Publishing Inferences

Once a model's schema has been onboarded, you can publish events/inferences so that Fiddler can analyze that inference data to ensure proper performance. This event inference data can either come in the form of pre-production or production data.

Publish Pre-production Data to a Model

Because pre-production data is available to the customer at model creation (unlike production data), Fiddler allows only batch publication of pre-production data. You may use a dataframe, parquet file, or csv file for uploading a dataset.

job = model.publish(
    source=DATASET_FILE_PATH,
    environment=fdl.EnvType.PRE_PRODUCTION,
    dataset_name=DATASET_NAME,
)
job.wait()

List Pre-production Dataset(s) Published to a Model

You can publish multiple pre-production datasets to a model.

# List of pre-production datasets for a Model instance

for dataset in model.datasets:
    print(f'Dataset: {dataset.id} - {dataset.name}')

Publish Production Data to a Model

Publish Events/Inferences as a Stream

Fiddler supports event streams to be published to a model.

model.event_ts_col = 'timestamp_col'
model.event_id_col = 'event_id_col'
DATASET_FILE_PATH = "dataset.csv"

df = pd.read_csv(DATASET_FILE_PATH)

# Generate event_id which is later needed for label updates
df[model.event_id_col] = [str(uuid4()) for _ in range(len(df))]
_add_timestamp(df=df, event_ts_col=model.event_ts_col)

event_ids = model.publish(source=df)

print(f'{len(event_ids)} events published')

Publish Production Events - Batch

The Fiddler client supports publishing micro batch streams (up to 1K events, configurable)

events = df.sample(10).to_dict(orient='records') # this will give list of event dictionaries

events_ids = model.publish(source=events)

print(f'{len(events_ids)} events published')

Publish Production Label Updates - Stream

Fiddler supports updates of existing events for provided event ids.

updated_events = [
        {
            model.event_id_col: event_id,
            model.spec.targets[0]: model.task_params.target_class_order[0],
        }
        for event_id in df.sample(100)[model.event_id_col]
]

events_ids = model.publish(source=updated_events, update=True)

print(f'{len(events_ids)} events updated')

Last updated

© 2024 Fiddler AI