Publishing Inferences

This document explains how to publish pre-production and production data to a Fiddler model for inference data analysis. It provides code examples for batch publishing pre-production data and streaming production data.

Once a model's schema has been onboarded, you can publish events/inferences so that Fiddler can analyze that inference data to ensure proper performance. This event inference data can either come in the form of pre-production or production data.

Publish Pre-Production data to a Model

Because pre-production data is available to the customer at model creation (unlike production data), Fiddler allows only batch publication of pre-production data. For this purpose, dataframes are not supported. You may use parquet or csv file for uploading dataset.

job = model.publish(
    source=DATASET_FILE_PATH,
    environment=fdl.EnvType.PRE_PRODUCTION,
    dataset_name=DATASET_NAME,
)
job.wait()

List pre-production dataset(s) onboarded to a model

You may publish multiple pre-produciton datasets to a model

# List of pre-production datasets

for x in model.datasets:
    print(f'Dataset: {x.id} - {x.name}')

Publish Production data to a model

Publish events/inferences as a stream

Fiddler supports event streams to be published to a model.

model.event_ts_col = 'timestamp_col'
model.event_id_col = 'event_id_col'
DATASET_FILE_PATH = "dataset.csv"

df = pd.read_csv(DATASET_FILE_PATH)

# Generate event_id which is later needed for label updates
df[model.event_id_col] = [str(uuid4()) for _ in range(len(df))]
_add_timestamp(df=df, event_ts_col=model.event_ts_col)

event_ids = model.publish(source=df)

print(f'{len(event_ids)} events published')

Publish product events - batch

The Fiddler client supports publishing micro batch streams (up to 1K events, configurable)

events = df.sample(10).to_dict(orient='records') # this will give list of event dictionaries

events_ids = model.publish(source=events)

print(f'{len(events_ids)} events published')

Publish production label updates - stream

Fiddler supports updates of existing events for provided event ids.

updated_events = [
        {
            model.event_id_col: event_id,
            model.spec.targets[0]: model.task_params.target_class_order[0],
        }
        for event_id in df.sample(100)[model.event_id_col]
]

events_ids = model.publish(source=updated_events, update=True)

print(f'{len(events_ids)} events updated')