Publishing Inferences
This document explains how to publish pre-production and production data to a Fiddler model for inference data analysis. It provides code examples for batch publishing pre-production data and streaming production data.
Once a model's schema has been onboarded, you can publish events/inferences so that Fiddler can analyze that inference data to ensure proper performance. This event inference data can either come in the form of pre-production or production data.
Publish Pre-Production data to a Model
Because pre-production data is available to the customer at model creation (unlike production data), Fiddler allows only batch publication of pre-production data. For this purpose, dataframes are not supported. You may use parquet or csv file for uploading dataset.
job = model.publish(
source=DATASET_FILE_PATH,
environment=fdl.EnvType.PRE_PRODUCTION,
dataset_name=DATASET_NAME,
)
job.wait()
List pre-production dataset(s) onboarded to a model
You may publish multiple pre-produciton datasets to a model
# List of pre-production datasets
for x in model.datasets:
print(f'Dataset: {x.id} - {x.name}')
Publish Production data to a model
Publish events/inferences as a stream
Fiddler supports event streams to be published to a model.
model.event_ts_col = 'timestamp_col'
model.event_id_col = 'event_id_col'
DATASET_FILE_PATH = "dataset.csv"
df = pd.read_csv(DATASET_FILE_PATH)
# Generate event_id which is later needed for label updates
df[model.event_id_col] = [str(uuid4()) for _ in range(len(df))]
_add_timestamp(df=df, event_ts_col=model.event_ts_col)
event_ids = model.publish(source=df)
print(f'{len(event_ids)} events published')
Publish product events - batch
The Fiddler client supports publishing micro batch streams (up to 1K events, configurable)
events = df.sample(10).to_dict(orient='records') # this will give list of event dictionaries
events_ids = model.publish(source=events)
print(f'{len(events_ids)} events published')
Publish production label updates - stream
Fiddler supports updates of existing events for provided event ids.
updated_events = [
{
model.event_id_col: event_id,
model.spec.targets[0]: model.task_params.target_class_order[0],
}
for event_id in df.sample(100)[model.event_id_col]
]
events_ids = model.publish(source=updated_events, update=True)
print(f'{len(events_ids)} events updated')
Updated 21 days ago