client.publish_events_batch

Publishes a batch of events to Fiddler asynchronously.

Input ParameterTypeDefaultDescription
project_idstrNoneThe unique identifier for the project.
model_idstrNoneA unique identifier for the model.
batch_sourceUnion[pd.Dataframe, str]NoneEither a pandas DataFrame containing a batch of events, or the path to a file containing a batch of events. Supported file types are
CSV (.csv)
Parquet (.pq)

- Pickled DataFrame (.pkl)
id_fieldOptional [str]NoneThe field containing event IDs for events in the batch. If not specified, Fiddler will generate its own ID, which can be retrived using the get_slice API.
update_eventOptional [bool]NoneIf True, will only modify an existing event, referenced by id_field. If an ID is provided for which there is no event, no change will take place.
timestamp_fieldOptional [str]NoneThe field containing timestamps for events in the batch. The format of these timestamps is given by timestamp_format. If no timestamp is provided for a given row, the current time will be used.
timestamp_formatOptional [fdl.FiddlerTimestamp]fdl.FiddlerTimestamp.INFERThe format of the timestamp passed in event_timestamp. Can be one of
-fdl.FiddlerTimestamp.INFER

- fdl.FiddlerTimestamp.EPOCH_MILLISECONDS
- fdl.FiddlerTimestamp.EPOCH_SECONDS
- fdl.FiddlerTimestamp.ISO_8601
data_sourceOptional [fdl.BatchPublishType]NoneThe location of the data source provided. By default, Fiddler will try to infer the value. Can be one of

- fdl.BatchPublishType.DATAFRAME
- fdl.BatchPublishType.LOCAL_DISK
- fdl.BatchPublishType.AWS_S3
casting_typeOptional [bool]FalseIf True, will try to cast the data in event to be in line with the data types defined in the model's ModelInfo object.
credentialsOptional [dict]NoneA dictionary containing authorization information for AWS or GCP.

For AWS, the expected keys are

- 'aws_access_key_id'
- 'aws_secret_access_key'
- 'aws_session_token'For GCP, the expected keys are

- 'gcs_access_key_id'
- 'gcs_secret_access_key'
- 'gcs_session_token'
group_byOptional [str]NoneThe field used to group events together when computing performance metrics (for ranking models only).
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'

df_events = pd.read_csv('events.csv')

client.publish_events_batch(
        project_id=PROJECT_ID,
        model_id=MODEL_ID,
        batch_source=df_events,
        timestamp_field='inference_date')