Skip to content

Publishing Events

View In Github

For this section, we will cover how to publish an event for a model in Fiddler. We will do so by sending monitoring events for a specific model and project that is already housed in Fiddler.

Initialize Fiddler client

We begin this section as usual by establishing a connection to our Fiddler instance. We can establish this connection either by specifying our credentials directly, or by utilizing our fiddler.ini file. More information can be found in the setup tutorial.

import fiddler as fdl

# client = fdl.FiddlerApi(url=url, org_id=org_id, auth_token=auth_token)
client = fdl.FiddlerApi()

Load Event Log

Event logs must contains the model's input features and predictions. For this demonstration, we have collected events and saved them in a file called events.log.

import pandas as pd
event_log = pd.read_csv('/app/fiddler_samples/samples/datasets/winequality/events.log')

project_id = 'tutorial'
model_id = 'wine_quality_model'

Add monitoring config

We can add a config for either an entire org, or a project or a model by using add_monitoring_config. In the following example, we will add the config for the given model.

config_info = {'min_bin_value': 3600, # possible values 300, 3600, 7200, 43200, 86400, 604800 secs
               'time_ranges': ['Day', 'Week', 'Month', 'Quarter', 'Year'],
               'default_time_range': 7200,
               'tag': 'config for wine quality model'
    }

client.add_monitoring_config(config_info,
                             project_id,
                             model_id)

Compute and Cache model predictions

For a new model, we need to compute model predictions and cache them before sending events. We can skip this step if we already computed the predictions.

dataset_id = 'winequality'
client.trigger_model_predictions(project_id, model_id, dataset_id)

Publish Events

First Option

In this step, we will be simulating traffic to send for our model monitoring by using publish_event. This will be the equivalent of running our model separately on data, and either sending to Fiddler then, or saving this information to a log and sending at a later point.

For this demonstration, we will be going with a log-related approach. This log contains rows that have inputs and predictions. To most accurately simulate this as a time-series event, we will also be calling a function to generate a timestamp in the last two weeks. Real data will ideally have a timestamp related to when the event took place; otherwise, the current time will be used.

We can send the inputs, outputs, targets as well as decisions variables.

Note

The timestamp must be in UTC milliseconds. See here for more details

import datetime
import time
from IPython.display import clear_output

NUM_EVENTS_TO_SEND = 50

FIVE_MINUTES_MS = 300000
FIFTEEN_MINUTES_MS = FIVE_MINUTES_MS * 3
ONE_DAY_MS = 8.64e+7
start_date = round(time.time() * 1000) - (ONE_DAY_MS * 8)
print(datetime.datetime.fromtimestamp(start_date/1000.0))
# Convert this dataframe into a list of dictionary events, where each event is its own dictionary
event_list_dict = event_log.sample(n=NUM_EVENTS_TO_SEND, random_state=42).to_dict(orient='records')

for ind, event_dict in enumerate(event_list_dict):
    event_time = start_date + ind * FIVE_MINUTES_MS
    result = client.publish_event(project_id,
                                  model_id,
                                  event_dict,
                                  event_timestamp=event_time,
                                  event_id=str(ind + 100),
                                  update_event=False)

    readable_timestamp = datetime.datetime.fromtimestamp(event_time/1000.0)
    clear_output(wait = True)

    print(f'Sending {ind+1} / {NUM_EVENTS_TO_SEND} \n{readable_timestamp} UTC: \n{event_dict}')
    time.sleep(0.1)
Sending 3 / 50

Note

If we want to update the events later, we need to specify an event_id. To update an event, we need to call publish_event again with the same event_id and update_event=True.

Second Option

As an alternative, we can send a log dataframe in once by using publish_events_batch.

We can embed the event_timestamp as a field in the input data frame and then use the ts_column to specify which column to use for timestamp. If the timestamp is not provided, the current time will be used.

We can send the inputs, outputs, targets as well as decisions variables.

Note

The timestamp must be in UTC milliseconds. See here for more details

import datetime
import time

now = datetime.datetime.now()
start_date = now - datetime.timedelta(days=2)

list_timestamp = [start_date + datetime.timedelta(minutes=5) * ind for ind in range(NUM_EVENTS_TO_SEND)]
list_timestamp = [x.isoformat(' ') for x in list_timestamp]

Optionally, we can also embed the event_id as a field in the input data if we want to update those events later.

event_id = [str(x) for x in range(NUM_EVENTS_TO_SEND)]
event_log = pd.concat([event_log.sample(n=NUM_EVENTS_TO_SEND, random_state=42).reset_index(),
                       pd.Series(list_timestamp, name='timestamp'),
                       pd.Series(event_id, name='__event_id')], axis=1)
client.publish_events_log(project_id,
                          model_id,
                          event_log,
                          ts_column='timestamp'
                         )
Back to top