Skip to content

Quick Start

This guide will walk you through the basic onboarding steps required to use Fiddler for production model monitoring and explainability. API documentation can be found here

Using Fiddler is a simple 3-step process visualized here - png

Info

This quick start notebook can be downloaded from this link

# Importing all the required packages at the beginning
import pandas as pd
import pathlib
import shutil
import yaml
import datetime
import time
from IPython.display import clear_output
from random import sample, randint

import fiddler as fdl

Client Setup

First, we need to initialize the client object by specifying:

  • The url: url is the fiddler URL that you have been provided to access. Usually of the form ‘XXXXX.fiddler.ai’. Contact us if you don’t have it
  • The org_id: organization id is an identifier for the account. See Fiddler_URL/settings/general to find this id (listed as "Organization ID") png
  • The auth_token: this token is used to authenticate access. See Fiddler_URL/settings/credentials to find, create, or change this token png
  • The verbose: a boolean value by default set as false, if set to true api calls will be logged verbosely.

Warning

When verbose is set as true, all the information required for debugging will be logged, including the auth_token.

You can also save this config as a file called fiddler.ini in the same folder as the notebook/script. That saves you from specifying the parameters in every notebook and script. png

!pip install fiddler-client
%%writefile fiddler.ini

[FIDDLER]
url = https://your-org@fiddler.ai
org_id = <org_id>
auth_token = YOUR_ORG_TOKEN
# client = fdl.FiddlerApi(url='https://trial.fiddler.ai', org_id='your_org_id', auth_token="your_auth_token")
client = fdl.FiddlerApi()

Step One: Upload Baseline Data

Create Project

First we will create a project, a convenient container for housing the models and datasets associated with a given ML use case.

project_id = 'monitoring_quickstart' # project_id may only contain lowercase letters, numbers or underscore
client.create_project(project_id)

Upload dataset

Next we will upload the dataset (training data or a representive sample of the same) that will serve as baselines for various product capabilities, including monitoring and explainability of the models. For this tutorial, we will be using a cleaned version of auto insurance dataset that can be found here. We are predicting whether a customer would be high value or not.

df = pd.read_csv("../samples/datasets/auto_insurance/data_cleaned.csv")
df.head()
# Uploading the dataset
dataset_id = 'auto_insurance' # dataset_id may only contain lowercase letters, numbers or underscore
client.upload_dataset(project_id=project_id,dataset_id = dataset_id, dataset = {'baseline' : df},
                      info = fdl.DatasetInfo.from_dataframe(df, max_inferred_cardinality=1000) )

Step Two: Register Model

As you must have noted, in the dataset upload step we did not ask for the model’s features and targets, or any model specific information. That’s because we allow for linking multiple models to a given dataset schema. Hence we require an Infer model schema step which helps us know the features relevant to the model and the model task. Here you can specify the input features, the target column, decision columns and metadata columns, and also the type of model.

  • We can infer the model task from the target column, or it can explicitly set. Currently we support three model types, for details refer to fdl.ModelInfo

    • Regression
    • Binary Classification
    • Multi-class Classification
model_id = 'high_value_classifier' # model_id may only contain lowercase letters, numbers or underscore

outputs = ['probability_high_value'] # output of the model
target = 'high_value' # we're predicting whether the customer is high value (1) or not (0)
decision_cols = ['Campaign_A'] # Based on the predicted high_value - should we send the customer this campaign
#input_features = df.drop(['probability_high_value', 'high_value','Campaign_A'], axis = 1).columns
input_features = df.drop(['high_value','Campaign_A'], axis = 1).columns

model_info = fdl.ModelInfo.from_dataset_info(
    dataset_info=client.get_dataset_info(project_id, dataset_id),
    features = input_features,
    target=target,
    decision_cols=decision_cols,
    outputs=outputs,
    input_type=fdl.ModelInputType.TABULAR,
    model_task=fdl.ModelTask.BINARY_CLASSIFICATION,
    display_name='High value prediction model',
    description='This is a Binary Classification model from the tutorial',
)

model_info

Register model

# register model
client.register_model(project_id, model_id, dataset_id, model_info)

Step Three: Simulate Monitoring Traffic

Streaming data example

In this step, we will be simulating traffic to send for our model monitoring by using publish_event. This will be the equivalent of running our model separately on some data, and either sending to Fiddler then, or saving this information to a log and sending at a later point.

For this demonstration, we will be going with a streaming approach. We will utilize a log containing rows with fields corresponding to:

  • inputs
  • predictions
  • labels (targets)
  • decisions

We can find the fields that will be utilized by referring our fdl.ModelInfo object

event_log = pd.read_csv('../samples/datasets/auto_insurance/event_log.csv')
event_log.head()

Now we will publish these rows as events. To most accurately simulate this as a time-series event, we will also be calling a function to generate a timestamp in the last 2 weeks. Real data will ideally have a timestamp related to when the event took place; otherwise, the current time will be used.

Note

The timestamp must be in UTC milliseconds. See here for more details

DAYS = 10
INTERVAL_MINS = 10
INTERVAL_MS = INTERVAL_MINS*60*1e3
NUM_EVENTS_TO_SEND = int(24*60/INTERVAL_MINS)*DAYS # publish an event every 10minutes for 10 days
ONE_DAY_MS = 8.64e+7
start_date = round(time.time() * 1000) - (ONE_DAY_MS * DAYS)

# Convert this dataframe into a list of dictionary events, where each event is its own dictionary
event_list_dict = event_log.sample(n=NUM_EVENTS_TO_SEND).to_dict(orient='records')

for ind, event_dict in enumerate(event_list_dict):
    event_ms_time_stamp = start_date + ind * INTERVAL_MS
    client.publish_event(project_id, model_id, event_dict, event_timestamp=event_ms_time_stamp)

    clear_output(wait = True)
    readable_timestamp = datetime.datetime.fromtimestamp(event_ms_time_stamp/1000.0)

    print(f'Sending {ind+1} / {NUM_EVENTS_TO_SEND} \n{readable_timestamp} UTC: \n{event_dict}')
    time.sleep(0.01)

Note

In the case that labels are ingested in a future point, an event can be updated by calling:

  • res = fiddler_api.publish_event(project_id, model_id, event, event_id: customer, update_event=True, event_timestamp=row['__occurred_at'])

By setting the update_event flag to be true, the event identifed by event_id will be updated with whatever additional information you pass in through event, including a target label. See here for more details.

Viewing Monitoring Traffic We can now consult our Fiddler instance to visualize our monitoring results. We can see our newly created project within the Projects Overview section:

png

Within our project, we can click high_value_classifier to see our model we created. From there, we can see the traffic that reflects the events we sent by going to the Monitor Section at the top:

png

For a walkthrough to learn more about navigating the product, please consult our Product Tour

Back to top