If you want to authenticate with Fiddler without passing this information directly into the function call, you can store it in a file named_ fiddler.ini_, which should be stored in the same directory as your notebook or script.
Projects are used to organize your models and datasets. Each project can represent a machine learning task (e.g. predicting house prices, assessing creditworthiness, or detecting fraud).
A project can contain one or more models (e.g. lin_reg_house_predict, random_forest_house_predict).
You cannot delete a project without deleting the datasets and the models associated with that project.
Datasets
Datasets (or baseline datasets) are used for making comparisons with production data.
A baseline dataset should be sampled from your model's training set, so it can serve as a representation of what the model expects to see in production.
A model is a representation of your machine learning model. Each model must have an associated dataset to be used as a baseline for monitoring, explainability, and fairness capabilities.
You do not need to upload your model artifact in order to onboard your model, but doing so will significantly improve the quality of explanations generated by Fiddler.
Vertical scaling: Model deployments support vertical scaling via cpu and memory parameters. Some models might need more memory to load the artifacts into memory or process the requests.
Scale down: You may want to scale down the model deployments to avoid allocating the resources when the model is not in use. Use active parameters to scale down the deployment.
Event publication is the process of sending your model's prediction logs, or events, to the Fiddler platform. Using the Fiddler Client, events can be published in batch or streaming mode. Using these events, Fiddler will calculate metrics around feature drift, prediction drift, and model performance. These events are also stored in Fiddler to allow for ad hoc segment analysis. Please read the sections that follow to learn more about how to use the Fiddler Client for event publication.
In this example, event_id and inference_date are columns in df_events. Both are optional. If not passed, we generate unique UUID and use current timestamp for event_timestmap.
PROJECT_ID ='example_project'MODEL_ID ='example_model'df_to_update = pd.read_csv('events_update.csv')# event_id is a column in df_to_updateclient.publish_events_batch( project_id=PROJECT_ID, model_id=MODEL_ID, update_event=True, batch_source=df_to_update, id_field='event_id')
In case of update-event, id_field is required as a unique identifier of the previous published events. For more details on which columns are eligible to be updated, refer to Updating Events.
{'status':202,'job_uuid':'4ae7bd3a-2b3f-4444-b288-d51e07b6736d','files': ['ssoqj_tmpzmczjuob.csv'], 'message': 'Successfully received the event data. Please allow time for the event ingestion to complete in the Fiddler platform.'}
from fiddler import BaselineType, WindowSizePROJECT_NAME ='example_project'BASELINE_NAME ='example_rolling'DATASET_NAME ='example_validation'MODEL_NAME ='example_model'client.add_baseline( project_id=PROJECT_NAME, model_id=MODEL_NAME, baseline_id=BASELINE_NAME, type=BaselineType.ROLLING_PRODUCTION, offset=WindowSize.ONE_MONTH, # How far back to set our window window_size=WindowSize.ONE_WEEK, # Size of the sliding window)
client.get_baseline
get_baseline helps get the configuration parameters of the existing baseline
Gets all the baselines in a project or attached to a single model within a project
PROJECT_NAME ='example_project'MODEL_NAME ='example_model'# list baselines across all models within a projectclient.list_baselines( project_id=ROJECT_NAME)# list baselines within a modelclient.list_baselines( project_id=PROJECT_NAME, model_id=MODEL_NAME,)
The Fiddler client can be used to create a variety of alert rules. Rules can be of Data Drift, Performance, Data Integrity, and **Service Metrics ** types and they can be compared to absolute (compare_to = RAW_VALUE) or to relative values (compare_to = TIME_PERIOD).
# To add a Performance type alert rule which triggers an email notification
# when precision metric is 5% higher than that from 1 hr bin one day ago.
import fiddler as fdl
notifications_config = client.build_notifications_config(
emails = "user_1@abc.com, user_2@abc.com",
)
client.add_alert_rule(
name = "perf-gt-5prec-1hr-1d-ago",
project_id = 'project-a',
model_id = 'model-a',
alert_type = fdl.AlertType.PERFORMANCE,
metric = fdl.Metric.PRECISION,
bin_size = fdl.BinSize.ONE_HOUR,
compare_to = fdl.CompareTo.TIME_PERIOD,
compare_period = fdl.ComparePeriod.ONE_DAY,
warning_threshold = 0.05,
critical_threshold = 0.1,
condition = fdl.AlertCondition.GREATER,
priority = fdl.Priority.HIGH,
notifications_config = notifications_config
)
# To add Data Integrity alert rule which triggers an email notification when
# published events have more than 5 null values in any 1 hour bin for the _age_ column.
# Notice compare_to = fdl.CompareTo.RAW_VALUE.
import fiddler as fdl
client.add_alert_rule(
name = "age-null-1hr",
project_id = 'project-a',
model_id = 'model-a',
alert_type = fdl.AlertType.DATA_INTEGRITY,
metric = fdl.Metric.MISSING_VALUE,
bin_size = fdl.BinSize.ONE_HOUR,
compare_to = fdl.CompareTo.RAW_VALUE,
priority = fdl.Priority.HIGH,
warning_threshold = 5,
critical_threshold = 10,
condition = fdl.AlertCondition.GREATER,
column = "age",
notifications_config = notifications_config
)
# To add a Data Drift type alert rule which triggers an email notification
# when PSI metric for 'age' column from an hr is 5% higher than that from 'baseline_name' dataset.
import fiddler as fdl
client.add_baseline(project_id='project-a',
model_id='model-a',
baseline_name='baseline_name',
type=fdl.BaselineType.PRE_PRODUCTION,
dataset_id='dataset-a')
notifications_config = client.build_notifications_config(
emails = "user_1@abc.com, user_2@abc.com",
)
client.add_alert_rule(
name = "psi-gt-5prec-age-baseline_name",
project_id = 'project-a',
model_id = 'model-a',
alert_type = fdl.AlertType.DATA_DRIFT,
metric = fdl.Metric.PSI,
bin_size = fdl.BinSize.ONE_HOUR,
compare_to = fdl.CompareTo.RAW_VALUE,
warning_threshold = 0.05,
critical_threshold = 0.1,
condition = fdl.AlertCondition.GREATER,
priority = fdl.Priority.HIGH,
notifications_config = notifications_config,
columns = ["age"],
baseline_id = 'baseline_name'
)
# To add Drift type alert rule which triggers an email notification when
# value of JSD metric is more than 0.5 for one hour bin for _age_ or _gender_ columns.
# Notice compare_to = fdl.CompareTo.RAW_VALUE.
import fiddler as fdl
notifications_config = client.build_notifications_config(
emails = "user_1@abc.com, user_2@abc.com",
)
client.add_alert_rule(
name = "jsd_multi_col_1hr",
project_id = 'project-a',
model_id = 'model-a',
alert_type = fdl.AlertType.DATA_DRIFT,
metric = fdl.Metric.JSD,
bin_size = fdl.BinSize.ONE_HOUR,
compare_to = fdl.CompareTo.RAW_VALUE,
warning_threshold = 0.4,
critical_threshold = 0.5,
condition = fdl.AlertCondition.GREATER,
priority = fdl.Priority.HIGH,
notifications_config = notifications_config,
columns = ["age", "gender"],
)
# To add Data Integrity alert rule which triggers an email notification when
# published events have more than 5 percent null values in any 1 hour bin for the _age_ column.
import fiddler as fdl
client.add_alert_rule(
name = "age_null_percentage_greater_than_10",
project_id = 'project-a',
model_id = 'model-a',
alert_type = fdl.AlertType.DATA_INTEGRITY,
metric = 'null_violation_percentage',
bin_size = fdl.BinSize.ONE_HOUR,
compare_to = fdl.CompareTo.RAW_VALUE,
priority = fdl.Priority.HIGH,
warning_threshold = 5,
critical_threshold = 10,
condition = fdl.AlertCondition.GREATER,
column = "age",
notifications_config = notifications_config
)
The Fiddler client can be used to get a list of alert rules with respect to the filtering parameters.
import fiddler as fdl
alert_rules = client.get_alert_rules(
project_id = 'project-a',
model_id = 'model-a',
alert_type = fdl.AlertType.DATA_INTEGRITY,
metric = fdl.Metric.MISSING_VALUE,
columns = ["age", "gender"],
ordering = ['critical_threshold'], #['-critical_threshold'] for descending
limit= 4, ## to set number of rules to show in one go
offset = 0, # page offset (multiple of limit)
)
client.get_triggered_alerts
📘 Info
The Fiddler client can be used to get a list of triggered alerts for given alert rule and time duration.
trigerred_alerts = client.get_triggered_alerts(
alert_rule_uuid = "588744b2-5757-4ae9-9849-1f4e076a58de",
start_time = "2022-05-01",
end_time = "2022-09-30",
ordering = ['alert_time_bucket'], #['-alert_time_bucket'] for descending
limit= 4, ## to set number of rules to show in one go
offset = 0, # page offset
)
client.delete_alert_rule
📘 Info
The Fiddler client can be used to get a list of triggered alerts for given alert rule and time duration.