Connecting to Fiddler
fdl.FiddlerAPI
The Client object is used to communicate with Fiddler. In order to use the client, you'll need to provide authentication details as shown below.
For more information, see Authorizing the Client.
Parameter | Type | Default | Description |
---|---|---|---|
url | str | None | The URL used to connect to Fiddler |
org_id | str | None | The organization ID for a Fiddler instance. Can be found on the General tab of the Settings page. |
auth_token | str | None | The authorization token used to authenticate with Fiddler. Can be found on the Credentials tab of the Settings page. |
proxies | Optional [dict] | None | A dictionary containing proxy URLs. |
verbose | Optional [bool] | False | If True, client calls will be logged verbosely. |
verify | Optional [bool] | True | If False, client will allow self-signed SSL certificates from the Fiddler server environment. If True, the SSL certificates need to be signed by a certificate authority (CA). |
Warning
If verbose is set to True, all information required for debugging will be logged, including the authorization token.
Info
To maximize compatibility, please ensure that your client version matches the server version for your Fiddler instance.
When you connect to Fiddler using the code on the right, you'll receive a notification if there is a version mismatch between the client and server.
You can install a specific version of fiddler-client using pip:
pip install fiddler-client==X.X.X
import fiddler as fdl
URL = 'https://app.fiddler.ai'
ORG_ID = 'my_org'
AUTH_TOKEN = 'p9uqlkKz1zAA3KAU8kiB6zJkXiQoqFgkUgEa1sv4u58'
client = fdl.FiddlerApi(
url=URL,
org_id=ORG_ID,
auth_token=AUTH_TOKEN
)
import fiddler as fdl
URL = 'https://app.fiddler.ai'
ORG_ID = 'my_org'
AUTH_TOKEN = 'p9uqlkKz1zAA3KAU8kiB6zJkXiQoqFgkUgEa1sv4u58'
client = fdl.FiddlerApi(
url=URL,
org_id=ORG_ID,
auth_token=AUTH_TOKEN,
verify=False
)
proxies = {
'http' : 'http://proxy.example.com:1234',
'https': 'https://proxy.example.com:5678'
}
client = fdl.FiddlerApi(
url=URL,
org_id=ORG_ID,
auth_token=AUTH_TOKEN,
proxies=proxies
)
If you want to authenticate with Fiddler without passing this information directly into the function call, you can store it in a file named fiddler.ini, which should be stored in the same directory as your notebook or script.
%%writefile fiddler.ini
[FIDDLER]
url = https://app.fiddler.ai
org_id = my_org
auth_token = p9uqlkKz1zAA3KAU8kiB6zJkXiQoqFgkUgEa1sv4u58
client = fdl.FiddlerApi()
Projects
Projects are used to organize your models and datasets. Each project can represent a machine learning task (e.g. predicting house prices, assessing creditworthiness, or detecting fraud).
A project can contain one or more models (e.g. lin_reg_house_predict, random_forest_house_predict).
For more information on projects, click here.
client.list_projects
response = client.list_projects()
Return Type | Description |
---|---|
list | A list containing the project ID string for each project |
[
'project_a',
'project_b',
'project_c'
]
client.create_project
Input Parameters | Type | Default | Description |
---|---|---|---|
project_id | str | None | A unique identifier for the project. Must be a lowercase string between 2-30 characters containing only alphanumeric characters and underscores. Additionally, it must not start with a numeric character. |
PROJECT_ID = 'example_project'
client.create_project(
project_id=PROJECT_ID
)
Return Type | Description |
---|---|
dict | A dictionary mapping project_name to the project ID string specified, once the project is successfully created. |
{
'project_name': 'example_project'
}
client.delete_project
Input Parameters | Type | Default | Description |
---|---|---|---|
project_id | str | None | The unique identifier for the project. |
PROJECT_ID = 'example_project'
client.delete_project(
project_id=PROJECT_ID
)
Return Type | Description |
---|---|
bool | A boolean denoting whether deletion was successful. |
True
Caution
You cannot delete a project without deleting the datasets and the models associated with that project.
Datasets
Datasets (or baseline datasets) are used for making comparisons with production data.
A baseline dataset should be sampled from your model's training set, so it can serve as a representation of what the model expects to see in production.
For more information, see Uploading a Baseline Dataset.
For guidance on how to design a baseline dataset, see Designing a Baseline Dataset.
client.list_datasets
Input Parameters | Type | Default | Description |
---|---|---|---|
project_id | str | None | The unique identifier for the project. |
PROJECT_ID = "example_project"
client.list_datasets(
project_id=PROJECT_ID
)
Return Type | Description |
---|---|
list | A list containing the dataset ID strings for each project. |
[
'dataset_a',
'dataset_b',
'dataset_c'
]
client.upload_dataset
Input Parameters | Type | Default | Description |
---|---|---|---|
project_id | str | None | The unique identifier for the project. |
dataset | dict | None | A dictionary mapping dataset slice names to pandas DataFrames. |
dataset_id | str | None | A unique identifier for the dataset. Must be a lowercase string between 2-30 characters containing only alphanumeric characters and underscores. Additionally, it must not start with a numeric character. |
info | Optional [fdl.DatasetInfo] | None | The Fiddler fdl.DatasetInfo() object used to describe the dataset. |
size_check_enabled | Optional [bool] | True | If True, will issue a warning when a dataset has a large number of rows. |
import pandas as pd
PROJECT_ID = 'example_project'
DATASET_ID = 'example_dataset'
df = pd.read_csv('example_dataset.csv')
dataset_info = fdl.DatasetInfo.from_dataframe(
df=df
)
client.upload_dataset(
project_id=PROJECT_ID,
dataset_id=DATASET_ID,
dataset={
'baseline': df
},
info=dataset_info
)
Return Type | Description |
---|---|
dict | A dictionary containing information about the uploaded dataset. |
{'uuid': '7046dda1-2779-4987-97b4-120e6185cc0b',
'name': 'Ingestion dataset Upload',
'info': {'project_name': 'example_model',
'resource_name': 'acme_data',
'resource_type': 'DATASET'},
'status': 'SUCCESS',
'progress': 100.0,
'error_message': None,
'error_reason': None}
client.delete_dataset
Input Parameters | Type | Default | Description |
---|---|---|---|
project_id | str | None | The unique identifier for the project. |
dataset_id | str | None | A unique identifier for the dataset. |
PROJECT_ID = 'example_project'
DATASET_ID = 'example_dataset'
client.delete_dataset(
project_id=PROJECT_ID,
dataset_id=DATASET_ID
)
Return Type | Description |
---|---|
str | A message confirming that the dataset was deleted. |
'Dataset deleted example_dataset'
Caution
You cannot delete a dataset without deleting the models associated with that dataset first.
client.get_dataset_info
Input Parameters | Type | Default | Description |
---|---|---|---|
project_id | str | None | The unique identifier for the project. |
dataset_id | str | None | A unique identifier for the dataset. |
PROJECT_ID = 'example_project'
DATASET_ID = 'example_dataset'
dataset_info = client.get_dataset_info(
project_id=PROJECT_ID,
dataset_id=DATASET_ID
)
Return Type | Description |
---|---|
fdl.DatasetInfo | The fdl.DatasetInfo() object associated with the specified dataset. |
#NA
Models
A model is a representation of your machine learning model. Each model must have an associated dataset to be used as a baseline for monitoring, explainability, and fairness capabilities.
You do not need to upload your model artifact in order to onboard your model, but doing so will significantly improve the quality of explanations generated by Fiddler.
client.add_model
Input Parameter | Type | Default | Description |
---|---|---|---|
project_id | str | None | The unique identifier for the project. |
model_id | str | None | A unique identifier for the model. Must be a lowercase string between 2-30 characters containing only alphanumeric characters and underscores. Additionally, it must not start with a numeric character. |
dataset_id | str | None | The unique identifier for the dataset. |
model_info | fdl.ModelInfo | None | A fdl.ModelInfo() object containing information about the model. |
PROJECT_ID = 'example_project'
DATASET_ID = 'example_dataset'
MODEL_ID = 'example_model'
dataset_info = client.get_dataset_info(
project_id=PROJECT_ID,
dataset_id=DATASET_ID
)
model_task = fdl.ModelTask.BINARY_CLASSIFICATION
model_target = 'target_column'
model_output = 'output_column'
model_features = [
'feature_1',
'feature_2',
'feature_3'
]
model_info = fdl.ModelInfo.from_dataset_info(
dataset_info=dataset_info,
target=model_target,
outputs=[model_output],
model_task=model_task
)
client.add_model(
project_id=PROJECT_ID,
dataset_id=DATASET_ID,
model_id=MODEL_ID,
model_info=model_info
)
Return Type | Description |
---|---|
str | A message confirming that the model was added. |
client.add_model_artifact
Note
Before calling this function, you must have already added a model using
add_model
.
Input Parameter | Type | Default | Description |
---|---|---|---|
project_id | str | None | The unique identifier for the project. |
model_id | str | None | A unique identifier for the model. |
model_dir | str | None | A path to the directory containing all of the model files needed to run the model. |
deployment_params | Optional[fdl.DeploymentParams] | None | Deployment parameters object for tuning the model deployment spec. Supported from server version 23.1 and above with Model Deployment feature enabled. |
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
client.add_model_artifact(
project_id=PROJECT_ID,
model_id=MODEL_ID,
model_dir='model_dir/',
)
client.add_model_surrogate
Note
Before calling this function, you must have already added a model using
add_model
.
Surrogate models are not supported for input_type = fdl.ModelInputType.TEXT
Input Parameter | Type | Default | Description |
---|---|---|---|
project_id | str | None | A unique identifier for the project. |
model_id | str | None | A unique identifier for the model. |
deployment_params | Optional[fdl.DeploymentParams] | None | Deployment parameters object for tuning the model deployment spec. |
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
client.add_model_surrogate(
project_id=PROJECT_ID,
model_id=MODEL_ID
)
# with deployment_params
client.add_model_surrogate(
project_id=PROJECT_ID,
model_id=MODEL_ID,
deployment_params=fdl.DeploymentParams(cpu=250, memory=500)
)
Return Type | Description |
---|---|
None | Returns None |
client.delete_model
For more information, see Uploading a Model Artifact.
Input Parameter | Type | Default | Description |
---|---|---|---|
project_id | str | None | The unique identifier for the project. |
model_id | str | None | A unique identifier for the model |
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
client.delete_model(
project_id=PROJECT_ID,
model_id=MODEL_ID
)
client.get_model_info
Input Parameter | Type | Default | Description |
---|---|---|---|
project_id | str | None | The unique identifier for the project. |
model_id | str | None | A unique identifier for the model. Must be a lowercase string between 2-30 characters containing only alphanumeric characters and underscores. Additionally, it must not start with a numeric character. |
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
model_info = client.get_model_info(
project_id=PROJECT_ID,
model_id=MODEL_ID
)
Return Type | Description |
---|---|
fdl.ModelInfo | The ModelInfo object associated with the specified model. |
client.list_models
Input Parameter | Type | Default | Description |
---|---|---|---|
project_id | str | None | The unique identifier for the project. |
PROJECT_ID = 'example_project'
client.list_models(
project_id=PROJECT_ID
)
Return Type | Description |
---|---|
list | A list containing the string ID of each model. |
[
'model_a',
'model_b',
'model_c'
]
client_register_model
Not supported with client 2.0 and above
Please use client.add_model() going forward.
client.trigger_pre_computation
Not supported with client 2.0 and above
This method is called automatically now when calling client.add_model_surrogate() or client.add_model_artifact().
client.update_model
For more information, see Uploading a Model Artifact.
Warning
This function does not allow for changes in a model's schema. The inputs and outputs to the model must remain the same.
Input Parameter | Type | Default | Description |
---|---|---|---|
project_id | str | None | The unique identifier for the project. |
model_id | str | None | A unique identifier for the model. |
model_dir | pathlib.Path | None | A path to the directory containing all of the model files needed to run the model. |
force_pre_compute | bool | True | If True, re-run precomputation steps for the model. This can also be done manually by calling client.trigger_pre_computation. |
import pathlib
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
model_dir = pathlib.Path('model_dir')
client.update_model(
project_id=PROJECT_ID,
model_id=MODEL_ID,
model_dir=model_dir
)
Return Type | Description |
---|---|
bool | A boolean denoting whether the update was successful. |
True
client.update_model_artifact
Note
Before calling this function, you must have already added a model using
add_model_surrogate
oradd_model_artifact
Input Parameter | Type | Default | Description |
---|---|---|---|
project_id | str | None | The unique identifier for the project. |
model_id | str | None | A unique identifier for the model. |
model_dir | str | None | A path to the directory containing all of the model files needed to run the model. |
deployment_params | Optional[fdl.DeploymentParams] | None | Deployment parameters object for tuning the model deployment spec. Supported from server version 23.1 and above with Model Deployment feature enabled. |
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
client.update_model_artifact(
project_id=PROJECT_ID,
model_id=MODEL_ID,
model_dir='model_dir/',
)
client.update_model_package
Not supported with client 2.0 and above
Please use client.add_model_artifact() going forward.
client.update_model_surrogate
Note
This method call cannot replace user uploaded model done using add_model_artifact. It can only re-generate a surrogate model
This can be used to re-generate a surrogate model for a model
Input Parameter | Type | Default | Description |
---|---|---|---|
project_id | str | None | A unique identifier for the project. |
model_id | str | None | A unique identifier for the model. |
deployment_params | Optional[fdl.DeploymentParams] | None | Deployment parameters object for tuning the model deployment spec. |
wait | Optional[bool] | True | Whether to wait for async job to finish(True) or return(False). |
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
client.update_model_surrogate(
project_id=PROJECT_ID,
model_id=MODEL_ID
)
# with deployment_params
client.update_model_surrogate(
project_id=PROJECT_ID,
model_id=MODEL_ID,
deployment_params=fdl.DeploymentParams(cpu=250, memory=500)
)
Return Type | Description |
---|---|
None | Returns None |
Model Deployment
client.get_model_deployment
Input Parameter | Type | Default | Description |
---|---|---|---|
project_id | str | None | The unique identifier for the project. |
model_id | str | None | The unique identifier for the model. |
PROJECT_NAME = 'example_project'
MODEL_NAME = 'example_model'
client.get_model_deployment(
project_id=PROJECT_NAME,
model_id=MODEL_NAME,
)
Return Type | Description |
---|---|
dict | returns a dictionary, with all related fields for the model deployment |
{
id: 106548,
uuid: UUID("123e4567-e89b-12d3-a456-426614174000"),
model_id: "MODEL_NAME",
project_id : "PROJECT_NAME",
organization_id: "ORGANIZATION_NAME",
artifact_type: "PYTHON_PACKAGE",
deployment_type: "BASE_CONTAINER",
active: True,
image_uri: "md-base/python/machine-learning:1.0.0",
replicas: 1,
cpu: 250,
memory: 512,
created_by: {
id: 4839,
full_name: "first_name last_name",
email: "[email protected]",
},
updated_by: {
id: 4839,
full_name: "first_name last_name",
email: "[email protected]",
},
created_at: datetime(2023, 1, 27, 10, 9, 39, 793829),
updated_at: datetime(2023, 1, 30, 17, 3, 17, 813865),
job_uuid: UUID("539j9630-a69b-98d5-g496-326117174805")
}
client.update_model_deployment
Input Parameter | Type | Default | Description |
---|---|---|---|
project_id | str | None | The unique identifier for the project. |
model_id | str | None | The unique identifier for the model. |
active | Optional [bool] | None | Set False to scale down model deployment and True to scale up. |
replicas | Optional[int] | None | The number of replicas running the model. |
cpu | Optional [int] | None | The amount of CPU (milli cpus) reserved per replica. |
memory | Optional [int] | None | The amount of memory (mebibytes) reserved per replica. |
wait | Optional[bool] | True | Whether to wait for the async job to finish (True ) or not (False ). |
Example use cases
-
Horizontal scaling: horizontal scaling via replicas parameter. This will create multiple Kubernetes pods internally to handle requests.
PROJECT_NAME = 'example_project' MODEL_NAME = 'example_model' # Create 3 Kubernetes pods internally to handle requests client.update_model_deployment( project_id=PROJECT_NAME, model_id=MODEL_NAME, replicas=3, )
-
Vertical scaling: Model deployments support vertical scaling via cpu and memory parameters. Some models might need more memory to load the artifacts into memory or process the requests.
PROJECT_NAME = 'example_project' MODEL_NAME = 'example_model' client.update_model_deployment( project_id=PROJECT_NAME, model_id=MODEL_NAME, cpu=500, memory=1024, )
-
Scale down: You may want to scale down the model deployments to avoid allocating the resources when the model is not in use. Use active parameters to scale down the deployment.
PROJECT_NAME = 'example_project' MODEL_NAME = 'example_model' client.update_model_deployment( project_id=PROJECT_NAME, model_id=MODEL_NAME, active=False, )
-
Scale up: This will again create the model deployment Kubernetes pods with the resource values available in the database.
PROJECT_NAME = 'example_project' MODEL_NAME = 'example_model' client.update_model_deployment( project_id=PROJECT_NAME, model_id=MODEL_NAME, active=True, )
Return Type | Description |
---|---|
dict | returns a dictionary, with all related fields for model deployment |
Supported from server version
23.1
and above with Flexible Model Deployment feature enabled.
{
id: 106548,
uuid: UUID("123e4567-e89b-12d3-a456-426614174000"),
model_id: "MODEL_NAME",
project_id : "PROJECT_NAME",
organization_id: "ORGANIZATION_NAME",
artifact_type: "PYTHON_PACKAGE",
deployment_type: "BASE_CONTAINER",
active: True,
image_uri: "md-base/python/machine-learning:1.0.0",
replicas: 1,
cpu: 250,
memory: 512,
created_by: {
id: 4839,
full_name: "first_name last_name",
email: "[email protected]",
},
updated_by: {
id: 4839,
full_name: "first_name last_name",
email: "[email protected]",
},
created_at: datetime(2023, 1, 27, 10, 9, 39, 793829),
updated_at: datetime(2023, 1, 30, 17, 3, 17, 813865),
job_uuid: UUID("539j9630-a69b-98d5-g496-326117174805")
}
Event Publication
Event publication is the process of sending your model's prediction logs, or events, to the Fiddler platform. Using the Fiddler Client, events can be published in batch or streaming mode. Using these events, Fiddler will calculate metrics around feature drift, prediction drift, and model performance. These events are also stored in Fiddler to allow for ad hoc segment analysis. Please read the sections that follow to learn more about how to use the Fiddler Client for event publication.
client.publish_event
Input Parameter | Type | Default | Description |
---|---|---|---|
project_id | str | None | The unique identifier for the project. |
model_id | str | None | A unique identifier for the model. Must be a lowercase string between 2-30 characters containing only alphanumeric characters and underscores. Additionally, it must not start with a numeric character. |
event | dict | None | A dictionary mapping field names to field values. Any fields found that are not present in the model's ModelInfo object will be dropped from the event. |
event_id | Optional [str] | None | A unique identifier for the event. If not specified, Fiddler will generate its own ID, which can be retrived using the get_slice API. |
update_event | Optional [bool] | None | If True, will only modify an existing event, referenced by event_id. If no event is found, no change will take place. |
event_timestamp | Optional [int] | None | The name of the timestamp input field for when the event took place. The format of this timestamp is given by timestamp_format. If no timestamp input is provided, the current time will be used. |
timestamp_format | Optional [fdl.FiddlerTimestamp] | fdl.FiddlerTimestamp.INFER | The format of the timestamp passed in event_timestamp. Can be one of - fdl.FiddlerTimestamp.INFER - fdl.FiddlerTimestamp.EPOCH_MILLISECONDS - fdl.FiddlerTimestamp.EPOCH_SECONDS - fdl.FiddlerTimestamp.ISO_8601 |
casting_type | Optional [bool] | False | If True, will try to cast the data in event to be in line with the data types defined in the model's ModelInfo object. |
dry_run | Optional [bool] | False | If True, the event will not be published, and instead a report will be generated with information about any problems with the event. Useful for debugging issues with event publishing. |
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
example_event = {
'feature_1': 20.7,
'feature_2': 45000,
'feature_3': True,
'output_column': 0.79,
'target_column': 1
}
client.publish_event(
project_id=PROJECT_ID,
model_id=MODEL_ID,
event=example_event,
event_id='event_001',
event_timestamp=1637344470000
)
Return Type | Description |
---|---|
str | returns a string with a UUID acknowledging that the event was successfully received. |
'66cfbeb6-5651-4e8b-893f-90286f435b8d'
client.publish_events_batch
Input Parameter | Type | Default | Description |
---|---|---|---|
project_id | str | None | The unique identifier for the project. |
model_id | str | None | A unique identifier for the model. |
batch_source | Union[pd.Dataframe, str] | None | Either a pandas DataFrame containing a batch of events, or the path to a file containing a batch of events. Supported file types are CSV (.csv) Parquet (.pq) - Pickled DataFrame (.pkl) |
id_field | Optional [str] | None | The field containing event IDs for events in the batch. If not specified, Fiddler will generate its own ID, which can be retrived using the get_slice API. |
update_event | Optional [bool] | None | If True, will only modify an existing event, referenced by id_field. If an ID is provided for which there is no event, no change will take place. |
timestamp_field | Optional [str] | None | The field containing timestamps for events in the batch. The format of these timestamps is given by timestamp_format. If no timestamp is provided for a given row, the current time will be used. |
timestamp_format | Optional [fdl.FiddlerTimestamp] | fdl.FiddlerTimestamp.INFER | The format of the timestamp passed in event_timestamp. Can be one of -fdl.FiddlerTimestamp.INFER - fdl.FiddlerTimestamp.EPOCH_MILLISECONDS - fdl.FiddlerTimestamp.EPOCH_SECONDS - fdl.FiddlerTimestamp.ISO_8601 |
data_source | Optional [fdl.BatchPublishType] | None | The location of the data source provided. By default, Fiddler will try to infer the value. Can be one of - fdl.BatchPublishType.DATAFRAME - fdl.BatchPublishType.LOCAL_DISK - fdl.BatchPublishType.AWS_S3 |
casting_type | Optional [bool] | False | If True, will try to cast the data in event to be in line with the data types defined in the model's ModelInfo object. |
credentials | Optional [dict] | None | A dictionary containing authorization information for AWS or GCP. For AWS, the expected keys are - 'aws_access_key_id' - 'aws_secret_access_key' - 'aws_session_token'For GCP, the expected keys are - 'gcs_access_key_id' - 'gcs_secret_access_key' - 'gcs_session_token' |
group_by | Optional [str] | None | The field used to group events together when computing performance metrics (for ranking models only). |
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
df_events = pd.read_csv('events.csv')
client.publish_events_batch(
project_id=PROJECT_ID,
model_id=MODEL_ID,
batch_source=df_events,
timestamp_field='inference_date')
Return Type | Description |
---|---|
dict | A dictionary object which reports the result of the batch publication. |
{'status': 202,
'job_uuid': '4ae7bd3a-2b3f-4444-b288-d51e07b6736d',
'files': ['ssoqj_tmpzmczjuob.csv'],
'message': 'Successfully received the event data. Please allow time for the event ingestion to complete in the Fiddler platform.'}
client.publish_events_batch_schema
Input Parameter | Type | Default | Description |
---|---|---|---|
batch_source | Union[pd.Dataframe, str] | None | Either a pandas DataFrame containing a batch of events, or the path to a file containing a batch of events. Supported file types are - CSV (.csv) |
publish_schema | dict | None | A dictionary used for locating fields within complex or nested data structures. |
data_source | Optional [fdl.BatchPublishType] | None | The location of the data source provided. By default, Fiddler will try to infer the value. Can be one of - fdl.BatchPublishType.DATAFRAME - fdl.BatchPublishType.LOCAL_DISK - fdl.BatchPublishType.AWS_S3 |
credentials | Optional [dict] | None | A dictionary containing authorization information for AWS or GCP. For AWS, the expected keys are - 'aws_access_key_id' - 'aws_secret_access_key' - 'aws_session_token'For GCP, the expected keys are - 'gcs_access_key_id' - 'gcs_secret_access_key' - 'gcs_session_token' |
group_by | Optional [str] | None | The field used to group events together when computing performance metrics (for ranking models only). |
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
path_to_batch = 'events_batch.avro'
schema = {
'__static': {
'__project': PROJECT_ID,
'__model': MODEL_ID
},
'__dynamic': {
'feature_1': 'features/feature_1',
'feature_2': 'features/feature_2',
'feature_3': 'features/feature_3',
'output_column': 'outputs/output_column',
'target_column': 'targets/target_column'
ORG = '__org'
13 MODEL = '__model'
14 PROJECT = '__project'
15 TIMESTAMP = '__timestamp'
16 DEFAULT_TIMESTAMP = '__default_timestamp'
17 TIMESTAMP_FORMAT = '__timestamp_format'
18 EVENT_ID = '__event_id'
19 IS_UPDATE_EVENT = '__is_update_event'
20 STATUS = '__status'
21 LATENCY = '__latency'
22 ITERATOR_KEY = '__iterator_key'
}
}
client.publish_events_batch_schema(
batch_source=path_to_batch,
publish_schema=schema
)
Return Type | Description |
---|---|
dict | A dictionary object which reports the result of the batch publication. |
{'status': 202,
'job_uuid': '5ae7bd3a-2b3f-4444-b288-d51e098a01d',
'files': ['rroqj_tmpzmczjttb.csv'],
'message': 'Successfully received the event data. Please allow time for the event ingestion to complete in the Fiddler platform.'}
Baselines
client.add_baseline
Input Parameter | Type | Required | Description |
---|---|---|---|
project_id | string | Yes | The unique identifier for the project |
model_id | string | Yes | The unique identifier for the model |
baseline_id | string | Yes | The unique identifier for the baseline |
type | fdl.BaselineType | Yes | one of : PRE_PRODUCTION STATIC_PRODUCTION ROLLING_PRODUCTION |
dataset_id | string | No | Training or validation dataset uploaded to Fiddler for a PRE_PRODUCTION baseline |
start_time | int | No | seconds since epoch to be used as the start time for STATIC_PRODUCTION baseline |
end_time | int | No | seconds since epoch to be used as the end time for STATIC_PRODUCTION baseline |
offset | fdl.WindowSize | No | offset in seconds relative to the current time to be used for ROLLING_PRODUCTION baseline |
window_size | fdl.WindowSize | No | width of the window in seconds to be used for ROLLING_PRODUCTION baseline |
Add a pre-production baseline
PROJECT_NAME = 'example_project'
BASELINE_NAME = 'example_pre'
DATASET_NAME = 'example_validation'
MODEL_NAME = 'example_model'
client.add_baseline(
project_id=PROJECT_NAME,
model_id=MODEL_NAME,
baseline_id=BASELINE_NAME,
type=BaselineType.PRE_PRODUCTION,
dataset_id=DATASET_NAME,
)
Add a static production baseline
from datetime import datetime
from fiddler import BaselineType, WindowSize
start = datetime(2023, 1, 1, 0, 0) # 12 am, 1st Jan 2023
end = datetime(2023, 1, 2, 0, 0) # 12 am, 2nd Jan 2023
PROJECT_NAME = 'example_project'
BASELINE_NAME = 'example_static'
DATASET_NAME = 'example_dataset'
MODEL_NAME = 'example_model'
START_TIME = start.timestamp()
END_TIME = end.timestamp()
client.add_baseline(
project_id=PROJECT_NAME,
model_id=MODEL_NAME,
baseline_id=BASELINE_NAME,
type=BaselineType.STATIC_PRODUCTION,
start_time=START_TIME,
end_time=END_TIME,
)
Add a rolling time window baseline
from fiddler import BaselineType, WindowSize
PROJECT_NAME = 'example_project'
BASELINE_NAME = 'example_rolling'
DATASET_NAME = 'example_validation'
MODEL_NAME = 'example_model'
client.add_baseline(
project_id=PROJECT_NAME,
model_id=MODEL_NAME,
baseline_id=BASELINE_NAME,
type=BaselineType.ROLLING_PRODUCTION,
offset=WindowSize.ONE_MONTH, # How far back to set our window
window_size=WindowSize.ONE_WEEK, # Size of the sliding window
)
Return Type | Description |
---|---|
fdl.Baseline | Baseline schema object with all the configuration parameters |
client.get_baseline
get_baseline
helps get the configuration parameters of the existing baseline
Input Parameter | Type | Required | Description |
---|---|---|---|
project_id | string | Yes | The unique identifier for the project |
model_id | string | Yes | The unique identifier for the model |
baseline_id | string | Yes | The unique identifier for the baseline |
PROJECT_NAME = 'example_project'
MODEL_NAME = 'example_model'
BASELINE_NAME = 'example_preconfigured'
baseline = client.get_baseline(
project_id=PROJECT_NAME,
model_id=MODEL_NAME,
baseline_id=BASELINE_NAME,
)
Return Type | Description |
---|---|
fdl.Baseline | Baseline schema object with all the configuration parameters |
client.list_baselines
Gets all the baselines in a project or attached to a single model within a project
Input Parameter | Type | Required | Description |
---|---|---|---|
project_id | string | Yes | The unique identifier for the project |
model_id | string | No | The unique identifier for the model |
PROJECT_NAME = 'example_project'
MODEL_NAME = 'example_model'
# list baselines across all models within a project
client.list_baselines(
project_id=ROJECT_NAME
)
# list baselines within a model
client.list_baselines(
project_id=PROJECT_NAME,
model_id=MODEL_NAME,
)
Return Type | Description |
---|---|
List[fdl.Baseline] | List of baseline config objects |
client.delete_baseline
Deletes an existing baseline from a project
Input Parameter | Type | Required | Description |
---|---|---|---|
project_id | string | Yes | The unique identifier for the project |
model_id | string | Yes | The unique identifier for the model |
baseline_id | string | Yes | The unique identifier for the baseline |
PROJECT_NAME = 'example_project'
MODEL_NAME = 'example_model'
BASELINE_NAME = 'example_preconfigured'
client.delete_baseline(
project_id=PROJECT_NAME,
model_id=MODEL_NAME,
baseline_id=BASELINE_NAME,
)
Monitoring
client.add_monitoring_config
Input Parameters | Type | Default | Description |
---|---|---|---|
config_info | dict | None | Monitoring config info for an entire org or a project or a model. |
project_id | Optional [str] | None | The unique identifier for the project. |
model_id | Optional [str] | None | The unique identifier for the model. |
Info
add_monitoring_config can be applied at the model, project, or organization level.
- If project_id and model_id are specified, the configuration will be applied at the model level.
- If project_id is specified but model_id is not, the configuration will be applied at the project level.
- If neither project_id nor model_id are specified, the configuration will be applied at the organization level.
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
monitoring_config = {
'min_bin_value': 3600,
'time_ranges': ['Day', 'Week', 'Month', 'Quarter', 'Year'],
'default_time_range': 7200
}
client.add_monitoring_config(
config_info=monitoring_config,
project_id=PROJECT_ID,
model_id=MODEL_ID
)
client.add_alert_rule
Input Parameters | Type | Default | Description |
---|---|---|---|
name | str | None | A name for the alert rule |
project_id | str | None | The unique identifier for the project. |
model_id | str | None | The unique identifier for the model. |
alert_type | fdl.AlertType | None | One of AlertType.PERFORMANCE ,AlertType.DATA_DRIFT ,AlertType.DATA_INTEGRITY , AlertType.SERVICE_METRICS , orAlertType.STATISTIC |
metric | fdl.Metric | None | When alert_type is AlertType.SERVICE_METRICS this should be Metric.TRAFFIC .When alert_type is AlertType.PERFORMANCE , choose one of the following based on the ML model task:For binary_classfication: Metric.ACCURACY Metric.TPR Metric.FPR Metric.PRECISION Metric.RECALL Metric.F1_SCORE Metric.ECE Metric.AUC For regression: Metric.R2 Metric.MSE Metric.MAE Metric.MAPE Metric.WMAPE For multi-class classification: Metric.ACCURACY Metric.LOG_LOSS For ranking: Metric.MAP Metric.MEAN_NDCG When alert_type is AlertType.DATA_DRIFT choose one of the following:Metric.PSI Metric.JSD When alert_type is AlertType.DATA_INTEGRITY choose one of the following:Metric.RANGE_VIOLATION Metric.MISSING_VALUE Metric.TYPE_VIOLATION When alert_type is AlertType.STATISTIC choose one of the following:Metric.AVERAGE Metric.SUM Metric.FREQUENCY |
bin_size | fdl.BinSize | ONE_DAY | Duration for which the metric value is calculated. Choose one of the following:BinSize.ONE_HOUR BinSize.ONE_DAY BinSize.SEVEN_DAYS |
compare_to | fdl.CompareTo | None | Whether the metric value compared against a static value or the same bin from a previous time period.CompareTo.RAW_VALUE CompareTo.TIME_PERIOD . |
compare_period | fdl.ComparePeriod | None | Required only when CompareTo is TIME_PERIOD . Choose one of the following: ComparePeriod.ONE_DAY ComparePeriod.SEVEN_DAYS ComparePeriod.ONE_MONTH ComparePeriod.THREE_MONTHS |
priority | fdl.Priority | None | Priority.LOW Priority.MEDIUM Priority.HIGH |
warning_threshold | float | None | [Optional] Threshold value to crossing which a warning level severity alert will be triggered. This should be a decimal which represents a percentage (e.g. 0.45). |
critical_threshold | float | None | Threshold value to crossing which a critical level severity alert will be triggered. This should be a decimal which represents a percentage (e.g. 0.45). |
condition | fdl.AlertCondition | None | Specifies if the rule should trigger if the metric is greater than or less than the thresholds. AlertCondition.LESSER AlertCondition.GREATER |
notifications_config | Dict[str, Dict[str, Any]] | None | [Optional] notifications config object created using helper method build_notifications_config() |
columns | List[str] | None | Column names on which alert rule is to be created. Applicable only when alert_type is AlertType.DATA_INTEGRITY or AlertType.DRIFT. When alert type is AlertType.DATA_INTEGRITY, it can take *[***ANY***]* to check for all columns. |
baseline_id | str | None | Name of the baseline whose histogram is compared against the same derived from current data. When no baseline_id is specified then the default baseline is used. Used only when alert type is AlertType.DATA_DRIFT . |
segment | str | None | The segment to alert on. See Segments for more details. |
Info
The Fiddler client can be used to create a variety of alert rules. Rules can be of Data Drift, Performance, Data Integrity, and Service Metrics types and they can be compared to absolute (compare_to = RAW_VALUE) or to relative values (compare_to = TIME_PERIOD).
# To add a Performance type alert rule which triggers an email notification
# when precision metric is 5% higher than that from 1 hr bin one day ago.
import fiddler as fdl
notifications_config = client.build_notifications_config(
emails = "[email protected], [email protected]",
)
client.add_alert_rule(
name = "perf-gt-5prec-1hr-1d-ago",
project_id = 'project-a',
model_id = 'model-a',
alert_type = fdl.AlertType.PERFORMANCE,
metric = fdl.Metric.PRECISION,
bin_size = fdl.BinSize.ONE_HOUR,
compare_to = fdl.CompareTo.TIME_PERIOD,
compare_period = fdl.ComparePeriod.ONE_DAY,
warning_threshold = 0.05,
critical_threshold = 0.1,
condition = fdl.AlertCondition.GREATER,
priority = fdl.Priority.HIGH,
notifications_config = notifications_config
)
# To add Data Integrity alert rule which triggers an email notification when
# published events have more than 5 null values in any 1 hour bin for the _age_ column.
# Notice compare_to = fdl.CompareTo.RAW_VALUE.
import fiddler as fdl
client.add_alert_rule(
name = "age-null-1hr",
project_id = 'project-a',
model_id = 'model-a',
alert_type = fdl.AlertType.DATA_INTEGRITY,
metric = fdl.Metric.MISSING_VALUE,
bin_size = fdl.BinSize.ONE_HOUR,
compare_to = fdl.CompareTo.RAW_VALUE,
priority = fdl.Priority.HIGH,
warning_threshold = 5,
critical_threshold = 10,
condition = fdl.AlertCondition.GREATER,
column = "age",
notifications_config = notifications_config
)
# To add a Data Drift type alert rule which triggers an email notification
# when PSI metric for 'age' column from an hr is 5% higher than that from 'baseline_name' dataset.
import fiddler as fdl
client.add_baseline(project_id='project-a',
model_id='model-a',
baseline_name='baseline_name',
type=fdl.BaselineType.PRE_PRODUCTION,
dataset_id='dataset-a')
notifications_config = client.build_notifications_config(
emails = "[email protected], [email protected]",
)
client.add_alert_rule(
name = "psi-gt-5prec-age-baseline_name",
project_id = 'project-a',
model_id = 'model-a',
alert_type = fdl.AlertType.DATA_DRIFT,
metric = fdl.Metric.PSI,
bin_size = fdl.BinSize.ONE_HOUR,
compare_to = fdl.CompareTo.RAW_VALUE,
warning_threshold = 0.05,
critical_threshold = 0.1,
condition = fdl.AlertCondition.GREATER,
priority = fdl.Priority.HIGH,
notifications_config = notifications_config,
columns = ["age"],
baseline_id = 'baseline_name'
)
# To add Drift type alert rule which triggers an email notification when
# value of JSD metric is more than 0.5 for one hour bin for _age_ or _gender_ columns.
# Notice compare_to = fdl.CompareTo.RAW_VALUE.
import fiddler as fdl
notifications_config = client.build_notifications_config(
emails = "[email protected], [email protected]",
)
client.add_alert_rule(
name = "jsd_multi_col_1hr",
project_id = 'project-a',
model_id = 'model-a',
alert_type = fdl.AlertType.DATA_DRIFT,
metric = fdl.Metric.JSD,
bin_size = fdl.BinSize.ONE_HOUR,
compare_to = fdl.CompareTo.RAW_VALUE,
warning_threshold = 0.4,
critical_threshold = 0.5,
condition = fdl.AlertCondition.GREATER,
priority = fdl.Priority.HIGH,
notifications_config = notifications_config,
columns = ["age", "gender"],
)
# To add Data Integrity alert rule which triggers an email notification when
# published events have more than 5 percent null values in any 1 hour bin for the _age_ column.
import fiddler as fdl
client.add_alert_rule(
name = "age_null_percentage_greater_than_10",
project_id = 'project-a',
model_id = 'model-a',
alert_type = fdl.AlertType.DATA_INTEGRITY,
metric = 'null_violation_percentage',
bin_size = fdl.BinSize.ONE_HOUR,
compare_to = fdl.CompareTo.RAW_VALUE,
priority = fdl.Priority.HIGH,
warning_threshold = 5,
critical_threshold = 10,
condition = fdl.AlertCondition.GREATER,
column = "age",
notifications_config = notifications_config
)
Return Type | Description |
---|---|
Alert Rule | Created Alert Rule object |
Example responses:
[AlertRule(alert_rule_uuid='9b8711fa-735e-4a72-977c-c4c8b16543ae',
organization_name='some_org_name',
project_id='project-a',
model_id='model-a',
name='perf-gt-5prec-1hr-1d-ago',
alert_type=AlertType.PERFORMANCE,
metric=Metric.PRECISION,
priority=Priority.HIGH,
compare_to='CompareTo.TIME_PERIOD,
compare_period=ComparePeriod.ONE_DAY,
compare_threshold=None,
raw_threshold=None,
warning_threshold=0.05,
critical_threshold=0.1,
condition=AlertCondition.GREATER,
bin_size=BinSize.ONE_HOUR)]
AlertRule(alert_rule_uuid='e1aefdd5-ef22-4e81-b869-3964eff8b5cd',
organization_name='some_org_name',
project_id='project-a',
model_id='model-a',
name='age-null-1hr',
alert_type=AlertType.DATA_INTEGRITY,
metric=Metric.MISSING_VALUE,
column='age',
priority=Priority.HIGH,
compare_to=CompareTo.RAW_VALUE,
compare_period=None,
warning_threshold=5,
critical_threshold=10,
condition=AlertCondition.GREATER,
bin_size=BinSize.ONE_HOUR)
AlertRule(alert_rule_uuid='e1aefdd5-ef22-4e81-b869-3964eff8b5cd',
organization_name='some_org_name',
project_id='project-a',
model_id='model-a',
name='psi-gt-5prec-age-baseline_name',
alert_type=AlertType.DATA_DRIFT,
metric=Metric.PSI,
priority=Priority.HIGH,
compare_to=CompareTo.RAW_VALUE,
compare_period=None,
warning_threshold=5,
critical_threshold=10,
condition=AlertCondition.GREATER,
bin_size=BinSize.ONE_HOUR,
columns=['age'],
baseline_id='baseline_name')
[AlertRule(alert_rule_uuid='9b8711fa-735e-4a72-977c-c4c8b16543ae',
organization_name='some_org_name',
project_id='project-a',
model_id='model-a',
name='perf-gt-5prec-1hr-1d-ago',
alert_type=AlertType.DRIFT,
metric=Metric.JSD,
priority=Priority.HIGH,
compare_to='CompareTo.RAW_VALUE,
compare_period=ComparePeriod.ONE_HOUR,
compare_threshold=None,
raw_threshold=None,
warning_threshold=0.4,
critical_threshold=0.5,
condition=AlertCondition.GREATER,
bin_size=BinSize.ONE_HOUR,
columns=['age', 'gender'])]
client.get_alert_rules
Input Parameters | Type | Default | Description |
---|---|---|---|
project_id | Optional [str] | None | A unique identifier for the project. |
model_id | Optional [str] | None | A unique identifier for the model. |
alert_type | Optional[fdl.AlertType] | None | Alert type. One of: AlertType.PERFORMANCE , AlertType.DATA_DRIFT , AlertType.DATA_INTEGRITY , or AlertType.SERVICE_METRICS |
metric | Optional[fdl.Metric] | None | When alert_type is SERVICE_METRICS: Metric.TRAFFIC .When alert_type is PERFORMANCE, choose one of the following based on the machine learning model. 1) For binary_classfication: One of Metric.ACCURACY , Metric.TPR , Metric.FPR , Metric.PRECISION , Metric.RECALL , Metric.F1_SCORE , Metric.ECE , Metric.AUC 2) For Regression: One of Metric.R2 , Metric.MSE , Metric.MAE , Metric.MAPE , Metric.WMAPE 3) For Multi-class: Metric.ACCURACY , Metric.LOG_LOSS 4) For Ranking: Metric.MAP , Metric.MEAN_NDCG When alert_type is DRIFT: Metric.PSI or Metric.JSD When alert_type is DATA_INTEGRITY: One of Metric.RANGE_VIOLATION ,Metric.MISSING_VALUE ,Metric.TYPE_VIOLATION |
columns | Optional[List[str]] | None | [Optional] List of column names on which alert rule was created. Please note, Alert Rule matching any columns from this list will be returned. |
offset | Optional[int] | None | Pointer to the starting of the page index |
limit | Optional[int] | None | Number of records to be retrieved per page, also referred as page_size |
ordering | Optional[List[str]] | None | List of Alert Rule fields to order by. Eg. [βcritical_thresholdβ] or [β- critical_thresholdβ] for descending order. |
Info
The Fiddler client can be used to get a list of alert rules with respect to the filtering parameters.
import fiddler as fdl
alert_rules = client.get_alert_rules(
project_id = 'project-a',
model_id = 'model-a',
alert_type = fdl.AlertType.DATA_INTEGRITY,
metric = fdl.Metric.MISSING_VALUE,
columns = ["age", "gender"],
ordering = ['critical_threshold'], #['-critical_threshold'] for descending
limit= 4, ## to set number of rules to show in one go
offset = 0, # page offset (multiple of limit)
)
Return Type | Description |
---|---|
List[AlertRule] | A List containing AlertRule objects returned by the query. |
client.get_triggered_alerts
Input Parameters | Type | Default | Description |
---|---|---|---|
alert_rule_uuid | str | None | The unique system generated identifier for the alert rule. |
start_time | Optional[datetime] | 7 days ago | Start time to filter trigger alerts in yyyy-MM-dd format, inclusive. |
end_time | Optional[datetime] | today | End time to filter trigger alerts in yyyy-MM-dd format, inclusive. |
offset | Optional[int] | None | Pointer to the starting of the page index |
limit | Optional[int] | None | Number of records to be retrieved per page, also referred as page_size |
ordering | Optional[List[str]] | None | List of Alert Rule fields to order by. Eg. [βalert_time_bucketβ] or [β- alert_time_bucketβ] for descending order. |
Info
The Fiddler client can be used to get a list of triggered alerts for given alert rule and time duration.
trigerred_alerts = client.get_triggered_alerts(
alert_rule_uuid = "588744b2-5757-4ae9-9849-1f4e076a58de",
start_time = "2022-05-01",
end_time = "2022-09-30",
ordering = ['alert_time_bucket'], #['-alert_time_bucket'] for descending
limit= 4, ## to set number of rules to show in one go
offset = 0, # page offset
)
Return Type | Description |
---|---|
List[TriggeredAlerts] | A List containing TriggeredAlerts objects returned by the query. |
client.delete_alert_rule
Input Parameters | Type | Default | Description |
---|---|---|---|
alert_rule_uuid | str | None | The unique system generated identifier for the alert rule. |
Info
The Fiddler client can be used to get a list of triggered alerts for given alert rule and time duration.
client.delete_alert_rule(
alert_rule_uuid = "588744b2-5757-4ae9-9849-1f4e076a58de",
)
Return Type | Description |
---|---|
None |
client.build_notifications_config
Input Parameters | Type | Default | Description |
---|---|---|---|
emails | Optional[str] | None | Comma separated emails list |
pagerduty_services | Optional[str] | None | Comma separated pagerduty services list |
pagerduty_severity | Optional[str] | None | Severity for the alerts triggered by pagerduty |
webhooks | Optional[List[str]] | None | Comma separated valid uuids of webhooks available |
Info
The Fiddler client can be used to build notification configuration to be used while creating alert rules.
notifications_config = client.build_notifications_config(
emails = "[email protected]",
)
notifications_config = client.build_notifications_config(
emails = "[email protected],[email protected]",
pagetduty_services = 'pd_service_1',
pagerduty_severity = 'critical'
)
notifications_config = client.build_notifications_config(
webhooks = ["894d76e8-2268-4c2e-b1c7-5561da6f84ae", "3814b0ac-b8fe-4509-afc9-ae86c176ef13"]
)
Return Type | Description |
---|---|
Dict[str, Dict[str, Any]]: | dict with emails and pagerduty dict. If left unused, will store empty string for these values |
Example Response:
{'emails': {'email': '[email protected]'}, 'pagerduty': {'service': '', 'severity': ''}, 'webhooks': []}
client.add_webhook
Input Parameters | Type | Default | Description |
---|---|---|---|
name | str | None | A unique name for the webhook. |
URL | str | None | The webhook url is used for sending notification messages. |
provider | str | None | The platform provides webhooks functionality. Only βSLACKβ is supported. |
client.add_webhook(
name='range_violation_channel',
url='https://hooks.slack.com/services/T9EAVLUQ5/P982J/G8ISUczk37hxQ15C28d',
provider='SLACK')
)
Return Type | Description |
---|---|
fdl.Webhook | Details of the webhook created. |
Example responses:
Webhook(uuid='df2397d3-23a8-4eb3-987a-2fe43b758b08',
name='range_violation_channel', organization_name='some_org_name',
url='https://hooks.slack.com/services/T9EAVLUQ5/P982J/G8ISUczk37hxQ15C28d',
provider='SLACK')
Add Slack webhook
Use the Slack API reference to generate a webhook for your Slack App
client.delete_webhook
Input Parameters | Type | Default | Description |
---|---|---|---|
uuid | str | None | The unique system generated identifier for the webook. |
client.delete_webhook(
uuid = "ffcc2ddf-f896-41f0-bc50-4e7b76bb9ace",
)
Return Type | Description |
---|---|
None |
client.get_webhook
Input Parameters | Type | Default | Description |
---|---|---|---|
uuid | str | None | The unique system generated identifier for the webook. |
client.get_webhook(
alert_rule_uuid = "a5f085bc-6772-4eff-813a-bfc20ff71002",
)
Return Type | Description |
---|---|
fdl.Webhook | Details of Webhook. |
Example responses:
Webhook(uuid='a5f085bc-6772-4eff-813a-bfc20ff71002',
name='binary_classification_alerts_channel',
organization_name='some_org',
url='https://hooks.slack.com/services/T9EAVLUQ5/P982J/G8ISUczk37hxQ15C28d,
provider='SLACK')
client.get_webhooks
Input Parameters | Type | Default | Description |
---|---|---|---|
limit | Optional[int] | 300 | Number of records to be retrieved per page. |
offset | Optional[int] | 0 | Pointer to the starting of the page index. |
response = client.get_webhooks()
Return Type | Description |
---|---|
List[fdl.Webhook] | A List containing webhooks. |
Example Response
[
Webhook(uuid='e20bf4cc-d2cf-4540-baef-d96913b14f1b', name='model_1_alerts', organization_name='some_org', url='https://hooks.slack.com/services/T9EAVLUQ5/P982J/G8ISUczk37hxQ15C28d', provider='SLACK'),
Webhook(uuid='bd4d02d7-d1da-44d7-b194-272b4351cff7', name='drift_alerts_channel', organization_name='some_org', url='https://hooks.slack.com/services/T9EAVLUQ5/P982J/G8ISUczk37hxQ15C28d', provider='SLACK'),
Webhook(uuid='761da93b-bde2-4c1f-bb17-bae501abd511', name='project_1_alerts', organization_name='some_org', url='https://hooks.slack.com/services/T9EAVLUQ5/P982J/G8ISUczk37hxQ15C28d', provider='SLACK')
]
client.update_webhook
Input Parameters | Type | Default | Description |
---|---|---|---|
name | str | None | A unique name for the webhook. |
url | str | None | The webhook url used for sending notification messages. |
provider | str | None | The platform that provides webhooks functionality. Only βSLACKβ is supported. |
uuid | str | None | The unique system generated identifier for the webook. |
client.update_webhook(uuid='e20bf4cc-d2cf-4540-baef-d96913b14f1b',
name='drift_violation',
url='https://hooks.slack.com/services/T9EAVLUQ5/P982J/G8ISUczk37hxQ15C28d',
provider='SLACK')
Return Type | Description |
---|---|
fdl.Webhook | Details of Webhook after modification. |
Example Response:
Webhook(uuid='e20bf4cc-d2cf-4540-baef-d96913b14f1b',
name='drift_violation', organization_name='some_org_name',
url='https://hooks.slack.com/services/T9EAVLUQ5/P982J/G8ISUczk37hxQ15C28d',
provider='SLACK')
client.update_alert_notification_status
Input Parameters | Type | Default | Description |
---|---|---|---|
notification_status | bool | None | The status of notification for the alerts. |
alert_config_ids | Optional[List[str]] | None | List of Alert Ids that we want to update. |
model_id | Optional[str] | None | The Model Id for which we want to update all alerts. |
Info
The Fiddler client can be used to update the notification status of multiple alerts at once.
updated_alert_configs = client.update_alert_notification_status(
notification_status = True,
model_id = "9f8180d3-3fa0-40c4-8656-b9b1d2de1b69",
)
updated_alert_configs = client.update_alert_notification_status(
notification_status = True,
alert_config_ids = ["9b8711fa-735e-4a72-977c-c4c8b16543ae"],
)
Return Type | Description |
---|---|
List[AlertRule] | List of Alert Rules updated from this method. |
Example responses:
[AlertRule(alert_rule_uuid='9b8711fa-735e-4a72-977c-c4c8b16543ae',
organization_name='some_org_name',
project_id='project-a',
model_id='model-a',
name='perf-gt-5prec-1hr-1d-ago',
alert_type=AlertType.PERFORMANCE,
metric=Metric.PRECISION,
priority=Priority.HIGH,
compare_to='CompareTo.TIME_PERIOD,
compare_period=ComparePeriod.ONE_DAY,
compare_threshold=None,
raw_threshold=None,
warning_threshold=0.05,
critical_threshold=0.1,
condition=AlertCondition.GREATER,
bin_size=BinSize.ONE_HOUR)]
Custom Metrics
client.get_custom_metric
Input Parameter | Type | Required | Description |
---|---|---|---|
metric_id | string | Yes | The unique identifier for the custom metric |
METRIC_ID = '7d06f905-80b1-4a41-9711-a153cbdda16c'
custom_metric = client.get_custom_metric(
metric_id=METRIC_ID
)
Return Type | Description |
---|---|
fiddler.schema.custom_metric.CustomMetric | Custom metric object with details about the metric |
client.get_custom_metrics
Input Parameter | Type | Default | Required | Description |
---|---|---|---|---|
project_id | string | Yes | The unique identifier for the project | |
model_id | string | Yes | The unique identifier for the model | |
limit | Optional[int] | 300 | No | Maximum number of items to return |
offset | Optional[int] | 0 | No | Number of items to skip before returning |
PROJECT_ID = 'my_project'
MODEL_ID = 'my_model'
custom_metrics = client.get_custom_metrics(
project_id=PROJECT_ID,
model_id=MODEL_ID
)
Return Type | Description |
---|---|
List[fiddler.schema.custom_metric.CustomMetric] | List of custom metric objects for the given model |
client.add_custom_metric
For details on supported constants, operators, and functions, see Fiddler Query Language.
Input Parameter | Type | Required | Description |
---|---|---|---|
name | string | Yes | Name of the custom metric |
project_id | string | Yes | The unique identifier for the project |
model_id | string | Yes | The unique identifier for the model |
definition | string | Yes | The FQL metric definition for the custom metric |
description | string | No | A description of the custom metric |
PROJECT_ID = 'my_project'
MODEL_ID = 'my_model'
definition = """
average(if(Prediction < 0.5 and Target == 1, -40, if(Prediction >= 0.5 and Target == 0, -400, 250)))
"""
client.add_custom_metric(
name='Loan Value',
description='A custom value score assigned to a loan',
project_id=PROJECT_ID,
model_id=MODEL_ID,
definition=definition
)
client.delete_custom_metric
Input Parameter | Type | Required | Description |
---|---|---|---|
metric_id | string | Yes | The unique identifier for the custom metric |
METRIC_ID = '7d06f905-80b1-4a41-9711-a153cbdda16c'
client.delete_custom_metric(
metric_id=METRIC_ID
)
Segments
client.get_segment
Input Parameter | Type | Required | Description |
---|---|---|---|
segment_id | string | Yes | The unique identifier for the segment |
SEGMENT_ID = '7d06f905-80b1-4a41-9711-a153cbdda16c'
segment = client.get_segment(
segment_id=SEGMENT_ID
)
Return Type | Description |
---|---|
fdl.Segment | Segment object with details about the segment |
client.get_segments
Input Parameter | Type | Default | Required | Description |
---|---|---|---|---|
project_id | string | Yes | The unique identifier for the project | |
model_id | string | Yes | The unique identifier for the model | |
limit | Optional[int] | 300 | No | Maximum number of items to return |
offset | Optional[int] | 0 | No | Number of items to skip before returning |
PROJECT_ID = 'my_project'
MODEL_ID = 'my_model'
custom_metrics = client.get_segments(
project_id=PROJECT_ID,
model_id=MODEL_ID
)
Return Type | Description |
---|---|
List[fdl.Segment] | List of segment objects for the given model |
client.add_segment
For details on supported constants, operators, and functions, see Fiddler Query Language.
Input Parameter | Type | Required | Description |
---|---|---|---|
name | string | Yes | Name of the segment |
project_id | string | Yes | The unique identifier for the project |
model_id | string | Yes | The unique identifier for the model |
definition | string | Yes | The FQL metric definition for the segment |
description | string | No | A description of the segment |
PROJECT_ID = 'my_project'
MODEL_ID = 'my_model'
definition = """
age > 50
"""
client.add_segment(
name='Over 50',
description='All people over the age of 50',
project_id=PROJECT_ID,
model_id=MODEL_ID,
definition=definition
)
Segment(
id='50a1c32d-c2b4-4faf-9006-f4aeadd7a859',
name='Over 50',
project_name='my_project',
organization_name='mainbuild',
definition='age > 50',
description='All people over the age of 50',
created_at=None,
created_by=None
)
client.delete_segment
Input Parameter | Type | Required | Description |
---|---|---|---|
segment_id | string | Yes | The unique identifier for the segment |
SEGMENT_ID = '7d06f905-80b1-4a41-9711-a153cbdda16c'
client.delete_segment(
segment_id=SEGMENT_ID
)
Explainability
client.get_predictions
Input Parameter | Type | Default | Description |
---|---|---|---|
project_id | str | None | A unique identifier for the project. |
model_id | str | None | A unique identifier for the model. |
input_df | pd.DataFrame | None | A pandas DataFrame containing model input vectors as rows. |
chunk_size | Optional[int] | 10000 | The chunk size for fetching predictions. Default is 10_000 rows chunk. |
import pandas as pd
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
input_df = pd.read_csv('example_data.csv')
# Example without chunk size specified:
predictions = client.get_predictions(
project_id=PROJECT_ID,
model_id=MODEL_ID,
input_df=input_df,
)
# Example with chunk size specified:
predictions = client.get_predictions(
project_id=PROJECT_ID,
model_id=MODEL_ID,
input_df=input_df,
chunk_size=1000,
)
Return Type | Description |
---|---|
pd.DataFrame | A pandas DataFrame containing model predictions for the given input vectors. |
client.get_explanation
Input Parameter | Type | Default | Description |
---|---|---|---|
project_id | str | None | A unique identifier for the project. |
model_id | str | None | A unique identifier for the model. |
input_data_source | Union[fdl.RowDataSource, fdl.EventIdDataSource] | None | Type of data source for the input dataset to compute explanation on (RowDataSource, EventIdDataSource). A single row explanation is currently supported. |
ref_data_source | Optional[Union[fdl.DatasetDataSource, fdl.SqlSliceQueryDataSource] ] | None | Type of data source for the reference data to compute explanation on (DatasetDataSource, SqlSliceQueryDataSource). Only used for non-text models and the following methods: 'SHAP', 'FIDDLER_SHAP', 'PERMUTE', 'MEAN_RESET' |
explanation_type | Optional[str] | 'FIDDLER_SHAP' | Explanation method name. Could be your custom explanation method or one of the following method: 'SHAP', 'FIDDLER_SHAP', 'IG', 'PERMUTE', 'MEAN_RESET', 'ZERO_RESET' |
num_permutations | Optional[int] | 300 | - For Fiddler SHAP, num_permutations corresponds to the number of coalitions to sample to estimate the Shapley values of each single-reference game. - For the permutation algorithms, num_permutations corresponds to the number of permutations from the dataset to use for the computation. |
ci_level | Optional[float] | 0.95 | The confidence level (between 0 and 1). |
top_n_class | Optional[int] | None | For multi-class classification models only, specifying if only the n top classes are computed or all classes (when parameter is None). |
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
DATASET_ID = 'example_dataset
# FIDDLER SHAP - Dataset reference data source
row = df.to_dict(orient='records')[0]
client.get_explanation(
project_id=PROJECT_ID,
model_id=MODEL_ID,
input_data_source=fdl.RowDataSource(row=row),
ref_data_source=fdl.DatasetDataSource(dataset_id=DATASET_ID, num_samples=300),
explanation_type='FIDDLER_SHAP',
num_permutations=200,
ci_level=0.95,
)
# FIDDLER SHAP - Slice ref data source
row = df.to_dict(orient='records')[0]
query = f'SELECT * from {DATASET_ID}.{MODEL_ID} WHERE sulphates >= 0.8'
client.get_explanation(
project_id=PROJECT_ID,
model_id=MODEL_ID,
input_data_source=fdl.RowDataSource(row=row),
ref_data_source=fdl.SqlSliceQueryDataSource(query=query, num_samples=100),
explanation_type='FIDDLER_SHAP',
num_permutations=200,
ci_level=0.95,
)
# FIDDLER SHAP - Multi-class classification (top classes)
row = df.to_dict(orient='records')[0]
client.get_explanation(
project_id=PROJECT_ID,
model_id=MODEL_ID,
input_data_source=fdl.RowDataSource(row=row),
ref_data_source=fdl.DatasetDataSource(dataset_id=DATASET_ID),
explanation_type='FIDDLER_SHAP',
top_n_class=2
)
# IG (Not available by default, need to be enabled via package.py)
row = df.to_dict(orient='records')[0]
client.get_explanation(
project_id=PROJECT_ID,
model_id=MODEL_ID,
input_data_source=fdl.RowDataSource(row=row),
explanation_type='IG',
)
Return Type | Description |
---|---|
tuple | A named tuple with the explanation results. |
client.get_feature_impact
Input Parameter | Type | Default | Description |
---|---|---|---|
project_id | str | None | A unique identifier for the project. |
model_id | str | None | A unique identifier for the model. |
data_source | Union[fdl.DatasetDataSource, fdl.SqlSliceQueryDataSource] | None | Type of data source for the input dataset to compute feature impact on (DatasetDataSource or SqlSliceQueryDataSource) |
num_iterations | Optional[int] | 10000 | The maximum number of ablated model inferences per feature. Used for TABULAR data only. |
num_refs | Optional[int] | 10000 | Number of reference points used in the explanation. Used for TABULAR data only. |
ci_level | Optional[float] | 0.95 | The confidence level (between 0 and 1). Used for TABULAR data only. |
output_columns | Optional[List[str]] | None | Only used for NLP (TEXT inputs) models. Output column names to compute feature impact on. Useful for Multi-class Classification models. If None, compute for all output columns. |
min_support | Optional[int] | 15 | Only used for NLP (TEXT inputs) models. Specify a minimum support (number of times a specific word was present in the sample data) to retrieve top words. Default to 15. |
overwrite_cache | Optional[bool] | False | Whether to overwrite the feature impact cached values or not. |
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
DATASET_ID = 'example_dataset'
# Feature Impact for TABULAR data - Dataset Data Source
feature_impact = client.get_feature_impact(
project_id=PROJECT_ID,
model_id=MODEL_ID,
data_source=fdl.DatasetDataSource(dataset_id=DATASET_ID, num_samples=200),
num_iterations=300,
num_refs=200,
ci_level=0.90,
)
# Feature Impact for TABULAR data - Slice Query data source
query = f'SELECT * FROM {DATASET_ID}.{MODEL_ID} WHERE CreditScore > 700'
feature_impact = client.get_feature_impact(
project_id=PROJECT_ID,
model_id=MODEL_ID,
data_source=fdl.SqlSliceQueryDataSource(query=query, num_samples=80),
num_iterations=300,
num_refs=200,
ci_level=0.90,
)
# Feature Impact for TEXT data
feature_impact = client.get_feature_impact(
project_id=PROJECT_ID,
model_id=MODEL_ID,
data_source=fdl.DatasetDataSource(dataset_id=DATASET_ID, num_samples=50),
output_columns= ['probability_A', 'probability_B'],
min_support=30
)
Return Type | Description |
---|---|
tuple | A named tuple with the feature impact results. |
client.get_feature_importance
Input Parameter | Type | Default | Description |
---|---|---|---|
project_id | str | None | A unique identifier for the project. |
model_id | str | None | A unique identifier for the model. |
data_source | Union[fdl.DatasetDataSource, fdl.SqlSliceQueryDataSource] | None | Type of data source for the input dataset to compute feature importance on (DatasetDataSource or SqlSliceQueryDataSource) |
num_iterations | Optional[int] | 10000 | The maximum number of ablated model inferences per feature. |
num_refs | Optional[int] | 10000 | Number of reference points used in the explanation. |
ci_level | Optional[float] | 0.95 | The confidence level (between 0 and 1). |
overwrite_cache | Optional[bool] | False | Whether to overwrite the feature importance cached values or not |
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
DATASET_ID = 'example_dataset'
# Feature Importance - Dataset data source
feature_importance = client.get_feature_importance(
project_id=PROJECT_ID,
model_id=MODEL_ID,
data_source=fdl.DatasetDataSource(dataset_id=DATASET_ID, num_samples=200),
num_iterations=300,
num_refs=200,
ci_level=0.90,
)
# Feature Importance - Slice Query data source
query = f'SELECT * FROM {DATASET_ID}.{MODEL_ID} WHERE CreditScore > 700'
feature_importance = client.get_feature_importance(
project_id=PROJECT_ID,
model_id=MODEL_ID,
data_source=fdl.SqlSliceQueryDataSource(query=query, num_samples=80),
num_iterations=300,
num_refs=200,
ci_level=0.90,
)
Return Type | Description |
---|---|
tuple | A named tuple with the feature impact results. |
client.get_mutual_information
Input Parameter | Type | Default | Description |
---|---|---|---|
project_id | str | None | A unique identifier for the project. |
dataset_id | str | None | A unique identifier for the dataset. |
query | str | None | Slice query to compute Mutual information on. |
column_name | str | None | Column name to compute mutual information with respect to all the columns in the dataset. |
normalized | Optional[bool] | False | If set to True, it will compute Normalized Mutual Information. |
num_samples | Optional[int] | 10000 | Number of samples to select for computation. |
PROJECT_ID = 'example_project'
DATASET_ID = 'example_dataset'
MODEL_ID = 'example_model'
query = f'SELECT * FROM {DATASET_ID}.{MODEL_ID} WHERE CreditScore > 700'
mutual_info = client.get_mutual_information(
project_id=PROJECT_ID,
dataset_id=DATASET_ID,
query=query,
column_name='Geography',
normalized=True,
num_samples=20000,
)
Return Type | Description |
---|---|
dict | A dictionary with the mutual information results. |
Analytics
client.get_slice
Input Parameter | Type | Default | Description |
---|---|---|---|
sql_query | str | None | The SQL query used to retrieve the slice. |
project_id | str | None | The unique identifier for the project. The model and/or the dataset to be queried within the project are designated in the sql_query itself. |
columns_override | Optional [list] | None | A list of columns to include in the slice, even if they aren't specified in the query. |
import pandas as pd
PROJECT_ID = 'example_project'
DATASET_ID = 'example_dataset'
MODEL_ID = 'example_model'
query = f""" SELECT * FROM "{DATASET_ID}.{MODEL_ID}" """
slice_df = client.get_slice(
sql_query=query,
project_id=PROJECT_ID
)
import pandas as pd
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
query = f""" SELECT * FROM "production.{MODEL_ID}" """
slice_df = client.get_slice(
sql_query=query,
project_id=PROJECT_ID
)
Return Type | Description |
---|---|
pd.DataFrame | A pandas DataFrame containing the slice returned by the query. |
Info
Only read-only SQL operations are supported. Certain SQL operations like aggregations and joins might not result in a valid slice.
Fairness
client.get_fairness
Only Binary classification models with categorical protected attributes are currently supported.
Input Parameter | Type | Default | Description |
---|---|---|---|
project_id | str | None | The unique identifier for the project. |
model_id | str | None | The unique identifier for the model. |
data_source | Union[fdl.DatasetDataSource, fdl.SqlSliceQueryDataSource] | None | DataSource for the input dataset to compute fairness on (DatasetDataSource or SqlSliceQueryDataSource). |
protected_features | list[str] | None | A list of protected features. |
positive_outcome | Union[str, int, float, bool] | None | Value of the positive outcome (from the target column) for Fairness analysis. |
score_threshold | Optional [float] | 0.5 | The score threshold used to calculate model outcomes. |
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
DATASET_ID = 'example_dataset'
# Fairness - Dataset data source
fairness_metrics = client.get_fairness(
project_id=PROJECT_ID,
model_id=MODEL_ID,
data_source=fdl.DatasetDataSource(dataset_id=DATASET_ID, num_samples=200),
protected_features=['feature_1', 'feature_2'],
positive_outcome='Approved',
score_threshold=0.6
)
# Fairness - Slice Query data source
query = f'SELECT * FROM {DATASET_ID}.{MODEL_ID} WHERE CreditSCore > 700'
fairness_metrics = client.get_fairness(
project_id=PROJECT_ID,
model_id=MODEL_ID,
data_source=fdl.SqlSliceQueryDataSource(query=query, num_samples=200),
protected_features=['feature_1', 'feature_2'],
positive_outcome='Approved',
score_threshold=0.6
)
Return Type | Description |
---|---|
dict | A dictionary containing fairness metric results. |
Access Control
client.list_org_roles
Warning
Only administrators can use client.list_org_roles() .
client.list_org_roles()
Return Type | Description |
---|---|
dict | A dictionary of users and their roles in the organization. |
{
'members': [
{
'id': 1,
'user': '[email protected]',
'email': '[email protected]',
'isLoggedIn': True,
'firstName': 'Example',
'lastName': 'Administrator',
'imageUrl': None,
'settings': {'notifyNews': True,
'notifyAccount': True,
'sliceTutorialCompleted': True},
'role': 'ADMINISTRATOR'
},
{
'id': 2,
'user': '[email protected]',
'email': '[email protected]',
'isLoggedIn': True,
'firstName': 'Example',
'lastName': 'User',
'imageUrl': None,
'settings': {'notifyNews': True,
'notifyAccount': True,
'sliceTutorialCompleted': True},
'role': 'MEMBER'
}
],
'invitations': [
{
'id': 3,
'user': '[email protected]',
'role': 'MEMBER',
'invited': True,
'link': 'http://app.fiddler.ai/signup/vSQWZkt3FP--pgzmuYe_-3-NNVuR58OLZalZOlvR0GY'
}
]
}
client.list_project_roles
Input Paraemter | Type | Default | Description |
---|---|---|---|
project_id | str | None | The unique identifier for the project. |
PROJECT_ID = 'example_project'
client.list_project_roles(
project_id=PROJECT_ID
)
Return Type | Description |
---|---|
dict | A dictionary of users and their roles for the specified project. |
{
'roles': [
{
'user': {
'email': '[email protected]'
},
'team': None,
'role': {
'name': 'OWNER'
}
},
{
'user': {
'email': '[email protected]'
},
'team': None,
'role': {
'name': 'READ'
}
}
]
}
client.list_teams
client.list_teams()
Return Type | Description |
---|---|
dict | A dictionary containing information about teams and users. |
{
'example_team': {
'members': [
{
'user': '[email protected]',
'role': 'MEMBER'
},
{
'user': '[email protected]',
'role': 'MEMBER'
}
]
}
}
client.share_project
Info
Administrators can share any project with any user. If you lack the required permissions to share a project, contact your organization administrator.
Input Paraemter | Type | Default | Description |
---|---|---|---|
project_id | str | None | The unique identifier for the project. |
role | str | None | The permissions role being shared. Can be one of - 'READ' - 'WRITE' - 'OWNER' |
user_name | Optional [str] | None | A username with which the project will be shared. Typically an email address. |
team_name | Optional [str] | None | A team with which the project will be shared. |
PROJECT_ID = 'example_project'
client.share_project(
project_name=PROJECT_ID,
role='READ',
user_name='[email protected]'
)
client.unshare_project
Info
Administrators and project owners can unshare any project with any user. If you lack the required permissions to unshare a project, contact your organization administrator.
Input Paraemter | Type | Default | Description |
---|---|---|---|
project_id | str | None | The unique identifier for the project. |
role | str | None | The permissions role being revoked. Can be one of - 'READ' - 'WRITE' - 'OWNER' |
user_name | Optional [str] | None | A username with which the project will be revoked. Typically an email address. |
team_name | Optional [str] | None | A team with which the project will be revoked. |
PROJECT_ID = 'example_project'
client.unshare_project(
project_name=PROJECT_ID,
role='READ',
user_name='[email protected]'
)
Fiddler Objects
fdl.DatasetInfo
For information on how to customize these objects, see Customizing Your Dataset Schema.
Input Parameters | Type | Default | Description |
---|---|---|---|
display_name | str | None | A display name for the dataset. |
columns | list | None | A list of fdl.Column objects containing information about the columns. |
files | Optional [list] | None | A list of strings pointing to CSV files to use. |
dataset_id | Optional [str] | None | The unique identifier for the dataset |
**kwargs | Additional arguments to be passed. |
columns = [
fdl.Column(
name='feature_1',
data_type=fdl.DataType.FLOAT
),
fdl.Column(
name='feature_2',
data_type=fdl.DataType.INTEGER
),
fdl.Column(
name='feature_3',
data_type=fdl.DataType.BOOLEAN
),
fdl.Column(
name='output_column',
data_type=fdl.DataType.FLOAT
),
fdl.Column(
name='target_column',
data_type=fdl.DataType.INTEGER
)
]
dataset_info = fdl.DatasetInfo(
display_name='Example Dataset',
columns=columns
)
fdl.DatasetInfo.from_dataframe
Input Parameters | Type | Default | Description |
---|---|---|---|
df | Union [pd.Dataframe, list] | Either a single pandas DataFrame or a list of DataFrames. If a list is given, all dataframes must have the same columns. | |
display_name | str | ' ' | A display_name for the dataset |
max_inferred_cardinality | Optional [int] | 100 | If specified, any string column containing fewer than max_inferred_cardinality unique values will be converted to a categorical data type. |
dataset_id | Optional [str] | None | The unique identifier for the dataset |
import pandas as pd
df = pd.read_csv('example_dataset.csv')
dataset_info = fdl.DatasetInfo.from_dataframe(df=df, max_inferred_cardinality=100)
Return Type | Description |
---|---|
fdl.DatasetInfo | A fdl.DatasetInfo() object constructed from the pandas Dataframe provided. |
fdl.DatasetInfo.from_dict
Input Parameters | Type | Default | Description |
---|---|---|---|
deserialized_json | dict | The dictionary object to be converted |
import pandas as pd
df = pd.read_csv('example_dataset.csv')
dataset_info = fdl.DatasetInfo.from_dataframe(df=df, max_inferred_cardinality=100)
dataset_info_dict = dataset_info.to_dict()
new_dataset_info = fdl.DatasetInfo.from_dict(
deserialized_json={
'dataset': dataset_info_dict
}
)
Return Type | Description |
---|---|
fdl.DatasetInfo | A fdl.DatasetInfo() object constructed from the dictionary. |
fdl.DatasetInfo.to_dict
Return Type | Description |
---|---|
dict | A dictionary containing information from the fdl.DatasetInfo() object. |
import pandas as pd
df = pd.read_csv('example_dataset.csv')
dataset_info = fdl.DatasetInfo.from_dataframe(df=df, max_inferred_cardinality=100)
dataset_info_dict = dataset_info.to_dict()
{
'name': 'Example Dataset',
'columns': [
{
'column-name': 'feature_1',
'data-type': 'float'
},
{
'column-name': 'feature_2',
'data-type': 'int'
},
{
'column-name': 'feature_3',
'data-type': 'bool'
},
{
'column-name': 'output_column',
'data-type': 'float'
},
{
'column-name': 'target_column',
'data-type': 'int'
}
],
'files': []
}
fdl.ModelInfo
Input Parameters | Type | Default | Description |
---|---|---|---|
display_name | str | A display name for the model. | |
input_type | fdl.ModelInputType | A ModelInputType object containing the input type of the model. | |
model_task | fdl.ModelTask | A ModelTask object containing the model task. | |
inputs | list | A list of Column objects corresponding to the inputs (features) of the model. | |
outputs | list | A list of Column objects corresponding to the outputs (predictions) of the model. | |
metadata | Optional [list] | None | A list of Column objects corresponding to any metadata fields. |
decisions | Optional [list] | None | A list of Column objects corresponding to any decision fields (post-prediction business decisions). |
targets | Optional [list] | None | A list of Column objects corresponding to the targets (ground truth) of the model. |
framework | Optional [str] | None | A string providing information about the software library and version used to train and run this model. |
description | Optional [str] | None | A description of the model. |
datasets | Optional [list] | None | A list of the dataset IDs used by the model. |
mlflow_params | Optional [fdl.MLFlowParams] | None | A MLFlowParams object containing information about MLFlow parameters. |
model_deployment_params | Optional [fdl.ModelDeploymentParams] | None | A ModelDeploymentParams object containing information about model deployment. |
artifact_status | Optional [fdl.ArtifactStatus] | None | An ArtifactStatus object containing information about the model artifact. |
preferred_explanation_method | Optional [fdl.ExplanationMethod] | None | An ExplanationMethod object that specifies the default explanation algorithm to use for the model. |
custom_explanation_names | Optional [list] | [ ] | A list of names that can be passed to the explanation_name _argument of the optional user-defined _explain_custom method of the model object defined in package.py. |
binary_classification_threshold | Optional [float] | .5 | The threshold used for classifying inferences for binary classifiers. |
ranking_top_k | Optional [int] | 50 | Used only for ranking models. Sets the top k results to take into consideration when computing performance metrics like MAP and NDCG. |
group_by | Optional [str] | None | Used only for ranking models. The column by which to group events for certain performance metrics like MAP and NDCG. |
fall_back | Optional [dict] | None | A dictionary mapping a column name to custom missing value encodings for that column. |
target_class_order | Optional [list] | None | A list denoting the order of classes in the target. This parameter is required in the following cases: - Binary classification tasks: If the target is of type string, you must tell Fiddler which class is considered the positive class for your output column. You need to provide a list with two elements. The 0th element by convention is considered the negative class, and the 1st element is considered the positive class. When your target is boolean, you don't need to specify this argument. By default Fiddler considers True as the positive class. In case your target is numerical, you don't need to specify this argument, by default Fiddler considers the higher of the two possible values as the positive class.- Multi-class classification tasks: You must tell Fiddler which class corresponds to which output by giving an ordered list of classes. This order should be the same as the order of the outputs. - Ranking tasks: If the target is of type string, you must provide a list of all the possible target values in the order of relevance. The first element will be considered as the least relevant grade and the last element from the list will be considered the most relevant grade. In the case your target is numerical, Fiddler considers the smallest value to be the least relevant grade and the biggest value from the list will be considered the most relevant grade. |
**kwargs | Additional arguments to be passed. |
inputs = [
fdl.Column(
name='feature_1',
data_type=fdl.DataType.FLOAT
),
fdl.Column(
name='feature_2',
data_type=fdl.DataType.INTEGER
),
fdl.Column(
name='feature_3',
data_type=fdl.DataType.BOOLEAN
)
]
outputs = [
fdl.Column(
name='output_column',
data_type=fdl.DataType.FLOAT
)
]
targets = [
fdl.Column(
name='target_column',
data_type=fdl.DataType.INTEGER
)
]
model_info = fdl.ModelInfo(
display_name='Example Model',
input_type=fdl.ModelInputType.TABULAR,
model_task=fdl.ModelTask.BINARY_CLASSIFICATION,
inputs=inputs,
outputs=outputs,
targets=targets
)
fdl.ModelInfo.from_dataset_info
Input Parameters | Type | Default | Description |
---|---|---|---|
dataset_info | fdl.DatasetInfo() | The DatasetInfo object from which to construct the ModelInfo object. | |
target | str | The column to be used as the target (ground truth). | |
model_task | fdl.ModelTask | None | A ModelTask object containing the model task. |
dataset_id | Optional [str] | None | The unique identifier for the dataset. |
features | Optional [list] | None | A list of columns to be used as features. |
custom_features | Optional[List[CustomFeature]] | None | List of Custom Features definitions for a model. Objects of type Multivariate, Vector, ImageEmbedding or TextEmbedding derived from CustomFeature can be provided. |
metadata_cols | Optional [list] | None | A list of columns to be used as metadata fields. |
decision_cols | Optional [list] | None | A list of columns to be used as decision fields. |
display_name | Optional [str] | None | A display name for the model. |
description | Optional [str] | None | A description of the model. |
input_type | Optional [fdl.ModelInputType] | fdl.ModelInputType.TABULAR | A ModelInputType object containing the input type of the model. |
outputs | Optional [list] | A list of Column objects corresponding to the outputs (predictions) of the model. | |
targets | Optional [list] | None | A list of Column objects corresponding to the targets (ground truth) of the model. |
model_deployment_params | Optional [fdl.ModelDeploymentParams] | None | A ModelDeploymentParams object containing information about model deployment. |
framework | Optional [str] | None | A string providing information about the software library and version used to train and run this model. |
datasets | Optional [list] | None | A list of the dataset IDs used by the model. |
mlflow_params | Optional [fdl.MLFlowParams] | None | A MLFlowParams object containing information about MLFlow parameters. |
preferred_explanation_method | Optional [fdl.ExplanationMethod] | None | An ExplanationMethod object that specifies the default explanation algorithm to use for the model. |
custom_explanation_names | Optional [list] | [ ] | A list of names that can be passed to the explanation_name _argument of the optional user-defined _explain_custom method of the model object defined in package.py. |
binary_classification_threshold | Optional [float] | .5 | The threshold used for classifying inferences for binary classifiers. |
ranking_top_k | Optional [int] | 50 | Used only for ranking models. Sets the top k results to take into consideration when computing performance metrics like MAP and NDCG. |
group_by | Optional [str] | None | Used only for ranking models. The column by which to group events for certain performance metrics like MAP and NDCG. |
fall_back | Optional [dict] | None | A dictionary mapping a column name to custom missing value encodings for that column. |
categorical_target_class_details | Optional [Union[list, int, str]] | None | A list denoting the order of classes in the target. This parameter is required in the following cases: - Binary classification tasks: If the target is of type string, you must tell Fiddler which class is considered the positive class for your output column. If you provide a single element, it is considered the positive class. Alternatively, you can provide a list with two elements. The 0th element by convention is considered the negative class, and the 1st element is considered the positive class. When your target is boolean, you don't need to specify this argument. By default Fiddler considers True as the positive class. In case your target is numerical, you don't need to specify this argument, by default Fiddler considers the higher of the two possible values as the positive class.- Multi-class classification tasks: You must tell Fiddler which class corresponds to which output by giving an ordered list of classes. This order should be the same as the order of the outputs. - Ranking tasks: If the target is of type string, you must provide a list of all the possible target values in the order of relevance. The first element will be considered as the least relevant grade and the last element from the list will be considered the most relevant grade. In the case your target is numerical, Fiddler considers the smallest value to be the least relevant grade and the biggest value from the list will be considered the most relevant grade. |
import pandas as pd
df = pd.read_csv('example_dataset.csv')
dataset_info = fdl.DatasetInfo.from_dataframe(
df=df
)
model_info = fdl.ModelInfo.from_dataset_info(
dataset_info=dataset_info,
features=[
'feature_1',
'feature_2',
'feature_3'
],
outputs=[
'output_column'
],
target='target_column',
input_type=fdl.ModelInputType.TABULAR,
model_task=fdl.ModelTask.BINARY_CLASSIFICATION
)
Return Type | Description |
---|---|
fdl.ModelInfo | A fdl.ModelInfo() object constructed from the fdl.DatasetInfo() object provided. |
fdl.ModelInfo.from_dict
Input Parameters | Type | Default | Description |
---|---|---|---|
deserialized_json | dict | The dictionary object to be converted |
import pandas as pd
df = pd.read_csv('example_dataset.csv')
dataset_info = fdl.DatasetInfo.from_dataframe(
df=df
)
model_info = fdl.ModelInfo.from_dataset_info(
dataset_info=dataset_info,
features=[
'feature_1',
'feature_2',
'feature_3'
],
outputs=[
'output_column'
],
target='target_column',
input_type=fdl.ModelInputType.TABULAR,
model_task=fdl.ModelTask.BINARY_CLASSIFICATION
)
model_info_dict = model_info.to_dict()
new_model_info = fdl.ModelInfo.from_dict(
deserialized_json={
'model': model_info_dict
}
)
Return Type | Description |
---|---|
fdl.ModelInfo | A fdl.ModelInfo() object constructed from the dictionary. |
fdl.ModelInfo.to_dict
Return Type | Description |
---|---|
dict | A dictionary containing information from the fdl.ModelInfo() object. |
import pandas as pd
df = pd.read_csv('example_dataset.csv')
dataset_info = fdl.DatasetInfo.from_dataframe(
df=df
)
model_info = fdl.ModelInfo.from_dataset_info(
dataset_info=dataset_info,
features=[
'feature_1',
'feature_2',
'feature_3'
],
outputs=[
'output_column'
],
target='target_column',
input_type=fdl.ModelInputType.TABULAR,
model_task=fdl.ModelTask.BINARY_CLASSIFICATION
)
model_info_dict = model_info.to_dict()
{
'name': 'Example Model',
'input-type': 'structured',
'model-task': 'binary_classification',
'inputs': [
{
'column-name': 'feature_1',
'data-type': 'float'
},
{
'column-name': 'feature_2',
'data-type': 'int'
},
{
'column-name': 'feature_3',
'data-type': 'bool'
},
{
'column-name': 'target_column',
'data-type': 'int'
}
],
'outputs': [
{
'column-name': 'output_column',
'data-type': 'float'
}
],
'datasets': [],
'targets': [
{
'column-name': 'target_column',
'data-type': 'int'
}
],
'custom-explanation-names': []
}