Fiddler Objects

fdl.DatasetInfo

For information on how to customize these objects, see Customizing Your Dataset Schema.

Input ParametersTypeDefaultDescription

display_name

str

None

A display name for the dataset.

columns

list

None

A list of fdl.Column objects containing information about the columns.

files

Optional [list]

None

A list of strings pointing to CSV files to use.

dataset_id

Optional [str]

None

The unique identifier for the dataset

**kwargs

Additional arguments to be passed.

columns = [
    fdl.Column(
        name='feature_1',
        data_type=fdl.DataType.FLOAT
    ),
    fdl.Column(
        name='feature_2',
        data_type=fdl.DataType.INTEGER
    ),
    fdl.Column(
        name='feature_3',
        data_type=fdl.DataType.BOOLEAN
    ),
    fdl.Column(
        name='output_column',
        data_type=fdl.DataType.FLOAT
    ),
    fdl.Column(
        name='target_column',
        data_type=fdl.DataType.INTEGER
    )
]

dataset_info = fdl.DatasetInfo(
    display_name='Example Dataset',
    columns=columns
)

fdl.DatasetInfo.from_dataframe

Input ParametersTypeDefaultDescription

df

Union [pd.Dataframe, list]

Either a single pandas DataFrame or a list of DataFrames. If a list is given, all dataframes must have the same columns.

display_name

str

' '

A display_name for the dataset

max_inferred_cardinality

Optional [int]

100

If specified, any string column containing fewer than max_inferred_cardinality unique values will be converted to a categorical data type.

dataset_id

Optional [str]

None

The unique identifier for the dataset

import pandas as pd

df = pd.read_csv('example_dataset.csv')

dataset_info = fdl.DatasetInfo.from_dataframe(df=df, max_inferred_cardinality=100)
Return TypeDescription

fdl.DatasetInfo

A fdl.DatasetInfo() object constructed from the pandas Dataframe provided.


fdl.DatasetInfo.from_dict

Input ParametersTypeDefaultDescription

deserialized_json

dict

The dictionary object to be converted

import pandas as pd

df = pd.read_csv('example_dataset.csv')

dataset_info = fdl.DatasetInfo.from_dataframe(df=df, max_inferred_cardinality=100)

dataset_info_dict = dataset_info.to_dict()

new_dataset_info = fdl.DatasetInfo.from_dict(
    deserialized_json={
        'dataset': dataset_info_dict
    }
)
Return TypeDescription

fdl.DatasetInfo

A fdl.DatasetInfo() object constructed from the dictionary.


fdl.DatasetInfo.to_dict

Return TypeDescription

dict

A dictionary containing information from the fdl.DatasetInfo() object.

import pandas as pd

df = pd.read_csv('example_dataset.csv')

dataset_info = fdl.DatasetInfo.from_dataframe(df=df, max_inferred_cardinality=100)

dataset_info_dict = dataset_info.to_dict()
{
    'name': 'Example Dataset',
    'columns': [
        {
            'column-name': 'feature_1',
            'data-type': 'float'
        },
        {
            'column-name': 'feature_2',
            'data-type': 'int'
        },
        {
            'column-name': 'feature_3',
            'data-type': 'bool'
        },
        {
            'column-name': 'output_column',
            'data-type': 'float'
        },
        {
            'column-name': 'target_column',
            'data-type': 'int'
        }
    ],
    'files': []
}


fdl.ModelInfo

| Input | Parameters | Type | Default | Description | | --- | --- | --- | --- | | display_name | str | | A display name for the model. | | input_type | fdl.ModelInputType | | A ModelInputType object containing the input type of the model. | | model_task | fdl.ModelTask | | A ModelTask object containing the model task. | | inputs | list | | A list of Column objects corresponding to the inputs (features) of the model. | | outputs | list | | A list of Column objects corresponding to the outputs (predictions) of the model. | | metadata | Optional [list] | None | A list of Column objects corresponding to any metadata fields. | | decisions | Optional [list] | None | A list of Column objects corresponding to any decision fields (post-prediction business decisions). | | targets | Optional [list] | None | A list of Column objects corresponding to the targets (ground truth) of the model. | | framework | Optional [str] | None | A string providing information about the software library and version used to train and run this model. | | description | Optional [str] | None | A description of the model. | | datasets | Optional [list] | None | A list of the dataset IDs used by the model. | | mlflow_params | Optional [fdl.MLFlowParams] | None | A MLFlowParams object containing information about MLFlow parameters. | | model_deployment_params | Optional [fdl.ModelDeploymentParams] | None | A ModelDeploymentParams object containing information about model deployment. | | artifact_status | Optional [fdl.ArtifactStatus] | None | An ArtifactStatus object containing information about the model artifact. | | preferred_explanation_method | Optional [fdl.ExplanationMethod] | None | An ExplanationMethod object that specifies the default explanation algorithm to use for the model. | | custom_explanation_names | Optional [list] | [ ] | A list of names that can be passed to the explanation_name _argument of the optional user-defined _explain_custom method of the model object defined in package.py. | | binary_classification_threshold | Optional [float] | .5 | The threshold used for classifying inferences for binary classifiers. | | ranking_top_k | Optional [int] | 50 | Used only for ranking models. Sets the top k results to take into consideration when computing performance metrics like MAP and NDCG. | | group_by | Optional [str] | None | Used only for ranking models. The column by which to group events for certain performance metrics like MAP and NDCG. | | fall_back | Optional [dict] | None | A dictionary mapping a column name to custom missing value encodings for that column. | | target_class_order | Optional [list] | None | A list denoting the order of classes in the target. This parameter is required in the following cases: - Binary classification tasks: If the target is of type string, you must tell Fiddler which class is considered the positive class for your output column. You need to provide a list with two elements. The 0th element by convention is considered the negative class, and the 1st element is considered the positive class. When your target is boolean, you don't need to specify this argument. By default Fiddler considers True as the positive class. In case your target is numerical, you don't need to specify this argument, by default Fiddler considers the higher of the two possible values as the positive class. - Multi-class classification tasks: You must tell Fiddler which class corresponds to which output by giving an ordered list of classes. This order should be the same as the order of the outputs. - Ranking tasks: If the target is of type string, you must provide a list of all the possible target values in the order of relevance. The first element will be considered as the least relevant grade and the last element from the list will be considered the most relevant grade. In the case your target is numerical, Fiddler considers the smallest value to be the least relevant grade and the biggest value from the list will be considered the most relevant grade. | | **kwargs | | | Additional arguments to be passed. |

inputs = [
    fdl.Column(
        name='feature_1',
        data_type=fdl.DataType.FLOAT
    ),
    fdl.Column(
        name='feature_2',
        data_type=fdl.DataType.INTEGER
    ),
    fdl.Column(
        name='feature_3',
        data_type=fdl.DataType.BOOLEAN
    )
]

outputs = [
    fdl.Column(
        name='output_column',
        data_type=fdl.DataType.FLOAT
    )
]

targets = [
    fdl.Column(
        name='target_column',
        data_type=fdl.DataType.INTEGER
    )
]

model_info = fdl.ModelInfo(
    display_name='Example Model',
    input_type=fdl.ModelInputType.TABULAR,
    model_task=fdl.ModelTask.BINARY_CLASSIFICATION,
    inputs=inputs,
    outputs=outputs,
    targets=targets
)

fdl.ModelInfo.from_dataset_info

| Input | Parameters | Type | Default | Description | | --- | --- | --- | --- | | dataset_info | fdl.DatasetInfo() | | The DatasetInfo object from which to construct the ModelInfo object. | | target | str | | The column to be used as the target (ground truth). | | model_task | fdl.ModelTask | None | A ModelTask object containing the model task. | | dataset_id | Optional [str] | None | The unique identifier for the dataset. | | features | Optional [list] | None | A list of columns to be used as features. | | custom_features | Optional[List[CustomFeature]] | None | List of Custom Features definitions for a model. Objects of type Multivariate, Vector, ImageEmbedding or TextEmbedding derived from CustomFeature can be provided. | | metadata_cols | Optional [list] | None | A list of columns to be used as metadata fields. | | decision_cols | Optional [list] | None | A list of columns to be used as decision fields. | | display_name | Optional [str] | None | A display name for the model. | | description | Optional [str] | None | A description of the model. | | input_type | Optional [fdl.ModelInputType] | fdl.ModelInputType.TABULAR | A ModelInputType object containing the input type of the model. | | outputs | Optional [list] | | A list of Column objects corresponding to the outputs (predictions) of the model. | | targets | Optional [list] | None | A list of Column objects corresponding to the targets (ground truth) of the model. | | model_deployment_params | Optional [fdl.ModelDeploymentParams] | None | A ModelDeploymentParams object containing information about model deployment. | | framework | Optional [str] | None | A string providing information about the software library and version used to train and run this model. | | datasets | Optional [list] | None | A list of the dataset IDs used by the model. | | mlflow_params | Optional [fdl.MLFlowParams] | None | A MLFlowParams object containing information about MLFlow parameters. | | preferred_explanation_method | Optional [fdl.ExplanationMethod] | None | An ExplanationMethod object that specifies the default explanation algorithm to use for the model. | | custom_explanation_names | Optional [list] | [ ] | A list of names that can be passed to the explanation_name _argument of the optional user-defined _explain_custom method of the model object defined in package.py. | | binary_classification_threshold | Optional [float] | .5 | The threshold used for classifying inferences for binary classifiers. | | ranking_top_k | Optional [int] | 50 | Used only for ranking models. Sets the top k results to take into consideration when computing performance metrics like MAP and NDCG. | | group_by | Optional [str] | None | Used only for ranking models. The column by which to group events for certain performance metrics like MAP and NDCG. | | fall_back | Optional [dict] | None | A dictionary mapping a column name to custom missing value encodings for that column. | | categorical_target_class_details | Optional [Union[list, int, str]] | None | A list denoting the order of classes in the target. This parameter is required in the following cases: - Binary classification tasks: If the target is of type string, you must tell Fiddler which class is considered the positive class for your output column. If you provide a single element, it is considered the positive class. Alternatively, you can provide a list with two elements. The 0th element by convention is considered the negative class, and the 1st element is considered the positive class. When your target is boolean, you don't need to specify this argument. By default Fiddler considers True as the positive class. In case your target is numerical, you don't need to specify this argument, by default Fiddler considers the higher of the two possible values as the positive class. - Multi-class classification tasks: You must tell Fiddler which class corresponds to which output by giving an ordered list of classes. This order should be the same as the order of the outputs. - Ranking tasks: If the target is of type string, you must provide a list of all the possible target values in the order of relevance. The first element will be considered as the least relevant grade and the last element from the list will be considered the most relevant grade. In the case your target is numerical, Fiddler considers the smallest value to be the least relevant grade and the biggest value from the list will be considered the most relevant grade. |

import pandas as pd

df = pd.read_csv('example_dataset.csv')

dataset_info = fdl.DatasetInfo.from_dataframe(
    df=df
)

model_info = fdl.ModelInfo.from_dataset_info(
    dataset_info=dataset_info,
    features=[
        'feature_1',
        'feature_2',
        'feature_3'
    ],
    outputs=[
        'output_column'
    ],
    target='target_column',
    input_type=fdl.ModelInputType.TABULAR,
    model_task=fdl.ModelTask.BINARY_CLASSIFICATION
)
Return TypeDescription

fdl.ModelInfo

A fdl.ModelInfo() object constructed from the fdl.DatasetInfo() object provided.


fdl.ModelInfo.from_dict

Input ParametersTypeDefaultDescription

deserialized_json

dict

The dictionary object to be converted

import pandas as pd

df = pd.read_csv('example_dataset.csv')

dataset_info = fdl.DatasetInfo.from_dataframe(
    df=df
)

model_info = fdl.ModelInfo.from_dataset_info(
    dataset_info=dataset_info,
    features=[
        'feature_1',
        'feature_2',
        'feature_3'
    ],
    outputs=[
        'output_column'
    ],
    target='target_column',
    input_type=fdl.ModelInputType.TABULAR,
    model_task=fdl.ModelTask.BINARY_CLASSIFICATION
)

model_info_dict = model_info.to_dict()

new_model_info = fdl.ModelInfo.from_dict(
    deserialized_json={
        'model': model_info_dict
    }
)
Return TypeDescription

fdl.ModelInfo

A fdl.ModelInfo() object constructed from the dictionary.


fdl.ModelInfo.to_dict

Return TypeDescription

dict

A dictionary containing information from the fdl.ModelInfo() object.

import pandas as pd

df = pd.read_csv('example_dataset.csv')

dataset_info = fdl.DatasetInfo.from_dataframe(
    df=df
)

model_info = fdl.ModelInfo.from_dataset_info(
    dataset_info=dataset_info,
    features=[
        'feature_1',
        'feature_2',
        'feature_3'
    ],
    outputs=[
        'output_column'
    ],
    target='target_column',
    input_type=fdl.ModelInputType.TABULAR,
    model_task=fdl.ModelTask.BINARY_CLASSIFICATION
)

model_info_dict = model_info.to_dict()
{
    'name': 'Example Model',
    'input-type': 'structured',
    'model-task': 'binary_classification',
    'inputs': [
        {
            'column-name': 'feature_1',
            'data-type': 'float'
        },
        {
            'column-name': 'feature_2',
            'data-type': 'int'
        },
        {
            'column-name': 'feature_3',
            'data-type': 'bool'
        },
        {
            'column-name': 'target_column',
            'data-type': 'int'
        }
    ],
    'outputs': [
        {
            'column-name': 'output_column',
            'data-type': 'float'
        }
    ],
    'datasets': [],
    'targets': [
        {
            'column-name': 'target_column',
            'data-type': 'int'
        }
    ],
    'custom-explanation-names': []
}


fdl.WeightingParams

Holds weighting information for class imbalanced models which can then be passed into a fdl.ModelInfo object. Please note that the use of weighting params requires the presence of model outputs in the baseline dataset.

Input ParametersTypeDefaultDescription

class_weight

List[float]

None

List of floats representing weights for each of the classes. The length must equal the no. of classes.

weighted_reference_histograms

bool

True

Flag indicating if baseline histograms must be weighted or not when calculating drift metrics.

weighted_surrogate_training

bool

True

Flag indicating if weighting scheme should be used when training the surrogate model.

import pandas as pd
import sklearn.utils
import fiddler as fdl

df = pd.read_csv('example_dataset.csv')
computed_weight = sklearn.utils.class_weight.compute_class_weight(
        class_weight='balanced',
        classes=np.unique(df[TARGET_COLUMN]),
        y=df[TARGET_COLUMN]
    ).tolist()
weighting_params =  fdl.WeightingParams(class_weight=computed_weight)
dataset_info = fdl.DatasetInfo.from_dataframe(df=df)

model_info = fdl.ModelInfo.from_dataset_info(
    dataset_info=dataset_info,
    features=[
        'feature_1',
        'feature_2',
        'feature_3'
    ],
    outputs=['output_column'],
    target='target_column',
    weighting_params=weighting_params,
    input_type=fdl.ModelInputType.TABULAR,
    model_task=fdl.ModelTask.BINARY_CLASSIFICATION
)


fdl.ModelInputType

Enum ValueDescription

fdl.ModelInputType.TABULAR

For tabular models.

fdl.ModelInputType.TEXT

For text models.

model_input_type = fdl.ModelInputType.TABULAR


fdl.ModelTask

Represents supported model tasks

Enum ValueDescription

fdl.ModelTask.REGRESSION

For regression models.

fdl.ModelTask.BINARY_CLASSIFICATION

For binary classification models

fdl.ModelTask.MULTICLASS_CLASSIFICATION

For multiclass classification models

fdl.ModelTask.RANKING

For ranking classification models

fdl.ModelTask.LLM

For LLM models.

fdl.ModelTask.NOT_SET

For other model tasks or no model task specified.

model_task = fdl.ModelTask.BINARY_CLASSIFICATION


fdl.DataType

Represents supported data types.

Enum ValueDescription

fdl.DataType.FLOAT

For floats.

fdl.DataType.INTEGER

For integers.

fdl.DataType.BOOLEAN

For booleans.

fdl.DataType.STRING

For strings.

fdl.DataType.CATEGORY

For categorical types.

fdl.DataType.VECTOR

For vector types

data_type = fdl.DataType.FLOAT


fdl.Column

Represents a column of a dataset.

Input ParameterTypeDefaultDescription

name

str

None

The name of the column

data_type

None

The fdl.DataType object corresponding to the data type of the column.

possible_values

Optional [list]

None

A list of unique values used for categorical columns.

is_nullable

Optional [bool]

None

If True, will expect missing values in the column.

value_range_min

Optional [float]

None

The minimum value used for numeric columns.

value_range_max

Optional [float]

None

The maximum value used for numeric columns.

column = fdl.Column(
    name='feature_1',
    data_type=fdl.DataType.FLOAT,
    value_range_min=0.0,
    value_range_max=80.0
)


fdl.DeploymentParams

Supported from server version 23.1 and above with Model Deployment feature enabled.

Input ParameterTypeDefaultDescription

image_uri

Optional[str]

md-base/python/machine-learning:1.0.1

Reference to the docker image to create a new runtime to serve the model. Check the available images on the Model Deployment page.

replicas

Optional[int]

1

The number of replicas running the model. Minimum value: 1 Maximum value: 10 Default value: 1

memory

Optional[int]

256

The amount of memory (mebibytes) reserved per replica. Minimum value: 150 Maximum value: 16384 (16GiB) Default value: 256

cpu

Optional[int]

100

The amount of CPU (milli cpus) reserved per replica. Minimum value: 10 Maximum value: 4000 (4vCPUs) Default value: 100

deployment_params = fdl.DeploymentParams(
        image_uri="md-base/python/machine-learning:1.1.0",
        cpu=250,
        memory=512,
  		  replicas=1,
)

📘 What parameters should I set for my model?

Setting the right parameters might not be straightforward and Fiddler is here to help you.

The parameters might vary depending the number of input features used, the pre-processing steps used and the model itself.

This table is helping you defining the right parameters

  1. Surrogate Models guide

Number of input featuresMemory (mebibytes)CPU (milli cpus)

< 10

250 (default)

100 (default)

< 20

400

300

< 50

600

400

<100

850

900

<200

1600

1200

<300

2000

1200

<400

2800

1300

<500

2900

1500

  1. User Uploaded guide

For uploading your artifact model, refer to the table above and increase the memory number, depending on your model framework and complexity. Surrogate models use lightgbm framework.

For example, an NLP model for a TEXT input might need memory set at 1024 or higher and CPU at 1000.

📘 Usage Reference

See the usage with:

Check more about the Model Deployment feature set.



fdl.ComparePeriod

Required when compare_to = CompareTo.TIME_PERIOD, this field is used to set when comparing against the same bin for a previous time period. Choose from the following:

Enumsvalues

fdl.ComparePeriod.ONE_DAY

86400000 millisecond i.e 1 day

fdl.ComparePeriod.SEVEN_DAYS

604800000 millisecond i.e 7 days

fdl.ComparePeriod.ONE_MONTH

2629743000 millisecond i.e 30 days

fdl.ComparePeriod.THREE_MONTHS

7776000000 millisecond i.e 90 days

import fiddler as fdl

client.add_alert_rule(
    name = "perf-gt-5prec-1hr-1d-ago",
    project_name = 'project-a',
    model_name = 'model-a',
    alert_type = fdl.AlertType.PERFORMANCE, 
    metric = fdl.Metric.PRECISION,
    bin_size = fdl.BinSize.ONE_HOUR, 
    compare_to = fdl.CompareTo.TIME_PERIOD,
    compare_period = fdl.ComparePeriod.ONE_DAY, <----
    warning_threshold = 0.05,
    critical_threshold = 0.1,
    condition = fdl.AlertCondition.GREATER,
    priority = fdl.Priority.HIGH,
    notifications_config = notifications_config
)
[AlertRule(alert_rule_uuid='9b8711fa-735e-4a72-977c-c4c8b16543ae',
           organization_name='some_org_name',
           project_id='project-a',
           model_id='model-a',
           name='perf-gt-5prec-1hr-1d-ago',
           alert_type=AlertType.PERFORMANCE, 
           metric=Metric.PRECISION,
           priority=Priority.HIGH,
           compare_to='CompareTo.TIME_PERIOD,
           compare_period=ComparePeriod.ONE_DAY, <----
           compare_threshold=None,
           raw_threshold=None,
           warning_threshold=0.05,
           critical_threshold=0.1,
           condition=AlertCondition.GREATER,
           bin_size=BinSize.ONE_HOUR)]


fdl.AlertCondition

If condition = fdl.AlertCondition.GREATER/LESSER is specified, and an alert is triggered every time the metric value is greater/lesser than the specified threshold.

EnumValue

fdl.AlertCondition.GREATER

greater

fdl.AlertCondition.LESSER

lesser

import fiddler as fdl

client.add_alert_rule(
    name = "perf-gt-5prec-1hr-1d-ago",
    project_name = 'project-a',
    model_name = 'model-a',
    alert_type = fdl.AlertType.PERFORMANCE, 
    metric = fdl.Metric.PRECISION,
    bin_size = fdl.BinSize.ONE_HOUR, 
    compare_to = fdl.CompareTo.TIME_PERIOD,
    compare_period = fdl.ComparePeriod.ONE_DAY,
    warning_threshold = 0.05,
    critical_threshold = 0.1,
    condition = fdl.AlertCondition.GREATER, <-----
    priority = fdl.Priority.HIGH,
    notifications_config = notifications_config
)
[AlertRule(alert_rule_uuid='9b8711fa-735e-4a72-977c-c4c8b16543ae',
           organization_name='some_org_name',
           project_id='project-a',
           model_id='model-a',
           name='perf-gt-5prec-1hr-1d-ago',
           alert_type=AlertType.PERFORMANCE, <---
           metric=Metric.PRECISION,
           priority=Priority.HIGH,
           compare_to='CompareTo.TIME_PERIOD,
           compare_period=ComparePeriod.ONE_DAY,
           compare_threshold=None,
           raw_threshold=None,
           warning_threshold=0.05,
           critical_threshold=0.1,
           condition=AlertCondition.GREATER, <-----
           bin_size=BinSize.ONE_HOUR)]


fdl.CompareTo

Whether the metric value is to be compared against a static value or the same time bin from a previous time period(set using compare_period[ComparePeriod]).

EnumsValue

fdl.CompareTo.RAW_VALUE

When comparing to an absolute value

fdl.CompareTo.TIME_PERIOD

When comparing to the same bin size from a previous time period

import fiddler as fdl

client.add_alert_rule(
    name = "perf-gt-5prec-1hr-1d-ago",
    project_name = 'project-a',
    model_name = 'binary_classification_model-a',
    alert_type = fdl.AlertType.PERFORMANCE,
    metric = fdl.Metric.PRECISION,
    bin_size = fdl.BinSize.ONE_HOUR, 
    compare_to = fdl.CompareTo.TIME_PERIOD, <----
    compare_period = fdl.ComparePeriod.ONE_DAY,
    warning_threshold = 0.05,
    critical_threshold = 0.1,
    condition = fdl.AlertCondition.GREATER,
    priority = fdl.Priority.HIGH,
    notifications_config = notifications_config
)
[AlertRule(alert_rule_uuid='9b8711fa-735e-4a72-977c-c4c8b16543ae',
           organization_name='some_org_name',
           project_id='project-a',
           model_id='binary_classification_model-a',
           name='perf-gt-5prec-1hr-1d-ago',
           alert_type=AlertType.PERFORMANCE,
           metric=Metric.PRECISION,
           priority=Priority.HIGH,
           compare_to='CompareTo.TIME_PERIOD, <---
           compare_period=ComparePeriod.ONE_DAY,
           compare_threshold=None,
           raw_threshold=None,
           warning_threshold=0.05,
           critical_threshold=0.1,
           condition=AlertCondition.GREATER,
           bin_size=BinSize.ONE_HOUR)]


fdl.BinSize

**This field signifies the durations for which fiddler monitoring calculates the metric values **

EnumsValues

fdl.BinSize.ONE_HOUR

3600 * 1000 millisecond i.e one hour

fdl.BinSize.ONE_DAY

86400 * 1000 millisecond i.e one day

fdl.BinSize.SEVEN_DAYS

604800 * 1000 millisecond i.e seven days

import fiddler as fdl

client.add_alert_rule(
    name = "perf-gt-5prec-1hr-1d-ago",
    project_name = 'project-a',
    model_name = 'model-a',
    alert_type = fdl.AlertType.PERFORMANCE, 
    metric = fdl.Metric.PRECISION,
    bin_size = fdl.BinSize.ONE_HOUR, <----
    compare_to = fdl.CompareTo.TIME_PERIOD,
    compare_period = fdl.ComparePeriod.ONE_DAY,
    warning_threshold = 0.05,
    critical_threshold = 0.1,
    condition = fdl.AlertCondition.GREATER,
    priority = fdl.Priority.HIGH,
    notifications_config = notifications_config
)
[AlertRule(alert_rule_uuid='9b8711fa-735e-4a72-977c-c4c8b16543ae',
           organization_name='some_org_name',
           project_id='project-a',
           model_id='model-a',
           name='perf-gt-5prec-1hr-1d-ago',
           alert_type=AlertType.PERFORMANCE, 
           metric=Metric.PRECISION,
           priority=Priority.HIGH,
           compare_to='CompareTo.TIME_PERIOD,
           compare_period=ComparePeriod.ONE_DAY,
           compare_threshold=None,
           raw_threshold=None,
           warning_threshold=0.05,
           critical_threshold=0.1,
           condition=AlertCondition.GREATER,
           bin_size=BinSize.ONE_HOUR)] <-----


fdl.Priority

This field can be used to prioritize the alert rules by adding an identifier - low, medium, and high to help users better categorize them on the basis of their importance. Following are the Priority Enums:

EnumsValues

fdl.Priority.HIGH

HIGH

fdl.Priority.MEDIUM

MEDIUM

fdl.Priority.LOW

LOW

import fiddler as fdl

client.add_alert_rule(
    name = "perf-gt-5prec-1hr-1d-ago",
    project_name = 'project-a',
    model_name = 'model-a',
    alert_type = fdl.AlertType.PERFORMANCE, 
    metric = fdl.Metric.PRECISION,
    bin_size = fdl.BinSize.ONE_HOUR, 
    compare_to = fdl.CompareTo.TIME_PERIOD,
    compare_period = fdl.ComparePeriod.ONE_DAY,
    warning_threshold = 0.05,
    critical_threshold = 0.1,
    condition = fdl.AlertCondition.GREATER,
    priority = fdl.Priority.HIGH, <---
    notifications_config = notifications_config
)
[AlertRule(alert_rule_uuid='9b8711fa-735e-4a72-977c-c4c8b16543ae',
           organization_name='some_org_name',
           project_id='project-a',
           model_id='model-a',
           name='perf-gt-5prec-1hr-1d-ago',
           alert_type=AlertType.PERFORMANCE, 
           metric=Metric.PRECISION,
           priority=Priority.HIGH, <----
           compare_to='CompareTo.TIME_PERIOD,
           compare_period=ComparePeriod.ONE_DAY,
           compare_threshold=None,
           raw_threshold=None,
           warning_threshold=0.05,
           critical_threshold=0.1,
           condition=AlertCondition.GREATER,
           bin_size=BinSize.ONE_HOUR)]


fdl.Metric

Following is the list of metrics, with corresponding alert type and model task, for which an alert rule can be created.

| Enum | Values | Supported | for | Alert | Types | (ModelTask | restriction | if | any) | Description | | --- | --- | --- | | fdl.Metric.SUM | fdl.AlertType.STATISTIC | Sum of all values of a column across all events | | fdl.Metric.AVERAGE | fdl.AlertType.STATISTIC | Average value of a column across all events | | fdl.Metric.FREQUENCY | fdl.AlertType.STATISTIC | Frequency count of a specific value in a categorical column | | fdl.Metric.PSI | fdl.AlertType.DATA_DRIFT | Population Stability Index | | fdl.Metric.JSD | fdl.AlertType.DATA_DRIFT | Jensen–Shannon divergence | | fdl.Metric.MISSING_VALUE | fdl.AlertType.DATA_INTEGRITY | Missing Value | | fdl.Metric.TYPE_VIOLATION | fdl.AlertType.DATA_INTEGRITY | Type Violation | | fdl.Metric.RANGE_VIOLATION | fdl.AlertType.DATA_INTEGRITY | Range violation | | fdl.Metric.TRAFFIC | fdl.AlertType.SERVICE_METRICS | Traffic Count | | fdl.Metric.ACCURACY | fdl.AlertType.PERFORMANCE (fdl.ModelTask.BINARY_CLASSIFICATION, fdl.ModelTask.MULTICLASS_CLASSIFICATION) | Accuracy | | fdl.Metric.RECALL | fdl.AlertType.PERFORMANCE (fdl.ModelTask.BINARY_CLASSIFICATION) | Recall | | fdl.Metric.FPR | fdl.AlertType.PERFORMANCE (fdl.ModelTask.BINARY_CLASSIFICATION) | False Positive Rate | | fdl.Metric.PRECISION | fdl.AlertType.PERFORMANCE (fdl.ModelTask.BINARY_CLASSIFICATION) | Precision | | fdl.Metric.TPR | fdl.AlertType.PERFORMANCE (fdl.ModelTask.BINARY_CLASSIFICATION) | True Positive Rate | | fdl.Metric.AUC | fdl.AlertType.PERFORMANCE (fdl.ModelTask.BINARY_CLASSIFICATION) | Area under the ROC Curve | | fdl.Metric.F1_SCORE | fdl.AlertType.PERFORMANCE (fdl.ModelTask.BINARY_CLASSIFICATION) | F1 score | | fdl.Metric.ECE | fdl.AlertType.PERFORMANCE (fdl.ModelTask.BINARY_CLASSIFICATION) | Expected Calibration Error | | fdl.Metric.R2 | fdl.AlertType.PERFORMANCE (fdl.ModelTask.REGRESSION) | R Squared | | fdl.Metric.MSE | fdl.AlertType.PERFORMANCE (fdl.ModelTask.REGRESSION) | Mean squared error | | fdl.Metric.MAPE | fdl.AlertType.PERFORMANCE (fdl.ModelTask.REGRESSION) | Mean Absolute Percentage Error | | fdl.Metric.WMAPE | fdl.AlertType.PERFORMANCE (fdl.ModelTask.REGRESSION) | Weighted Mean Absolute Percentage Error | | fdl.Metric.MAE | fdl.AlertType.PERFORMANCE (fdl.ModelTask.REGRESSION) | Mean Absolute Error | | fdl.Metric.LOG_LOSS | fdl.AlertType.PERFORMANCE (fdl.ModelTask.MULTICLASS_CLASSIFICATION) | Log Loss | | fdl.Metric.MAP | fdl.AlertType.PERFORMANCE (fdl.ModelTask.RANKING) | Mean Average Precision | | fdl.Metric.MEAN_NDCG | fdl.AlertType.PERFORMANCE (fdl.ModelTask.RANKING) | Normalized Discounted Cumulative Gain |

import fiddler as fdl

client.add_alert_rule(
    name = "perf-gt-5prec-1hr-1d-ago",
    project_name = 'project-a',
    model_name = 'binary_classification_model-a',
    alert_type = fdl.AlertType.PERFORMANCE,
    metric = fdl.Metric.PRECISION, <----
    bin_size = fdl.BinSize.ONE_HOUR, 
    compare_to = fdl.CompareTo.TIME_PERIOD,
    compare_period = fdl.ComparePeriod.ONE_DAY,
    warning_threshold = 0.05,
    critical_threshold = 0.1,
    condition = fdl.AlertCondition.GREATER,
    priority = fdl.Priority.HIGH,
    notifications_config = notifications_config
)
[AlertRule(alert_rule_uuid='9b8711fa-735e-4a72-977c-c4c8b16543ae',
           organization_name='some_org_name',
           project_id='project-a',
           model_id='binary_classification_model-a',
           name='perf-gt-5prec-1hr-1d-ago',
           alert_type=AlertType.PERFORMANCE,
           metric=Metric.PRECISION, <---
           priority=Priority.HIGH,
           compare_to='CompareTo.TIME_PERIOD,
           compare_period=ComparePeriod.ONE_DAY,
           compare_threshold=None,
           raw_threshold=None,
           warning_threshold=0.05,
           critical_threshold=0.1,
           condition=AlertCondition.GREATER,
           bin_size=BinSize.ONE_HOUR)]


fdl.AlertType

Enum ValueDescription

fdl.AlertType.DATA_DRIFT

For drift alert type

fdl.AlertType.PERFORMANCE

For performance alert type

fdl.AlertType.DATA_INTEGRITY

For data integrity alert type

fdl.AlertType.SERVICE_METRICS

For service metrics alert type

fdl.AlertType.STATISTIC

For statistics of a feature

client.add_alert_rule(
    name = "perf-gt-5prec-1hr-1d-ago",
    project_name = 'project-a',
    model_name = 'model-a',
    alert_type = fdl.AlertType.PERFORMANCE, <---
    metric = fdl.Metric.PRECISION,
    bin_size = fdl.BinSize.ONE_HOUR, 
    compare_to = fdl.CompareTo.TIME_PERIOD,
    compare_period = fdl.ComparePeriod.ONE_DAY,
    warning_threshold = 0.05,
    critical_threshold = 0.1,
    condition = fdl.AlertCondition.GREATER,
    priority = fdl.Priority.HIGH,
    notifications_config = notifications_config
)
[AlertRule(alert_rule_uuid='9b8711fa-735e-4a72-977c-c4c8b16543ae',
           organization_name='some_org_name',
           project_id='project-a',
           model_id='model-a',
           name='perf-gt-5prec-1hr-1d-ago',
           alert_type=AlertType.PERFORMANCE, <---
           metric=Metric.PRECISION,
           priority=Priority.HIGH,
           compare_to='CompareTo.TIME_PERIOD,
           compare_period=ComparePeriod.ONE_DAY,
           compare_threshold=None,
           raw_threshold=None,
           warning_threshold=0.05,
           critical_threshold=0.1,
           condition=AlertCondition.GREATER,
           bin_size=BinSize.ONE_HOUR)]


fdl.WindowSize

EnumValue

fdl.WindowSize.ONE_HOUR

3600

fdl.WindowSize.ONE_DAY

86400

fdl.WindowSize.ONE_WEEK

604800

fdl.WindowSize.ONE_MONTH

2592000

from fiddler import BaselineType, WindowSize

PROJECT_NAME = 'example_project'
BASELINE_NAME = 'example_rolling'
DATASET_NAME = 'example_validation'
MODEL_NAME = 'example_model'

client.add_baseline(
  project_id=PROJECT_NAME,
  model_id=MODEL_NAME,
  baseline_id=BASELINE_NAME,
  type=BaselineType.ROLLING_PRODUCTION,
  offset=WindowSize.ONE_MONTH, # How far back to set our window
  window_size=WindowSize.ONE_WEEK, # Size of the sliding window
)


fdl.CustomFeatureType

EnumValue

FROM_COLUMNS

Represents custom features derived directly from columns.

FROM_VECTOR

Represents custom features derived from a vector column.

FROM_TEXT_EMBEDDING

Represents custom features derived from text embeddings.

FROM_IMAGE_EMBEDDING

Represents custom features derived from image embeddings.

ENRICHMENT

Represents enrichment custom feature.



fdl.CustomFeature

This is the base class that all other custom features inherit from. It's flexible enough to accommodate different types of derived features. Note: All of the derived feature classes (e.g., Multivariate, VectorFeature, etc.) inherit from CustomFeature and thus have its properties, in addition to their specific ones.

Input ParameterTypeDefaultDescription

name

str

None

The name of the custom feature.

type

None

The type of custom feature. Must be one of the CustomFeatureType enum values.

n_clusters

Optional[int]

5

The number of clusters.

centroids

Optional[List]

None

Centroids of the clusters in the embedded space. Number of centroids equal to n_clusters.

columns

Optional[List[str]]

None

For FROM_COLUMNS type, represents the original columns from which the feature is derived.

column

Optional[str]

None

Used for vector-derived features, the original vector column name.

source_column

Optional[str]

None

Specifies the original column name for embedding-derived features.

n_tags

Optional[int]

5

For FROM_TEXT_EMBEDDING type, represents the number of tags for each cluster in the tfidf summarization in drift computation.

# use from_columns helper function to generate a custom feature combining multiple numeric columns

feature = fdl.CustomFeature.from_columns(
    name='my_feature',
    columns=['column_1', 'column_2'],
    n_clusters=5
)

fdl.Multivariate

Represents custom features derived from multiple columns.

Input ParameterTypeDefaultDescription

columns

List[str]

None

List of original columns from which this feature is derived.

n_clusters

Optional[int]

5

The number of clusters.

centroids

Optional[List]

Centroids of the clusters in the embedded space. Number of centroids equal to n_clusters

Centroids of the clusters in the embedded space. Number of centroids equal to n_clusters

monitor_components

bool

False

Whether to monitor each column in columns as individual feature. If set to True, components are monitored and drift will be available.

multivariate_feature = fdl.Multivariate(
    name='multi_feature',
    columns=['column_1', 'column_2']
)

fdl.VectorFeature

Represents custom features derived from a single vector column.

Input ParameterTypeDefaultDescription

source_column

Optional[str]

None

Specifies the original column if this feature is derived from an embedding.

column

str

None

The vector column name.

n_clusters

Optional[int]

5

The number of clusters.

centroids

Optional[List[List[float]]]

Centroids of the clusters in the embedded space. Number of centroids equal to n_clusters

Centroids of the clusters in the embedded space. Number of centroids equal to n_clusters

vector_feature = fdl.VectorFeature(
    name='vector_feature',
    column='vector_column'
)

fdl.TextEmbedding

Represents custom features derived from text embeddings.

Input ParameterTypeDefaultDescription

source_column

str

Required

Specifies the column name where text data (e.g. LLM prompts) is stored

column

str

Required

Specifies the column name where the embeddings corresponding to source_col are stored

n_tags

Optional[int]

5

How many tags(tokens) the text embedding are used in each cluster as the tfidf summarization in drift computation.

n_clusters

Optional[int]

5

The number of clusters.

centroids

Optional[List]

Centroids of the clusters in the embedded space. Number of centroids equal to n_clusters

Centroids of the clusters in the embedded space. Number of centroids equal to n_clusters

text_embedding_feature = TextEmbedding(
    name='text_custom_feature',
    source_column='text_column',
    column='text_embedding',
    n_tags=10
)

fdl.ImageEmbedding

Represents custom features derived from image embeddings.

Input ParameterTypeDefaultDescription

source_column

str

Required

URL where image data is stored

column

str

Required

Specifies the column name where embeddings corresponding to source_col are stored.

n_clusters

Optional[int]

5

The number of clusters

centroids

Optional[List]

Centroids of the clusters in the embedded space. Number of centroids equal to n_clusters

Centroids of the clusters in the embedded space. Number of centroids equal to n_clusters

image_embedding_feature = fdl.ImageEmbedding(
    name='image_feature',
    source_column='image_url',
    column='image_embedding',
)


fdl.Enrichment (beta)

  • Enrichments are custom features designed to augment data provided in events.

  • They add new computed columns to your published data automatically whenever defined.

  • The new columns generated are available for querying in analyze, charting, and alerting, similar to any other column.

Input ParameterTypeDefaultDescription

name

str

The name of the custom feature to generate

enrichment

str

The enrichment operation to be applied

columns

List[str]

The column names on which the enrichment depends

config

Optional[List]

{}

(optional): Configuration specific to an enrichment operation which controls the behavior of the enrichment

# Automatically generating embedding for a column named “question”

fdl.ModelInfo.from_dataset_info(
    dataset_info=dataset_info,
    display_name='llm_model',
    model_task=fdl.core_objects.ModelTask.LLM,
    custom_features=[
        fdl.Enrichment(
            name='question_embedding',
            enrichment='embedding',
            columns=['question'],
        ),
        fdl.TextEmbedding(
            name='question_cf',
            source_column='question',
            column='question_embedding',
        ),
    ]
)

Note

Enrichments are disabled by default. To enable them, contact your administrator. Failing to do so will result in an error during the add_model call.


Embedding (beta)

  • Create an embedding for a string column using an embedding model.

  • Supports Sentence transformers and Encoder/Decoder NLP transformers from Hugging Face.

  • To enable set enrichment parameter toembedding.

  • For each embedding enrichment, if you want to monitor the embedding vector on fiddler you MUST create a corresponding TextEmbedding using the enrichment’s output column.

Requirements:

  • Access to Huggingface inference endpoint - https://api-inference.huggingface.co

  • Huggingface API token

Supported Models:

model_namesizeTypepooling_methodNotes

BAAI/bge-small-en-v1.5

small

Sentence Transformer

sentence-transformers/all-MiniLM-L6-v2

med

Sentence Transformer

thenlper/gte-base

med

Sentence Transformer

(default)

gpt2

med

Encoder NLP Transformer

last_token

distilgpt2

small

Encoder NLP Transformer

last_token

EleuteherAI/gpt-neo-125m

med

Encoder NLP Transformer

last_token

google/bert_uncased_L-4_H-256_A-4

small

Decoder NLP Transformer

first_token

Smallest Bert

bert-base-cased

med

Decoder NLP Transformer

first_token

distilroberta-base

med

Decoder NLP Transformer

first_token

xlm-roberta-large

large

Decoder NLP Transformer

first_token

Multilingual

roberta-large

large

Decoder NLP Transformer

first_token

fdl.ModelInfo.from_dataset_info(
    dataset_info=dataset_info,
    display_name='llm_model',
    model_task=fdl.core_objects.ModelTask.LLM,
    custom_features = [
      fdl.Enrichment(
          name='Question Embedding', # name of the enrichment, will be the vector col
          enrichment='embedding', 
          columns=['question'], # only one allowed per embedding enrichment, must be a text column in dataframe
          config={ # optional
            'model_name': ... # default: 'thenlper/gte-base'
            'pooling_method': ... # choose from '{first/last/mean}_token'. Only required if NOT using a sentence transformer
          }
      ),
      fdl.TextEmbedding(
        name='question_cf', # name of the text embedding custom feature
        source_column='question', # source - raw text
        column='Question Embedding', # the name of the vector - outpiut of the embedding enrichment
      ),
    ]
)

The above example will lead to generation of new column

  • FDL Question Embedding(vector) : embeddings corresponding to string column question

Note

In the context of Hugging Face models, particularly transformer-based models used for generating embeddings, the pooling_method determines how the model processes the output of its layers to produce a single vector representation for input sequences (like sentences or documents). This is crucial when using these models for tasks like sentence or document embedding, where you need a fixed-size vector representation regardless of the input length.


Centroid Distance (beta)

  • Fiddler uses KMeans based system to determine which cluster a particular CustomFeature belongs to.

  • This Centroid Distance enrichment calculates the distance from the closest centroid calculated by model monitoring.

  • A new numeric column with distances to the closest centroid is added to the events table.

  • To enable set enrichment parameter tocentroid_distance.

fdl.ModelInfo.from_dataset_info(
    dataset_info=dataset_info,
    display_name='llm_model',
    model_task=fdl.core_objects.ModelTask.LLM,
    custom_features = [
      fdl.Enrichment(
        name='question_embedding',
        enrichment='embedding',
        columns=['question'],
      ),
      fdl.TextEmbedding(
          name='question_cf',
          source_column='question',
          column='question_embedding',
      ),
      fdl.Enrichment(
        name='Centroid Distance',
        enrichment='centroid_distance',
        columns=['question_cf'],
      ),
    ]
)

The above example will lead to generation of new column

  • FDL Centroid Distance (question_embedding)(float) : distance from the nearest K-Means centroid present in question_embedding

Note

Does not calculate membership for preproduction data, so you cannot calculate drift.


Personally Identifiable Information (beta)

The PII (Personally Identifiable Information) enrichment is a critical tool designed to detect and flag the presence of sensitive information within textual data. Whether user-entered or system-generated, this enrichment aims to identify instances where PII might be exposed, helping to prevent privacy breaches and the potential misuse of personal data. In an era where digital privacy concerns are paramount, mishandling or unintentionally leaking PII can have serious repercussions, including privacy violations, identity theft, and significant legal and reputational damage.

Regulatory frameworks such as the General Data Protection Regulation (GDPR) in the European Union and the Health Insurance Portability and Accountability Act (HIPAA) in the United States underscore the necessity of safeguarding PII. These laws enforce strict guidelines on the collection, storage, and processing of personal data, emphasizing the need for robust measures to protect sensitive information.

The inadvertent inclusion of PII in datasets used for training or interacting with large language models (LLMs) can exacerbate the risks associated with data privacy. Once exposed to an LLM, sensitive information can be inadvertently learned by the model, potentially leading to wider dissemination of this data beyond intended confines. This scenario underscores the importance of preemptively identifying and removing PII from data before it is processed or shared, particularly in contexts involving AI and machine learning.

To mitigate the risks associated with PII exposure, organizations and developers can integrate the PII enrichment into their data processing workflows. This enrichment operates by scanning text for patterns and indicators of personal information, flagging potentially sensitive data for review or anonymization. By proactively identifying PII, stakeholders can take necessary actions to comply with privacy laws, safeguard individuals' data, and prevent the unintended spread of personal information through AI models and other digital platforms. Implementing PII detection and management practices is not just a legal obligation but a critical component of responsible data stewardship in the digital age.

  • To enable set enrichment parameter topii.

Requirements

  • Reachability to https://github.com/explosion/spacy-models/releases/download/ to download spacy models as required

List of PII entities

| Entity | Type | Description | Detection | Method | Example | | --- | --- | --- | --- | | CREDIT_CARD | A credit card number is between 12 to 19 digits. https://en.wikipedia.org/wiki/Payment_card_number | Pattern match and checksum | 4111111111111111 378282246310005 (American Express) | | CRYPTO | A Crypto wallet number. Currently only Bitcoin address is supported | Pattern match, context and checksum | 1BoatSLRHtKNngkdXEeobR76b53LETtpyT | | DATE_TIME | Absolute or relative dates or periods or times smaller than a day. | Pattern match and context | ../2024 | | EMAIL_ADDRESS | An email address identifies an email box to which email messages are delivered | Pattern match, context and RFC-822 validation | trust@fiddler.ai | | IBAN_CODE | The International Bank Account Number (IBAN) is an internationally agreed system of identifying bank accounts across national borders to facilitate the communication and processing of cross border transactions with a reduced risk of transcription errors. | Pattern match, context and checksum | DE89 3704 0044 0532 0130 00 | | IP_ADDRESS | An Internet Protocol (IP) address (either IPv4 or IPv6). | Pattern match, context and checksum | 1.2.3.4 127.0.0.12/16 1234:BEEF:3333:4444:5555:6666:7777:8888 | | LOCATION | Name of politically or geographically defined location (cities, provinces, countries, international regions, bodies of water, mountains | Custom logic and context | PALO ALTO Japan | | PERSON | A full person name, which can include first names, middle names or initials, and last names. | Custom logic and context | Joanna Doe | | PHONE_NUMBER | A telephone number | Custom logic, pattern match and context | 5556667890 | | URL | A URL (Uniform Resource Locator), unique identifier used to locate a resource on the Internet | Pattern match, context and top level url validation | www.fiddler.ai | | US SSN | A US Social Security Number (SSN) with 9 digits. | Pattern match and context | 1234-00-5678 | | US_DRIVER_LICENSE | A US driver license according to https://ntsi.com/drivers-license-format/ | Pattern match and context | | | US_ITIN | US Individual Taxpayer Identification Number (ITIN). Nine digits that start with a "9" and contain a "7" or "8" as the 4 digit. | Pattern match and context | 912-34-1234 | | US_PASSPORT | A US passport number begins with a letter, followed by eight numbers | Pattern match and context | L12345678 | | US_SSN | A US Social Security Number (SSN) with 9 digits. | Pattern match and context | 001-12-1234 |

fdl.ModelInfo.from_dataset_info(
    dataset_info=dataset_info,
    display_name='llm_model',
    model_task=fdl.core_objects.ModelTask.LLM,
    custom_features = [
      fdl.Enrichment(
        name='Rag PII',
        enrichment='pii',
        columns=['question'], # one or more columns
        allow_list=['fiddler'], # Optional: list of strings that are white listed
        score_threshold=0.85, # Optional: float value for minimum possible confidence 
      ),
    ]
)