Fiddler Objects
fdl.DatasetInfo
For information on how to customize these objects, see Customizing Your Dataset Schema.
Input Parameters | Type | Default | Description |
---|---|---|---|
display_name | str | None | A display name for the dataset. |
columns | list | None | A list of fdl.Column objects containing information about the columns. |
files | Optional [list] | None | A list of strings pointing to CSV files to use. |
dataset_id | Optional [str] | None | The unique identifier for the dataset |
**kwargs | Additional arguments to be passed. |
fdl.DatasetInfo.from_dataframe
Input Parameters | Type | Default | Description |
---|---|---|---|
df | Union [pd.Dataframe, list] | Either a single pandas DataFrame or a list of DataFrames. If a list is given, all dataframes must have the same columns. | |
display_name | str | ' ' | A display_name for the dataset |
max_inferred_cardinality | Optional [int] | 100 | If specified, any string column containing fewer than max_inferred_cardinality unique values will be converted to a categorical data type. |
dataset_id | Optional [str] | None | The unique identifier for the dataset |
Return Type | Description |
---|---|
fdl.DatasetInfo | A fdl.DatasetInfo() object constructed from the pandas Dataframe provided. |
fdl.DatasetInfo.from_dict
Input Parameters | Type | Default | Description |
---|---|---|---|
deserialized_json | dict | The dictionary object to be converted |
Return Type | Description |
---|---|
fdl.DatasetInfo | A fdl.DatasetInfo() object constructed from the dictionary. |
fdl.DatasetInfo.to_dict
Return Type | Description |
---|---|
dict | A dictionary containing information from the fdl.DatasetInfo() object. |
fdl.ModelInfo
| Input | Parameters | Type | Default | Description | | --- | --- | --- | --- | | display_name | str | | A display name for the model. | | input_type | fdl.ModelInputType | | A ModelInputType object containing the input type of the model. | | model_task | fdl.ModelTask | | A ModelTask object containing the model task. | | inputs | list | | A list of Column objects corresponding to the inputs (features) of the model. | | outputs | list | | A list of Column objects corresponding to the outputs (predictions) of the model. | | metadata | Optional [list] | None | A list of Column objects corresponding to any metadata fields. | | decisions | Optional [list] | None | A list of Column objects corresponding to any decision fields (post-prediction business decisions). | | targets | Optional [list] | None | A list of Column objects corresponding to the targets (ground truth) of the model. | | framework | Optional [str] | None | A string providing information about the software library and version used to train and run this model. | | description | Optional [str] | None | A description of the model. | | datasets | Optional [list] | None | A list of the dataset IDs used by the model. | | mlflow_params | Optional [fdl.MLFlowParams] | None | A MLFlowParams object containing information about MLFlow parameters. | | model_deployment_params | Optional [fdl.ModelDeploymentParams] | None | A ModelDeploymentParams object containing information about model deployment. | | artifact_status | Optional [fdl.ArtifactStatus] | None | An ArtifactStatus object containing information about the model artifact. | | preferred_explanation_method | Optional [fdl.ExplanationMethod] | None | An ExplanationMethod object that specifies the default explanation algorithm to use for the model. | | custom_explanation_names | Optional [list] | [ ] | A list of names that can be passed to the explanation_name _argument of the optional user-defined _explain_custom method of the model object defined in package.py. | | binary_classification_threshold | Optional [float] | .5 | The threshold used for classifying inferences for binary classifiers. | | ranking_top_k | Optional [int] | 50 | Used only for ranking models. Sets the top k results to take into consideration when computing performance metrics like MAP and NDCG. | | group_by | Optional [str] | None | Used only for ranking models. The column by which to group events for certain performance metrics like MAP and NDCG. | | fall_back | Optional [dict] | None | A dictionary mapping a column name to custom missing value encodings for that column. | | target_class_order | Optional [list] | None | A list denoting the order of classes in the target. This parameter is required in the following cases:
- Binary classification tasks: If the target is of type string, you must tell Fiddler which class is considered the positive class for your output column. You need to provide a list with two elements. The 0th element by convention is considered the negative class, and the 1st element is considered the positive class. When your target is boolean, you don't need to specify this argument. By default Fiddler considers True
as the positive class. In case your target is numerical, you don't need to specify this argument, by default Fiddler considers the higher of the two possible values as the positive class.
- Multi-class classification tasks: You must tell Fiddler which class corresponds to which output by giving an ordered list of classes. This order should be the same as the order of the outputs.
- Ranking tasks: If the target is of type string, you must provide a list of all the possible target values in the order of relevance. The first element will be considered as the least relevant grade and the last element from the list will be considered the most relevant grade.
In the case your target is numerical, Fiddler considers the smallest value to be the least relevant grade and the biggest value from the list will be considered the most relevant grade. | | **kwargs | | | Additional arguments to be passed. |
fdl.ModelInfo.from_dataset_info
Input Parameters | Type | Default | Description |
---|---|---|---|
dataset_info | The DatasetInfo object from which to construct the ModelInfo object. | ||
target | str | The column to be used as the target (ground truth). | |
model_task | None | A ModelTask object containing the model task. | |
dataset_id | Optional[str] | None | The unique identifier for the dataset. |
features | Optional[list] | None | A list of columns to be used as features. |
custom_features | Optional[List[CustomFeature]] | None | List of Custom Features definitions for a model. Objects of type Multivariate, Vector, ImageEmbedding or TextEmbedding derived from CustomFeature can be provided. |
metadata_cols | Optional[list] | None | A list of columns to be used as metadata fields. |
decision_cols | Optional[list] | None | A list of columns to be used as decision fields. |
display_name | Optional[str] | None | A display name for the model. |
description | Optional[str] | None | A description of the model. |
input_type | Optional[fdl.ModelInputType] | fdl.ModelInputType.TABULAR | A ModelInputType object containing the input type of the model. |
outputs | Optional[list] | A list of Column objects corresponding to the outputs (predictions) of the model. | |
targets | Optional[list] | None | A list of Column objects corresponding to the targets (ground truth) of the model. |
model_deployment_params | Optional[fdl.ModelDeploymentParams] | None | A ModelDeploymentParams object containing information about model deployment. |
framework | Optional[str] | None | A string providing information about the software library and version used to train and run this model. |
datasets | Optional[list] | None | A list of the dataset IDs used by the model. |
mlflow_params | Optional[fdl.MLFlowParams] | None | A MLFlowParams object containing information about MLFlow parameters. |
preferred_explanation_method | Optional[fdl.ExplanationMethod] | None | An ExplanationMethod object that specifies the default explanation algorithm to use for the model. |
custom_explanation_names | Optional[list] | [ ] | A list of names that can be passed to the |
binary_classification_threshold | Optional[float] | .5 | The threshold used for classifying inferences for binary classifiers. |
ranking_top_k | Optional[int] | 50 | Used only for ranking models. Sets the top k results to take into consideration when computing performance metrics like MAP and NDCG. |
group_by | Optional[str] | None | Used only for ranking models. The column by which to group events for certain performance metrics like MAP and NDCG. |
fall_back | Optional[dict] | None | A dictionary mapping a column name to custom missing value encodings for that column. |
categorical_target_class_details | Optional[Union[list, int, str]] | None | A list denoting the order of classes in the target. This parameter is required in the following cases: - Binary classification tasks: If the target is of type string, you must tell Fiddler which class is considered the positive class for your output column. If you provide a single element, it is considered the positive class. Alternatively, you can provide a list with two elements. The 0th element by convention is considered the negative class, and the 1st element is considered the positive class. When your target is boolean, you don't need to specify this argument. By default Fiddler considers |
Return Type | Description |
---|---|
fdl.ModelInfo | A fdl.ModelInfo() object constructed from the fdl.DatasetInfo() object provided. |
fdl.ModelInfo.from_dict
Input Parameters | Type | Default | Description |
---|---|---|---|
deserialized_json | dict | The dictionary object to be converted |
Return Type | Description |
---|---|
fdl.ModelInfo | A fdl.ModelInfo() object constructed from the dictionary. |
fdl.ModelInfo.to_dict
Return Type | Description |
---|---|
dict | A dictionary containing information from the fdl.ModelInfo() object. |
fdl.WeightingParams
Holds weighting information for class imbalanced models which can then be passed into a fdl.ModelInfo object. Please note that the use of weighting params requires the presence of model outputs in the baseline dataset.
Input Parameters | Type | Default | Description |
---|---|---|---|
class_weight | List[float] | None | List of floats representing weights for each of the classes. The length must equal the no. of classes. |
weighted_reference_histograms | bool | True | Flag indicating if baseline histograms must be weighted or not when calculating drift metrics. |
weighted_surrogate_training | bool | True | Flag indicating if weighting scheme should be used when training the surrogate model. |
fdl.ModelInputType
Enum Value | Description |
---|---|
fdl.ModelInputType.TABULAR | For tabular models. |
fdl.ModelInputType.TEXT | For text models. |
fdl.ModelTask
Represents supported model tasks
Enum Value | Description |
---|---|
fdl.ModelTask.REGRESSION | For regression models. |
fdl.ModelTask.BINARY_CLASSIFICATION | For binary classification models |
fdl.ModelTask.MULTICLASS_CLASSIFICATION | For multiclass classification models |
fdl.ModelTask.RANKING | For ranking classification models |
fdl.ModelTask.LLM | For LLM models. |
fdl.ModelTask.NOT_SET | For other model tasks or no model task specified. |
fdl.DataType
Represents supported data types.
Enum Value | Description |
---|---|
fdl.DataType.FLOAT | For floats. |
fdl.DataType.INTEGER | For integers. |
fdl.DataType.BOOLEAN | For booleans. |
fdl.DataType.STRING | For strings. |
fdl.DataType.CATEGORY | For categorical types. |
fdl.DataType.VECTOR | For vector types |
fdl.Column
Represents a column of a dataset.
Input Parameter | Type | Default | Description |
---|---|---|---|
name | str | None | The name of the column |
data_type | None | The fdl.DataType object corresponding to the data type of the column. | |
possible_values | Optional [list] | None | A list of unique values used for categorical columns. |
is_nullable | Optional [bool] | None | If True, will expect missing values in the column. |
value_range_min | Optional [float] | None | The minimum value used for numeric columns. |
value_range_max | Optional [float] | None | The maximum value used for numeric columns. |
fdl.DeploymentParams
Supported from server version
23.1
and above with Model Deployment feature enabled.
Input Parameter | Type | Default | Description |
---|---|---|---|
image_uri | Optional[str] | md-base/python/machine-learning:1.0.1 | Reference to the docker image to create a new runtime to serve the model. Check the available images on the Model Deployment page. |
replicas | Optional[int] | 1 | The number of replicas running the model. Minimum value: 1 Maximum value: 10 Default value: 1 |
memory | Optional[int] | 256 | The amount of memory (mebibytes) reserved per replica. Minimum value: 150 Maximum value: 16384 (16GiB) Default value: 256 |
cpu | Optional[int] | 100 | The amount of CPU (milli cpus) reserved per replica. Minimum value: 10 Maximum value: 4000 (4vCPUs) Default value: 100 |
📘 What parameters should I set for my model?
Setting the right parameters might not be straightforward and Fiddler is here to help you.
The parameters might vary depending the number of input features used, the pre-processing steps used and the model itself.
This table is helping you defining the right parameters
Surrogate Models guide
Number of input features | Memory (mebibytes) | CPU (milli cpus) |
---|---|---|
< 10 | 250 (default) | 100 (default) |
< 20 | 400 | 300 |
< 50 | 600 | 400 |
<100 | 850 | 900 |
<200 | 1600 | 1200 |
<300 | 2000 | 1200 |
<400 | 2800 | 1300 |
<500 | 2900 | 1500 |
User Uploaded guide
For uploading your artifact model, refer to the table above and increase the memory number, depending on your model framework and complexity. Surrogate models use lightgbm framework.
For example, an NLP model for a TEXT input might need memory set at 1024 or higher and CPU at 1000.
📘 Usage Reference
See the usage with:
Check more about the Model Deployment feature set.
fdl.ComparePeriod
Required when compare_to = CompareTo.TIME_PERIOD, this field is used to set when comparing against the same bin for a previous time period. Choose from the following:
Enums | values |
---|---|
fdl.ComparePeriod.ONE_DAY | 86400000 millisecond i.e 1 day |
fdl.ComparePeriod.SEVEN_DAYS | 604800000 millisecond i.e 7 days |
fdl.ComparePeriod.ONE_MONTH | 2629743000 millisecond i.e 30 days |
fdl.ComparePeriod.THREE_MONTHS | 7776000000 millisecond i.e 90 days |
fdl.AlertCondition
If condition = fdl.AlertCondition.GREATER/LESSER is specified, and an alert is triggered every time the metric value is greater/lesser than the specified threshold.
Enum | Value |
---|---|
fdl.AlertCondition.GREATER | greater |
fdl.AlertCondition.LESSER | lesser |
fdl.CompareTo
Whether the metric value is to be compared against a static value or the same time bin from a previous time period(set using compare_period[ComparePeriod]).
Enums | Value |
---|---|
fdl.CompareTo.RAW_VALUE | When comparing to an absolute value |
fdl.CompareTo.TIME_PERIOD | When comparing to the same bin size from a previous time period |
fdl.BinSize
**This field signifies the durations for which fiddler monitoring calculates the metric values **
Enums | Values |
---|---|
fdl.BinSize.ONE_HOUR | 3600 * 1000 millisecond i.e one hour |
fdl.BinSize.ONE_DAY | 86400 * 1000 millisecond i.e one day |
fdl.BinSize.SEVEN_DAYS | 604800 * 1000 millisecond i.e seven days |
fdl.Priority
This field can be used to prioritize the alert rules by adding an identifier - low, medium, and high to help users better categorize them on the basis of their importance. Following are the Priority Enums:
Enums | Values |
---|---|
fdl.Priority.HIGH | HIGH |
fdl.Priority.MEDIUM | MEDIUM |
fdl.Priority.LOW | LOW |
fdl.Metric
Following is the list of metrics, with corresponding alert type and model task, for which an alert rule can be created.
Enum Values | Supported for Alert Types (ModelTask restriction if any) | Description |
fdl.Metric.SUM | fdl.AlertType.STATISTIC | Sum of all values of a column across all events |
fdl.Metric.AVERAGE | fdl.AlertType.STATISTIC | Average value of a column across all events |
fdl.Metric.FREQUENCY | fdl.AlertType.STATISTIC | Frequency count of a specific value in a categorical column |
fdl.Metric.PSI | fdl.AlertType.DATA_DRIFT | Population Stability Index |
fdl.Metric.JSD | fdl.AlertType.DATA_DRIFT | Jensen–Shannon divergence |
fdl.Metric.MISSING_VALUE | fdl.AlertType.DATA_INTEGRITY | Missing Value |
fdl.Metric.TYPE_VIOLATION | fdl.AlertType.DATA_INTEGRITY | Type Violation |
fdl.Metric.RANGE_VIOLATION | fdl.AlertType.DATA_INTEGRITY | Range violation |
fdl.Metric.TRAFFIC | fdl.AlertType.SERVICE_METRICS | Traffic Count |
fdl.Metric.ACCURACY | fdl.AlertType.PERFORMANCE (fdl.ModelTask.BINARY_CLASSIFICATION, fdl.ModelTask.MULTICLASS_CLASSIFICATION) | Accuracy |
fdl.Metric.RECALL | fdl.AlertType.PERFORMANCE (fdl.ModelTask.BINARY_CLASSIFICATION) | Recall |
fdl.Metric.FPR | fdl.AlertType.PERFORMANCE (fdl.ModelTask.BINARY_CLASSIFICATION) | False Positive Rate |
fdl.Metric.PRECISION | fdl.AlertType.PERFORMANCE (fdl.ModelTask.BINARY_CLASSIFICATION) | Precision |
fdl.Metric.TPR | fdl.AlertType.PERFORMANCE (fdl.ModelTask.BINARY_CLASSIFICATION) | True Positive Rate |
fdl.Metric.AUC | fdl.AlertType.PERFORMANCE (fdl.ModelTask.BINARY_CLASSIFICATION) | Area under the ROC Curve |
fdl.Metric.F1_SCORE | fdl.AlertType.PERFORMANCE (fdl.ModelTask.BINARY_CLASSIFICATION) | F1 score |
fdl.Metric.ECE | fdl.AlertType.PERFORMANCE (fdl.ModelTask.BINARY_CLASSIFICATION) | Expected Calibration Error |
fdl.Metric.R2 | fdl.AlertType.PERFORMANCE (fdl.ModelTask.REGRESSION) | R Squared |
fdl.Metric.MSE | fdl.AlertType.PERFORMANCE (fdl.ModelTask.REGRESSION) | Mean squared error |
fdl.Metric.MAPE | fdl.AlertType.PERFORMANCE (fdl.ModelTask.REGRESSION) | Mean Absolute Percentage Error |
fdl.Metric.WMAPE | fdl.AlertType.PERFORMANCE (fdl.ModelTask.REGRESSION) | Weighted Mean Absolute Percentage Error |
fdl.Metric.MAE | fdl.AlertType.PERFORMANCE (fdl.ModelTask.REGRESSION) | Mean Absolute Error |
fdl.Metric.LOG_LOSS | fdl.AlertType.PERFORMANCE (fdl.ModelTask.MULTICLASS_CLASSIFICATION) | Log Loss |
fdl.Metric.MAP | fdl.AlertType.PERFORMANCE (fdl.ModelTask.RANKING) | Mean Average Precision |
fdl.Metric.MEAN_NDCG | fdl.AlertType.PERFORMANCE (fdl.ModelTask.RANKING) | Normalized Discounted Cumulative Gain |
fdl.AlertType
Enum Value | Description |
---|---|
fdl.AlertType.DATA_DRIFT | For drift alert type |
fdl.AlertType.PERFORMANCE | For performance alert type |
fdl.AlertType.DATA_INTEGRITY | For data integrity alert type |
fdl.AlertType.SERVICE_METRICS | For service metrics alert type |
fdl.AlertType.STATISTIC | For statistics of a feature |
fdl.WindowSize
Enum | Value |
---|---|
fdl.WindowSize.ONE_HOUR | 3600 |
fdl.WindowSize.ONE_DAY | 86400 |
fdl.WindowSize.ONE_WEEK | 604800 |
fdl.WindowSize.ONE_MONTH | 2592000 |
fdl.CustomFeatureType
Enum | Value |
---|---|
FROM_COLUMNS | Represents custom features derived directly from columns. |
FROM_VECTOR | Represents custom features derived from a vector column. |
FROM_TEXT_EMBEDDING | Represents custom features derived from text embeddings. |
FROM_IMAGE_EMBEDDING | Represents custom features derived from image embeddings. |
ENRICHMENT | Represents enrichment custom feature. |
fdl.CustomFeature
This is the base class that all other custom features inherit from. It's flexible enough to accommodate different types of derived features. Note: All of the derived feature classes (e.g., Multivariate, VectorFeature, etc.) inherit from CustomFeature and thus have its properties, in addition to their specific ones.
Input Parameter | Type | Default | Description |
---|---|---|---|
name | str | None | The name of the custom feature. |
type | None | The type of custom feature. Must be one of the | |
n_clusters | Optional[int] | 5 | The number of clusters. |
centroids | Optional[List] | None | Centroids of the clusters in the embedded space. Number of centroids equal to |
columns | Optional[List[str]] | None | For |
column | Optional[str] | None | Used for vector-derived features, the original vector column name. |
source_column | Optional[str] | None | Specifies the original column name for embedding-derived features. |
n_tags | Optional[int] | 5 | For |
fdl.Multivariate
Represents custom features derived from multiple columns.
Input Parameter | Type | Default | Description |
---|---|---|---|
columns | List[str] | None | List of original columns from which this feature is derived. |
n_clusters | Optional[int] | 5 | The number of clusters. |
centroids | Optional[List] | Centroids of the clusters in the embedded space. Number of centroids equal to | Centroids of the clusters in the embedded space. Number of centroids equal to |
monitor_components | bool | False | Whether to monitor each column in |
fdl.VectorFeature
Represents custom features derived from a single vector column.
Input Parameter | Type | Default | Description |
---|---|---|---|
source_column | Optional[str] | None | Specifies the original column if this feature is derived from an embedding. |
column | str | None | The vector column name. |
n_clusters | Optional[int] | 5 | The number of clusters. |
centroids | Optional[List[List[float]]] | Centroids of the clusters in the embedded space. Number of centroids equal to | Centroids of the clusters in the embedded space. Number of centroids equal to |
fdl.TextEmbedding
Represents custom features derived from text embeddings.
Input Parameter | Type | Default | Description |
---|---|---|---|
source_column | str | Required | Specifies the column name where text data (e.g. LLM prompts) is stored |
column | str | Required | Specifies the column name where the embeddings corresponding to source_col are stored |
n_tags | Optional[int] | 5 | How many tags(tokens) the text embedding are used in each cluster as the |
n_clusters | Optional[int] | 5 | The number of clusters. |
centroids | Optional[List] | Centroids of the clusters in the embedded space. Number of centroids equal to | Centroids of the clusters in the embedded space. Number of centroids equal to |
fdl.ImageEmbedding
Represents custom features derived from image embeddings.
Input Parameter | Type | Default | Description |
---|---|---|---|
source_column | str | Required | URL where image data is stored |
column | str | Required | Specifies the column name where embeddings corresponding to source_col are stored. |
n_clusters | Optional[int] | 5 | The number of clusters |
centroids | Optional[List] | Centroids of the clusters in the embedded space. Number of centroids equal to | Centroids of the clusters in the embedded space. Number of centroids equal to |
fdl.Enrichment (beta)
Enrichments are custom features designed to augment data provided in events.
They add new computed columns to your published data automatically whenever defined.
The new columns generated are available for querying in analyze, charting, and alerting, similar to any other column.
Input Parameter | Type | Default | Description |
---|---|---|---|
name | str | The name of the custom feature to generate | |
enrichment | str | The enrichment operation to be applied | |
columns | List[str] | The column names on which the enrichment depends | |
config | Optional[List] | {} | (optional): Configuration specific to an enrichment operation which controls the behavior of the enrichment |
Note
Enrichments are disabled by default. To enable them, contact your administrator. Failing to do so will result in an error during the add_model
call.
Embedding (beta)
Create an embedding for a string column using an embedding model.
Supports Sentence transformers and Encoder/Decoder NLP transformers from Hugging Face.
To enable set enrichment parameter to
embedding
.For each embedding enrichment, if you want to monitor the embedding vector on fiddler you MUST create a corresponding
TextEmbedding
using the enrichment’s output column.
Requirements:
Access to Huggingface inference endpoint -
https://api-inference.huggingface.co
Huggingface API token
Supported Models:
model_name | size | Type | pooling_method | Notes |
---|---|---|---|---|
BAAI/bge-small-en-v1.5 | small | Sentence Transformer | ||
sentence-transformers/all-MiniLM-L6-v2 | med | Sentence Transformer | ||
thenlper/gte-base | med | Sentence Transformer | (default) | |
gpt2 | med | Encoder NLP Transformer | last_token | |
distilgpt2 | small | Encoder NLP Transformer | last_token | |
EleuteherAI/gpt-neo-125m | med | Encoder NLP Transformer | last_token | |
google/bert_uncased_L-4_H-256_A-4 | small | Decoder NLP Transformer | first_token | Smallest Bert |
bert-base-cased | med | Decoder NLP Transformer | first_token | |
distilroberta-base | med | Decoder NLP Transformer | first_token | |
xlm-roberta-large | large | Decoder NLP Transformer | first_token | Multilingual |
roberta-large | large | Decoder NLP Transformer | first_token |