Uploading a Baseline Dataset

To upload a baseline dataset to Fiddler, you can use the client.upload_dataset API. Let's walk through a simple example of how this can be done.


The first step is to load your baseline dataset into a pandas DataFrame.

import pandas as pd

df = pd.read_csv('example_dataset.csv')

Creating a DatasetInfo object

Then, you'll need to create a fdl.DatasetInfo() object that can be used to define the schema for your dataset.

This schema can be inferred from your DataFrame using the fdl.DatasetInfo.from_dataframe() function.

dataset_info = fdl.DatasetInfo.from_dataframe(df)

📘

Info

In the case that you have categorical columns in your dataset that are encoded as strings, you can use the max_inferred_cardinality argument.

This argument specifies a threshold for unique values in a column. Any column with fewer than max_inferred_cardinality unique values will be converted to fdl.DataType.CATEGORY type.

dataset_info = fdl.DatasetInfo.from_dataframe(
        df=df,
        max_inferred_cardinality=1000
    )

Uploading your dataset

Once you have your fdl.DatasetInfo() object, you can make any necessary adjustments before upload (see Customizing Your Dataset Schema ).

When you're ready, the dataset can be uploaded using client.upload_dataset().

PROJECT_ID = 'example_project'
DATASET_ID = 'example_dataset'

client.upload_dataset(
    project_id=PROJECT_ID,
    dataset_id=DATASET_ID,
    dataset={
        'baseline': df
    },
    info=dataset_info
)