Dataset

Represents a dataset containing data published to a Fiddler model. A Dataset is a collection of data records that have been published to a specific model in the Fiddler platform. Datasets are automatically created when data is published using Model.publish() and serve as the foundation for monitoring, drift detection, and baseline creation. Key Features:

Data Collection: Organized storage of model input/output data
Environment Separation: Distinct handling of production vs. pre-production data
Baseline Source: Reference data for drift detection and monitoring
Analysis Support: Data download and statistical analysis capabilities
Model Integration: Tight coupling with specific models for context

Dataset Characteristics:

Automatic Creation: Created by Model.publish() operations
Model-Scoped: Each dataset belongs to exactly one model
Named Collections: Unique names within a model for identification
Row Tracking: Automatic counting of data records
Environment Typed: Classified as production or pre-production data

Example

# Retrieve a specific dataset
dataset = Dataset.from_name(
    name="training_data_v1",
    model_id=model.id
)
print(f"Dataset: {dataset.name}")
print(f"Rows: {dataset.row_count}")
print(f"Model: {dataset.model_id}")

# List all datasets for a model
datasets = list(Dataset.list(model_id=model.id))
print(f"Found {len(datasets)} datasets")

# Find datasets by characteristics
large_datasets = [
    ds for ds in Dataset.list(model_id=model.id)
    if ds.row_count and ds.row_count > 10000
]

Datasets cannot be created directly through the Dataset class. They are automatically created when data is published to models using Model.publish(). Use the Dataset class for retrieval, listing, and analysis operations.

Initialize a Dataset instance. Creates a dataset object representing data published to a model. This constructor is typically used internally when deserializing API responses rather than for direct dataset creation.

Parameters

str

required

Dataset name, must be unique within the model. Should be descriptive of the data contents or purpose.

str | UUID

required

Identifier of the model this dataset belongs to. Can be provided as UUID object or string representation.

UUID | str

required

Identifier of the parent project. Can be provided as UUID object or string representation.

Example

# Internal usage - typically not called directly
dataset = Dataset(
    name="training_baseline_v1",
    model_id="550e8400-e29b-41d4-a716-446655440000",
    project_id="660e8400-e29b-41d4-a716-446655440000"
)

Datasets are typically retrieved using Dataset.get(), Dataset.from_name(), or Dataset.list() rather than created directly. Direct creation is mainly used internally by the Fiddler client.

classmethod get()

Retrieve a dataset by its unique identifier. Fetches a dataset from the Fiddler platform using its UUID. This is the most direct way to retrieve a dataset when you know its ID.

Parameters

UUID | str

required

The unique identifier (UUID) of the dataset to retrieve. Can be provided as a UUID object or string representation.

Returns

The dataset instance with all metadata and row count information.

Raises

NotFound – If no dataset exists with the specified ID.
ApiError – If there’s an error communicating with the Fiddler API.

Example

# Get dataset by UUID
dataset = Dataset.get(id_="550e8400-e29b-41d4-a716-446655440000")
print(f"Retrieved dataset: {dataset.name}")
print(f"Rows: {dataset.row_count}")
print(f"Model: {dataset.model_id}")

# Use dataset for analysis
if dataset.row_count and dataset.row_count > 1000:

    print("Large dataset suitable for baseline creation")

This method makes an API call to fetch the latest dataset state from the server. The returned dataset instance reflects the current state in Fiddler.

classmethod from_name()

Retrieve a dataset by name within a specific model. Finds and returns a dataset using its name and model context. Dataset names are unique within a model, making this a reliable lookup method when you know both the dataset name and model ID.

Parameters

str

required

The name of the dataset to retrieve. Dataset names are unique within a model and are case-sensitive.

UUID | str

required

The identifier of the model containing the dataset. Can be provided as UUID object or string representation.

Returns

The dataset instance matching the specified name and model.

Raises

NotFound – If no dataset exists with the specified name in the given model.
ApiError – If there’s an error communicating with the Fiddler API.

Example

# Get dataset by name for a specific model
dataset = Dataset.from_name(
    name="training_baseline",
    model_id=model.id
)
print(f"Found dataset: {dataset.name}")
print(f"Rows: {dataset.row_count}")

# Get validation dataset
val_dataset = Dataset.from_name(
    name="validation_set_v2",
    model_id=model.id
)

# Use for baseline creation
baseline = Baseline.create_from_dataset(
    dataset_id=dataset.id,
    name="training_baseline"
)

Dataset names are case-sensitive and must match exactly. This method is useful when you know the dataset name from configuration or when working with named datasets created during model training workflows.

classmethod list()

List all pre-production datasets for a specific model. Retrieves all datasets that have been published to a model in the pre-production environment. These datasets are typically used for baselines, training data analysis, and validation purposes.

Parameters

UUID | str

required

The identifier of the model to list datasets for. Can be provided as UUID object or string representation.

Yields

Dataset – Dataset instances for all pre-production datasets in the model.

Raises

ApiError – If there’s an error communicating with the Fiddler API.

Returns

Iterator[Dataset]

Example

# List all datasets for a model
for dataset in Dataset.list(model_id=model.id):

    print(f"Dataset: {dataset.name}")
    print(f"  Rows: {dataset.row_count}")
    print(f"  ID: {dataset.id}")

    # Convert to list for analysis
    datasets = list(Dataset.list(model_id=model.id))
    print(f"Found {len(datasets)} datasets")

    # Find datasets by characteristics
    large_datasets = [
        ds for ds in Dataset.list(model_id=model.id)
        if ds.row_count and ds.row_count > 10000
    ]
    print(f"Large datasets: {len(large_datasets)}")

    # Get dataset summary statistics
    total_rows = sum(
        ds.row_count or 0
        for ds in Dataset.list(model_id=model.id)
    )
    print(f"Total rows across all datasets: {total_rows}")

This method returns an iterator for memory efficiency and only includes pre-production datasets. Production data is handled separately through the monitoring system. Convert to a list with list(Dataset.list(…)) if you need to iterate multiple times.

Fiddler Python Client SDK

Fiddler Evals SDK

Fiddler OTel SDK

Fiddler LangGraph SDK

Fiddler LangChain SDK

Fiddler Google ADK SDK

Fiddler Strands Agent SDK

Fiddler OTel JS SDK

Fiddler LangGraph JS SDK

Fiddler LangChain JS SDK

Example

Parameters

Example

classmethod get()

Parameters

Returns

Raises

Example

classmethod from_name()

Parameters

Returns

Raises

Example

classmethod list()

Parameters

Yields

Raises

Returns

Example

​Example

​Parameters

​Example

​classmethod get()

​Parameters

​Returns

​Raises

​Example

​classmethod from_name()

​Parameters

​Returns

​Raises

​Example

​classmethod list()

​Parameters

​Yields

​Raises

​Returns

​Example

Example

Parameters

Example

classmethod get()

Parameters

Returns

Raises

Example

classmethod from_name()

Parameters

Returns

Raises

Example

classmethod list()

Parameters

Yields

Raises

Returns

Example