Experiment
API reference for Experiment
Experiment
Represents an Experiment for tracking evaluation runs and results.
An Experiment is a single evaluation run of a test suite against a specific application/LLM/Agent version and evaluators. Experiments provide comprehensive tracking, monitoring, and result management for GenAI evaluation workflows, enabling systematic testing and performance analysis.
Key Features:
Evaluation Tracking: Complete lifecycle tracking of evaluation runs
Status Management: Real-time status updates (PENDING, IN_PROGRESS, COMPLETED, etc.)
Dataset Integration: Linked to specific datasets for evaluation
Result Storage: Comprehensive storage of results, metrics, and error information
Error Handling: Detailed error tracking with traceback information
Experiment Lifecycle:
Creation: Create experiment with dataset and application references
Execution: Experiment runs evaluation against the dataset
Monitoring: Track status and progress in real-time
Completion: Retrieve results, metrics, and analysis
Cleanup: Archive or delete completed experiments
Example
description : str | None = None
error_reason : str | None = None
error_message : str | None = None
traceback : str | None = None
duration_ms : int | None = None
get_app_url()
Get the application URL for this experiment Return type: str
classmethod get_by_id(id_)
Retrieve an experiment by its unique identifier.
Fetches an experiment from the Fiddler platform using its UUID. This is the most direct way to retrieve an experiment when you know its ID.
Parameters
id – The unique identifier (UUID) of the experiment to retrieve. Can be provided as a UUID object or string representation.
id_ (UUID | str)
Returns
The experiment instance with all metadata and configuration. Return type: Experiment
Raises
NotFound – If no experiment exists with the specified ID.
ApiError – If there’s an error communicating with the Fiddler API.
Example
classmethod get_by_name(name, application_id)
Retrieve an experiment by name within an application.
Finds and returns an experiment using its name within the specified application. This is useful when you know the experiment name and application but not its UUID. Experiment names are unique within an application, making this a reliable lookup method.
Parameters
name
str
✗
None
The name of the experiment to retrieve. Experiment names are unique within an application and are case-sensitive.
application_id
UUID | str
✗
None
The UUID of the application containing the experiment. Can be provided as a UUID object or string representation.
Returns
The experiment instance matching the specified name. Return type: Experiment
Raises
NotFound – If no experiment exists with the specified name in the application.
ApiError – If there’s an error communicating with the Fiddler API.
Example
classmethod list(application_id, dataset_id=None)
List all experiments in an application.
Retrieves all experiments that the current user has access to within the specified application. Returns an iterator for memory efficiency when dealing with many experiments.
Parameters
application_id
UUID | str
✗
None
The UUID of the application to list experiments from. Can be provided as a UUID object or string representation.
dataset_id
UUID | str | None
✗
None
The UUID of the dataset to list experiments from. Can be provided as a UUID object or string representation.
Yields
Experiment – Experiment instances for all accessible experiments in the application.
Raises
ApiError – If there’s an error communicating with the Fiddler API. Return type: Iterator[Experiment]
Example
classmethod create(name, application_id, dataset_id, description=None, metadata=None)
Create a new experiment in an application.
Creates a new experiment within the specified application on the Fiddler platform. The experiment must have a unique name within the application and will be linked to the specified dataset for evaluation.
Note: It is not recommended to use this method directly. Instead, use the evaluate method. Creating and managing an experiment without evaluate wrapper is extremely advance usecase and should be avoided.
Parameters
name
str
✗
None
Experiment name, must be unique within the application.
application_id
UUID | str
✗
None
The UUID of the application to create the experiment in. Can be provided as a UUID object or string representation.
dataset_id
UUID | str
✗
None
The UUID of the dataset to use for evaluation. Can be provided as a UUID object or string representation.
description
str | None
✗
None
Optional human-readable description of the experiment.
metadata
dict | None
✗
None
Optional custom metadata dictionary for additional experiment information.
Returns
The newly created experiment instance with server-assigned fields. Return type: Experiment
Raises
Conflict – If an experiment with the same name already exists in the application.
ValidationError – If the experiment configuration is invalid (e.g., invalid name format).
ApiError – If there’s an error communicating with the Fiddler API.
Example
classmethod get_or_create(name, application_id, dataset_id, description=None, metadata=None)
Get an existing experiment by name or create a new one if it doesn’t exist.
This is a convenience method that attempts to retrieve an experiment by name within an application, and if not found, creates a new experiment with that name. Useful for idempotent experiment setup in automation scripts and deployment pipelines.
Parameters
name
str
✗
None
The name of the experiment to retrieve or create.
application_id
UUID | str
✗
None
The UUID of the application to search/create the experiment in. Can be provided as a UUID object or string representation.
dataset_id
UUID | str
✗
None
The UUID of the dataset to use for evaluation. Can be provided as a UUID object or string representation.
description
str | None
✗
None
Optional human-readable description of the experiment.
metadata
dict | None
✗
None
Optional custom metadata dictionary for additional experiment information.
Returns
Either the existing experiment with the specified name, : or a newly created experiment if none existed. Return type: Experiment
Raises
ValidationError – If the experiment name format is invalid.
ApiError – If there’s an error communicating with the Fiddler API.
Example
update()
Update experiment description, metadata, and status.
Updates the experiment’s description, metadata, and/or status. This method allows you to modify the experiment’s configuration after creation, including updating the experiment status and error information for failed experiments.
Parameters
description
str | None
✗
None
Optional new description for the experiment. If provided, replaces the existing description. Set to empty string to clear.
metadata
dict | None
✗
None
Optional new metadata dictionary for the experiment. If provided, replaces the existing metadata completely. Use empty dict to clear.
status
ExperimentStatus | None
✗
None
Optional new status for the experiment. Can be used to update experiment status (e.g., PENDING, RUNNING, COMPLETED, FAILED).
error_reason
str | None
✗
None
Required when status is FAILED. The reason for the experiment failure.
error_message
str | None
✗
None
Required when status is FAILED. Detailed error message for the failure.
traceback
str | None
✗
None
Required when status is FAILED. Stack trace information for debugging.
duration_ms
int | None
✗
None
Optional duration in milliseconds for the experiment execution
Returns
The updated experiment instance with new metadata and configuration. Return type: Experiment
Raises
ValueError – If no update parameters are provided (all are None) or if status is FAILED but error_reason, error_message, or traceback are missing.
ValidationError – If the update data is invalid (e.g., invalid metadata format).
ApiError – If there’s an error communicating with the Fiddler API.
Example
delete()
Delete the experiment.
Permanently deletes the experiment and all associated data from the Fiddler platform. This action cannot be undone and will remove all experiment results, metrics, and metadata.
Raises
ApiError – If there’s an error communicating with the Fiddler API. Return type: None
Example
add_items()
Add outputs of LLM/Agent/Application against dataset items to the experiment.
Adds outputs of LLM/Agent/Application (task or target function) against dataset items to the experiment, representing individual test case outcomes. Each item contains the outputs of LLM/Agent/Application results, timing information, and status for a specific dataset item.
Parameters
items
list[NewExperimentItem]
✗
None
List of NewExperimentItem instances containing outputs of LLM/Agent/Application against dataset items. Each item should include: dataset_item_id: UUID of the dataset item being evaluated; outputs: Dictionary containing the outputs of the task function against dataset item; duration_ms: Duration of the execution in milliseconds: status: Status of the outputs of the task function / scoring against dataset item (PENDING, COMPLETED, FAILED, etc.); error_reason: Reason for failure, if applicable; error_message: Detailed error message, if applicable
Returns
List of UUIDs for the newly created experiment items. Return type: builtins.list[UUID]
Raises
ValueError – If the items list is empty.
ValidationError – If any item data is invalid (e.g., missing required fields).
ApiError – If there’s an error communicating with the Fiddler API.
Example
get_items()
Retrieve all experiment result items from the experiment.
Fetches all experiment result items (outputs, timing, status) that were generated by the task function against dataset items. Returns an iterator for memory efficiency when dealing with large experiments containing many result items.
Returns
Iterator of : ExperimentItem instances for all result items in the experiment. Return type: Iterator[ExperimentItem]
Raises
ApiError – If there’s an error communicating with the Fiddler API.
Example
add_results()
Add evaluation results to the experiment.
Adds complete evaluation results to the experiment, including both the experiment item data (outputs, timing, status) and all associated evaluator scores. This method is typically used after running evaluations to store the complete results of the evaluation process for a batch of dataset items.
This method will only append the results to the experiment.
Note: It is not recommended to use this method directly. Instead, use the evaluate method. Creating and managing an experiment without evaluate wrapper is extremely advance usecase and should be avoided.
Parameters
items
list[ExperimentItemResult]
✗
None
List of ExperimentItemResult instances containing: experiment_item: NewExperimentItem with outputs, timing, and status; scores: List of Score objects from evaluators for this item
Returns
Results are added to the experiment on the server. Return type: None
Raises
ValueError – If the items list is empty.
ValidationError – If any item data is invalid (e.g., missing required fields).
ApiError – If there’s an error communicating with the Fiddler API.
Example
Last updated
Was this helpful?