Fiddler Evals SDK
LLM experiments framework with pre-built evaluators and custom metrics
What You'll Need
Quick Start
# Step 1: Install
pip install fiddler-evals
# Step 2: Initialize connection
from fiddler_evals import init
init(
url='https://your-org.fiddler.ai',
token='your-access-token'
)
# Step 3: Create project and application
from fiddler_evals import Project, Application, Dataset
project = Project.get_or_create(name='my_eval_project')
application = Application.get_or_create(
name='my_llm_app',
project_id=project.id
)
# Step 4: Create dataset and add test cases
from fiddler_evals.pydantic_models.dataset import NewDatasetItem
dataset = Dataset.create(
name='experiment_dataset',
application_id=application.id,
description='Test cases for LLM experiments'
)
test_cases = [
NewDatasetItem(
inputs={"question": "What is the capital of France?"},
expected_outputs={"answer": "Paris is the capital of France"},
metadata={"type": "Factual", "category": "Geography"}
),
]
dataset.insert(test_cases)
# Step 5: Run evaluation
from fiddler_evals import evaluate
from fiddler_evals.evaluators import AnswerRelevance, Conciseness, Coherence
MODEL = "openai/gpt-4o"
CREDENTIAL = "your-credential-name"
def my_llm_task(inputs, extras, metadata):
"""Your LLM application logic"""
question = inputs.get("question", "")
# Call your LLM here
answer = f"Mock response to: {question}"
return {"answer": answer}
results = evaluate(
dataset=dataset,
task=my_llm_task,
evaluators=[
AnswerRelevance(model=MODEL, credential=CREDENTIAL),
Conciseness(model=MODEL, credential=CREDENTIAL),
Coherence(model=MODEL, credential=CREDENTIAL)
],
name_prefix="my_experiment",
score_fn_kwargs_mapping={
"user_query": lambda x: x["inputs"]["question"],
"rag_response": "answer",
"response": "answer",
}
)
# Step 6: Analyze results in Fiddler UI
print(f"✅ Evaluated {len(results.results)} test cases")Pre-Built Evaluators
Safety & Trust
Quality & Accuracy
Content Analysis
Example Usage
Batch Experiment with Multiple Evaluators
Custom Evaluators
Importing Test Cases from Files
Viewing Results
Programmatic Analysis
Advanced Configuration
Parallel Processing
Experiment Metadata and Organization
Custom Parameter Mapping
Troubleshooting
Connection Issues
Import Errors
Experiment Failures
Performance Issues
Related Integrations
Next Steps
Last updated
Was this helpful?