Fiddler Evals SDK
LLM evaluation framework with pre-built evaluators and custom metrics
What You'll Need
Quick Start
# Step 1: Install
pip install fiddler-evals
# Step 2: Initialize connection
from fiddler_evals import init
init(
url='https://your-org.fiddler.ai',
token='your-access-token'
)
# Step 3: Create project and application
from fiddler_evals import Project, Application, Dataset
project = Project.get_or_create(name='my_eval_project')
application = Application.get_or_create(
name='my_llm_app',
project_id=project.id
)
# Step 4: Create dataset and add test cases
from fiddler_evals.pydantic_models.dataset import NewDatasetItem
dataset = Dataset.create(
name='evaluation_dataset',
application_id=application.id,
description='Test cases for LLM evaluation'
)
test_cases = [
NewDatasetItem(
inputs={"question": "What is the capital of France?"},
expected_outputs={"answer": "Paris is the capital of France"},
metadata={"type": "Factual", "category": "Geography"}
),
]
dataset.insert(test_cases)
# Step 5: Run evaluation
from fiddler_evals import evaluate
from fiddler_evals.evaluators import AnswerRelevance, Conciseness, Toxicity
def my_llm_task(inputs, extras, metadata):
"""Your LLM application logic"""
question = inputs.get("question", "")
# Call your LLM here
answer = f"Mock response to: {question}"
return {"answer": answer}
results = evaluate(
dataset=dataset,
task=my_llm_task,
evaluators=[
AnswerRelevance(),
Conciseness(),
Toxicity()
],
name_prefix="my_evaluation",
score_fn_kwargs_mapping={
"response": "answer",
"text": "answer",
"prompt": lambda x: x["inputs"]["question"]
}
)
# Step 6: Analyze results in Fiddler UI
print(f"✅ Evaluated {len(results.results)} test cases")Pre-Built Evaluators
Safety & Trust
Quality & Accuracy
Content Analysis
Example Usage
Batch Evaluation with Multiple Evaluators
Custom Evaluators
Importing Test Cases from Files
Viewing Results
Programmatic Analysis
Advanced Configuration
Parallel Processing
Experiment Metadata and Organization
Custom Parameter Mapping
Troubleshooting
Connection Issues
Import Errors
Evaluation Failures
Performance Issues
Related Integrations
Next Steps
Last updated
Was this helpful?