LLM Evaluation Quick Start
This quick start guide shows you how to evaluate Large Language Model (LLM) applications, RAG systems, and AI agents using the Fiddler Evals SDK.
Time to complete: ~20 minutes
What You'll Learn
Connect to Fiddler and set up your evaluation environment
Create projects, applications, and datasets for organizing evaluations
Build evaluation datasets with test cases
Use built-in evaluators for common AI evaluation tasks
Create custom evaluators for domain-specific requirements
Run comprehensive evaluation experiments
Analyze results with detailed metrics and insights
📚 Complete Guide
For the full step-by-step quick start with code examples and detailed instructions, see:
→ Evals SDK Quick Start (complete guide)
This guide includes:
Prerequisites and setup
Step-by-step code examples
Built-in evaluator examples
Custom evaluator creation
Results analysis
Best practices
Quick Overview
The Fiddler Evals SDK allows you to:
1. Set Up Evaluation Environment
from fiddler import Fiddler
# Initialize Fiddler client
fiddler_client = Fiddler(
api_key="your-api-key",
url="https://your-instance.fiddler.ai"
)2. Create Evaluation Datasets
# Define test cases
test_cases = [
{"input": "Your test input", "expected_output": "Expected result"},
# ... more test cases
]3. Use Built-In Evaluators
Accuracy & Quality: Exact match, semantic similarity, relevance
Safety & Ethics: Toxicity, bias detection, PII detection
RAG-Specific: Answer correctness, faithfulness, context relevance
4. Run Evaluations
# Run evaluation experiment
experiment = fiddler_client.run_evaluation(
dataset=test_cases,
evaluators=[evaluator1, evaluator2],
application_id="your-app-id"
)5. Analyze Results
View detailed metrics, scores, and insights in the Fiddler UI or programmatically via the SDK.
Next Steps
Full Guide: Evals SDK Quick Start - Complete tutorial with code
Advanced: Evals SDK Advanced Guide - Production patterns
Concepts: Evaluations Overview - Why and when to evaluate
Reference: Fiddler Evals SDK Documentation - Complete API docs