LLM Evaluation Quick Start

This quick start guide shows you how to evaluate Large Language Model (LLM) applications, RAG systems, and AI agents using the Fiddler Evals SDK.

Time to complete: ~20 minutes

What You'll Learn

  • Connect to Fiddler and set up your evaluation environment

  • Create projects, applications, and datasets for organizing evaluations

  • Build evaluation datasets with test cases

  • Use built-in evaluators for common AI evaluation tasks

  • Create custom evaluators for domain-specific requirements

  • Run comprehensive evaluation experiments

  • Analyze results with detailed metrics and insights


📚 Complete Guide

For the full step-by-step quick start with code examples and detailed instructions, see:

Evals SDK Quick Start (complete guide)

This guide includes:

  • Prerequisites and setup

  • Step-by-step code examples

  • Built-in evaluator examples

  • Custom evaluator creation

  • Results analysis

  • Best practices


Quick Overview

The Fiddler Evals SDK allows you to:

1. Set Up Evaluation Environment

from fiddler import Fiddler

# Initialize Fiddler client
fiddler_client = Fiddler(
    api_key="your-api-key",
    url="https://your-instance.fiddler.ai"
)

2. Create Evaluation Datasets

# Define test cases
test_cases = [
    {"input": "Your test input", "expected_output": "Expected result"},
    # ... more test cases
]

3. Use Built-In Evaluators

  • Accuracy & Quality: Exact match, semantic similarity, relevance

  • Safety & Ethics: Toxicity, bias detection, PII detection

  • RAG-Specific: Answer correctness, faithfulness, context relevance

4. Run Evaluations

# Run evaluation experiment
experiment = fiddler_client.run_evaluation(
    dataset=test_cases,
    evaluators=[evaluator1, evaluator2],
    application_id="your-app-id"
)

5. Analyze Results

View detailed metrics, scores, and insights in the Fiddler UI or programmatically via the SDK.


Next Steps