Fiddler Evals SDK

LLM experiments framework with pre-built evaluators and custom metrics

GA | 🏆 Native SDK

Evaluate LLM application quality with Fiddler's evaluation framework. Run batch experiments with 13 pre-built evaluators or create custom metrics for domain-specific quality assessment.

What You'll Need

  • Fiddler account

  • Python 3.10 or higher

  • Fiddler API key and access token

  • Dataset for experiments

Quick Start

# Step 1: Install
pip install fiddler-evals

# Step 2: Initialize connection
from fiddler_evals import init

init(
    url='https://your-org.fiddler.ai',
    token='your-access-token'
)

# Step 3: Create project and application
from fiddler_evals import Project, Application, Dataset

project = Project.get_or_create(name='my_eval_project')
application = Application.get_or_create(
    name='my_llm_app',
    project_id=project.id
)

# Step 4: Create dataset and add test cases
from fiddler_evals.pydantic_models.dataset import NewDatasetItem

dataset = Dataset.create(
    name='experiment_dataset',
    application_id=application.id,
    description='Test cases for LLM experiments'
)

test_cases = [
    NewDatasetItem(
        inputs={"question": "What is the capital of France?"},
        expected_outputs={"answer": "Paris is the capital of France"},
        metadata={"type": "Factual", "category": "Geography"}
    ),
]
dataset.insert(test_cases)

# Step 5: Run evaluation
from fiddler_evals import evaluate
from fiddler_evals.evaluators import AnswerRelevance, Conciseness, Coherence

MODEL = "openai/gpt-4o"
CREDENTIAL = "your-credential-name"

def my_llm_task(inputs, extras, metadata):
    """Your LLM application logic"""
    question = inputs.get("question", "")
    # Call your LLM here
    answer = f"Mock response to: {question}"
    return {"answer": answer}

results = evaluate(
    dataset=dataset,
    task=my_llm_task,
    evaluators=[
        AnswerRelevance(model=MODEL, credential=CREDENTIAL),
        Conciseness(model=MODEL, credential=CREDENTIAL),
        Coherence(model=MODEL, credential=CREDENTIAL)
    ],
    name_prefix="my_experiment",
    score_fn_kwargs_mapping={
        "user_query": lambda x: x["inputs"]["question"],
        "rag_response": "answer",
        "response": "answer",
    }
)

# Step 6: Analyze results in Fiddler UI
print(f"✅ Evaluated {len(results.results)} test cases")

Pre-Built Evaluators

Safety & Trust

  • FTLPromptSafety - Detect prompt injection, jailbreaks, and unsafe prompts (runs on Fiddler Trust Models)

Quality & Accuracy

  • AnswerRelevance - Assess how well responses address user queries (High / Medium / Low)

  • ContextRelevance - Evaluate whether retrieved documents are relevant to the query (High / Medium / Low). Available in Agentic Monitoring and Experiments only

  • RAGFaithfulness - Check if responses are grounded in retrieved documents (Yes / No)

  • FTLResponseFaithfulness - Fast Trust Model faithfulness for low-latency guardrails

  • Coherence - Measure logical flow and consistency

  • Conciseness - Evaluate response brevity and efficiency

Content Analysis

  • Sentiment - Analyze emotional tone

  • TopicClassification - Categorize content by topic

  • RegexSearch / RegexMatch - Custom pattern-based evaluation

  • EvalFn - Wrap any Python function as an evaluator

Example Usage

Batch Experiment with Multiple Evaluators

Custom Evaluators

Importing Test Cases from Files

Viewing Results

Results are automatically tracked in the Fiddler UI. Navigate to your application to:

  • View experiment results with detailed scores

  • Compare experiments side-by-side

  • Filter and analyze by metadata

  • Export results for further analysis

Programmatic Analysis

Advanced Configuration

Parallel Processing

Experiment Metadata and Organization

Custom Parameter Mapping

Troubleshooting

Connection Issues

Problem: Cannot connect to Fiddler instance

Solution:

  1. Verify your URL is correct (e.g., https://your-org.fiddler.ai)

  2. Ensure your access token is valid and not expired

  3. Check network connectivity: curl -I https://your-org.fiddler.ai

  4. Regenerate token from Fiddler UI: Settings > Credentials

Import Errors

Problem: ModuleNotFoundError: No module named 'fiddler_evals'

Solution:

Experiment Failures

Problem: Evaluators failing with parameter errors

Solution:

  1. Check score_fn_kwargs_mapping matches evaluator requirements

  2. Verify task output format matches expected structure

  3. Test evaluators individually:

Performance Issues

Problem: Experiment running slowly

Solution:

Next Steps

  1. Quick Start Guidearrow-up-right - Complete tutorial with working examples

  2. Getting Started with Experimentsarrow-up-right - Understand experiment concepts and best practices

  3. SDK API Reference - Explore all available classes and methods

Last updated

Was this helpful?