flask-round-potionRAG Health Metrics Tutorial

Evaluate your RAG application using the RAG Health Metrics diagnostic triad to pinpoint whether issues originate in retrieval, generation, or query understanding.

Time to complete: ~30 minutes

What You'll Learn

  • Set up a RAG experiment pipeline with the Fiddler Evals SDK

  • Use Answer Relevance, Context Relevance, and RAG Faithfulness together

  • Interpret diagnostic results to identify pipeline failures

  • Distinguish between retrieval and generation problems

Prerequisites


Step 1: Connect and Set Up

from fiddler_evals import init, Project, Application, Dataset
from fiddler_evals.pydantic_models.dataset import NewDatasetItem

# Initialize connection
init(
    url='https://your-org.fiddler.ai',
    token='your-access-token'
)

# Create organizational structure
project = Project.get_or_create(name='rag_health_experiments')
application = Application.get_or_create(
    name='my_rag_app',
    project_id=project.id
)

Step 2: Create a RAG Experiment Dataset

Create test cases that include user queries and retrieved documents. The quality of your evaluation depends on realistic, representative test cases.

Step 3: Define Your RAG Task

The task function represents your RAG application. It receives inputs and returns the generated response.

Step 4: Run the RAG Health Experiment

Use all three evaluators together for comprehensive diagnostics:

Step 5: Analyze Diagnostic Results

Examine the results to identify which pipeline stage is causing issues:

Step 6: Compare RAG Configurations

Use experiments to compare different RAG configurations:


Understanding the Results

Score Interpretation

Evaluator
High Score
Low Score

Answer Relevance

Response directly addresses the query

Response misses the point or is off-topic

Context Relevance

Retrieved documents support the query

Retrieved documents are irrelevant

RAG Faithfulness

Response is grounded in context

Response contains unsupported claims

Common Diagnostic Patterns

Answer Relevance
Context Relevance
RAG Faithfulness
Diagnosis

High

High

Yes

Healthy RAG pipeline

High

High

No

Hallucination — fix generation

Low

High

Yes

Query misunderstanding — fix prompt

Low

Low

-

Bad retrieval — fix retrieval

High

Low

Yes

Lucky generation — retrieval needs work


Next Steps