What You’ll Learn
- Set up a RAG experiment pipeline with the Fiddler Evals SDK
- Use Answer Relevance, Context Relevance, and RAG Faithfulness together
- Interpret diagnostic results to identify pipeline failures
- Distinguish between retrieval and generation problems
Prerequisites
- Fiddler Account: Active account with API access
- Python 3.10+
- Fiddler Evals SDK:
pip install fiddler-evals - Familiarity with: Experiments Getting Started
Step 1: Connect and Set Up
Step 2: Create a RAG Experiment Dataset
Create test cases that include user queries and retrieved documents. The quality of your evaluation depends on realistic, representative test cases.Step 3: Define Your RAG Task
The task function represents your RAG application. It receives inputs and returns the generated response.Step 4: Run the RAG Health Experiment
Use all three evaluators together for comprehensive diagnostics:Step 5: Analyze Diagnostic Results
Examine the results to identify which pipeline stage is causing issues:Step 6: Compare RAG Configurations
Use experiments to compare different RAG configurations:Understanding the Results
Score Interpretation
| Evaluator | High Score | Low Score |
|---|---|---|
| Answer Relevance | Response directly addresses the query | Response misses the point or is off-topic |
| Context Relevance | Retrieved documents support the query | Retrieved documents are irrelevant |
| RAG Faithfulness | Response is grounded in context | Response contains unsupported claims |
Common Diagnostic Patterns
| Answer Relevance | Context Relevance | RAG Faithfulness | Diagnosis |
|---|---|---|---|
| High | High | Yes | Healthy RAG pipeline |
| High | High | No | Hallucination — fix generation |
| Low | High | Yes | Query misunderstanding — fix prompt |
| Low | Low | - | Bad retrieval — fix retrieval |
| High | Low | Yes | Lucky generation — retrieval needs work |
Next Steps
- RAG Health Diagnostics — Conceptual deep-dive into the diagnostic framework
- Evals SDK Advanced Guide — Advanced evaluation patterns
- Evaluator Rules — Set up continuous RAG monitoring in production