Detecting Hallucinations in RAG
The Two-Layer Approach
Layer
Tool
Purpose
Layer 1: Pre-Deployment Evaluation
1
Set Up and Connect
import pandas as pd
from fiddler_evals import init, evaluate, Project, Application, Dataset
from fiddler_evals.evaluators import (
AnswerRelevance,
ContextRelevance,
RAGFaithfulness,
)
URL = 'https://your-org.fiddler.ai'
TOKEN = 'your-access-token'
LLM_CREDENTIAL_NAME = 'your-llm-credential'
LLM_MODEL_NAME = 'openai/gpt-4o'
init(url=URL, token=TOKEN)
project = Project.get_or_create(name='hallucination_detection')
app = Application.get_or_create(
name='rag-hallucination-test',
project_id=project.id,
)
dataset = Dataset.get_or_create(
name='hallucination-scenarios',
application_id=app.id,
)2
Create Hallucination-Focused Test Cases
hallucination_scenarios = pd.DataFrame(
[
{
'scenario': 'Grounded response',
'user_query': 'What is the return policy?',
'retrieved_documents': [
'Returns accepted within 30 days with receipt.',
],
'rag_response': 'You can return items within 30 days '
'if you have a receipt.',
},
{
'scenario': 'Fabricated details',
'user_query': 'What is the return policy?',
'retrieved_documents': [
'Returns accepted within 30 days with receipt.',
],
'rag_response': 'You can return items within 60 days. '
'No receipt needed. We also offer free shipping on returns.',
},
{
'scenario': 'Insufficient context',
'user_query': 'What are the shipping costs?',
'retrieved_documents': [
'We ship to all 50 US states.',
],
'rag_response': 'Standard shipping is $5.99 and express '
'shipping is $12.99.',
},
]
)
dataset.insert_from_pandas(
df=hallucination_scenarios,
input_columns=['user_query', 'retrieved_documents', 'rag_response'],
metadata_columns=['scenario'],
)3
Run the Diagnostic Evaluation
def passthrough_task(inputs, extras, metadata):
return {
'rag_response': inputs['rag_response'],
'retrieved_documents': inputs['retrieved_documents'],
}
result = evaluate(
dataset=dataset,
task=passthrough_task,
evaluators=[
RAGFaithfulness(model=LLM_MODEL_NAME, credential=LLM_CREDENTIAL_NAME),
AnswerRelevance(model=LLM_MODEL_NAME, credential=LLM_CREDENTIAL_NAME),
ContextRelevance(model=LLM_MODEL_NAME, credential=LLM_CREDENTIAL_NAME),
],
score_fn_kwargs_mapping={
'user_query': lambda x: x['inputs']['user_query'],
'retrieved_documents': 'retrieved_documents',
'rag_response': 'rag_response',
},
)4
Interpret Results
for r in result.results:
scores = {s.evaluator_name: s for s in r.scores}
scenario = r.dataset_item.metadata.get('scenario', 'unknown')
faithfulness = scores.get('rag_faithfulness')
relevance = scores.get('answer_relevance')
context = scores.get('context_relevance')
# Classify the failure mode
if faithfulness and faithfulness.value == 0:
diagnosis = 'HALLUCINATION'
elif context and context.value < 0.5:
diagnosis = 'BAD RETRIEVAL'
elif relevance and relevance.value < 0.5:
diagnosis = 'OFF-TOPIC'
else:
diagnosis = 'HEALTHY'
print(f'{scenario}: {diagnosis}')
if faithfulness:
print(f' Faithfulness: {faithfulness.label} — {faithfulness.reasoning}')Grounded response: HEALTHY
Faithfulness: yes — The response accurately reflects the return policy
stated in the retrieved document.
Fabricated details: HALLUCINATION
Faithfulness: no — The response claims a 60-day return window and no
receipt requirement, but the source document states 30 days with receipt.
Insufficient context: HALLUCINATION
Faithfulness: no — The response provides specific prices ($5.99, $12.99)
that are not supported by the retrieved document.Layer 2: Production Monitoring
Combining Both Layers
Stage
What to Do
Tool