Evaluator Rules
Evaluator Rules define how automated evaluations are applied to your application's spans. They connect evaluators (LLM-based or rule-based functions) with span data, specify what inputs to use, and determine which spans qualify for evaluation.
Overview
Evaluator Rules provide the configuration layer between your evaluators and your application's telemetry data. When properly configured, they automatically assess the quality, safety, and performance of your GenAI application based on real-time span data.
What Are Evaluator Rules?
An Evaluator Rule determines how and when an evaluator runs against your application's spans. Each rule consists of four key components:
Evaluator Configuration - The evaluator definition, including provider, model, and prompt
Input Field Mapping - How span data is passed to the evaluator's input variables
Application Rules - Conditions that determine which spans qualify for evaluation
Backfill Configuration - Whether to apply evaluations to historical data
How Evaluator Rules Work
When a new span is created in your application:
The system checks all active Evaluator Rules
Each rule evaluates whether its Application Rules match the span's attributes
If a match is found, the system extracts data from the span using Input Field Mappings
The evaluator runs with the mapped data as input
Results are stored and made available in dashboards and analytics
Key Concepts
Evaluators
An Evaluator is a configured model or function that performs analysis over spans. It can classify, score, or assess the quality of data generated by your application.
Evaluators are defined by:
Provider - The LLM provider (OpenAI, Anthropic, Gemini, Fiddler)
Model - The specific model to use for evaluation
Credentials - Authentication to the provider (configured via LLM Gateway)
Prompt or Logic - The evaluation instructions or function
Input Mappings
Input Mappings define how data flows from spans into evaluators. Each variable used in an evaluator's prompt (such as {{input}} or {{context}}) must be mapped to a field or attribute in the span data.
For example, if your evaluator prompt includes {{puppynoises}}, you must map that variable to a span attribute like fiddler.contents.gen_ai.llm.input.user.
Application Rules
Application Rules specify filtering conditions that determine which spans qualify for evaluation. Rules use AND/OR logic:
AND condition across categories - A span must match ALL rule categories
OR condition within a category - A span can match ANY value within a single category
Example:
Rule 1: SpanType = llm
Rule 2: Region = us-east OR us-west
Result: Evaluates spans that are type "llm" AND in either "us-east" or "us-west"Backfill
Backfill controls whether evaluations apply retroactively to existing historical data or only to spans created after the rule is configured.
The backfill process runtime depends on the volume of data in your history. Be certain to backfill only as needed.
Create an Evaluator Rule
Prerequisites
Before creating an Evaluator Rule, ensure you have:
Active Application - A GenAI application with span data
Configured Evaluators - Organization-level evaluators ready to use
LLM Gateway Credentials - If using custom LLM-based evaluators (see LLM Gateway Configuration)
Step-by-Step Guide
Select an Evaluator
Navigate to your application in the Fiddler UI and access the evaluator configuration:
Click the Evaluator Rules tab
Click Add Rule in the top-right corner
The Add Evaluator Rule dialog opens with available evaluators
Choose an evaluator from the list:
Fiddler-Provided Evaluators:
Topic Classification
Embedding
Token Count
Answer Relevance
Coherence
Conciseness
RAG Faithfulness
PII Detection
Sentiment Analysis
F# Prompt Safety
F# Response Faithfulness
Llm As A Judge (custom evaluator)

Configure Custom Evaluator (Llm As A Judge)
If you select Llm As A Judge, you'll need to configure the evaluator:
a. Evaluator Name
Enter a descriptive name (e.g.,
saddestpuppynoises)
b. Provider
Select the LLM provider (e.g.,
fiddler)
c. Credential
Choose the API credential for authentication (e.g.,
dummy)
d. Model
Select the specific model (e.g.,
llama3.1-8b)
e. Prompt Template
Enter evaluation instructions with input variables using curly braces:
{{variableName}}Example:
sad {{puppynoises}}
f. Outputs
Define the expected response format in JSON
Example Output Configuration:
{
"name": "sadnoises",
"description": "sad puppy noises",
"type": "categorical",
"choices": ["sad", "not sad"]
}
Tip: For Fiddler-provided evaluators, the evaluation method and fields are predetermined. You only need to map inputs and configure application rules.
Click Next to continue.
Map Input Fields
Map each evaluator input variable to a span attribute.
In the Map Evaluator step, you'll see all required input variables
For each variable (e.g.,
puppynoises):Click the Select an attribute or enter a custom path dropdown
Choose from available span attributes or enter a custom path manually
Common Span Attributes:
fiddler.span.user.pirate_completion_scorefiddler.contents.gen_ai.llm.contextfiddler.span.system.gen_ai.usage.output_tokensfiddler.session.user.regionfiddler.span.system.gen_ai.usage.input_tokensfiddler.session.user.max_conversation_turnsgen_ai.systemfiddler.contents.gen_ai.llm.input.userfiddler.contents.gen_ai.tool.inputAnd many more...

Repeat for all input variables
Click Next to continue
Important: All required input variables must be mapped. The evaluator cannot run without complete input mappings.
Define Application Rules
Specify which spans should be evaluated by setting filter conditions.
In the Apply Rules step, you'll see the current rule conditions
The info box shows: "This evaluator will apply to spans that match ALL of the following conditions:"
Click Add Rule to add a new condition category
For each rule category:
a. Rule Category
Select the attribute type (e.g.,
Span Type)
b. Values
Choose which values to match:
chainllm✓tool
c. Custom Values
(Optional) Add specific custom values to match

Understanding Rule Logic
AND condition across categories - A span must match ALL rule categories
OR condition within a category - A span can match ANY value within a single category
Example:
Rule 1: SpanType = llm
Rule 2: Region = us-east OR us-west
Result: Evaluates spans that are type "llm" AND in either "us-east" or "us-west"Add multiple rule categories as needed
Click Next to continue
Configure Backfill and Review
Determine whether to apply the evaluator to existing historical data and review your configuration.
Backfill Configuration
Choose one of three options:
Option 1: Apply to all past data
Evaluates all existing spans in the dataset
Use when: You need complete historical coverage
Warning: May take significant time for large datasets
Option 2: Apply from a specific past date
Evaluates spans created after a chosen date
Use when: You want partial historical coverage
Select the start date using the date picker
Option 3: No backfill (Default)
Evaluates only new spans created after activation
Use when: You only need forward-looking evaluation
Best for: Testing new evaluators or reducing processing time

Review Configuration Summary
Evaluator Configuration
Evaluator name, model, provider, credential
Prompt template and expected outputs
Input Field Mapping
Variable → Span attribute mappings
Application Rules
Span matching conditions
Performance Tip: Start with "No backfill" to test your evaluator configuration. Once validated, you can create a new rule with backfill enabled.
Save and Activate
Complete the configuration and activate your evaluator rule.
Configuration Name
Enter a descriptive name for this evaluator rule (e.g.,
puppyjudge)This name identifies the rule in your application's Evaluator Rules list
Finalize:
Click Save to activate the rule
Or click Back to modify any settings
Or click Cancel to discard the configuration

Once saved, the evaluator rule becomes active and begins evaluating spans that match your criteria.
Manage Evaluator Rules
View Active Rules
Navigate to the Evaluator Rules tab in your application to see all configured rules.
The Evaluator Rules table displays:
Rule Name
The configuration name you assigned
Rule
Span-matching conditions (e.g., "SpanName undefined: ChatOpenAI")
Input Mappings
Mapped input fields (e.g., "CONTEXT: gen_ai.llm.con...")
Outputs
Expected output fields (e.g., "faithful_prob", "spans")
Status
Active or Inactive
Created At
Date the rule was created
Activate or Deactivate a Rule
Toggle a rule's status without deleting it:
Locate the rule in the Evaluator Rules table
Click the Status toggle to activate or deactivate
Active - Rule is running on matching spans
Inactive - Rule is paused and not evaluating new spans
Delete a Rule
Remove a rule permanently:
Locate the rule in the Evaluator Rules table
Click the delete icon (trash can) at the end of the row
Confirm the deletion when prompted
Warning: Deleting a rule does not remove evaluation results already generated. Historical evaluation data remains in your analytics.
Best Practices
Evaluator Configuration
Use Descriptive Names - Name evaluators and rules clearly (e.g.,
rag_faithfulness_prodinstead ofrule1)Test Before Backfill - Create rules without backfill first, validate results, then create a new rule with backfill if needed
Version Your Prompts - Include version identifiers in custom judge names (e.g.,
topic_classifier_v2)
Input Mapping
Validate Paths - Ensure span attributes exist before mapping
Use Consistent Paths - Standardize attribute naming across your application
Document Custom Paths - Keep a reference of custom attribute paths for your team
Application Rules
Start Broad, Refine Later - Begin with simple rules, add complexity as needed
Avoid Over-Filtering - Don't create rules so specific that they match too few spans
Test Rule Logic - Verify spans are matching as expected using span search
Performance Optimization
Limit Backfill Scope - Use date-based backfill instead of "all past data" for large datasets
Monitor Evaluation Latency - Track how long evaluations take and optimize prompts if needed
Batch Similar Rules - Group related evaluations to reduce overhead
Troubleshooting
Evaluator Not Running
Issue: Rule is active but not producing results.
Solutions:
Verify Application Rules match actual span attributes
Check that all input mappings point to valid span fields
Ensure LLM Gateway credentials are valid and not expired
Review span data to confirm matching spans exist
Missing Input Data
Issue: Evaluator fails due to missing input values.
Solutions:
Verify the span attribute path is correct
Check that the attribute exists in your span schema
Ensure spans contain data for the mapped field
Use a different attribute or add the field to your instrumentation
Backfill Taking Too Long
Issue: Historical evaluation is processing slowly.
Solutions:
Use date-based backfill instead of all past data
Start with recent data and expand the date range gradually
Consider creating multiple rules for different time periods
Deactivate unnecessary rules to free up processing capacity
Unexpected Evaluation Results
Issue: Evaluator produces unexpected scores or classifications.
Solutions:
Review the evaluator prompt template for clarity
Verify input mappings are passing the correct data
Test the evaluator with sample data outside Fiddler
Check for prompt ambiguity or missing context
Adjust the prompt and create a new rule version
Related Documentation
LLM Gateway Configuration - Configure LLM provider credentials
Fiddler Evals SDK - Create and manage evaluators programmatically
Custom Evaluators - Build custom evaluation logic
Application Monitoring - Monitor your GenAI applications
❓ Questions? Talk to a product expert or request a demo.
💡 Need help? Contact us at [email protected].