Evaluator Rules

Evaluator Rules define how automated evaluations are applied to your application's spans. They connect evaluators (LLM-based or rule-based functions) with span data, specify what inputs to use, and determine which spans qualify for evaluation.


Overview

Evaluator Rules provide the configuration layer between your evaluators and your application's telemetry data. When properly configured, they automatically assess the quality, safety, and performance of your GenAI application based on real-time span data.

What Are Evaluator Rules?

An Evaluator Rule determines how and when an evaluator runs against your application's spans. Each rule consists of four key components:

  1. Evaluator Configuration - The evaluator definition, including provider, model, and prompt

  2. Input Field Mapping - How span data is passed to the evaluator's input variables

  3. Application Rules - Conditions that determine which spans qualify for evaluation

  4. Backfill Configuration - Whether to apply evaluations to historical data

How Evaluator Rules Work

When a new span is created in your application:

  1. The system checks all active Evaluator Rules

  2. Each rule evaluates whether its Application Rules match the span's attributes

  3. If a match is found, the system extracts data from the span using Input Field Mappings

  4. The evaluator runs with the mapped data as input

  5. Results are stored and made available in dashboards and analytics


Key Concepts

Evaluators

An Evaluator is a configured model or function that performs analysis over spans. It can classify, score, or assess the quality of data generated by your application.

Evaluators are defined by:

  • Provider - The LLM provider (OpenAI, Anthropic, Gemini, Fiddler)

  • Model - The specific model to use for evaluation

  • Credentials - Authentication to the provider (configured via LLM Gateway)

  • Prompt or Logic - The evaluation instructions or function

Note: Evaluators are defined at the organization level and shared across all projects in your organization.

Input Mappings

Input Mappings define how data flows from spans into evaluators. Each variable used in an evaluator's prompt (such as {{input}} or {{context}}) must be mapped to a field or attribute in the span data.

For example, if your evaluator prompt includes {{puppynoises}}, you must map that variable to a span attribute like fiddler.contents.gen_ai.llm.input.user.

Application Rules

Application Rules specify filtering conditions that determine which spans qualify for evaluation. Rules use AND/OR logic:

  • AND condition across categories - A span must match ALL rule categories

  • OR condition within a category - A span can match ANY value within a single category

Example:

Rule 1: SpanType = llm
Rule 2: Region = us-east OR us-west

Result: Evaluates spans that are type "llm" AND in either "us-east" or "us-west"

Backfill

Backfill controls whether evaluations apply retroactively to existing historical data or only to spans created after the rule is configured.


Create an Evaluator Rule

Prerequisites

Before creating an Evaluator Rule, ensure you have:

  • Active Application - A GenAI application with span data

  • Configured Evaluators - Organization-level evaluators ready to use

  • LLM Gateway Credentials - If using custom LLM-based evaluators (see LLM Gateway Configuration)


Step-by-Step Guide

1

Select an Evaluator

Navigate to your application in the Fiddler UI and access the evaluator configuration:

  1. Click the Evaluator Rules tab

  2. Click Add Rule in the top-right corner

  3. The Add Evaluator Rule dialog opens with available evaluators

Choose an evaluator from the list:

Fiddler-Provided Evaluators:

  • Topic Classification

  • Embedding

  • Token Count

  • Answer Relevance

  • Coherence

  • Conciseness

  • RAG Faithfulness

  • PII Detection

  • Sentiment Analysis

  • F# Prompt Safety

  • F# Response Faithfulness

  • Llm As A Judge (custom evaluator)

Select an evaluator from the available list
Step 1: Select an evaluator

Configure Custom Evaluator (Llm As A Judge)

If you select Llm As A Judge, you'll need to configure the evaluator:

a. Evaluator Name

  • Enter a descriptive name (e.g., saddestpuppynoises)

b. Provider

  • Select the LLM provider (e.g., fiddler)

c. Credential

  • Choose the API credential for authentication (e.g., dummy)

d. Model

  • Select the specific model (e.g., llama3.1-8b)

e. Prompt Template

  • Enter evaluation instructions with input variables using curly braces: {{variableName}}

  • Example: sad {{puppynoises}}

f. Outputs

  • Define the expected response format in JSON

Example Output Configuration:

{
  "name": "sadnoises",
  "description": "sad puppy noises",
  "type": "categorical",
  "choices": ["sad", "not sad"]
}
Configure custom Llm As A Judge evaluator
Configure evaluator settings including provider, model, prompt, and outputs

Tip: For Fiddler-provided evaluators, the evaluation method and fields are predetermined. You only need to map inputs and configure application rules.

Click Next to continue.

2

Map Input Fields

Map each evaluator input variable to a span attribute.

  1. In the Map Evaluator step, you'll see all required input variables

  2. For each variable (e.g., puppynoises):

    • Click the Select an attribute or enter a custom path dropdown

    • Choose from available span attributes or enter a custom path manually

Common Span Attributes:

  • fiddler.span.user.pirate_completion_score

  • fiddler.contents.gen_ai.llm.context

  • fiddler.span.system.gen_ai.usage.output_tokens

  • fiddler.session.user.region

  • fiddler.span.system.gen_ai.usage.input_tokens

  • fiddler.session.user.max_conversation_turns

  • gen_ai.system

  • fiddler.contents.gen_ai.llm.input.user

  • fiddler.contents.gen_ai.tool.input

  • And many more...

Map evaluator input variables to span attributes
Step 2: Map input fields to span data
  1. Repeat for all input variables

  2. Click Next to continue

Important: All required input variables must be mapped. The evaluator cannot run without complete input mappings.

3

Define Application Rules

Specify which spans should be evaluated by setting filter conditions.

  1. In the Apply Rules step, you'll see the current rule conditions

  2. The info box shows: "This evaluator will apply to spans that match ALL of the following conditions:"

  3. Click Add Rule to add a new condition category

For each rule category:

a. Rule Category

  • Select the attribute type (e.g., Span Type)

b. Values

  • Choose which values to match:

    • chain

    • llm

    • tool

c. Custom Values

  • (Optional) Add specific custom values to match

Define application rules to filter which spans are evaluated
Step 3: Configure application rules

Understanding Rule Logic

  • AND condition across categories - A span must match ALL rule categories

  • OR condition within a category - A span can match ANY value within a single category

Example:

Rule 1: SpanType = llm
Rule 2: Region = us-east OR us-west

Result: Evaluates spans that are type "llm" AND in either "us-east" or "us-west"
  1. Add multiple rule categories as needed

  2. Click Next to continue

4

Configure Backfill and Review

Determine whether to apply the evaluator to existing historical data and review your configuration.

Backfill Configuration

Choose one of three options:

Option 1: Apply to all past data

  • Evaluates all existing spans in the dataset

  • Use when: You need complete historical coverage

  • Warning: May take significant time for large datasets

Option 2: Apply from a specific past date

  • Evaluates spans created after a chosen date

  • Use when: You want partial historical coverage

  • Select the start date using the date picker

Option 3: No backfill (Default)

  • Evaluates only new spans created after activation

  • Use when: You only need forward-looking evaluation

  • Best for: Testing new evaluators or reducing processing time

Configure backfill options and review configuration
Step 4: Configure backfill and review settings

Review Configuration Summary

Evaluator Configuration

  • Evaluator name, model, provider, credential

  • Prompt template and expected outputs

Input Field Mapping

  • Variable → Span attribute mappings

Application Rules

  • Span matching conditions

Performance Tip: Start with "No backfill" to test your evaluator configuration. Once validated, you can create a new rule with backfill enabled.

5

Save and Activate

Complete the configuration and activate your evaluator rule.

  1. Configuration Name

    • Enter a descriptive name for this evaluator rule (e.g., puppyjudge)

    • This name identifies the rule in your application's Evaluator Rules list

  2. Finalize:

    • Click Save to activate the rule

    • Or click Back to modify any settings

    • Or click Cancel to discard the configuration

Save configuration and activate the evaluator rule
Step 5: Name and save your evaluator rule

Once saved, the evaluator rule becomes active and begins evaluating spans that match your criteria.


Manage Evaluator Rules

View Active Rules

Navigate to the Evaluator Rules tab in your application to see all configured rules.

The Evaluator Rules table displays:

Column
Description

Rule Name

The configuration name you assigned

Rule

Span-matching conditions (e.g., "SpanName undefined: ChatOpenAI")

Input Mappings

Mapped input fields (e.g., "CONTEXT: gen_ai.llm.con...")

Outputs

Expected output fields (e.g., "faithful_prob", "spans")

Status

Active or Inactive

Created At

Date the rule was created

Activate or Deactivate a Rule

Toggle a rule's status without deleting it:

  1. Locate the rule in the Evaluator Rules table

  2. Click the Status toggle to activate or deactivate

    • Active - Rule is running on matching spans

    • Inactive - Rule is paused and not evaluating new spans

Delete a Rule

Remove a rule permanently:

  1. Locate the rule in the Evaluator Rules table

  2. Click the delete icon (trash can) at the end of the row

  3. Confirm the deletion when prompted

Warning: Deleting a rule does not remove evaluation results already generated. Historical evaluation data remains in your analytics.


Best Practices

Evaluator Configuration

  • Use Descriptive Names - Name evaluators and rules clearly (e.g., rag_faithfulness_prod instead of rule1)

  • Test Before Backfill - Create rules without backfill first, validate results, then create a new rule with backfill if needed

  • Version Your Prompts - Include version identifiers in custom judge names (e.g., topic_classifier_v2)

Input Mapping

  • Validate Paths - Ensure span attributes exist before mapping

  • Use Consistent Paths - Standardize attribute naming across your application

  • Document Custom Paths - Keep a reference of custom attribute paths for your team

Application Rules

  • Start Broad, Refine Later - Begin with simple rules, add complexity as needed

  • Avoid Over-Filtering - Don't create rules so specific that they match too few spans

  • Test Rule Logic - Verify spans are matching as expected using span search

Performance Optimization

  • Limit Backfill Scope - Use date-based backfill instead of "all past data" for large datasets

  • Monitor Evaluation Latency - Track how long evaluations take and optimize prompts if needed

  • Batch Similar Rules - Group related evaluations to reduce overhead


Troubleshooting

Evaluator Not Running

Issue: Rule is active but not producing results.

Solutions:

  • Verify Application Rules match actual span attributes

  • Check that all input mappings point to valid span fields

  • Ensure LLM Gateway credentials are valid and not expired

  • Review span data to confirm matching spans exist

Missing Input Data

Issue: Evaluator fails due to missing input values.

Solutions:

  • Verify the span attribute path is correct

  • Check that the attribute exists in your span schema

  • Ensure spans contain data for the mapped field

  • Use a different attribute or add the field to your instrumentation

Backfill Taking Too Long

Issue: Historical evaluation is processing slowly.

Solutions:

  • Use date-based backfill instead of all past data

  • Start with recent data and expand the date range gradually

  • Consider creating multiple rules for different time periods

  • Deactivate unnecessary rules to free up processing capacity

Unexpected Evaluation Results

Issue: Evaluator produces unexpected scores or classifications.

Solutions:

  • Review the evaluator prompt template for clarity

  • Verify input mappings are passing the correct data

  • Test the evaluator with sample data outside Fiddler

  • Check for prompt ambiguity or missing context

  • Adjust the prompt and create a new rule version




Questions? Talk to a product expert or request a demo.

💡 Need help? Contact us at [email protected].