# Evaluator Rules

<figure><img src="https://3170638587-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F82RHcnYWV62fvrxMeeBB%2Fuploads%2Fgit-blob-55dac804207931386120b0811f1becf4ba1c454c%2Fevaluator-rules-hero%20(1).svg?alt=media" alt="Evaluator Rules workflow: Configure, Filter, Evaluate"><figcaption></figcaption></figure>

Evaluator Rules define how automated evaluations are applied to your application's spans. They connect evaluators (LLM-based or rule-based functions) with span data, specify what inputs to use, and determine which spans qualify for evaluation.

***

## Overview

Evaluator Rules provide the configuration layer between your evaluators and your application's telemetry data. When properly configured, they automatically assess the quality, safety, and performance of your GenAI application based on real-time span data.

### What Are Evaluator Rules?

An **Evaluator Rule** determines how and when an evaluator runs against your application's spans. Each rule consists of four key components:

1. **Evaluator Configuration** - The evaluator definition, including provider, model, and prompt
2. **Input Field Mapping** - How span data is passed to the evaluator's input variables
3. **Application Rules** - Conditions that determine which spans qualify for evaluation
4. **Backfill Configuration** - Whether to apply evaluations to historical data

### How Evaluator Rules Work

When a new span is created in your application:

1. The system checks all active Evaluator Rules
2. Each rule evaluates whether its Application Rules match the span's attributes
3. If a match is found, the system extracts data from the span using Input Field Mappings
4. The evaluator runs with the mapped data as input
5. Results are stored and made available in dashboards and analytics

***

## Key Concepts

### Evaluators

An **Evaluator** is a configured model or function that performs analysis over spans. It can classify, score, or assess the quality of data generated by your application.

Evaluators are defined by:

* **Provider** - The LLM provider (OpenAI, Anthropic, Gemini, Fiddler)
* **Model** - The specific model to use for evaluation
* **Credentials** - Authentication to the provider (configured via [LLM Gateway](https://docs.fiddler.ai/reference/settings/llm-gateway))
* **Prompt or Logic** - The evaluation instructions or function

{% hint style="info" %}
**Note:** Evaluators are defined at the organization level and shared across all projects in your organization.
{% endhint %}

### Input Mappings

**Input Mappings** define how data flows from spans into evaluators. Each variable used in an evaluator's prompt (such as `{{input}}` or `{{context}}`) must be mapped to a field or attribute in the span data.

For example, if your evaluator prompt includes `{{puppynoises}}`, you must map that variable to a span attribute like `fiddler.contents.gen_ai.llm.input.user`.

### Application Rules

**Application Rules** specify filtering conditions that determine which spans qualify for evaluation. Rules use AND/OR logic:

* **AND condition across categories** - A span must match ALL rule categories
* **OR condition within a category** - A span can match ANY value within a single category

**Example:**

```
Rule 1: SpanType = llm
Rule 2: Region = us-east OR us-west

Result: Evaluates spans that are type "llm" AND in either "us-east" or "us-west"
```

### Backfill

**Backfill** controls whether evaluations apply retroactively to existing historical data or only to spans created after the rule is configured.

{% hint style="warning" %}
The backfill process runtime depends on the volume of data in your history. Be certain to backfill only as needed.
{% endhint %}

***

## Create an Evaluator Rule

### Prerequisites

Before creating an Evaluator Rule, ensure you have:

* **Active Application** - A GenAI application with span data
* **Configured Evaluators** - Organization-level evaluators ready to use
* **LLM Gateway Credentials** - If using custom LLM-based evaluators (see [LLM Gateway Configuration](https://docs.fiddler.ai/reference/settings/llm-gateway))

***

### Step-by-Step Guide

{% stepper %}
{% step %}
**Select an Evaluator**

Navigate to your application in the Fiddler UI and access the evaluator configuration:

1. Click the **Evaluator Rules** tab
2. Click **Add Rule** in the top-right corner
3. The **Add Evaluator Rule** dialog opens with available evaluators

Choose an evaluator from the list:

**Fiddler-Provided Evaluators:**

* Topic Classification
* Embedding
* Token Count
* Answer Relevance
* Coherence
* Conciseness
* Context Relevance
* RAG Faithfulness
* PII Detection
* Sentiment Analysis
* F# Prompt Safety
* F# Response Faithfulness
* **Llm As A Judge** (custom evaluator)

<figure><img src="https://3170638587-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F82RHcnYWV62fvrxMeeBB%2Fuploads%2Fgit-blob-5e68da84fa3e3a681e1458e751ec43ca45db8c19%2Fevaluator-rules-step1-select%20(1).png?alt=media" alt="Select an evaluator from the available list"><figcaption><p>Step 1: Select an evaluator</p></figcaption></figure>

**Configure Custom Evaluator (Llm As A Judge)**

If you select **Llm As A Judge**, you'll need to configure the evaluator:

**a. Evaluator Name**

* Enter a descriptive name (e.g., `saddestpuppynoises`)

**b. Provider**

* Select the LLM provider (e.g., `fiddler`)

**c. Credential**

* Choose the API credential for authentication (e.g., `dummy`)

**d. Model**

* Select the specific model (e.g., `llama3.1-8b`)

**e. Prompt Template**

* Enter evaluation instructions with input variables using curly braces: `{{variableName}}`
* **Example:** `sad {{puppynoises}}`

**f. Outputs**

* Define the expected response format in JSON

**Example Output Configuration:**

```json
{
  "name": "sadnoises",
  "description": "sad puppy noises",
  "type": "categorical",
  "choices": ["sad", "not sad"]
}
```

<figure><img src="https://3170638587-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F82RHcnYWV62fvrxMeeBB%2Fuploads%2Fgit-blob-e402213ffd7b4853204975722bebf1cfd3b5a6e8%2Fevaluator-rules-step1-configure%20(1).png?alt=media" alt="Configure custom Llm As A Judge evaluator"><figcaption><p>Configure evaluator settings including provider, model, prompt, and outputs</p></figcaption></figure>

{% hint style="info" %}
**Tip:** For Fiddler-provided evaluators, the evaluation method and fields are predetermined. You only need to map inputs and configure application rules.
{% endhint %}

Click **Next** to continue.
{% endstep %}

{% step %}
**Map Input Fields**

Map each evaluator input variable to a span attribute.

1. In the **Map Evaluator** step, you'll see all required input variables
2. For each variable (e.g., `puppynoises`):
   * Click the **Select an attribute or enter a custom path** dropdown
   * Choose from available span attributes or enter a custom path manually

**Common Span Attributes:**

* `fiddler.span.user.pirate_completion_score`
* `fiddler.contents.gen_ai.llm.context`
* `fiddler.span.system.gen_ai.usage.output_tokens`
* `fiddler.session.user.region`
* `fiddler.span.system.gen_ai.usage.input_tokens`
* `fiddler.session.user.max_conversation_turns`
* `gen_ai.system`
* `fiddler.contents.gen_ai.llm.input.user`
* `fiddler.contents.gen_ai.tool.input`
* And many more...

<figure><img src="https://3170638587-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F82RHcnYWV62fvrxMeeBB%2Fuploads%2Fgit-blob-b2c5fb2de5321a55718abdeb5cecde64311a6ccd%2Fevaluator-rules-step2-mapping%20(1).png?alt=media" alt="Map evaluator input variables to span attributes"><figcaption><p>Step 2: Map input fields to span data</p></figcaption></figure>

3. Repeat for all input variables
4. Click **Next** to continue

{% hint style="info" %}
All required input variables must be mapped. The evaluator cannot run without complete input mappings.
{% endhint %}
{% endstep %}

{% step %}
**Define Application Rules**

Specify which spans to evaluate by setting filter conditions.

1. In the **Apply Rules** step, you'll see the current rule conditions
2. The info box shows: **"This evaluator will apply to spans that match ALL of the following conditions:"**
3. Click **Add Rule** to add a new condition category

For each rule category:

**a. Rule Category**

* Select the attribute type (e.g., `Span Type`)

**b. Values**

* Choose which values to match:
  * `chain`
  * `llm` ✓
  * `tool`

**c. Custom Values**

* (Optional) Add specific custom values to match

<figure><img src="https://3170638587-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F82RHcnYWV62fvrxMeeBB%2Fuploads%2Fgit-blob-8bdf6e0035f3d9be0cc8d0a079fc45056d08cb5a%2Fevaluator-rules-step3-rules%20(1).png?alt=media" alt="Define application rules to filter which spans are evaluated"><figcaption><p>Step 3: Configure application rules</p></figcaption></figure>

**Understanding Rule Logic**

* **AND condition across categories** - A span must match ALL rule categories
* **OR condition within a category** - A span can match ANY value within a single category

**Example:**

```
Rule 1: SpanType = llm
Rule 2: Region = us-east OR us-west

Result: Evaluates spans that are type "llm" AND in either "us-east" or "us-west"
```

5. Add multiple rule categories as needed
6. Click **Next** to continue
   {% endstep %}

{% step %}
**Configure Backfill and Review**

Determine whether to apply the evaluator to existing historical data and review your configuration.

**Backfill Configuration**

Choose one of three options:

**Option 1: Apply to all past data**

* Evaluates all existing spans in the dataset
* Use when: You need complete historical coverage
* Warning: May take significant time for large datasets

**Option 2: Apply from a specific past date**

* Evaluates spans created after a chosen date
* Use when: You want partial historical coverage
* Select the start date using the date picker

**Option 3: No backfill** (Default)

* Evaluates only new spans created after activation
* Use when: You only need a forward-looking evaluation
* Best for: Testing new evaluators or reducing processing time

<figure><img src="https://3170638587-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F82RHcnYWV62fvrxMeeBB%2Fuploads%2Fgit-blob-a18522aa8296b1ca528ab969248e960c91fda445%2Fevaluator-rules-step4-backfill%20(1).png?alt=media" alt="Configure backfill options and review configuration"><figcaption><p>Step 4: Configure backfill and review settings</p></figcaption></figure>

**Review Configuration Summary**

**Evaluator Configuration**

* Evaluator name, model, provider, credential
* Prompt template and expected outputs

**Input Field Mapping**

* Variable → Span attribute mappings

**Application Rules**

* Span matching conditions

{% hint style="info" %}
**Performance Tip:** Start with "No backfill" to test your evaluator configuration. Once validated, you can create a new rule with backfill enabled.
{% endhint %}
{% endstep %}

{% step %}
**Save and Activate**

Complete the configuration and activate your evaluator rule.

1. **Configuration Name**
   * Enter a descriptive name for this evaluator rule (e.g., `puppyjudge`)
   * This name identifies the rule in your application's Evaluator Rules list
2. **Finalize:**
   * Click **Save** to activate the rule
   * Or click **Back** to modify any settings
   * Or click **Cancel** to discard the configuration

<figure><img src="https://3170638587-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F82RHcnYWV62fvrxMeeBB%2Fuploads%2Fgit-blob-d1d2f6054bf300c22a19e603466526a5478077e2%2Fevaluator-rules-step5-save%20(1).png?alt=media" alt="Save configuration and activate the evaluator rule"><figcaption><p>Step 5: Name and save your evaluator rule</p></figcaption></figure>

Once saved, the evaluator rule becomes active and begins evaluating spans that match your criteria.
{% endstep %}
{% endstepper %}

***

## Manage Evaluator Rules

### View Active Rules

Navigate to the **Evaluator Rules** tab in your application to see all configured rules.

The Evaluator Rules table displays:

| Column             | Description                                                       |
| ------------------ | ----------------------------------------------------------------- |
| **Rule Name**      | The configuration name you assigned                               |
| **Rule**           | Span-matching conditions (e.g., "SpanName undefined: ChatOpenAI") |
| **Input Mappings** | Mapped input fields (e.g., "CONTEXT: gen\_ai.llm.con...")         |
| **Outputs**        | Expected output fields (e.g., "faithful\_prob", "spans")          |
| **Status**         | Active or Inactive                                                |
| **Created At**     | Date the rule was created                                         |

### Activate or Deactivate a Rule

Toggle a rule's status without deleting it:

1. Locate the rule in the Evaluator Rules table
2. Click the **Status** toggle to activate or deactivate
   * **Active** - Rule is running on matching spans
   * **Inactive** - Rule is paused and not evaluating new spans

### Delete a Rule

Remove a rule permanently:

1. Locate the rule in the Evaluator Rules table
2. Click the **delete** icon (trash can) at the end of the row
3. Confirm the deletion when prompted

{% hint style="warning" %}
Deleting a rule does not remove evaluator results already generated. Historical evaluator data remains in your analytics.
{% endhint %}

***

## Best Practices

### Evaluator Configuration

* **Use Descriptive Names** - Name evaluators and rules clearly (e.g., `rag_faithfulness_prod` instead of `rule1`)
* **Test Before Backfill** - Create rules without backfill first, validate results, then create a new rule with backfill if needed
* **Version Your Prompts** - Include version identifiers in custom judge names (e.g., `topic_classifier_v2`)

### Input Mapping

* **Validate Paths** - Ensure span attributes exist before mapping
* **Use Consistent Paths** - Standardize attribute naming across your application
* **Document Custom Paths** - Keep a reference of custom attribute paths for your team

### Application Rules

* **Start Broad, Refine Later** - Begin with simple rules, add complexity as needed
* **Avoid Over-Filtering** - Don't create rules so specific that they match too few spans
* **Test Rule Logic** - Verify spans are matching as expected using span search

### Performance Optimization

* **Limit Backfill Scope** - Use date-based backfill instead of "all past data" for large datasets
* **Monitor Evaluation Latency** - Track how long evaluations take and optimize prompts if needed
* **Batch Similar Rules** - Group related evaluations to reduce overhead

***

## Troubleshooting

### Evaluator Not Running

**Issue:** Rule is active but not producing results.

**Solutions:**

* Verify Application Rules match actual span attributes
* Check that all input mappings point to valid span fields
* Ensure LLM Gateway credentials are valid and not expired
* Review span data to confirm matching spans exist

### Missing Input Data

**Issue:** Evaluator fails due to missing input values.

**Solutions:**

* Verify the span attribute path is correct
* Check that the attribute exists in your span schema
* Ensure spans contain data for the mapped field
* Use a different attribute or add the field to your instrumentation

### Backfill Taking Too Long

**Issue:** Historical evaluation is processing slowly.

**Solutions:**

* Use date-based backfill instead of all past data
* Start with recent data and expand the date range gradually
* Consider creating multiple rules for different time periods
* Deactivate unnecessary rules to free up processing capacity

### Unexpected Evaluator Results

**Issue:** Evaluator produces unexpected scores or classifications.

**Solutions:**

* Review the evaluator prompt template for clarity
* Verify input mappings are passing the correct data
* Test the evaluator with sample data outside Fiddler
* Check for prompt ambiguity or missing context
* Adjust the prompt and create a new rule version

***

## Related Documentation

* [**LLM Gateway Configuration**](https://docs.fiddler.ai/reference/settings/llm-gateway) - Configure LLM provider credentials
* [**Fiddler Evals SDK**](https://app.gitbook.com/s/rsvU8AIQ2ZL9arerribd/fiddler-evals-sdk) - Create and manage evaluators programmatically
* [**Custom Evaluators**](https://docs.fiddler.ai/evaluate-and-test/overview) - Build custom evaluation logic
* [**Application Monitoring**](https://docs.fiddler.ai/getting-started/agentic-monitoring) - Monitor your GenAI applications

***

:question: Questions? [Talk](https://www.fiddler.ai/contact-sales) to a product expert or [request](https://www.fiddler.ai/demo) a demo.

:bulb: Need help? Contact us at <support@fiddler.ai>.
