CustomJudge to encode business rules, quality criteria, or classification tasks that built-in evaluators don’t cover.
Use this cookbook when: You need evaluation criteria specific to your domain, such as topic classification, brand voice matching, compliance checking, or custom quality rubrics.
Time to complete: ~20 minutes
Prerequisites
- Fiddler account with API access
- LLM credential configured in Settings > LLM Gateway
pip install fiddler-evals pandas
Connect to Fiddler
Replace
URL, TOKEN, and credential names with your Fiddler account details. Find your credentials in Settings > Access Tokens and Settings > LLM Gateway.Prepare Test Data
This example classifies news summaries into topics — Sci/Tech, Sports, Business, or World:
Create a CustomJudge
Define your evaluation criteria using a
prompt_template with {{ placeholder }} markers and output_fields that define the structured response:How It Works
prompt_template: Your evaluation prompt with{{ placeholder }}markers (Jinja syntax). Placeholders are filled from theinputsdict passed to.score().output_fields: Schema defining the expected outputs. Each field specifies atype(string,boolean,integer,number) and optionalchoicesordescription.
Output Field Types
CustomJudge supports four output field types:
| Type | Description | Example Use |
|---|---|---|
string | Free-form text or categorical (with choices) | Topic classification, reasoning |
boolean | True/False | Compliance checks, binary quality gates |
integer | Whole numbers | 1-5 rating scales |
number | Floating-point | 0.0-1.0 confidence scores |
Using choices for Categorical Output
Using description to Guide the LLM
Real-World Examples
Brand Voice Match
Evaluate whether generated content adheres to brand guidelines:Compliance Checking
Verify responses meet regulatory requirements:Next Steps
- Running RAG Experiments at Scale — Use CustomJudge evaluators in structured experiments
- Monitoring Agentic Content Generation — Combine built-in evaluators with custom Brand Voice judges
- Evaluator Rules — Deploy custom evaluators in production monitoring
Source notebook: Fiddler Cookbook: Custom Judge Evaluators