CustomJudge evaluators for domain-specific governance.
Use this cookbook when: You have content generation agents (writing reports, customer communications, marketing copy) and need automated quality gates to replace manual review of every draft.
Time to complete: ~20 minutes
Prerequisites
- Fiddler account with API access
- LLM credential configured in Settings > LLM Gateway
pip install fiddler-evals pandas
The Content Generation Challenge
Enterprise content generation agents produce volume that exceeds human review capacity. Without automated quality gates, teams face:- Reviewer fatigue — manually reviewing hundreds of drafts per day
- Inconsistent quality — different reviewers apply different standards
- Brand drift — subtle changes in tone or style go undetected
Recommended Evaluators
Built-In Evaluators (Baseline Quality)
| Evaluator | What It Measures | Value |
|---|---|---|
| Answer Relevance | Does the output address the input instruction? | Instruction adherence |
| Coherence | Logical flow and clarity | Narrative quality |
| Conciseness | Brevity without losing meaning | Message clarity |
| Sentiment | Positive, negative, or neutral tone | Brand alignment |
| Prompt Safety | 11 safety dimensions (toxicity, bias, etc.) | Risk mitigation |
Custom Evaluators (Domain-Specific Governance)
| Evaluator | What It Measures | Value |
|---|---|---|
| Brand Voice Match | Adherence to company style guide | Automated brand governance |
| Bias Detection | Potential bias across multiple dimensions | Compliance and risk mitigation |
Set Up Built-In Evaluators
Replace
URL, TOKEN, and credential names with your Fiddler account details. Find your credentials in Settings > Access Tokens and Settings > LLM Gateway.Create a Brand Voice Match Judge
Use
CustomJudge to evaluate content against your company’s style guide:See Building Custom Judge Evaluators for a deep-dive into
prompt_template, output_fields, and iterative prompt improvement.Production Monitoring
To deploy these evaluators in production:- Evaluator Rules: Configure built-in evaluators (Answer Relevance, Coherence, Conciseness) as Evaluator Rules in your Agentic Monitoring application. See Evaluator Rules.
- Custom Judges in Experiments: Run the Brand Voice Match judge as a recurring experiment against sampled production outputs to track brand compliance over time.
- Alerting: Set up alerts on evaluator score degradation to catch systemic quality drift after model updates or prompt changes.
Next Steps
- Building Custom Judge Evaluators — Deep-dive into
CustomJudgecapabilities - Evaluator Rules — Deploy evaluators in production
- Evals SDK Integration — Integration patterns for agentic workflows
Related: Evaluator Rules — Configure evaluators for production monitoring