Monitoring Agentic Content Generation

Ensure quality, safety, and brand compliance in content generation agents using a combination of Fiddler's built-in evaluators for baseline quality and custom CustomJudge evaluators for domain-specific governance.

Use this cookbook when: You have content generation agents (writing reports, customer communications, marketing copy) and need automated quality gates to replace manual review of every draft.

Time to complete: ~20 minutes

Prerequisites

Fiddler account with API access
LLM credential configured in Settings > LLM Gateway
pip install fiddler-evals pandas

The Content Generation Challenge

Enterprise content generation agents produce volume that exceeds human review capacity. Without automated quality gates, teams face:

Reviewer fatigue — manually reviewing hundreds of drafts per day
Inconsistent quality — different reviewers apply different standards
Brand drift — subtle changes in tone or style go undetected

The solution: combine Fiddler's built-in evaluators (quality, safety) with custom LLM-as-a-Judge evaluators (brand voice, compliance) for automated governance.

Recommended Evaluators

Built-In Evaluators (Baseline Quality)

Evaluator

What It Measures

Value

Answer Relevance

Does the output address the input instruction?

Instruction adherence

Coherence

Logical flow and clarity

Narrative quality

Conciseness

Brevity without losing meaning

Message clarity

Sentiment

Positive, negative, or neutral tone

Brand alignment

Prompt Safety

11 safety dimensions (toxicity, bias, etc.)

Risk mitigation

Custom Evaluators (Domain-Specific Governance)

Evaluator

What It Measures

Value

Brand Voice Match

Adherence to company style guide

Automated brand governance

Bias Detection

Potential bias across multiple dimensions

Compliance and risk mitigation

Set Up Built-In Evaluators

Replace URL, TOKEN, and credential names with your Fiddler account details. Find your credentials in Settings > Access Tokens and Settings > LLM Gateway.

import pandas as pd
from fiddler_evals import init
from fiddler_evals.evaluators import (
    AnswerRelevance,
    Coherence,
    Conciseness,
    CustomJudge,
)

URL = 'https://your-org.fiddler.ai'
TOKEN = 'your-access-token'
LLM_CREDENTIAL_NAME = 'your-llm-credential'
LLM_MODEL_NAME = 'openai/gpt-4o'

init(url=URL, token=TOKEN)

# Built-in evaluators for baseline quality
relevance = AnswerRelevance(model=LLM_MODEL_NAME, credential=LLM_CREDENTIAL_NAME)
coherence = Coherence(model=LLM_MODEL_NAME, credential=LLM_CREDENTIAL_NAME)
conciseness = Conciseness(model=LLM_MODEL_NAME, credential=LLM_CREDENTIAL_NAME)

Create a Brand Voice Match Judge

Use CustomJudge to evaluate content against your company's style guide:

brand_voice_judge = CustomJudge(
    prompt_template="""
        Determine whether the provided content adheres to the provided
        brand guidelines.

        Content: {{ content }}
        Brand Guidelines: {{ brand_guidelines }}
    """,
    output_fields={
        'voice_match_score': {
            'type': 'string',
            'choices': ['Perfect Match', 'Minor Deviations', 'Off-Brand'],
        },
        'reasoning': {'type': 'string'},
    },
    model=LLM_MODEL_NAME,
    credential=LLM_CREDENTIAL_NAME,
)

See Building Custom Judge Evaluators for a deep-dive into prompt_template, output_fields, and iterative prompt improvement.

Evaluate Generated Content

# Example: your brand guidelines
brand_guidelines = (
    "Use professional, approachable tone. "
    "Address customers as 'you'. "
    "Avoid jargon, slang, and exclamation marks. "
    "Keep sentences under 25 words."
)

# Sample content from your agent
generated_content = [
    {
        'instruction': 'Write a welcome email for new customers',
        'content': 'Welcome to our platform. We are glad you chose us. '
            'Your account is ready and you can start exploring features '
            'right away.',
    },
    {
        'instruction': 'Write a welcome email for new customers',
        'content': 'OMG WELCOME!!! You are going to LOVE this!! '
            'Our platform is literally the BEST thing ever!!!',
    },
    {
        'instruction': 'Explain the refund process',
        'content': 'To request a refund, navigate to your order history, '
            'select the item, and click Request Refund. Processing takes '
            '3-5 business days.',
    },
]

# Evaluate each piece of content
for item in generated_content:
    # Built-in evaluators
    rel_score = relevance.score(
        user_query=item['instruction'],
        rag_response=item['content'],
    )
    coh_score = coherence.score(prompt=item['instruction'], response=item['content'])
    con_score = conciseness.score(response=item['content'])

    # Custom brand voice judge
    brand_scores = brand_voice_judge.score(inputs={
        'content': item['content'],
        'brand_guidelines': brand_guidelines,
    })
    brand_dict = {s.name: s for s in brand_scores}

    print(f"\nInstruction: {item['instruction'][:50]}...")
    print(f"  Relevance:  {rel_score.label} ({rel_score.value})")
    print(f"  Coherence:  {coh_score.label}")
    print(f"  Conciseness: {con_score.label}")
    print(f"  Brand Voice: {brand_dict['voice_match_score'].label}")
    print(f"    Reason: {brand_dict['reasoning'].label}")

Expected output:

Instruction: Write a welcome email for new customers...
  Relevance:  high (1.0)
  Coherence:  high
  Conciseness: high
  Brand Voice: Perfect Match
    Reason: Professional tone, addresses customer directly, no jargon or
    exclamation marks, sentences are concise.

Instruction: Write a welcome email for new customers...
  Relevance:  medium (0.5)
  Coherence:  low
  Conciseness: low
  Brand Voice: Off-Brand
    Reason: Uses all-caps, multiple exclamation marks, slang ("OMG",
    "literally"), and informal tone — violates all brand guidelines.

Instruction: Explain the refund process...
  Relevance:  high (1.0)
  Coherence:  high
  Conciseness: high
  Brand Voice: Perfect Match
    Reason: Clear, professional instructions with appropriate tone and
    sentence length.

Build a Quality Gate

Combine evaluator scores into an automated quality gate that flags content for human review:

def quality_gate(instruction, content, brand_guidelines):
    """Automated quality gate for content generation agents.

    Returns 'APPROVED', 'REVIEW', or 'REJECTED' with reasons.
    """
    issues = []

    # Check relevance
    rel = relevance.score(user_query=instruction, rag_response=content)
    if rel.value < 0.5:
        issues.append(f'Low relevance ({rel.label})')

    # Check coherence
    coh = coherence.score(prompt=instruction, response=content)
    if coh.value < 0.5:
        issues.append(f'Low coherence ({coh.label})')

    # Check brand voice
    brand = brand_voice_judge.score(inputs={
        'content': content,
        'brand_guidelines': brand_guidelines,
    })
    brand_dict = {s.name: s for s in brand}
    voice = brand_dict['voice_match_score'].label
    if voice == 'Off-Brand':
        issues.append(f'Off-brand content')
    elif voice == 'Minor Deviations':
        issues.append(f'Minor brand deviations')

    if not issues:
        return 'APPROVED', []
    elif any('Off-Brand' in i or 'Low' in i for i in issues):
        return 'REJECTED', issues
    else:
        return 'REVIEW', issues


# Run the quality gate
for item in generated_content:
    status, issues = quality_gate(
        item['instruction'], item['content'], brand_guidelines,
    )
    print(f"{status}: {item['content'][:60]}...")
    if issues:
        print(f"  Issues: {', '.join(issues)}")

Expected output:

APPROVED: Welcome to our platform. We are glad you chose us. Your ...
REJECTED: OMG WELCOME!!! You are going to LOVE this!! Our platform...
  Issues: Low coherence (low), Off-brand content
APPROVED: To request a refund, navigate to your order history, sel...

Production Monitoring

To deploy these evaluators in production:

Evaluator Rules: Configure built-in evaluators (Answer Relevance, Coherence, Conciseness) as Evaluator Rules in your Agentic Monitoring application. See Evaluator Rules.
Custom Judges in Experiments: Run the Brand Voice Match judge as a recurring experiment against sampled production outputs to track brand compliance over time.
Alerting: Set up alerts on evaluator score degradation to catch systemic quality drift after model updates or prompt changes.

Next Steps

Building Custom Judge Evaluators — Deep-dive into CustomJudge capabilities
Evaluator Rules — Deploy evaluators in production
Evals SDK Integration — Integration patterns for agentic workflows

Related: Evaluator Rules — Configure evaluators for production monitoring

❓ Questions? Talk to a product expert or request a demo.

💡 Need help? Contact us at [email protected].

PreviousDetecting Hallucinations in RAG NextAgentic & LLM Monitoring

hashtagThe Content Generation Challenge

hashtagRecommended Evaluators

hashtagBuilt-In Evaluators (Baseline Quality)

hashtagCustom Evaluators (Domain-Specific Governance)

hashtagSet Up Built-In Evaluators

hashtagCreate a Brand Voice Match Judge

hashtagEvaluate Generated Content

hashtagBuild a Quality Gate

hashtagProduction Monitoring

hashtagNext Steps