For Agentic Monitoring and Experiments, use the
CustomJudge class from the Fiddler Evals SDK instead of Prompt Specs. CustomJudge provides prompt_template (Jinja syntax) and output_fields for structured evaluation. See the Custom Judge Evaluators Cookbook for examples.Prerequisites
- Understanding of Prompt Specs fundamentals
- Completion of the LLM Evaluation Quickstart
- Familiarity with LLM evaluation concepts
Download this tutorial directly from GitHub or run it in Google Colab
Inspect the Results
Note several
Sci/Tech articles were misclassified as World. The reasoning field helps identify trends. We’ll use this to update our prompt spec in the next section.Improve the Accuracy with Descriptions
Just as descriptive field names can help improve model performance, you can also add a task instruction and field descriptions. Here, we will add a description to
topic to help with classifying Sci/Tech articles. Note the improved results.Deploying Your Evaluation to Production
Once you see the results you expect with your test data, deploy the custom evaluation to production and monitor your production application:Update the DataFrame Schema Names
Recall we used
news_summary in our prompt. Let’s make our dataframe match this and add some metadata.Add the Prediction as a Fiddler GenAI Enrichment
namewill be used as part of the generated column name; set it to something meaningful for your use case.enrichmentmust always bellm_as_a_judge.columnsmatches all the input columns your prompt spec uses.configmust set the prompt spec.
Publish Data to Simulate LLM Activity
Our prediction will add two columns:
FDL news_topic (topic) and FDL news_topic (reasoning).
Note: The column names follow the pattern: FDL {enrichment name} ({prompt spec output column}), using values as specified.
Advanced Prompt Specs Configuration
Schema Design Patterns
Multi-Output Evaluation
Domain-Specific ClassificationPerformance Optimization Techniques
Field Description Best Practices- Be Specific: Use concrete examples rather than abstract descriptions
- Avoid Ambiguity: Define edge cases and boundary conditions
- Include Context: Reference domain-specific knowledge when needed
Bring-Your-Own-Prompt
For maximum customization, Fiddler supports custom prompt templates with multiple output format options.Free-Form Output
Best for open-ended evaluations where structure is less important:Guided Choice Output
For single categorical outputs with high accuracy requirements:Guided JSON Output
For complex structured outputs with validation:Additional Documentation
- LLM Observability Overview: Understanding Fiddler’s broader LLM monitoring capabilities
- Enrichments: Technical details on Fiddler’s evaluation infrastructure