CustomJudge
CustomJudge
Create a fully customizable LLM-as-a-Judge evaluator with your own prompt and output schema.
The CustomJudge evaluator allows you to define arbitrary evaluation criteria by specifying a custom prompt template and structured output fields. This is the most flexible evaluator in the Fiddler Evals SDK, enabling you to build domain-specific evaluation logic without writing custom code.
Key Features:
Custom Prompts: Define your own evaluation prompt with
{{ placeholder }}syntaxStructured Outputs: Specify typed output fields (string, boolean, integer, number)
Categorical Choices: Constrain string outputs to predefined categories
Multi-Field Outputs: Return multiple scores/labels from a single evaluation
Field Descriptions: Guide the LLM with descriptions for each output field
Use Cases:
Domain-Specific Evaluation: Create evaluators tailored to your industry or use case
Custom Rubrics: Implement grading rubrics with specific criteria
Multi-Aspect Scoring: Evaluate multiple dimensions (e.g., tone, accuracy, helpfulness)
Classification Tasks: Categorize responses into predefined labels
Compliance Checking: Verify responses meet specific guidelines or policies
Output Field Types:
string: Free-form text output, or categorical if
choicesis specifiedboolean: True/False classification
integer: Whole number scores (e.g., 1-5 rating scale)
number: Floating-point scores (e.g., 0.0-1.0 confidence)
Parameters
prompt_template
str
✗
None
The evaluation prompt with {{ placeholder }} markers for dynamic content. Placeholders are filled from the inputs dict passed to the score() method.
output_fields
Dict[str, OutputField]
✗
None
Schema defining the expected outputs. Each field has: type: One of 'string', 'boolean', 'integer', 'number'; choices (optional): List of allowed values for categorical string fields; description (optional): Instructions for the LLM about this field
model
str
✗
None
LLM Gateway model name in {provider}/{model} format. E.g., openai/gpt-4o, anthropic/claude-3-sonnet
credential
str, optional
✗
None
Name of the LLM Gateway credential for the provider.
Returns
A list of Score objects, one for each output field defined. : Each Score contains:
name: The output field name (e.g., "sentiment", "confidence")
value: The numeric value (for number/integer/boolean fields)
label: The string label (for string/categorical fields)
reasoning: Always None for CustomJudge (reasoning is returned as a field)
Return type: list[Score]
Example
Basic sentiment analysis with categorical output:
Example
Multi-criteria response quality evaluation:
Example
Code review evaluator:
Placeholder names in
{{ }}must exactly match keys in theinputsdictThe LLM is instructed to return JSON matching your output schema
For best results, include clear descriptions for each output field
Use
choicesfor categorical fields to ensure consistent outputsThis evaluator requires an active connection to the Fiddler API
name = 'custom_judge'
score()
Score using the Custom Judge.
Parameters
inputs
Dict[str, Any]
✗
None
Values for the {{ placeholders }} in your prompt_template. Keys must match placeholder names exactly.
Returns
A list of Score objects, one for each output field defined.
Return type: list[Score]
Raises
ValueError -- If inputs is empty.
Last updated
Was this helpful?