CustomJudge
CustomJudge
Create a fully customizable LLM-as-a-Judge evaluator with your own prompt and output schema.
The CustomJudge evaluator allows you to define arbitrary evaluation criteria by specifying a custom prompt template and structured output fields. This is the most flexible evaluator in the Fiddler Evals SDK, enabling you to build domain-specific evaluation logic without writing custom code.
Key Features:
Custom Prompts: Define your own evaluation prompt with
{{ placeholder }}syntaxStructured Outputs: Specify typed output fields (string, boolean, integer, number)
Categorical Choices: Constrain string outputs to predefined categories
Multi-Field Outputs: Return multiple scores/labels from a single evaluation
Field Descriptions: Guide the LLM with descriptions for each output field
Use Cases:
Domain-Specific Evaluation: Create evaluators tailored to your industry or use case
Custom Rubrics: Implement grading rubrics with specific criteria
Multi-Aspect Scoring: Evaluate multiple dimensions (e.g., tone, accuracy, helpfulness)
Classification Tasks: Categorize responses into predefined labels
Compliance Checking: Verify responses meet specific guidelines or policies
Output Field Types:
string: Free-form text output, or categorical if
choicesis specifiedboolean: True/False classification
integer: Whole number scores (e.g., 1-5 rating scale)
number: Floating-point scores (e.g., 0.0-1.0 confidence)
Parameters
prompt_template
str
✗
None
The evaluation prompt with {{ placeholder }} markers for dynamic content. Placeholders are filled from the inputs dict passed to the score() method.
output_fields
Dict[str, OutputField]
✗
None
Schema defining the expected outputs. Each field has: type: One of 'string', 'boolean', 'integer', 'number'; choices (optional): List of allowed values for categorical string fields; description (optional): Instructions for the LLM about this field
model
str
✗
None
LLM Gateway model name in {provider}/{model} format. E.g., openai/gpt-4o, anthropic/claude-3-sonnet
credential
str, optional
✗
None
Name of the LLM Gateway credential for the provider.
Returns
A list of Score objects, one for each output field defined. : Each Score contains:
name: The output field name (e.g., "sentiment", "confidence")
value: The numeric value (for number/integer/boolean fields)
label: The string label (for string/categorical fields)
reasoning: Always None for CustomJudge (reasoning is returned as a field)
Return type: list[Score]
Example
Basic sentiment analysis with categorical output:
Example
Multi-criteria response quality evaluation:
Example
Code review evaluator:
Last updated
Was this helpful?