- Custom Prompts: Define your own evaluation prompt with
{{ placeholder }}syntax - Structured Outputs: Specify typed output fields (string, boolean, integer, number)
- Categorical Choices: Constrain string outputs to predefined categories
- Multi-Field Outputs: Return multiple scores/labels from a single evaluation
- Field Descriptions: Guide the LLM with descriptions for each output field
- Numeric Constraints: Set minimum/maximum bounds on numeric output fields
- Multi-Message Prompts: Use structured message lists with system/user/assistant roles
- Input Metadata: Define input field requirements and documentation
- Output Transforms: Map LLM response fields to final output fields with value mapping
- Intermediate Response Schema: Define a separate LLM response schema with transforms
- CustomJudgeSpec Object: Bundle prompt, inputs, and outputs into a reusable
CustomJudgeSpec
- Domain-Specific Evaluation: Create evaluators tailored to your industry or use case
- Custom Rubrics: Implement grading rubrics with specific criteria
- Multi-Aspect Scoring: Evaluate multiple dimensions (e.g., tone, accuracy, helpfulness)
- Classification Tasks: Categorize responses into predefined labels
- Compliance Checking: Verify responses meet specific guidelines or policies
- string: Free-form text output, or categorical if
choicesis specified - boolean: True/False classification
- integer: Whole number scores (e.g., 1-5 rating scale)
- number: Floating-point scores (e.g., 0.0-1.0 confidence)
Parameters
The evaluation prompt. Can be
either a plain string with
{{ placeholder }} markers (wrapped in a single
user message automatically) or a list of Message dicts for multi-message
prompts. Required unless prompt_spec is provided.Schema defining the expected
outputs. Required unless
prompt_spec is provided. Each field has:type: One of ‘string’, ‘boolean’, ‘integer’, ‘number’choices(optional): List of allowed values for categorical string fieldsdescription(optional): Instructions for the LLM about this fieldtitle(optional): Human-readable title for the fieldtransform(optional): Transform from LLM response field to output fielddefault(optional): Default value if field is missing from LLM responseminimum(optional): Minimum allowed value for numeric fieldsmaximum(optional): Maximum allowed value for numeric fields
A
CustomJudgeSpec object bundling
prompt_template, output_fields, inputs, and llm_response_fields into a
single reusable specification. Mutually exclusive with providing
prompt_template and output_fields directly.LLM Gateway model name in
{provider}/{model} format.
E.g., openai/gpt-4o, anthropic/claude-3-sonnetName of the LLM Gateway credential for the provider.
Metadata for template variables.
Keys must match
{{ placeholder }} names in the prompt template. Each value
can specify:title(optional): Human-readable titledescription(optional): Description of the inputrequired(optional): Whether this input must be provided (default: False)
Schema for the LLM
response before transformation. When provided, the LLM is instructed to
return fields matching this schema, and
output_fields with transform
specs define how to map the response to final outputs. Required when any
output field uses a transform.Returns
A list of Score objects, one for each output field defined.
Each Score contains:
- name: The output field name (e.g., “sentiment”, “confidence”)
- value: The numeric value (for number/integer/boolean fields)
- label: The string label (for string/categorical fields)
- reasoning: Always None for CustomJudge (reasoning is returned as a field)
Example
Basic sentiment analysis with categorical output:Example
Multi-criteria response quality evaluation:Example
Code review evaluator:Example
Using llm_response_fields with transforms for value mapping:Example
Using a reusable CustomJudgeSpec object:- Placeholder names in
{{ }}must exactly match keys in theinputsdict - The LLM is instructed to return JSON matching your output schema
- For best results, include clear descriptions for each output field
- Use
choicesfor categorical fields to ensure consistent outputs - Use
minimum/maximumfor numeric fields to constrain values - Use
CustomJudgeSpecto bundle prompt configuration into a reusable object - This evaluator requires an active connection to the Fiddler API
name = ‘custom_judge’
score()
Score using the Custom Judge.Parameters
Values for the {{ placeholders }} in your prompt_template.
Keys must match placeholder names exactly.
Returns
A list of Score objects, one for each output field defined.