CustomJudge

CustomJudge

Create a fully customizable LLM-as-a-Judge evaluator with your own prompt and output schema.

The CustomJudge evaluator allows you to define arbitrary evaluation criteria by specifying a custom prompt template and structured output fields. This is the most flexible evaluator in the Fiddler Evals SDK, enabling you to build domain-specific evaluation logic without writing custom code.

Key Features:

  • Custom Prompts: Define your own evaluation prompt with {{ placeholder }} syntax

  • Structured Outputs: Specify typed output fields (string, boolean, integer, number)

  • Categorical Choices: Constrain string outputs to predefined categories

  • Multi-Field Outputs: Return multiple scores/labels from a single evaluation

  • Field Descriptions: Guide the LLM with descriptions for each output field

Use Cases:

  • Domain-Specific Evaluation: Create evaluators tailored to your industry or use case

  • Custom Rubrics: Implement grading rubrics with specific criteria

  • Multi-Aspect Scoring: Evaluate multiple dimensions (e.g., tone, accuracy, helpfulness)

  • Classification Tasks: Categorize responses into predefined labels

  • Compliance Checking: Verify responses meet specific guidelines or policies

Output Field Types:

  • string: Free-form text output, or categorical if choices is specified

  • boolean: True/False classification

  • integer: Whole number scores (e.g., 1-5 rating scale)

  • number: Floating-point scores (e.g., 0.0-1.0 confidence)

Parameters

Parameter
Type
Required
Default
Description

prompt_template

str

None

The evaluation prompt with {{ placeholder }} markers for dynamic content. Placeholders are filled from the inputs dict passed to the score() method.

output_fields

Dict[str, OutputField]

None

Schema defining the expected outputs. Each field has: type: One of 'string', 'boolean', 'integer', 'number'; choices (optional): List of allowed values for categorical string fields; description (optional): Instructions for the LLM about this field

model

str

None

LLM Gateway model name in {provider}/{model} format. E.g., openai/gpt-4o, anthropic/claude-3-sonnet

credential

str, optional

None

Name of the LLM Gateway credential for the provider.

Returns

A list of Score objects, one for each output field defined. : Each Score contains:

  • name: The output field name (e.g., "sentiment", "confidence")

  • value: The numeric value (for number/integer/boolean fields)

  • label: The string label (for string/categorical fields)

  • reasoning: Always None for CustomJudge (reasoning is returned as a field)

Return type: list[Score]

Example

Basic sentiment analysis with categorical output:

Example

Multi-criteria response quality evaluation:

Example

Code review evaluator:

circle-info
  • Placeholder names in {{ }} must exactly match keys in the inputs dict

  • The LLM is instructed to return JSON matching your output schema

  • For best results, include clear descriptions for each output field

  • Use choices for categorical fields to ensure consistent outputs

  • This evaluator requires an active connection to the Fiddler API

name = 'custom_judge'

score()

Score using the Custom Judge.

Parameters

Parameter
Type
Required
Default
Description

inputs

Dict[str, Any]

None

Values for the {{ placeholders }} in your prompt_template. Keys must match placeholder names exactly.

Returns

A list of Score objects, one for each output field defined.

Return type: list[Score]

Raises

ValueError -- If inputs is empty.

Last updated

Was this helpful?