Reusable prompt specification for CustomJudge evaluators.
Provides a structured, validated way to define evaluation prompts with
input/output schemas, transforms, and multi-message templates. A
CustomJudgeSpec can be defined once and reused across multiple evaluator
instances or shared across a codebase.
Parameters
prompt_template
str | list[Message]
required
The evaluation prompt. Can be a plain string (wrapped
in a single user message) or a list of Message dicts.
output_fields
Dict[str, OutputField]
required
Schema defining the expected output fields.
inputs
Dict[str, InputFieldSpec] | None
default:"None"
Optional metadata for template variables.
llm_response_fields
Dict[str, OutputField] | None
default:"None"
Optional schema for the LLM response before
transformation. Required when output fields use transform.
Example
Defining a reusable faithfulness evaluator spec:
from fiddler_evals.evaluators.custom_judge import (
CustomJudge, CustomJudgeSpec, Message, InputFieldSpec,
OutputFieldTransform,
)
FAITHFULNESS_SPEC = CustomJudgeSpec(
prompt_template=[
Message(role='system', content='Judge faithfulness.'),
Message(role='user', content=(
'Question: {{ query }}\n'
'Documents: {{ docs }}\n'
'Answer: {{ answer }}'
)),
],
inputs={
'query': InputFieldSpec(required=True),
'docs': InputFieldSpec(required=True),
'answer': InputFieldSpec(required=True),
},
llm_response_fields={
'is_faithful': {
'type': 'string',
'choices': ['faithful', 'not_faithful'],
},
},
output_fields={
'label': {
'type': 'string',
'choices': ['yes', 'no'],
'transform': OutputFieldTransform(
source_field='is_faithful',
value_map={
'faithful': 'yes',
'not_faithful': 'no',
},
),
},
},
)
evaluator = CustomJudge(
prompt_spec=FAITHFULNESS_SPEC,
model='openai/gpt-4o',
)
prompt_template
output_fields
llm_response_fields
model_config
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].