Mapping Priority (highest to lowest):
- Evaluator-level score_fn_kwargs_mapping (set in constructor)
- Evaluation-level kwargs_mapping (passed to evaluate function)
- Default parameter resolution
This allows evaluators to define sensible defaults while still permitting customization at the evaluation level. Creating Custom Evaluators: To create a custom evaluator, inherit from this class and implement the score method with parameters specific to your evaluation needs:
Example - Custom evaluator with parameter mapping: class ExactMatchEvaluator(Evaluator):
def init(self, output_key: str = “answer”, score_name_prefix: str = None): : super().init( : score_name_prefix=score_name_prefix, score_fn_kwargs_mapping={“output”: output_key}
)
def score(self, output: str, expected_output: str) -> Score: : is_match = output.strip().lower() == expected_output.strip().lower() return Score(
name=f”{self.score_name_prefix}exact_match”, value=1.0 if is_match else 0.0, reasoning=f”Match: {is_match}”
)
Parameters
Optional prefix to prepend to score names. Useful for
distinguishing scores when using multiple instances of the same evaluator
on different fields or with different configurations.
Optional mapping for parameter transformation.
Maps parameter names to either string keys or transformation functions.
This mapping takes precedence over evaluation-level mappings when running
the evaluate method.
The score method signature is intentionally flexible using *args and **kwargs
to allow each evaluator to define its own parameter requirements. This design
enables maximum flexibility while maintaining a consistent interface across
all evaluators in the framework.
Parameters
Optional prefix to prepend to score names. Useful for
distinguishing scores when using multiple instances of the same evaluator
on different fields or with different configurations.
Optional mapping for parameter transformation.
Maps parameter names to either string keys or transformation functions.
This mapping takes precedence over evaluation-level mappings when running
the evaluate method.
Example
Raises
ScoreFunctionInvalidArgs – If the mapping contains invalid parameter names that don’t match the evaluator’s score method signature.property name
abstractmethod score()
Evaluate inputs and return a score or list of scores. This method must be implemented by all concrete evaluator classes. Each evaluator can define its own parameter signature based on what it needs for evaluation. Common parameter patterns:- Output-only: score(self, output: str) -> Score
- Input-Output: score(self, input: str, output: str) -> Score
- Comparison: score(self, output: str, expected_output: str) -> Score
- All parameters: score(self, input: str, output: str, context: list[str]) -> Score
Parameters
Positional arguments specific to the evaluator’s needs.
Returns
A single Score object or list of Score objects
representing the evaluation results. Each Score should include:
- name: The score name (e.g., “has_zipcode”)
- evaluator_name: The evaluator name (e.g., “RegexMatch”)
- value: The score value (typically 0.0 to 1.0)
- status: SUCCESS, FAILED, or SKIPPED
- reasoning: Optional explanation of the score
- error: Optional error information if evaluation failed
Raises
- ValueError – If required parameters are missing or invalid.
- TypeError – If parameters have incorrect types.
- Exception – Any other evaluation-specific errors.