Evaluator
API reference for Evaluator
Evaluator
Evaluator
Abstract base class for creating custom evaluators in Fiddler Evals.
The Evaluator class provides a flexible framework for creating builtin and custom evaluators that can assess LLM outputs against various criteria. Each evaluator is responsible for a single, specific evaluation task (e.g., hallucination detection, answer relevance, exact match, etc.).
Parameter Mapping: : Evaluators can define their own parameter mappings using score_fn_kwargs_mapping in the constructor. These mappings specify how data from the evaluation context (inputs, outputs, expected_outputs) should be passed to the evaluator’s score method. Mapping Priority (highest to lowest):
Evaluator-level score_fn_kwargs_mapping (set in constructor)
Evaluation-level kwargs_mapping (passed to evaluate function)
Default parameter resolution
This allows evaluators to define sensible defaults while still permitting customization at the evaluation level.
Creating Custom Evaluators: : To create a custom evaluator, inherit from this class and implement the score method with parameters specific to your evaluation needs: Example - Custom evaluator with parameter mapping: class ExactMatchEvaluator(Evaluator):
def __init__(self, output_key: str = “answer”, score_name_prefix: str = None): : super()._init_( : score_name_prefix=score_name_prefix, score_fn_kwargs_mapping={“output”: output_key} )
def score(self, output: str, expected_output: str) -> Score: : is_match = output.strip().lower() == expected_output.strip().lower() return Score( > name=f”{self.score_name_prefix}exact_match”, > value=1.0 if is_match else 0.0, > reasoning=f”Match: {is_match}” )
Parameters
score_name_prefix
str | None
✗
None
Optional prefix to prepend to score names. Useful for distinguishing scores when using multiple instances of the same evaluator on different fields or with different configurations.
score_fn_kwargs_mapping
ScoreFnKwargsMappingType | None
✗
None
Optional mapping for parameter transformation. Maps parameter names to either string keys or transformation functions. This mapping takes precedence over evaluation-level mappings when running the evaluate method.
Initialize the evaluator with parameter mapping configuration.
Parameters
score_name_prefix
str | None
✗
None
Optional prefix to prepend to score names. Useful for distinguishing scores when using multiple instances of the same evaluator on different fields or with different configurations.
score_fn_kwargs_mapping
Dict[str, str | Callable[[Dict[str, Any]], Any]] | None
✗
None
Optional mapping for parameter transformation. Maps parameter names to either string keys or transformation functions. This mapping takes precedence over evaluation-level mappings when running the evaluate method.
Example
Evaluator
Evaluator
Abstract base class for creating custom evaluators in Fiddler Evals.
The Evaluator class provides a flexible framework for creating builtin and custom evaluators that can assess LLM outputs against various criteria. Each evaluator is responsible for a single, specific evaluation task (e.g., hallucination detection, answer relevance, exact match, etc.).
Parameter Mapping: : Evaluators can define their own parameter mappings using score_fn_kwargs_mapping in the constructor. These mappings specify how data from the evaluation context (inputs, outputs, expected_outputs) should be passed to the evaluator’s score method. Mapping Priority (highest to lowest):
Evaluator-level score_fn_kwargs_mapping (set in constructor)
Evaluation-level kwargs_mapping (passed to evaluate function)
Default parameter resolution
This allows evaluators to define sensible defaults while still permitting customization at the evaluation level.
Creating Custom Evaluators: : To create a custom evaluator, inherit from this class and implement the score method with parameters specific to your evaluation needs: Example - Custom evaluator with parameter mapping: class ExactMatchEvaluator(Evaluator):
def __init__(self, output_key: str = “answer”, score_name_prefix: str = None): : super()._init_( : score_name_prefix=score_name_prefix, score_fn_kwargs_mapping={“output”: output_key} )
def score(self, output: str, expected_output: str) -> Score: : is_match = output.strip().lower() == expected_output.strip().lower() return Score( > name=f”{self.score_name_prefix}exact_match”, > value=1.0 if is_match else 0.0, > reasoning=f”Match: {is_match}” )
Parameters
score_name_prefix
str | None
✗
None
Optional prefix to prepend to score names. Useful for distinguishing scores when using multiple instances of the same evaluator on different fields or with different configurations.
score_fn_kwargs_mapping
ScoreFnKwargsMappingType | None
✗
None
Optional mapping for parameter transformation. Maps parameter names to either string keys or transformation functions. This mapping takes precedence over evaluation-level mappings when running the evaluate method.
Initialize the evaluator with parameter mapping configuration.
Parameters
score_name_prefix
str | None
✗
None
Optional prefix to prepend to score names. Useful for distinguishing scores when using multiple instances of the same evaluator on different fields or with different configurations.
score_fn_kwargs_mapping
Dict[str, str | Callable[[Dict[str, Any]], Any]] | None
✗
None
Optional mapping for parameter transformation. Maps parameter names to either string keys or transformation functions. This mapping takes precedence over evaluation-level mappings when running the evaluate method.
Example
>>> # Simple string mapping
>>> evaluator = MyEvaluator(score_fn_kwargs_mapping={"output": "answer"})
>>>
>>> # Complex transformation function
>>> evaluator = MyEvaluator(score_fn_kwargs_mapping={
... "question": lambda x: x["inputs"]["question"],
... "response": "answer"
... })
>>>
>>> # Using score name prefix for multiple instances
>>> evaluator1 = RegexSearch(r"\d+", score_name_prefix="question")
>>> evaluator2 = RegexSearch(r"\d+", score_name_prefix="answer")
>>> # Results in scores named "question_has_number" and "answer_has_number"
#### Raises
**ScoreFunctionInvalidArgs** – If the mapping contains invalid parameter names
that don’t match the evaluator’s score method signature.
**Return type:** None
#### *property* name *: str*
#### *abstractmethod* score(\*args, \*\*kwargs)
Evaluate inputs and return a score or list of scores.
This method must be implemented by all concrete evaluator classes.
Each evaluator can define its own parameter signature based on what
it needs for evaluation.
Common parameter patterns:
- Output-only: score(self, output: str) -> Score
- Input-Output: score(self, input: str, output: str) -> Score
- Comparison: score(self, output: str, expected_output: str) -> Score
- All parameters: score(self, input: str, output: str, context: list[str]) -> Score
#### Parameters
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `*args` | `Any` | ✗ | `None` | Positional arguments specific to the evaluator’s needs. |
#### Returns
A single Score object or list of Score objects
: representing the evaluation results. Each Score should include:
- name: The score name (e.g., “has_zipcode”)
- evaluator_name: The evaluator name (e.g., “RegexMatch”)
- value: The score value (typically 0.0 to 1.0)
- status: SUCCESS, FAILED, or SKIPPED
- reasoning: Optional explanation of the score
- error: Optional error information if evaluation failed
**Return type:** Score | list[Score]
#### Raises
* **ValueError** – If required parameters are missing or invalid.
* **TypeError** – If parameters have incorrect types.
* **Exception** – Any other evaluation-specific errors.Last updated
Was this helpful?