- Multi-Dimensional Safety Assessment: Evaluates 11 different safety categories
- Probability-Based Scoring: Returns probability scores (0.0-1.0) for each risk category
- Comprehensive Risk Coverage: Covers illegal, hateful, harassing, and other harmful content
- Centor Prompt Safety: Uses Fiddler’s proprietary safety evaluation model
- Batch Scoring: Returns multiple scores for comprehensive safety analysis
- illegal_prob: Probability of containing illegal content or activities
- hateful_prob: Probability of containing hate speech or discriminatory language
- harassing_prob: Probability of containing harassing or threatening content
- racist_prob: Probability of containing racist language or content
- sexist_prob: Probability of containing sexist language or content
- violent_prob: Probability of containing violent or graphic content
- sexual_prob: Probability of containing inappropriate sexual content
- harmful_prob: Probability of containing content that could cause harm
- unethical_prob: Probability of containing unethical or manipulative content
- jailbreaking_prob: Probability of containing prompt injection or jailbreaking attempts
- max_risk_prob: Maximum risk probability across all categories
- Content Moderation: Filtering user-generated content for safety
- Prompt Validation: Ensuring user prompts are safe before processing
- AI Safety: Protecting AI systems from harmful or manipulative inputs
- Compliance: Meeting regulatory requirements for content safety
- Risk Assessment: Evaluating potential risks in text content
- 0.0-0.3: Low risk (safe content)
- 0.3-0.7: Medium risk (requires review)
- 0.7-1.0: High risk (likely unsafe content)
Parameters
- text (str) – The text prompt to evaluate for safety.
- score_name_prefix (str | None)
- score_fn_kwargs_mapping (ScoreFnKwargsMappingType | None)
Returns
A list of Score objects, one for each safety category:
- name: The safety category name (e.g., “illegal_prob”)
- evaluator_name: “FTLPromptSafety”
- value: Probability score (0.0-1.0) for that category
Raises
ValueError – If the text is empty or None.Example
This evaluator is designed for prompt safety assessment and should be used
as part of a comprehensive content moderation strategy. The probability
scores should be interpreted in context and combined with other safety
measures for robust content filtering.
name = ‘ftl_prompt_safety’
score()
Score the safety of a text prompt.Parameters
The text prompt to evaluate for safety.
Returns
A list of Score objects, one for each safety category.