Overview
Fiddler Guardrails provide real-time protection for GenAI applications—including LLM-powered systems and agentic AI workflows—by detecting and preventing harmful content, PII leaks, and hallucinations before they reach your users. Built on Fiddler Centor Models—Fiddler’s proprietary small language models (SLMs)—Guardrails deliver enterprise-grade security with low-latency, high-throughput performance optimized for production environments. Use Fiddler Guardrails to:- Detect and block harmful or inappropriate content across 11 safety dimensions
- Prevent personally identifiable information (PII) leaks in user inputs and model outputs
- Identify hallucinations in retrieval-augmented generation (RAG) applications
- Protect against prompt injection and jailbreaking attempts
Available Guardrail Types
Fiddler offers three specialized guardrail types, each powered by Fiddler Centor Models:- Centor Safety Guardrails - Detect harmful, toxic, or jailbreaking content
- Centor Faithfulness Guardrails - Identify hallucinations in RAG applications
- Centor PII Guardrails - Detect and redact sensitive information
Guardrails are designed for real-time content blocking with more sensitive thresholds than enrichments used for monitoring and analytics. See the Enrichments guide for batch processing and monitoring use cases.
Getting Started with Fiddler Guardrails
Prerequisites
- Fiddler Guardrails Account - Sign up for Free Guardrails or use your enterprise Fiddler account
- API Key - Generate your API key from Settings → Credentials
- HTTP Client - Python 3.8+ with
requestslibrary, cURL, or any HTTP client
Centor Safety Guardrails
The Centor Safety model evaluates the safety of text along eleven different dimensions:illegal, hateful, harassing, racist, sexist, violent, sexual, harmful, unethical, jailbreaking, roleplaying.
This model requires a single string input for evaluation and outputs 11 distinct scores (floats between 0 and 1). Set a threshold value > 0.1 for detection (any value > 0.1 indicates unsafe content).
Threshold Guidance: For real-time guardrails, a threshold of 0.1 provides sufficient sensitivity for blocking potentially harmful content. For monitoring use cases with enrichments, higher thresholds (0.7+) reduce false positives. See Centor Safety Enrichment for monitoring thresholds.
Centor Safety Guardrails Example Code
- cURL
- Python
- Closer to 0 - Safe content
- Closer to 1 - Unsafe content
- > 0.1 - Exceeds recommended threshold for real-time blocking
Centor Faithfulness Guardrails
The Centor Faithfulness model evaluates the accuracy and reliability of facts presented in AI-generated text responses by comparing them to provided context documents. This uses Fiddler’s proprietary Centor Model withresponse and context inputs.
Not to be confused with RAG Faithfulness. Centor Faithfulness Guardrails use Fiddler’s proprietary Centor Model (
ftl_response_faithfulness) optimized for real-time blocking. RAG Faithfulness is a separate LLM-as-a-Judge evaluator available in Agentic Monitoring and Experiments for diagnostic evaluation. See RAG Health Diagnostics for details.Threshold Guidance: A score closer to 0 means unfaithful (the LLM hallucinated relative to the provided context), while a score closer to 1 means faithful (the LLM output did not hallucinate and is well-grounded in the provided context). For real-time guardrails, a threshold of 0.5 strikes a balance between sensitivity and accuracy.
Centor Faithfulness Guardrails Example Code
- cURL
- Python
- 0.0 - 0.49 - Unfaithful (likely hallucination - block or flag for review)
- 0.5 - 1.0 - Faithful (response is well-supported by the provided context)
Centor PII Guardrails
The Centor PII model detects, flags, and redacts PII leakage in both user inputs and model responses. Centor PII Guardrails support a comprehensive set of label types, including:person, address, email, email address, credit card number, credit card expiration date, cvv, cvc, bank account number, iban, social security number, date of birth, ip address, phone number, mobile phone number, landline phone number, passport number, driver's license number, tax identification number, cpf, cnpj, account number, license plate number, fax number, website, digital signature, postal code. See the PII & PHI Tutorial for the full entity list.
Centor PII Guardrails use Fiddler’s proprietary Centor Models and support a different entity set than the PII Enrichment (which uses Presidio). For monitoring and batch processing, see the PII Enrichment documentation.
PHI Detection also supported. Centor PII Guardrails also detect Protected Health Information (PHI) for HIPAA compliance, including:
medication, medical condition, health insurance number, health insurance id number, national health insurance number, birth certificate number, serial number. Pass "entity_categories": "PHI" in your request body. See the PII & PHI Tutorial for full entity lists and example code.Centor PII Guardrails Example Code
- cURL
- Python
score- Confidence score (0.0 to 1.0)label- Entity type (e.g., “email”, “social security number”)text- The detected sensitive informationstart/end- Character positions in the input text
Summary
Fiddler Guardrails provide real-time protection for GenAI applications through three specialized guardrail types powered by Fiddler Centor Models:- Centor Safety Guardrails - Detect harmful content across 11 safety dimensions with a recommended threshold of > 0.1
- Centor Faithfulness Guardrails - Identify hallucinations in RAG applications with a recommended threshold of < 0.5
- Centor PII Guardrails - Detect and redact PII and PHI across a comprehensive set of entity types
Next Steps
- Quick Start - Get started with Fiddler Guardrails in 15 minutes
- API Reference - Complete Guardrails API documentation
- Tutorials - Explore detailed tutorials for Safety, PII, and Faithfulness
- Concepts - Understand Fiddler Centor Models and enrichments
- Monitoring - Integrate guardrails with LLM monitoring