Guardrails - Fiddler Documentation

Overview

Fiddler Guardrails provide real-time protection for GenAI applications—including LLM-powered systems and agentic AI workflows—by detecting and preventing harmful content, PII leaks, and hallucinations before they reach your users. Built on Fiddler Centor Models—Fiddler’s proprietary small language models (SLMs)—Guardrails deliver enterprise-grade security with low-latency, high-throughput performance optimized for production environments. Use Fiddler Guardrails to:

Detect and block harmful or inappropriate content across 11 safety dimensions
Prevent personally identifiable information (PII) leaks in user inputs and model outputs
Identify hallucinations in retrieval-augmented generation (RAG) applications
Protect against prompt injection and jailbreaking attempts

What Fiddler Guardrails Can Moderate

Fiddler Guardrails are powered by Fiddler Centor Models, and you can apply them to moderate or block three categories of risk:

Safety - Detect harmful, toxic, or jailbreaking content
Hallucination (faithfulness) - Identify hallucinations in RAG applications
PII/PHI - Detect and redact sensitive information

Guardrails are designed for real-time content blocking with more sensitive thresholds than enrichments used for monitoring and analytics. See the Enrichments guide for batch processing and monitoring use cases.

Getting Started with Fiddler Guardrails

Prerequisites

Fiddler Environment - Access to a Fiddler environment with Guardrails enabled
API Key - Generate your API key from Settings → Credentials
HTTP Client - Python 3.8+ with requests library, cURL, or any HTTP client

Guardrails can be invoked directly via REST API from any programming language. The examples below demonstrate usage with cURL and Python.

Safety

For safety moderation, Fiddler Guardrails use the Centor Model for Safety, which evaluates the safety of text along eleven different dimensions: illegal, hateful, harassing, racist, sexist, violent, sexual, harmful, unethical, jailbreaking, roleplaying. This model requires a single string input for evaluation and outputs 11 distinct scores (floats between 0 and 1). Starting in release 26.13, scores are calibrated so that 0.5 is the default decision threshold across all 11 dimensions — a score of 0.5 or above indicates unsafe content. Lower thresholds increase sensitivity but may over-block; tune the threshold for your data and risk tolerance.

Threshold Guidance: Starting in release 26.13, the Centor Model for Safety is calibrated so a single decision threshold of 0.5 applies across all 11 dimensions — no per-dimension tuning required. Lower the threshold to increase sensitivity (at the cost of more false positives), or raise it to reduce false positives. For monitoring use cases with enrichments, see Safety Enrichment for monitoring thresholds.

Migrating from earlier releases: If you previously used a threshold of 0.1 (calibrated for the older, uncalibrated model), adopt 0.5 for the 26.13 calibrated model — keeping 0.1 will over-block benign content. Choosing a stricter (lower) threshold remains a deliberate option for high-sensitivity use cases.

Safety Example Code

cURL
Python

curl --location 'https://{fiddler_endpoint}/v3/guardrails/ftl-safety' 
--header 'Content-Type: application/json' 
--header 'Authorization: Bearer {token}' 
--data '{
    "data": {
        "input": "I am a dangerous person who will be wreaking havoc upon the world!!!"
    }
}'

import requests
import json

token = "YOUR_FIDDLER_TOKEN_HERE"
url = "FIDDLER_ENDPOINT_HERE"

payload = json.dumps(
    {
        "data": {
            "input": "I am a dangerous person who will be wreaking havoc upon the world!!!"
        }
    }
)
headers = {"Content-Type": "application/json", "Authorization": f"Bearer {token}"}

response = requests.request(
    "POST", f"{url}/v3/guardrails/ftl-safety", headers=headers, data=payload
)

print(response.text)

Sample Response

{
  "fdl_harmful": 0.119,
  "fdl_violent": 0.073,
  "fdl_unethical": 0.043,
  "fdl_illegal": 0.016,
  "fdl_sexual": 0.005,
  "fdl_racist": 0.003,
  "fdl_jailbreaking": 0.002,
  "fdl_harassing": 0.001,
  "fdl_hateful": 0.001,
  "fdl_sexist": 0.001,
  "fdl_roleplaying": 0.051
}

Interpreting Safety Scores: Each dimension returns a score between 0 and 1:

Closer to 0 - Safe content
Closer to 1 - Unsafe content
0.5 or above - Meets or exceeds the calibrated default decision threshold (tunable for your use case)

Hallucination (faithfulness)

For hallucination moderation, Fiddler Guardrails use the Centor Model for Faithfulness, which evaluates the accuracy and reliability of facts presented in AI-generated text responses by comparing them to provided context documents. This model uses response and context inputs.

Not to be confused with RAG Faithfulness. For real-time blocking, Fiddler Guardrails use the Centor Model for Faithfulness (ftl_response_faithfulness). RAG Faithfulness is a separate LLM-as-a-Judge evaluator available in Agentic Monitoring and Experiments for diagnostic evaluation. See RAG Health Diagnostics for details.

This model requires a response string and contextual documents as input. The model outputs a single faithfulness score (float between 0 and 1). Set a threshold of < 0.5 for detection (any value less than 0.5 indicates unfaithful content).

Threshold Guidance: A score closer to 0 means unfaithful (the LLM hallucinated relative to the provided context), while a score closer to 1 means faithful (the LLM output did not hallucinate and is well-grounded in the provided context). For real-time guardrails, a threshold of 0.5 strikes a balance between sensitivity and accuracy.

Faithfulness Example Code

cURL
Python

curl --location 'https://{fiddler_endpoint}/v3/guardrails/ftl-response-faithfulness' 
--header 'Content-Type: application/json' 
--header 'Authorization: Bearer {token}' 
--data '{
    "data": {
      "response": "The Yorkshire Terrier and the Cavalier King Charles Spaniel are both small breeds of companion dogs.",
      "context": "The Yorkshire Terrier is a small dog breed of terrier type, developed during the 19th century in Yorkshire, England, to catch rats in clothing mills.The Cavalier King Charles Spaniel is a small spaniel classed as a toy dog by The Kennel Club and the American Kennel Club"
  }
}'

import requests
import json

token = "YOUR_FIDDLER_TOKEN_HERE"
url = "FIDDLER_ENDPOINT_HERE"

payload = json.dumps(
    {
        "data": {
            "response": "The Yorkshire Terrier and the Cavalier King Charles Spaniel are both small breeds of companion dogs.",
            "context": "The Yorkshire Terrier is a small dog breed of terrier type, developed during the 19th century in Yorkshire, England, to catch rats in clothing mills.The Cavalier King Charles Spaniel is a small spaniel classed as a toy dog by The Kennel Club and the American Kennel Club",
        }
    }
)
headers = {"Content-Type": "application/json", "Authorization": f"Bearer {token}"}

response = requests.request(
    "POST",
    f"{url}/v3/guardrails/ftl-response-faithfulness",
    headers=headers,
    data=payload,
)

print(response.text)

Sample Response

{
  "fdl_faithful_score": 0.194
}

Interpreting Faithfulness Scores:

0.0 - 0.49 - Unfaithful (likely hallucination - block or flag for review)
0.5 - 1.0 - Faithful (response is well-supported by the provided context)

The example above shows a score of 0.194, which is below the 0.5 threshold, indicating the response may contain hallucinated information not supported by the context.

PII/PHI

For PII/PHI moderation, Fiddler Guardrails use the Centor Model for PII/PHI, which detects, flags, and redacts PII leakage in both user inputs and model responses. PII/PHI moderation supports a comprehensive set of label types, including:

person, address, email, email address, credit card number, credit card expiration date, cvv, cvc, bank account number, iban, social security number, date of birth, ip address, phone number, mobile phone number, landline phone number, passport number, driver's license number, tax identification number, cpf, cnpj, account number, license plate number, fax number, website, digital signature, postal code

. See the PII & PHI Tutorial for the full entity list.

The Centor Model for PII/PHI supports a different entity set than the PII Enrichment (which uses Presidio). For monitoring and batch processing, see the PII Enrichment documentation.

PHI Detection also supported. Fiddler Guardrails also detect Protected Health Information (PHI) for HIPAA compliance, including:

medication, medical condition, health insurance number, health insurance id number, national health insurance number, birth certificate number, serial number

. Pass "entity_categories": "PHI" in your request body. See the PII & PHI Tutorial for full entity lists and example code.

This model accepts a single text string and returns all detected PII spans with their labels, confidence scores, and character offsets.

PII/PHI Example Code

cURL
Python

curl --location 'https://{fiddler_endpoint}/v3/guardrails/sensitive-information' 
--header 'Content-Type: application/json' 
--header 'Authorization: Bearer {token}' 
--data '{
    "data": {
        "input": "Some of my colleagues share their contact info as well. Jane Smith's email is jane.smith@company.com, and her office is located at 432 Oak Avenue, Suite 210, Chicago, IL 60611. You can call her mobile at 312-555-7890."
    }
}'

import requests
import json

token = "YOUR_FIDDLER_TOKEN_HERE"
url = "FIDDLER_ENDPOINT_HERE"

payload = json.dumps(
    {
        "data": {
            "input": "Some of my colleagues share their contact info as well. Jane Smith's email is jane.smith@company.com, and her office is located at 432 Oak Avenue, Suite 210, Chicago, IL 60611. You can call her mobile at 312-555-7890."
        }
    }
)
headers = {"Content-Type": "application/json", "Authorization": f"Bearer {token}"}

response = requests.request(
    "POST", f"{url}/v3/guardrails/sensitive-information", headers=headers, data=payload
)

print(response.text)

Sample Response

{
  "fdl_sensitive_information_scores": [
    {
      "score": 0.987,
      "label": "email",
      "start": 78,
      "end": 100,
      "text": "jane.smith@company.com"
    },
    {
      "score": 0.945,
      "label": "address",
      "start": 131,
      "end": 175,
      "text": "432 Oak Avenue, Suite 210, Chicago, IL 60611"
    },
    {
      "score": 0.987,
      "label": "mobile phone number",
      "start": 204,
      "end": 216,
      "text": "312-555-7890"
    }
  ]
}

Response Fields:

score - Confidence score (0.0 to 1.0)
label - Entity type (e.g., “email”, “social security number”)
text - The detected sensitive information
start / end - Character positions in the input text

Summary

Fiddler Guardrails provide real-time protection for GenAI applications, powered by Fiddler Centor Models, across three categories of risk:

Safety - Detect harmful content across 11 safety dimensions with a calibrated default decision threshold of 0.5 (tunable)
Hallucination (faithfulness) - Identify hallucinations in RAG applications with a recommended threshold of < 0.5
PII/PHI - Detect and redact PII and PHI across a comprehensive set of entity types

All guardrails use Fiddler Centor Models—Fiddler’s proprietary small language models—optimized for sub-second latency in production environments.

Next Steps

Quick Start - Get started with Fiddler Guardrails in 15 minutes
API Reference - Complete Guardrails API documentation
Tutorials - Explore detailed tutorials for Safety, PII, and Faithfulness
Concepts - Understand Fiddler Centor Models and enrichments
Monitoring - Integrate guardrails with LLM monitoring

​Overview

​What Fiddler Guardrails Can Moderate

​Getting Started with Fiddler Guardrails

​Prerequisites

​Safety

​Safety Example Code

​Hallucination (faithfulness)

​Faithfulness Example Code

​PII/PHI

​PII/PHI Example Code

​Summary

​Next Steps

Overview

What Fiddler Guardrails Can Moderate

Getting Started with Fiddler Guardrails

Prerequisites

Safety

Safety Example Code

Hallucination (faithfulness)

Faithfulness Example Code

PII/PHI

PII/PHI Example Code

Summary

Next Steps