Guardrails Quick Start

Fiddler Guardrails provide real-time protection for your LLM applications by detecting and preventing harmful content, PII leaks, and hallucinations before they reach your users. Time to complete: ~15 minutes

What You’ll Learn

How to set up Fiddler Guardrails
How to use the four main guardrail types (Safety, PII, Secret Detection, Faithfulness)
How to interpret risk scores
How to integrate guardrails into your LLM application

Prerequisites

Fiddler Environment: Access to a Fiddler environment with Guardrails enabled
API Key: Generated from your Fiddler environment (Settings → Credentials)
Python 3.8+ (or any HTTP client)

Quick Start: Setting Up Guardrails

Step 1: Get Your API Key

Sign in to your organization’s Fiddler environment
Go to Settings → Credentials
Generate a Fiddler API key and copy it for use in the requests below

If you are not sure whether Guardrails is enabled for your environment, contact your Fiddler representative. For access setup, see the Guardrails Quick Start.

Step 2: Install Required Libraries (Optional)

# For Python
pip install requests

# Or use any HTTP client in your preferred language

Step 3: Configure Your Connection

import requests
import json

# Your API credentials
FIDDLER_URL = "https://your-instance.fiddler.ai"  # Replace with your Fiddler instance URL
API_KEY = "your-api-key-here"

# Standard headers for all guardrail requests
HEADERS = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

Guardrail Types and Usage

Each guardrail type has its own endpoint and request/response format. Choose the appropriate guardrail based on your protection needs.

🛡️ Safety Guardrails

Detect harmful, toxic, or jailbreaking content across 11 safety dimensions. Endpoint: /v3/guardrails/ftl-safety Use cases:

Content moderation
Jailbreak prevention
Toxic content detection

Example: Check for Harmful Content

def check_safety(text):
    """Check text for safety violations across 11 dimensions."""
    response = requests.post(
        f"{FIDDLER_URL}/v3/guardrails/ftl-safety",
        headers=HEADERS,
        json={"data": {"input": text}}
    )
    return response.json()

# Test with potentially harmful content
test_input = "I want to cause harm to others"
safety_scores = check_safety(test_input)

print(json.dumps(safety_scores, indent=2))

Response Format:

{
  "fdl_harmful": 0.92,
  "fdl_violent": 0.78,
  "fdl_unethical": 0.65,
  "fdl_illegal": 0.45,
  "fdl_sexual": 0.12,
  "fdl_racist": 0.08,
  "fdl_jailbreaking": 0.23,
  "fdl_harassing": 0.54,
  "fdl_hateful": 0.71,
  "fdl_sexist": 0.15
}

Safety Dimensions:

fdl_harmful - General harmful content
fdl_violent - Violence and threats
fdl_unethical - Unethical behavior
fdl_illegal - Illegal activities
fdl_sexual - Sexual content
fdl_racist - Racist content
fdl_jailbreaking - Prompt manipulation attempts
fdl_harassing - Harassment
fdl_hateful - Hateful content
fdl_sexist - Sexist content
fdl_roleplaying - Prompting persona change

Interpreting Safety Scores

Starting in release 26.13, 0.5 is the calibrated default decision threshold for the Centor Model for Safety. The threshold values in these examples are illustrative — tune them for your data and risk tolerance. See safety threshold guidance.

Each dimension returns a score between 0 and 1:

0.0 - 0.3: Low risk (safe to proceed)
0.3 - 0.5: Elevated risk, below the calibrated 0.5 decision threshold (review recommended)
0.5 - 1.0: At or above the calibrated 0.5 decision threshold (block or flag for review)

def is_safe(safety_scores, threshold=0.5):
    """
    Check if content passes safety check.
    Returns True if ALL dimensions are below threshold.
    """
    dangerous_dimensions = [
        dim for dim, score in safety_scores.items()
        if score >= threshold
    ]

    if dangerous_dimensions:
        print(f"⚠️ Safety violations: {dangerous_dimensions}")
        return False

    print("✅ Content passed safety check")
    return True

# Use in your application
if is_safe(safety_scores):
    # Proceed with LLM processing
    pass
else:
    # Block or flag content
    print("Content blocked due to safety concerns")

→ Safety Guardrails Tutorial

🔒 PII Detection

Detect personally identifiable information (PII), protected health information (PHI), and custom sensitive data. Endpoint: /v3/guardrails/sensitive-information Use cases:

Data privacy compliance
GDPR/CCPA protection
Sensitive data redaction

Example 1: Detect PII

def detect_pii(text, entity_categories="PII"):
    """
    Detect sensitive information in text.

    Args:
        text: Input text to analyze
        entity_categories: "PII", "PHI", "Custom Entities", or list like ["PII", "PHI"]
    """
    payload = {
        "data": {
            "input": text,
            "entity_categories": entity_categories
        }
    }

    response = requests.post(
        f"{FIDDLER_URL}/v3/guardrails/sensitive-information",
        headers=HEADERS,
        json=payload
    )
    return response.json()

# Test with PII data
test_text = """
Contact John Doe at john.doe@email.com or call (555) 123-4567.
SSN: 123-45-6789. Credit card: 4111-1111-1111-1111.
"""

pii_results = detect_pii(test_text)
print(json.dumps(pii_results, indent=2))

Response Format:

{
  "fdl_sensitive_information_scores": [
    {
      "score": 0.987,
      "label": "person",
      "text": "John Doe",
      "start": 8,
      "end": 16
    },
    {
      "score": 0.998,
      "label": "email",
      "text": "john.doe@email.com",
      "start": 20,
      "end": 38
    },
    {
      "score": 0.991,
      "label": "social security number",
      "text": "123-45-6789",
      "start": 72,
      "end": 83
    }
  ]
}

Response Fields:

score - Confidence score (0.0 to 1.0)
label - Entity type (e.g., “email”, “social security number”)
text - The detected sensitive information
start / end - Character positions in the input text

Example 2: Detect PHI (Healthcare Data)

# Detect protected health information
healthcare_text = """
Patient John Smith prescribed metformin for diabetes.
Insurance number: HI-987654321.
"""

phi_results = detect_pii(healthcare_text, entity_categories="PHI")

# Display detected PHI entities
for entity in phi_results.get("fdl_sensitive_information_scores", []):
    print(f"Found {entity['label']}: '{entity['text']}' (confidence: {entity['score']:.3f})")

Example 3: Custom Entity Detection

# Detect organization-specific sensitive data
custom_text = "Employee ID: EMP-2024-001, API key: sk-abc123xyz789"

custom_results = detect_pii(
    custom_text,
    entity_categories="Custom Entities"
)

# Note: For custom entities, you can also specify the entity types:
payload = {
    "data": {
        "input": custom_text,
        "entity_categories": "Custom Entities",
        "custom_entities": ["employee id", "api key", "project code"]
    }
}

Supported Entity Categories:

PII: comprehensive coverage including names, addresses, SSN, credit cards, emails, and phone numbers
PHI: healthcare-specific types including medication, medical conditions, and health insurance numbers
Custom Entities: Define your own sensitive data patterns

Processing PII Results

def redact_pii(text, pii_results):
    """Redact detected PII from text."""
    entities = pii_results.get("fdl_sensitive_information_scores", [])

    # Sort by position in reverse to maintain correct offsets
    entities_sorted = sorted(entities, key=lambda x: x['start'], reverse=True)

    redacted_text = text
    for entity in entities_sorted:
        redacted_text = (
            redacted_text[:entity['start']] +
            f"[REDACTED {entity['label'].upper()}]" +
            redacted_text[entity['end']:]
        )

    return redacted_text

# Use in your application
if pii_results.get("fdl_sensitive_information_scores"):
    clean_text = redact_pii(test_text, pii_results)
    print(f"Redacted: {clean_text}")

→ PII Detection Tutorial

🔑 Secret Detection

Detect credentials, API keys, and tokens across ~42 known formats plus high-entropy unknown secrets. Endpoint: /v3/guardrails/secret-detection Use cases:

Prevent credentials from leaking through LLM prompts or responses
Detect and redact secrets before they are logged or forwarded

Example: Detect Secrets

def detect_secrets(text):
    """Detect credentials and API keys in text."""
    response = requests.post(
        f"{FIDDLER_URL}/v3/guardrails/secret-detection",
        headers=HEADERS,
        json={"data": {"input": text}}
    )
    return response.json()

# Test with an API key in the prompt
test_input = "Use this key to call the API: sk-ant-api03-abcdefghijklmnopqrstu"
secret_results = detect_secrets(test_input)

print(json.dumps(secret_results, indent=2))

Response Format:

{
  "fdl_secret_detection_scores": [
    {
      "label": "Anthropic API Key",
      "start": 30,
      "end": 64
    }
  ]
}

Response Fields:

label - Secret type (e.g., "Anthropic API Key", "AWS Access Key ID", or "Possible Secret" for entropy-based detections)
start / end - Character positions for precise redaction

Processing Secret Results

def redact_secrets(text, secret_results):
    """Redact detected secrets from text."""
    secrets = secret_results.get("fdl_secret_detection_scores", [])

    # Apply right-to-left to preserve offsets
    for secret in sorted(secrets, key=lambda s: s["start"], reverse=True):
        label = secret["label"].upper().replace(" ", "_")
        text = (
            text[: secret["start"]] +
            f"[REDACTED {label}]" +
            text[secret["end"] :]
        )

    return text

# Use in your application
if secret_results.get("fdl_secret_detection_scores"):
    clean_text = redact_secrets(test_input, secret_results)
    print(f"Redacted: {clean_text}")
    # Redacted: Use this key to call the API: [REDACTED ANTHROPIC_API_KEY]

→ Secret Detection Tutorial

✅ Faithfulness Detection

Detect hallucinations and unsupported claims by comparing LLM outputs to source context (for RAG applications) using Fiddler Centor Models. Endpoint: /v3/guardrails/ftl-response-faithfulness

This guardrail uses the Centor Model for Faithfulness for real-time content blocking. For RAG pipeline diagnostics using the LLM-as-a-Judge approach, see RAG Health Metrics.

Use cases:

RAG application accuracy
Fact-checking
Hallucination prevention

Example: Check Response Faithfulness

def check_faithfulness(llm_response, source_context):
    """
    Check if LLM response is faithful to the provided context.

    Args:
        llm_response: The text generated by your LLM
        source_context: The reference text from your knowledge base/retrieval
    """
    payload = {
        "data": {
            "response": llm_response,
            "context": source_context
        }
    }

    response = requests.post(
        f"{FIDDLER_URL}/v3/guardrails/ftl-response-faithfulness",
        headers=HEADERS,
        json=payload
    )
    return response.json()

# Test with RAG example
retrieved_context = """
The Eiffel Tower is located in Paris, France. It was completed in 1889
and stands 330 meters tall. It was designed by Gustave Eiffel.
"""

llm_response_correct = "The Eiffel Tower in Paris is 330 meters tall and was completed in 1889."
llm_response_hallucinated = "The Eiffel Tower in Paris is 450 meters tall and was completed in 1895."

# Check faithful response
faithful_score = check_faithfulness(llm_response_correct, retrieved_context)
print(f"Faithful response score: {faithful_score}")

# Check hallucinated response
hallucinated_score = check_faithfulness(llm_response_hallucinated, retrieved_context)
print(f"Hallucinated response score: {hallucinated_score}")

Response Format:

{
  "fdl_faithful_score": 0.92
}

Score Interpretation:

0.0 - 0.3: Low faithfulness (likely hallucination)
0.3 - 0.7: Medium faithfulness (review recommended)
0.7 - 1.0: High faithfulness (response is well-supported by context)

def is_faithful(faithfulness_result, threshold=0.7):
    """Check if response is faithful to context."""
    score = faithfulness_result.get("fdl_faithful_score", 0.0)

    if score >= threshold:
        print(f"✅ Response is faithful (score: {score:.3f})")
        return True
    else:
        print(f"⚠️ Possible hallucination detected (score: {score:.3f})")
        return False

# Use in your RAG application
if not is_faithful(faithful_score):
    print("Warning: LLM response may contain unsupported claims")

→ Faithfulness Tutorial

Common Integration Patterns

Pattern 1: Pre-Processing (Input Guardrails)

Check user input before sending to your LLM:

def process_user_input(user_message):
    """Process and validate user input before LLM processing."""

    # Step 1: Check safety
    safety_scores = check_safety(user_message)

    # Block if any safety dimension exceeds threshold
    max_safety_score = max(safety_scores.values())
    if max_safety_score >= 0.5:
        return {
            "error": "Your message contains inappropriate content.",
            "blocked": True
        }

    # Step 2: Check for secrets and redact if found
    secret_results = detect_secrets(user_message)

    if secret_results.get("fdl_secret_detection_scores"):
        user_message = redact_secrets(user_message, secret_results)
        print(f"⚠️ Secret detected and redacted")

    # Step 3: Check for PII and redact if needed
    pii_results = detect_pii(user_message)

    if pii_results.get("fdl_sensitive_information_scores"):
        # Redact PII before sending to LLM
        user_message = redact_pii(user_message, pii_results)
        print(f"⚠️ PII detected and redacted")

    # Step 4: Proceed with LLM call
    return {
        "message": user_message,
        "blocked": False
    }

# Example usage
user_input = "My SSN is 123-45-6789. Can you help me?"
result = process_user_input(user_input)

if not result.get("blocked"):
    # Safe to send to LLM
    llm_response = call_your_llm(result["message"])

Pattern 2: Post-Processing (Output Guardrails)

Check LLM output before returning to user:

def validate_llm_output(llm_response, retrieval_context=None):
    """Validate LLM output before returning to user."""

    # Step 1: Check for PII in output
    pii_results = detect_pii(llm_response)

    if pii_results.get("fdl_sensitive_information_scores"):
        # Redact any PII in the response
        llm_response = redact_pii(llm_response, pii_results)
        print("⚠️ PII detected in LLM output and redacted")

    # Step 2: Check faithfulness (for RAG applications)
    if retrieval_context:
        faithfulness_result = check_faithfulness(llm_response, retrieval_context)

        if not is_faithful(faithfulness_result, threshold=0.7):
            return {
                "response": llm_response,
                "warning": "This response may contain information not supported by source documents."
            }

    return {
        "response": llm_response,
        "warning": None
    }

# Example usage in RAG application
context = retrieve_from_knowledge_base(user_query)
llm_output = generate_llm_response(user_query, context)
validated = validate_llm_output(llm_output, context)

if validated.get("warning"):
    print(f"⚠️ {validated['warning']}")

return validated["response"]

Pattern 3: Complete LLM Pipeline with Multiple Guardrails

def safe_llm_pipeline(user_input, use_rag=True):
    """Complete LLM pipeline with comprehensive guardrails."""

    # === INPUT GUARDRAILS ===

    # 1. Safety check
    safety_scores = check_safety(user_input)
    if max(safety_scores.values()) >= 0.5:
        return {"error": "Inappropriate content detected", "blocked": True}

    # 2. Secret detection and redaction
    secret_input = detect_secrets(user_input)
    if secret_input.get("fdl_secret_detection_scores"):
        user_input = redact_secrets(user_input, secret_input)

    # 3. PII detection and redaction
    pii_input = detect_pii(user_input)
    if pii_input.get("fdl_sensitive_information_scores"):
        user_input = redact_pii(user_input, pii_input)

    # === LLM PROCESSING ===

    context = None
    if use_rag:
        context = retrieve_from_knowledge_base(user_input)

    llm_response = generate_llm_response(user_input, context)

    # === OUTPUT GUARDRAILS ===

    # 4. PII detection in output
    pii_output = detect_pii(llm_response)
    if pii_output.get("fdl_sensitive_information_scores"):
        llm_response = redact_pii(llm_response, pii_output)

    # 5. Faithfulness check (for RAG)
    warning = None
    if use_rag and context:
        faithfulness = check_faithfulness(llm_response, context)
        if faithfulness.get("fdl_faithful_score", 0) < 0.7:
            warning = "Response may contain unsupported claims"

    return {
        "response": llm_response,
        "warning": warning,
        "blocked": False
    }

Best Practices

Layer Multiple Guardrails: Use safety + PII for inputs, faithfulness + PII for outputs
Set Appropriate Thresholds: Adjust risk score thresholds based on your use case sensitivity
Log All Checks: Track guardrail results for monitoring and continuous improvement
Handle Gracefully: Provide helpful user-facing messages when content is blocked
Monitor Performance: Track false positives/negatives and adjust thresholds accordingly
Consider Latency: Guardrail checks add ~100-300ms - use async calls when possible
Mind deployment capacity: Guardrail throughput is governed by the resources provisioned for your Fiddler deployment; batch and parallelize within that capacity

Error Handling

def safe_guardrail_check(guardrail_func, *args, **kwargs):
    """Wrapper for safe guardrail execution with error handling."""
    try:
        response = guardrail_func(*args, **kwargs)
        return response, None

    except requests.exceptions.HTTPError as e:
        if e.response.status_code == 401:
            error = "Authentication failed. Check your API key."
        elif e.response.status_code == 413:
            error = "Input exceeds token length limit."
        elif e.response.status_code == 429:
            error = "Rate limit exceeded. Please retry later."
        else:
            error = f"HTTP error: {e.response.status_code}"

        return None, error

    except requests.exceptions.Timeout:
        return None, "Request timed out."

    except Exception as e:
        return None, f"Unexpected error: {str(e)}"

# Usage
safety_result, error = safe_guardrail_check(check_safety, user_input)
if error:
    print(f"Guardrail check failed: {error}")
    # Fallback behavior
else:
    # Process safety_result
    pass

Next Steps

API Reference: Complete Guardrails API Documentation
Setup Guide: Complete Guardrails Setup
Concepts: Guardrails Overview
Tutorials:
FAQ: Guardrails Frequently Asked Questions
Monitoring: Integrate Guardrails with LLM Monitoring

Summary

You’ve learned how to:

✅ Use Safety Guardrails to detect harmful content across 11 dimensions
✅ Detect and redact PII, PHI, and custom sensitive information
✅ Detect and redact credentials, API keys, and tokens
✅ Check response faithfulness to prevent hallucinations in RAG applications
✅ Integrate multiple guardrails into your LLM pipeline
✅ Handle errors and respect rate limits

Each guardrail type uses a different endpoint and response format optimized for its specific protection purpose. Combine multiple guardrails for comprehensive LLM application safety.

Overview

Platform

Agentic AI Monitoring

LLM Monitoring

ML Monitoring

Experiments

Guardrails

Cookbooks

Tutorials

Client Library Reference

What You’ll Learn

Prerequisites

Quick Start: Setting Up Guardrails

Step 1: Get Your API Key

Step 2: Install Required Libraries (Optional)

Step 3: Configure Your Connection

Guardrail Types and Usage

🛡️ Safety Guardrails

Example: Check for Harmful Content

Interpreting Safety Scores

🔒 PII Detection

Example 1: Detect PII

Example 2: Detect PHI (Healthcare Data)

Example 3: Custom Entity Detection

Processing PII Results

🔑 Secret Detection

Example: Detect Secrets

Processing Secret Results

✅ Faithfulness Detection

Example: Check Response Faithfulness

Common Integration Patterns

Pattern 1: Pre-Processing (Input Guardrails)

Pattern 2: Post-Processing (Output Guardrails)

Pattern 3: Complete LLM Pipeline with Multiple Guardrails

Best Practices

Error Handling

Next Steps

Summary

​What You’ll Learn

​Prerequisites

​Quick Start: Setting Up Guardrails

​Step 1: Get Your API Key

​Step 2: Install Required Libraries (Optional)

​Step 3: Configure Your Connection

​Guardrail Types and Usage

​🛡️ Safety Guardrails

​Example: Check for Harmful Content

​Interpreting Safety Scores

​🔒 PII Detection

​Example 1: Detect PII

​Example 2: Detect PHI (Healthcare Data)

​Example 3: Custom Entity Detection

​Processing PII Results

​🔑 Secret Detection

​Example: Detect Secrets

​Processing Secret Results

​✅ Faithfulness Detection

​Example: Check Response Faithfulness

​Common Integration Patterns

​Pattern 1: Pre-Processing (Input Guardrails)

​Pattern 2: Post-Processing (Output Guardrails)

​Pattern 3: Complete LLM Pipeline with Multiple Guardrails

​Best Practices

​Error Handling

​Next Steps

​Summary

What You’ll Learn

Prerequisites

Quick Start: Setting Up Guardrails

Step 1: Get Your API Key

Step 2: Install Required Libraries (Optional)

Step 3: Configure Your Connection

Guardrail Types and Usage

🛡️ Safety Guardrails

Example: Check for Harmful Content

Interpreting Safety Scores

🔒 PII Detection

Example 1: Detect PII

Example 2: Detect PHI (Healthcare Data)

Example 3: Custom Entity Detection

Processing PII Results

🔑 Secret Detection

Example: Detect Secrets

Processing Secret Results

✅ Faithfulness Detection

Example: Check Response Faithfulness

Common Integration Patterns

Pattern 1: Pre-Processing (Input Guardrails)

Pattern 2: Post-Processing (Output Guardrails)

Pattern 3: Complete LLM Pipeline with Multiple Guardrails

Best Practices

Error Handling

Next Steps

Summary