Guardrails Quick Start

Fiddler Guardrails provide real-time protection for your LLM applications by detecting and preventing harmful content, PII leaks, and hallucinations before they reach your users.

Time to complete: ~15 minutes

What You'll Learn

How to set up Fiddler Guardrails
The common execution pattern for all guardrail types
How to interpret risk scores
How to integrate guardrails into your LLM application

Prerequisites

Fiddler Guardrails Account: Sign up for Free Guardrails
API Key: Generated from your Fiddler Guardrails dashboard
Python 3.8+ (or any HTTP client)

Quick Start: Common Execution Pattern

All Fiddler Guardrails follow the same execution pattern, making it easy to protect your application with multiple guardrail types.

Step 1: Get Your API Key

Sign up at fiddler.ai/free-guardrails
Activate your account via email
Generate your API key from the dashboard

For detailed setup instructions, see the Guardrails Setup Guide.

Step 2: Install Required Libraries (Optional)

# For Python
pip install requests

# Or use any HTTP client in your preferred language

Step 3: Make a Guardrail Request

The execution pattern is the same for all guardrail types:

import requests
import json

# Your API credentials
API_KEY = "your-api-key-here"
API_URL = "https://api.fiddler.ai/guardrails/v1"

# Content to check
content_to_check = {
    "inputs": ["What is the capital of France?"],
    # For faithfulness, include context:
    # "context": ["Paris is the capital of France..."]
}

# Choose your guardrail type:
# - "safety" - Detect harmful, toxic, or jailbreaking content
# - "pii" - Detect personally identifiable information
# - "faithfulness" - Detect hallucinations and unsupported claims

guardrail_type = "safety"  # Change this to test different guardrails

# Make API request
response = requests.post(
    f"{API_URL}/{guardrail_type}",
    headers={
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    },
    json=content_to_check
)

# Parse results
results = response.json()
print(json.dumps(results, indent=2))

Step 4: Interpret Risk Scores

All guardrails return risk scores between 0 and 1:

0.0 - 0.3: Low risk (safe to proceed)
0.3 - 0.7: Medium risk (review recommended)
0.7 - 1.0: High risk (block or flag for review)

# Example response
{
  "scores": [0.15],  # Low risk - content is safe
  "threshold": 0.5,
  "passed": [True]   # Content passed the guardrail check
}

Step 5: Integrate into Your Application

Add guardrails as a protective layer before LLM inference:

def check_guardrail(content, guardrail_type="safety"):
    """Check content against Fiddler Guardrails"""
    response = requests.post(
        f"{API_URL}/{guardrail_type}",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={"inputs": [content]}
    )
    result = response.json()
    return result["passed"][0], result["scores"][0]

# In your LLM application
user_input = "User's message here..."

# Check input safety
is_safe, risk_score = check_guardrail(user_input, "safety")

if not is_safe:
    return "I'm sorry, I can't process that request."

# Proceed with LLM inference only if content is safe
llm_response = call_your_llm(user_input)

# Optionally, check output for PII or hallucinations
has_pii, pii_score = check_guardrail(llm_response, "pii")
if has_pii:
    llm_response = redact_pii(llm_response)

return llm_response

Available Guardrail Types

🛡️ Safety Guardrails

Detect harmful, toxic, or jailbreaking content in user inputs and LLM outputs.

Use cases:

Content moderation
Jailbreak prevention
Toxic content detection

→ Safety Guardrails Tutorial

🔒 PII Detection

Identify and prevent personally identifiable information (PII) leaks.

Use cases:

Data privacy compliance
GDPR/CCPA protection
Sensitive data redaction

→ PII Detection Tutorial

✅ Faithfulness Detection

Detect hallucinations and unsupported claims by comparing outputs to source context.

Use cases:

RAG application accuracy
Fact-checking
Hallucination prevention

→ Faithfulness Tutorial

Common Use Cases

Pre-Processing (Input Guardrails)

# Check user input before sending to LLM
user_input = request.get("user_message")

# Safety check
is_safe, _ = check_guardrail(user_input, "safety")
if not is_safe:
    return {"error": "Inappropriate content detected"}

# PII check
has_pii, _ = check_guardrail(user_input, "pii")
if has_pii:
    user_input = redact_pii(user_input)

# Now safe to process with LLM
response = llm.generate(user_input)

Post-Processing (Output Guardrails)

# Check LLM output before returning to user
llm_output = llm.generate(user_input)

# Check for hallucinations
is_faithful, _ = check_guardrail(
    llm_output,
    "faithfulness",
    context=retrieval_context
)

if not is_faithful:
    return {"warning": "Response may contain unsupported claims"}

# Check for PII in output
has_pii, _ = check_guardrail(llm_output, "pii")
if has_pii:
    llm_output = redact_pii(llm_output)

return {"response": llm_output}

Best Practices

Layer Multiple Guardrails: Use safety + PII for inputs, faithfulness for outputs
Set Appropriate Thresholds: Adjust risk score thresholds based on your use case
Log All Checks: Track guardrail results for monitoring and improvement
Handle Gracefully: Provide helpful error messages when content is blocked
Monitor Performance: Track false positives/negatives and adjust as needed

Next Steps

Setup: Complete Guardrails Setup Guide
Concepts: Guardrails Overview
Tutorials:
FAQ: Guardrails Frequently Asked Questions
API Reference: Guardrails API Documentation

PreviousLLM Evaluation Quick Start NextAgentic & LLM Monitoring