Guardrails Quick Start

Fiddler Guardrails provide real-time protection for your LLM applications by detecting and preventing harmful content, PII leaks, and hallucinations before they reach your users.

Time to complete: ~15 minutes

What You'll Learn

  • How to set up Fiddler Guardrails

  • The common execution pattern for all guardrail types

  • How to interpret risk scores

  • How to integrate guardrails into your LLM application

Prerequisites

  • Fiddler Guardrails Account: Sign up for Free Guardrails

  • API Key: Generated from your Fiddler Guardrails dashboard

  • Python 3.8+ (or any HTTP client)


Quick Start: Common Execution Pattern

All Fiddler Guardrails follow the same execution pattern, making it easy to protect your application with multiple guardrail types.

Step 1: Get Your API Key

  1. Activate your account via email

  2. Generate your API key from the dashboard

For detailed setup instructions, see the Guardrails Setup Guide.

Step 2: Install Required Libraries (Optional)

# For Python
pip install requests

# Or use any HTTP client in your preferred language

Step 3: Make a Guardrail Request

The execution pattern is the same for all guardrail types:

import requests
import json

# Your API credentials
API_KEY = "your-api-key-here"
API_URL = "https://api.fiddler.ai/guardrails/v1"

# Content to check
content_to_check = {
    "inputs": ["What is the capital of France?"],
    # For faithfulness, include context:
    # "context": ["Paris is the capital of France..."]
}

# Choose your guardrail type:
# - "safety" - Detect harmful, toxic, or jailbreaking content
# - "pii" - Detect personally identifiable information
# - "faithfulness" - Detect hallucinations and unsupported claims

guardrail_type = "safety"  # Change this to test different guardrails

# Make API request
response = requests.post(
    f"{API_URL}/{guardrail_type}",
    headers={
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    },
    json=content_to_check
)

# Parse results
results = response.json()
print(json.dumps(results, indent=2))

Step 4: Interpret Risk Scores

All guardrails return risk scores between 0 and 1:

  • 0.0 - 0.3: Low risk (safe to proceed)

  • 0.3 - 0.7: Medium risk (review recommended)

  • 0.7 - 1.0: High risk (block or flag for review)

# Example response
{
  "scores": [0.15],  # Low risk - content is safe
  "threshold": 0.5,
  "passed": [True]   # Content passed the guardrail check
}

Step 5: Integrate into Your Application

Add guardrails as a protective layer before LLM inference:

def check_guardrail(content, guardrail_type="safety"):
    """Check content against Fiddler Guardrails"""
    response = requests.post(
        f"{API_URL}/{guardrail_type}",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={"inputs": [content]}
    )
    result = response.json()
    return result["passed"][0], result["scores"][0]

# In your LLM application
user_input = "User's message here..."

# Check input safety
is_safe, risk_score = check_guardrail(user_input, "safety")

if not is_safe:
    return "I'm sorry, I can't process that request."

# Proceed with LLM inference only if content is safe
llm_response = call_your_llm(user_input)

# Optionally, check output for PII or hallucinations
has_pii, pii_score = check_guardrail(llm_response, "pii")
if has_pii:
    llm_response = redact_pii(llm_response)

return llm_response

Available Guardrail Types

🛡️ Safety Guardrails

Detect harmful, toxic, or jailbreaking content in user inputs and LLM outputs.

Use cases:

  • Content moderation

  • Jailbreak prevention

  • Toxic content detection

Safety Guardrails Tutorial

🔒 PII Detection

Identify and prevent personally identifiable information (PII) leaks.

Use cases:

  • Data privacy compliance

  • GDPR/CCPA protection

  • Sensitive data redaction

PII Detection Tutorial

✅ Faithfulness Detection

Detect hallucinations and unsupported claims by comparing outputs to source context.

Use cases:

  • RAG application accuracy

  • Fact-checking

  • Hallucination prevention

Faithfulness Tutorial


Common Use Cases

Pre-Processing (Input Guardrails)

# Check user input before sending to LLM
user_input = request.get("user_message")

# Safety check
is_safe, _ = check_guardrail(user_input, "safety")
if not is_safe:
    return {"error": "Inappropriate content detected"}

# PII check
has_pii, _ = check_guardrail(user_input, "pii")
if has_pii:
    user_input = redact_pii(user_input)

# Now safe to process with LLM
response = llm.generate(user_input)

Post-Processing (Output Guardrails)

# Check LLM output before returning to user
llm_output = llm.generate(user_input)

# Check for hallucinations
is_faithful, _ = check_guardrail(
    llm_output,
    "faithfulness",
    context=retrieval_context
)

if not is_faithful:
    return {"warning": "Response may contain unsupported claims"}

# Check for PII in output
has_pii, _ = check_guardrail(llm_output, "pii")
if has_pii:
    llm_output = redact_pii(llm_output)

return {"response": llm_output}

Best Practices

  1. Layer Multiple Guardrails: Use safety + PII for inputs, faithfulness for outputs

  2. Set Appropriate Thresholds: Adjust risk score thresholds based on your use case

  3. Log All Checks: Track guardrail results for monitoring and improvement

  4. Handle Gracefully: Provide helpful error messages when content is blocked

  5. Monitor Performance: Track false positives/negatives and adjust as needed


Next Steps