Guardrails Quick Start
Fiddler Guardrails provide real-time protection for your LLM applications by detecting and preventing harmful content, PII leaks, and hallucinations before they reach your users.
Time to complete: ~15 minutes
What You'll Learn
How to set up Fiddler Guardrails
The common execution pattern for all guardrail types
How to interpret risk scores
How to integrate guardrails into your LLM application
Prerequisites
Fiddler Guardrails Account: Sign up for Free Guardrails
API Key: Generated from your Fiddler Guardrails dashboard
Python 3.8+ (or any HTTP client)
Quick Start: Common Execution Pattern
All Fiddler Guardrails follow the same execution pattern, making it easy to protect your application with multiple guardrail types.
Step 1: Get Your API Key
Sign up at fiddler.ai/free-guardrails
Activate your account via email
Generate your API key from the dashboard
For detailed setup instructions, see the Guardrails Setup Guide.
Step 2: Install Required Libraries (Optional)
# For Python
pip install requests
# Or use any HTTP client in your preferred languageStep 3: Make a Guardrail Request
The execution pattern is the same for all guardrail types:
import requests
import json
# Your API credentials
API_KEY = "your-api-key-here"
API_URL = "https://api.fiddler.ai/guardrails/v1"
# Content to check
content_to_check = {
"inputs": ["What is the capital of France?"],
# For faithfulness, include context:
# "context": ["Paris is the capital of France..."]
}
# Choose your guardrail type:
# - "safety" - Detect harmful, toxic, or jailbreaking content
# - "pii" - Detect personally identifiable information
# - "faithfulness" - Detect hallucinations and unsupported claims
guardrail_type = "safety" # Change this to test different guardrails
# Make API request
response = requests.post(
f"{API_URL}/{guardrail_type}",
headers={
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
},
json=content_to_check
)
# Parse results
results = response.json()
print(json.dumps(results, indent=2))Step 4: Interpret Risk Scores
All guardrails return risk scores between 0 and 1:
0.0 - 0.3: Low risk (safe to proceed)
0.3 - 0.7: Medium risk (review recommended)
0.7 - 1.0: High risk (block or flag for review)
# Example response
{
"scores": [0.15], # Low risk - content is safe
"threshold": 0.5,
"passed": [True] # Content passed the guardrail check
}Step 5: Integrate into Your Application
Add guardrails as a protective layer before LLM inference:
def check_guardrail(content, guardrail_type="safety"):
"""Check content against Fiddler Guardrails"""
response = requests.post(
f"{API_URL}/{guardrail_type}",
headers={"Authorization": f"Bearer {API_KEY}"},
json={"inputs": [content]}
)
result = response.json()
return result["passed"][0], result["scores"][0]
# In your LLM application
user_input = "User's message here..."
# Check input safety
is_safe, risk_score = check_guardrail(user_input, "safety")
if not is_safe:
return "I'm sorry, I can't process that request."
# Proceed with LLM inference only if content is safe
llm_response = call_your_llm(user_input)
# Optionally, check output for PII or hallucinations
has_pii, pii_score = check_guardrail(llm_response, "pii")
if has_pii:
llm_response = redact_pii(llm_response)
return llm_responseAvailable Guardrail Types
🛡️ Safety Guardrails
Detect harmful, toxic, or jailbreaking content in user inputs and LLM outputs.
Use cases:
Content moderation
Jailbreak prevention
Toxic content detection
🔒 PII Detection
Identify and prevent personally identifiable information (PII) leaks.
Use cases:
Data privacy compliance
GDPR/CCPA protection
Sensitive data redaction
✅ Faithfulness Detection
Detect hallucinations and unsupported claims by comparing outputs to source context.
Use cases:
RAG application accuracy
Fact-checking
Hallucination prevention
Common Use Cases
Pre-Processing (Input Guardrails)
# Check user input before sending to LLM
user_input = request.get("user_message")
# Safety check
is_safe, _ = check_guardrail(user_input, "safety")
if not is_safe:
return {"error": "Inappropriate content detected"}
# PII check
has_pii, _ = check_guardrail(user_input, "pii")
if has_pii:
user_input = redact_pii(user_input)
# Now safe to process with LLM
response = llm.generate(user_input)Post-Processing (Output Guardrails)
# Check LLM output before returning to user
llm_output = llm.generate(user_input)
# Check for hallucinations
is_faithful, _ = check_guardrail(
llm_output,
"faithfulness",
context=retrieval_context
)
if not is_faithful:
return {"warning": "Response may contain unsupported claims"}
# Check for PII in output
has_pii, _ = check_guardrail(llm_output, "pii")
if has_pii:
llm_output = redact_pii(llm_output)
return {"response": llm_output}Best Practices
Layer Multiple Guardrails: Use safety + PII for inputs, faithfulness for outputs
Set Appropriate Thresholds: Adjust risk score thresholds based on your use case
Log All Checks: Track guardrail results for monitoring and improvement
Handle Gracefully: Provide helpful error messages when content is blocked
Monitor Performance: Track false positives/negatives and adjust as needed
Next Steps
Concepts: Guardrails Overview
API Reference: Guardrails API Documentation