Guardrails Quick Start

Fiddler Guardrails provide real-time protection for your LLM applications by detecting and preventing harmful content, PII leaks, and hallucinations before they reach your users.

Time to complete: ~15 minutes

What You'll Learn

  • How to set up Fiddler Guardrails

  • How to use the three main guardrail types (Safety, PII, Faithfulness)

  • How to interpret risk scores

  • How to integrate guardrails into your LLM application

Prerequisites

  • Fiddler Guardrails Account: Sign up for Free Guardrails

  • API Key: Generated from your Fiddler Guardrails dashboard

  • Python 3.8+ (or any HTTP client)


Quick Start: Setting Up Guardrails

Step 1: Get Your API Key

  1. Activate your account via email

  2. Generate your API key from the dashboard

For detailed setup instructions, see the Guardrails Getting Started Guide.

Step 2: Install Required Libraries (Optional)

Step 3: Configure Your Connection


Guardrail Types and Usage

Each guardrail type has its own endpoint and request/response format. Choose the appropriate guardrail based on your protection needs.

🛡️ Safety Guardrails

Detect harmful, toxic, or jailbreaking content across 10 safety dimensions.

Endpoint: /v3/guardrails/ftl-safety

Use cases:

  • Content moderation

  • Jailbreak prevention

  • Toxic content detection

Example: Check for Harmful Content

Response Format:

Safety Dimensions:

  • fdl_harmful - General harmful content

  • fdl_violent - Violence and threats

  • fdl_unethical - Unethical behavior

  • fdl_illegal - Illegal activities

  • fdl_sexual - Sexual content

  • fdl_racist - Racist content

  • fdl_jailbreaking - Prompt manipulation attempts

  • fdl_harassing - Harassment

  • fdl_hateful - Hateful content

  • fdl_sexist - Sexist content

Interpreting Safety Scores

Each dimension returns a score between 0 and 1:

  • 0.0 - 0.3: Low risk (safe to proceed)

  • 0.3 - 0.7: Medium risk (review recommended)

  • 0.7 - 1.0: High risk (block or flag for review)

Safety Guardrails Tutorial


🔒 PII Detection

Detect personally identifiable information (PII), protected health information (PHI), and custom sensitive data.

Endpoint: /v3/guardrails/sensitive-information

Use cases:

  • Data privacy compliance

  • GDPR/CCPA protection

  • Sensitive data redaction

Example 1: Detect PII

Response Format:

Response Fields:

  • score - Confidence score (0.0 to 1.0)

  • label - Entity type (e.g., "email", "social_security_number")

  • text - The detected sensitive information

  • start / end - Character positions in the input text

Example 2: Detect PHI (Healthcare Data)

Example 3: Custom Entity Detection

Supported Entity Categories:

  • PII: 35+ types including names, addresses, SSN, credit cards, emails, phone numbers

  • PHI: 7 healthcare-specific types (medication, medical conditions, health insurance numbers)

  • Custom Entities: Define your own sensitive data patterns

Processing PII Results

PII Detection Tutorial


✅ Faithfulness Detection

Detect hallucinations and unsupported claims by comparing LLM outputs to source context (for RAG applications).

Endpoint: /v3/guardrails/ftl-response-faithfulness

Use cases:

  • RAG application accuracy

  • Fact-checking

  • Hallucination prevention

Example: Check Response Faithfulness

Response Format:

Score Interpretation:

  • 0.0 - 0.3: Low faithfulness (likely hallucination)

  • 0.3 - 0.7: Medium faithfulness (review recommended)

  • 0.7 - 1.0: High faithfulness (response is well-supported by context)

Faithfulness Tutorial


Common Integration Patterns

Pattern 1: Pre-Processing (Input Guardrails)

Check user input before sending to your LLM:

Pattern 2: Post-Processing (Output Guardrails)

Check LLM output before returning to user:

Pattern 3: Complete LLM Pipeline with Multiple Guardrails


Best Practices

  1. Layer Multiple Guardrails: Use safety + PII for inputs, faithfulness + PII for outputs

  2. Set Appropriate Thresholds: Adjust risk score thresholds based on your use case sensitivity

  3. Log All Checks: Track guardrail results for monitoring and continuous improvement

  4. Handle Gracefully: Provide helpful user-facing messages when content is blocked

  5. Monitor Performance: Track false positives/negatives and adjust thresholds accordingly

  6. Consider Latency: Guardrail checks add ~100-300ms - use async calls when possible

  7. Respect Rate Limits: Free tier has limits (2 req/s, 70 req/hr, 200 req/day)


Error Handling


Next Steps


Summary

You've learned how to:

  • ✅ Use Safety Guardrails to detect harmful content across 10 dimensions

  • ✅ Detect and redact PII, PHI, and custom sensitive information

  • ✅ Check response faithfulness to prevent hallucinations in RAG applications

  • ✅ Integrate multiple guardrails into your LLM pipeline

  • ✅ Handle errors and respect rate limits

Each guardrail type uses a different endpoint and response format optimized for its specific protection purpose. Combine multiple guardrails for comprehensive LLM application safety.