PII

Get your sensitive information detection running in minutes with Fiddler's Fast PII Guardrails. This guide walks you through detecting PII, PHI, and custom entities to protect sensitive data across your applications.

What You'll Build

In this quick start, you'll implement a sensitive information detection system that:

  • Detects 35+ types of personally identifiable information (PII)

  • Identifies 7 types of protected health information (PHI)

  • Configures custom entity detection for organization-specific data

  • Provides real-time detection with sub-second latency

Interactive Tutorial

For more advanced examples, including batch processing, performance optimization, and production deployment patterns:

Open the Complete Sensitive Information Guardrail Notebook in Google Colab →

Or download the notebook from GitHub →

Prerequisites

  • Fiddler account with access token

  • Python 3.10+ environment

  • Basic understanding of data privacy concepts

Overview

Fiddler's Fast PII and PHI detection provides enterprise-grade protection against data leakage by automatically detecting sensitive information across multiple categories. These guardrails integrate seamlessly with Fiddler's AI Observability platform, enabling continuous monitoring and automated compliance reporting.

Key Capabilities

  • PII Detection: 35+ entity types including names, addresses, SSN, credit cards, emails, phone numbers

  • PHI Detection: 7 healthcare-specific entity types for HIPAA compliance

  • Custom Entities: Define organization-specific sensitive data patterns

  • Real-time Processing: Sub-second latency for production applications

1

Set Up Your Environment

Connect to Fiddler and configure the Sensitive Information Guardrail API:

import json
import pandas as pd
import requests
import time
import fiddler as fdl

# Replace with your actual values
URL = 'https://your_company.fiddler.ai'
TOKEN = 'your_token_here'

# API Configuration
SENSITIVE_INFORMATION_URL = f"{URL}/v3/guardrails/sensitive-information"
FIDDLER_HEADERS = {
    "Authorization": f"Bearer {TOKEN}",
    "Content-Type": "application/json",
}

# Connect to Fiddler
fdl.init(url=URL, token=TOKEN)
print("✅ Connected to Fiddler successfully!")
2

Define Helper Functions

Create reusable functions for interacting with the API:

def get_sensitive_information_response(
    text: str,
    entity_categories: str | list[str] = 'PII',
    custom_entities: list[str] = None,
):
    """
    Detect sensitive information in text.

    Args:
        text: Input text to analyze
        entity_categories: 'PII', 'PHI', 'Custom Entities', or list
        custom_entities: Custom entity patterns (when using 'Custom Entities')

    Returns:
        Tuple of (API response dict, latency in seconds)
    """
    data = {'input': text}

    # Add entity configuration if specified
    if entity_categories != 'PII' or custom_entities:
        data['entity_categories'] = entity_categories
        if custom_entities:
            data['custom_entities'] = custom_entities

    start_time = time.monotonic()

    try:
        response = requests.post(
            SENSITIVE_INFORMATION_URL,
            headers=FIDDLER_HEADERS,
            json={'data': data},
        )
        response.raise_for_status()
        return response.json(), (time.monotonic() - start_time)

    except requests.exceptions.RequestException as e:
        print(f'❌ API call failed: {e}')
        return {}, (time.monotonic() - start_time)


def print_detection_results(response, latency):
    """Display detection results in a formatted way."""
    entities = response.get('fdl_sensitive_information_scores', [])

    print(f"\n🔍 Detection Results (⏱️ {latency:.3f}s)")
    print(f"📊 Total Entities Found: {len(entities)}\n")

    if not entities:
        print("✅ No sensitive information detected.")
        return

    # Group by entity type
    by_type = {}
    for entity in entities:
        label = entity.get('label', 'unknown')
        if label not in by_type:
            by_type[label] = []
        by_type[label].append(entity)

    # Display grouped results
    for label, group in sorted(by_type.items()):
        print(f"🏷️  {label.upper()} ({len(group)} found):")
        for entity in group:
            print(f"   • '{entity['text']}' (confidence: {entity['score']:.3f})")
        print()
3

Example 1: PII Detection

Detect common personally identifiable information:

# Sample text with various PII types
sample_text = """
I'm John Doe and I live at 1234 Maple Street, Springfield, IL 62704.
You can reach me at [email protected] or call me at (217) 555-1234.
My social security number is 123-45-6789, and I was born on January 15, 1987.
My credit card number is 4111 1111 1111 1111 with CVV 123.
"""

print("🧪 Testing PII Detection")
print("📄 Input Text:")
print(sample_text)

# Call the API with default PII configuration
response, latency = get_sensitive_information_response(sample_text)

# Display results
print_detection_results(response, latency)

Expected Output:

🔍 Detection Results (⏱️ 0.125s)
📊 Total Entities Found: 8

🏷️  PERSON (1 found):
   • 'John Doe' (confidence: 0.987)

🏷️  ADDRESS (1 found):
   • '1234 Maple Street, Springfield, IL 62704' (confidence: 0.945)

🏷️  EMAIL (1 found):
   • '[email protected]' (confidence: 0.998)

🏷️  PHONE NUMBER (1 found):
   • '(217) 555-1234' (confidence: 0.976)

🏷️  SOCIAL SECURITY NUMBER (1 found):
   • '123-45-6789' (confidence: 0.991)

🏷️  CREDIT CARD NUMBER (1 found):
   • '4111 1111 1111 1111' (confidence: 0.989)

🏷️  CVV (1 found):
   • '123' (confidence: 0.892)

🏷️  DATE OF BIRTH (1 found):
   • 'January 15, 1987' (confidence: 0.923)
4

Example 2: PHI Detection for Healthcare

Detect protected health information in medical contexts:

# Sample text with PHI information
healthcare_text = """
Patient report: John Smith was prescribed metformin for his diabetes condition.
His health insurance number is HI-987654321, and medical record shows
serial number MED-2024-001 for his glucose monitor device.
Birth certificate number is BC-IL-1987-001234.
Current medication includes aspirin and lisinopril for blood pressure management.
"""

print("🏥 Testing PHI Detection for Healthcare Data")
print("📄 Input Text:")
print(healthcare_text)

# Call the API with PHI configuration
response, latency = get_sensitive_information_response(
    healthcare_text,
    entity_categories="PHI"
)

# Display results
print_detection_results(response, latency)

Expected Output:

🔍 Detection Results (⏱️ 0.098s)
📊 Total Entities Found: 5

🏷️  PERSON (1 found):
   • 'John Smith' (confidence: 0.976)

🏷️  MEDICATION (3 found):
   • 'metformin' (confidence: 0.945)
   • 'aspirin' (confidence: 0.932)
   • 'lisinopril' (confidence: 0.928)

🏷️  HEALTH INSURANCE NUMBER (1 found):
   • 'HI-987654321' (confidence: 0.887)
5

Example 4: Custom Entity Detection

Define and detect organization-specific sensitive data:

# Sample text with custom entities
custom_text = """
Employee ID: EMP-2024-001, Badge Number: BD-789456
Project code: PROJ-AI-2024, Server hostname: srv-prod-01
API key: sk-abc123xyz789
Internal ticket: TICK-2024-5678
"""

# Define custom entities for your organization
custom_entities = [
    'employee id',
    'badge number',
    'project code',
    'api key',
    'server hostname',
    'ticket number'
]

print("🎯 Testing Custom Entity Detection")
print(f"🏷️ Custom entities: {custom_entities}")

# Call the API with custom entity configuration
response, latency = get_sensitive_information_response(
    custom_text,
    entity_categories='Custom Entities',
    custom_entities=custom_entities
)

# Display results
print_detection_results(response, latency)

Expected Output:

🔍 Detection Results (⏱️ 0.112s)
📊 Total Entities Found: 6

🏷️  EMPLOYEE ID (1 found):
   • 'EMP-2024-001' (confidence: 0.923)

🏷️  BADGE NUMBER (1 found):
   • 'BD-789456' (confidence: 0.911)

🏷️  PROJECT CODE (1 found):
   • 'PROJ-AI-2024' (confidence: 0.897)

🏷️  API KEY (1 found):
   • 'sk-abc123xyz789' (confidence: 0.945)

🏷️  SERVER HOSTNAME (1 found):
   • 'srv-prod-01' (confidence: 0.878)

🏷️  TICKET NUMBER (1 found):
   • 'TICK-2024-5678' (confidence: 0.902)

API Reference

Endpoint

POST /v3/guardrails/sensitive-information

Request Format

{
  "data": {
    "input": "Text to analyze for sensitive information",
    "entity_categories": "PII" | "PHI" | "Custom Entities" | ["PII", "PHI"],
    "custom_entities": ["api key", "employee id", "custom pattern"]
  }
}

Request Parameters

Parameter
Type
Description
Default

input

string

Text to analyze for sensitive information

Required

entity_categories

string or array

Detection mode(s) to use

"PII"

custom_entities

array

Custom entity patterns (required when using "Custom Entities")

None

Response Format

{
  "fdl_sensitive_information_scores": [
    {
      "score": 0.987,
      "label": "email",
      "start": 78,
      "end": 100,
      "text": "[email protected]"
    }
  ]
}

Response Fields

Field
Type
Description

score

float

Confidence score (0.0 to 1.0)

label

string

Entity type identifier

start

integer

Character position where entity starts

end

integer

Character position where entity ends

text

string

The detected entity text

Supported Entity Types

PII Entities (35+ types)

  • Personal: person, date_of_birth

  • Contact: email, email_address, phone_number, mobile_phone_number, landline_phone_number, address, postal_code

  • Financial: credit_card_number, credit_card_expiration_date, cvv, cvc, bank_account_number, iban

  • Government IDs: social_security_number, passport_number, drivers_license_number, tax_identification_number, cpf, cnpj, national_health_insurance_number

  • Digital: ip_address, digital_signature

  • And more...

PHI Entities (7 types)

  • Medical: medication, medical_condition, medical_record_number

  • Insurance: health_insurance_number, health_plan_id

  • Identifiers: birth_certificate_number, device_serial_number

Code Examples

import requests

url = "https://your_company.fiddler.ai/v3/guardrails/sensitive-information"
headers = {
    "Authorization": "Bearer YOUR_TOKEN",
    "Content-Type": "application/json"
}

payload = {
    "data": {
        "input": "Contact John at [email protected]",
        "entity_categories": "PII"
    }
}

response = requests.post(url, json=payload, headers=headers)
entities = response.json().get("fdl_sensitive_information_scores", [])

for entity in entities:
    print(f"Found {entity['label']}: {entity['text']} (confidence: {entity['score']})")

Next Steps

After completing this quick start:

Summary

You've learned how to: ✅ Detect 35+ types of PII in text data ✅ Identify PHI for healthcare compliance ✅ Configure custom entities for your organization ✅ Integrate the Fast PII Guardrails API into your applications.

The Fast PII Guardrails offer enterprise-grade protection for sensitive information with sub-second latency, making them ideal for real-time applications while ensuring compliance with privacy regulations such as GDPR, HIPAA, and CCPA.