Personally Identifiable Information (beta)

  • The PII enrichment helps identify leaks in strings either entered by the user, or produced by the system.
  • Mishandling or leaking PII can lead to severe issues like privacy violations and identity theft.
  • Laws such as GDPR and HIPAA highlight the importance of protecting PII with stringent guidelines.
  • Accidentally sending PII to large language models (LLMs) can spread this sensitive information widely, increasing the risks.
  • To enable set enrichment parameter topii.

Requirements

  • Reachability to https://github.com/explosion/spacy-models/releases/download/ to download spacy models as required

List of PII entities

Entity TypeDescriptionDetection MethodExample
CREDIT_CARDA credit card number is between 12 to 19 digits. https://en.wikipedia.org/wiki/Payment_card_numberPattern match and checksum4111111111111111
378282246310005 (American Express)
CRYPTOA Crypto wallet number. Currently only Bitcoin address is supportedPattern match, context and checksum1BoatSLRHtKNngkdXEeobR76b53LETtpyT
DATE_TIMEAbsolute or relative dates or periods or times smaller than a day.Pattern match and context01/01/2024
EMAIL_ADDRESSAn email address identifies an email box to which email messages are deliveredPattern match, context and RFC-822 validation[email protected]
IBAN_CODEThe International Bank Account Number (IBAN) is an internationally agreed system of identifying bank accounts across national borders to facilitate the communication and processing of cross border transactions with a reduced risk of transcription errors.Pattern match, context and checksumDE89 3704 0044 0532 0130 00
IP_ADDRESSAn Internet Protocol (IP) address (either IPv4 or IPv6).Pattern match, context and checksum1.2.3.4
127.0.0.12/16
1234:BEEF:3333:4444:5555:6666:7777:8888
LOCATIONName of politically or geographically defined location (cities, provinces, countries, international regions, bodies of water, mountainsCustom logic and contextPALO ALTO
Japan
PERSON A full person name, which can include first names, middle names or initials, and last names.Custom logic and contextJoanna Doe
PHONE_NUMBERA telephone numberCustom logic, pattern match and context5556667890
URLA URL (Uniform Resource Locator), unique identifier used to locate a resource on the InternetPattern match, context and top level url validationwww.fiddler.ai
US SSN A US Social Security Number (SSN) with 9 digits.Pattern match and context1234-00-5678
fdl.ModelInfo.from_dataset_info(
    dataset_info=dataset_info,
    display_name='llm_model',
    model_task=fdl.core_objects.ModelTask.LLM,
    custom_features = [
      fdl.Enrichment(
        name='Rag PII',
        enrichment='pii',
        columns=['question'], # one or more columns
      ),
    ]
)

The above example will lead to generation of new columns:

  1. FDL Rag PII (question) (bool) : whether any PII was detected
  2. FDL Rag PII (question) Matches (str) : what matches in raw text were flagged as potential PII (ex. ‘Douglas MacArthur,Korean’)
  3. FDL Rag PII (question) Entities (str) : what entites these matches were tagged as (ex. 'PERSON')

Note

PII enrichment is integrated with Presidio