Enrichments (beta)


  • Enrichments are custom features designed to augment data provided in events.
  • They add new computed columns to your published data automatically whenever defined.
  • The new columns generated are available for querying in analyze, charting, and alerting, similar to any other column.
  • see enrichments for API methods.

📘

Note

Enrichments are disabled by default. To enable them, contact your administrator. Failing to do so will result in an error during the add_model call.



Embedding (beta)

  • Create an embedding for a string column using an embedding model.
  • Supports Sentence transformers and Encoder/Decoder NLP transformers from Hugging Face.
  • To enable set enrichment parameter toembedding.
  • For each embedding enrichment, if you want to monitor the embedding vector on fiddler you MUST create a corresponding TextEmbedding using the enrichment’s output column.

Requirements:

  • Access to Huggingface inference endpoint - https://api-inference.huggingface.co
  • Huggingface API token

For supported models and usage example, see here.



Centroid Distance (beta)

  • Fiddler uses KMeans based system to determine which cluster a particular CustomFeature belongs to.
  • This Centroid Distance enrichment calculates the distance from the closest centroid calculated by model monitoring.
  • A new numeric column with distances to the closest centroid is added to the events table.
  • To enable set enrichment parameter tocentroid_distance.

For usage example, see here.



Personally Identifiable Information (beta)

The PII (Personally Identifiable Information) enrichment is a critical tool designed to detect and flag the presence of sensitive information within textual data. Whether user-entered or system-generated, this enrichment aims to identify instances where PII might be exposed, helping to prevent privacy breaches and the potential misuse of personal data. In an era where digital privacy concerns are paramount, mishandling or unintentionally leaking PII can have serious repercussions, including privacy violations, identity theft, and significant legal and reputational damage.

Regulatory frameworks such as the General Data Protection Regulation (GDPR) in the European Union and the Health Insurance Portability and Accountability Act (HIPAA) in the United States underscore the necessity of safeguarding PII. These laws enforce strict guidelines on the collection, storage, and processing of personal data, emphasizing the need for robust measures to protect sensitive information.

The inadvertent inclusion of PII in datasets used for training or interacting with large language models (LLMs) can exacerbate the risks associated with data privacy. Once exposed to an LLM, sensitive information can be inadvertently learned by the model, potentially leading to wider dissemination of this data beyond intended confines. This scenario underscores the importance of preemptively identifying and removing PII from data before it is processed or shared, particularly in contexts involving AI and machine learning.

To mitigate the risks associated with PII exposure, organizations and developers can integrate the PII enrichment into their data processing workflows. This enrichment operates by scanning text for patterns and indicators of personal information, flagging potentially sensitive data for review or anonymization. By proactively identifying PII, stakeholders can take necessary actions to comply with privacy laws, safeguard individuals' data, and prevent the unintended spread of personal information through AI models and other digital platforms. Implementing PII detection and management practices is not just a legal obligation but a critical component of responsible data stewardship in the digital age.

  • To enable set enrichment parameter topii.

Requirements

  • Reachability to https://github.com/explosion/spacy-models/releases/download/ to download spacy models as required

For list of PII Entities and usage, see here.



Evaluate (beta)

  • Calculates classic Metrics for evaluating QA results like Bleu, Rouge and Meteor.
  • To enable set enrichment parameter toevaluate.
  • Make sure the reference_col and prediction_col are set in the configof Enrichment.

For types of evaluation metrics and usage, see here.



Textstat (beta)

With Textstat you would be able to generate statistics on string columns.

  • To enable set enrichment parameter totextstat.

Fo supported statistics and usage, see here.



Sentiment (beta)

Sentiment Analysis enrichment employs advanced natural language processing (NLP) techniques to gauge the emotional tone behind a body of text. This enrichment is designed to determine whether the sentiment of textual content is positive, negative, or neutral, providing valuable insights into the emotions and opinions expressed within. By analyzing the sentiment, this tool offers a powerful means to understand user feedback, market research responses, social media commentary, and any textual data where opinion and mood are significant.

Implementing Sentiment Analysis into your data processing allows for a nuanced understanding of how your audience feels about a product, service, or topic, enabling informed decision-making and strategy development. It's particularly useful in customer service and brand management, where gauging customer sentiment is crucial for addressing concerns, improving user experience, and building brand reputation.

The Sentiment enrichment uses NLTK's VADER lexicon to generate a score and corresponding sentiment for all specified columns. For each string column on which sentiment enrichment is enabled, two additional columns are added. To enable set enrichment parameter to sentiment.

Requirements

  • Reachability to www.nltk.org/nltk_data to download latest vader lexicon.

For usage example, see here.



Profanity (beta)

The Profanity enrichment is designed to detect and flag the use of offensive or inappropriate language within textual content. This enrichment is essential for maintaining the integrity and professionalism of digital platforms, forums, social media, and any user-generated content areas. It helps ensure that conversations and interactions remain respectful and free from language that could be considered harmful or offensive to users.

In the digital space, where diverse audiences come together, the presence of profanity can lead to negative user experiences, damage brand reputation, and create an unwelcoming environment. Implementing a profanity filter is a proactive measure to prevent such outcomes, promoting a positive and inclusive online community.

Beyond maintaining community standards, the Profanity enrichment has practical implications for compliance with platform guidelines and legal regulations concerning hate speech and online conduct. Many digital platforms have strict policies against the use of profane or offensive language, making it crucial for content creators and moderators to actively monitor and manage such language.

By integrating the Profanity enrichment into their content moderation workflow, businesses and content managers can automate the detection of inappropriate language, significantly reducing manual review efforts. This enrichment not only helps in upholding community guidelines and legal standards but also supports the creation of safer and more respectful online spaces for all users.



Answer Relevance (beta)

The Answer Relevance is a specialized enrichment designed to evaluate the pertinence of AI-generated responses to their corresponding prompts. This enrichment operates by assessing whether the content of a response accurately addresses the question or topic posed by the initial prompt, providing a simple yet effective binary outcome: relevant or not relevant. Its primary function is to ensure that the output of AI systems, such as chatbots, virtual assistants, and content generation models, remains aligned with the user's informational needs and intentions.

In the context of AI-generated content, ensuring relevance is crucial for maintaining user engagement and trust. Irrelevant or tangentially related responses can lead to user frustration, decreased satisfaction, and diminished trust in the AI's capabilities. The Answer Relevance metric serves as a critical checkpoint, verifying that interactions and content deliveries meet the expected standards of accuracy and pertinence.

This enrichment finds its application across a wide range of AI-driven platforms and services where the quality of the response directly impacts the user experience. From customer service bots answering inquiries to educational tools providing study assistance, the ability to automatically gauge the relevance of responses enhances the effectiveness and reliability of these services.

Incorporating the Answer Relevance enrichment into the development and refinement of AI models enables creators to iteratively improve their systems based on relevant feedback. By identifying instances where the model generates non-relevant responses, developers can adjust and fine-tune their models to better meet user expectations. This continuous improvement cycle is essential for advancing the quality and utility of AI-generated content, ensuring that it remains focused, accurate, and highly relevant to users' needs.

  • To enable set enrichment parameter toanswer_relevance.

Requirements:

  • This enrichment requires access to the OpenAI API, which may introduce latency due to network communication and processing time. Learn more about LLM based enrichments
  • OpenAI API access token MUST BE provided by the user.

For usage example, see here.


Faithfulness (beta)

The Faithfulness (Groundedness) enrichment is a binary indicator designed to evaluate the accuracy and reliability of facts presented in AI-generated text responses. It specifically assesses whether the information used in the response correctly aligns with and is grounded in the provided context, often in the form of referenced documents or data. This enrichment plays a critical role in ensuring that the AI's outputs are not only relevant but also factually accurate, based on the context it was given.

In practical applications, such as automated content creation, customer support, and informational queries, the Faithfulness (Groundedness) metric serves as a safeguard against the dissemination of misinformation. It verifies that the AI system's responses are not only generated with a high level of linguistic fluency but also reflect a true and correct use of the available information (retrieved documents).

This enrichment is particularly important in fields where accuracy is paramount, such as in educational content, medical advice, or factual reporting. By implementing the Faithfulness (Groundedness )metric, developers and researchers can enhance the trustworthiness of AI-generated content, ensuring that users receive responses that are not only contextually relevant but also factually sound. The effectiveness of this enrichment hinges on its ability to critically analyze the alignment between the generated content and the context provided, promoting a higher standard of reliability in AI-generated outputs.

  • To enable set enrichment parameter tofaithfulness.

Requirements:

  • This enrichment requires access to the OpenAI API, which may introduce latency due to network communication and processing time. Learn more about LLM based enrichments
  • OpenAI API access token MUST BE provided by the user.

For usage example, see here.



Coherence (beta)

The Coherence enrichment assesses the logical flow and clarity of AI-generated text responses, ensuring they are structured in a way that makes sense from start to finish. This enrichment is crucial for evaluating whether the content produced by AI maintains a consistent theme, argument, or narrative, without disjointed thoughts or abrupt shifts in topic. Coherence is key to making AI-generated content not only understandable but also engaging and informative for the reader.

In applications ranging from storytelling and article generation to customer service interactions, coherence determines the effectiveness of communication. A coherent response builds trust in the AI's capabilities, as it demonstrates an understanding of not just language, but also context and the natural progression of ideas. This enrichment encourages AI systems to produce content that flows naturally, mimicking the way a knowledgeable human would convey information or tell a story.

For developers, integrating the Coherence enrichment into AI evaluation processes is essential for achieving outputs that resonate with human readers. It helps in fine-tuning AI models to produce content that not only answers questions or provides information but does so in a way that is logical and easy to follow. By prioritizing coherence, AI-generated texts can better serve their intended purpose, whether to inform, persuade, or entertain, enhancing the overall quality and impact of AI communications.

  • To enable set enrichment parameter to coherence.

Requirements:

  • This enrichment requires access to the OpenAI API, which may introduce latency due to network communication and processing time. Learn more about LLM based enrichments
  • OpenAI API access token MUST BE provided by the user.

For usage example, see here.



Conciseness (beta)

The Conciseness enrichment evaluates the brevity and clarity of AI-generated text responses, ensuring that the information is presented in a straightforward and efficient manner. This enrichment identifies and rewards responses that effectively communicate their message without unnecessary elaboration or redundancy. In the realm of AI-generated content, where verbosity can dilute the message's impact or confuse the audience, maintaining conciseness is crucial for enhancing readability and user engagement.

Implementing the Conciseness metric can significantly improve the user experience across various applications, from chatbots and virtual assistants to summarization tools and content generation platforms. It encourages the AI to distill information down to its essence, providing users with clear, to-the-point answers that satisfy their queries or needs without overwhelming them with superfluous details.

For developers and content creators, the Conciseness serves as a valuable tool for refining the output of AI systems, aligning them more closely with human preferences for communication that is both efficient and effective. By prioritizing conciseness, AI-generated content can become more accessible and useful, meeting the high standards of users who value quick and accurate information delivery. This enrichment, therefore, plays a pivotal role in the ongoing effort to enhance the quality and utility of AI-generated text, making it an indispensable component of AI evaluation frameworks.

  • To enable set enrichment parameter toconciseness.

Requirements:

  • This enrichment requires access to the OpenAI API, which may introduce latency due to network communication and processing time. Learn more about LLM based enrichments
  • OpenAI API access token MUST BE provided by the user.

For usage example, see here.



Toxicity (beta)

The toxicity enrichment classifies whether a piece of text is toxic or not. A RoBERTa based model is fine-tuned with a mix of toxic and non-toxic data. The model predicts score between 0-1 where scores closer to 1 indicate toxicity.

For performance of enrichment and usage example, see here.



Regex Match (beta)

The Regex Match enrichment is designed to evaluate text responses or content based on their adherence to specific patterns defined by regular expressions (regex). By accepting a regex as input, this metric offers a highly customizable way to check if a string column in the dataset matches the given pattern. This functionality is essential for scenarios requiring precise formatting, specific keyword inclusion, or adherence to particular linguistic structures.

In practical applications, the Regex Match enrichment can be instrumental in validating data entries, ensuring compliance with formatting standards, or verifying the presence of required terms or codes within AI-generated content. Whether it's checking for email addresses, phone numbers, specific terminologies, or coding patterns, this metric provides a straightforward and efficient method for assessing the conformance of text to predefined patterns.

For developers and data analysts, the Regex Match enrichment is a powerful tool for automating the quality control process of textual data. It enables the swift identification of entries that fail to meet the necessary criteria, thereby streamlining the process of refining and improving the dataset or AI-generated content. This enrichment not only saves time but also enhances the reliability of data-driven applications by ensuring that the text outputs adhere closely to the desired specifications or standards.

Implementing the Regex Match enrichment into the evaluation and production framework of AI systems allows for a level of precision in text analysis that is crucial for applications demanding high accuracy and specificity. This metric is invaluable for maintaining the integrity and usefulness of textual data, making it a key component in the toolkit of anyone working with AI-generated content or large text datasets.

  • To enable set enrichment parameter toregex_match.

For usage example, see here.



Topic (beta)

The Topic enrichment leverages the capabilities of Zero Shot Classifier Zero Shot Classifier models to categorize textual inputs into a predefined list of topics, even without having been explicitly trained on those topics. This approach to text classification is known as zero-shot learning, a groundbreaking method in natural language processing (NLP) that allows models to intelligently classify text into categories they haven't encountered during training. It's particularly useful for applications requiring the ability to understand and organize content dynamically across a broad range of subjects or themes.

By utilizing zero-shot classification, the Topic enrichment provides a flexible and powerful tool for automatically sorting and labeling text according to relevant topics. This is invaluable for content management systems, recommendation engines, and any application needing to quickly and accurately understand the thematic content of large volumes of text.

The enrichment works by evaluating the semantic similarity between the textual input and potential topic labels, assigning the most appropriate topic based on the content. This process enables the handling of diverse and evolving content types without the need for continual retraining or manual classification, significantly reducing the effort and complexity involved in content categorization.

Implementing the Topic enrichment into your data processing workflow can dramatically enhance the organization and accessibility of textual content, making it easier to deliver relevant, targeted information to users or to analyze content themes at scale. This enrichment taps into the advanced capabilities of zero-shot classification to provide a nuanced, efficient, and adaptable tool for text categorization, essential for anyone working with diverse and dynamic textual datasets.

Requirements

For usage example, see here.



Banned Keyword Detector (beta)

The Banned Keyword Detector enrichment is designed to scrutinize textual inputs for the presence of specified terms, particularly focusing on identifying content that includes potentially undesirable or restricted keywords. This enrichment operates based on a list of terms defined in its configuration, making it highly adaptable to various content moderation, compliance, and content filtering needs.

By specifying a list of terms to be flagged, the Banned Keyword Detector provides a straightforward yet powerful mechanism for automatically scanning and flagging content that contains certain keywords. This capability is crucial for platforms seeking to maintain high standards of content quality, adhere to regulatory requirements, or ensure community guidelines are followed. It's particularly valuable in environments where content is user-generated, providing

  • To enable, set enrichment parameter tobanned_keywordsand specify a list of terms in the banned_keywords config parameter.

For usage example, see here.