LLM Monitoring

Monitoring of Large Language Model applications has the same mechanics as monitoring traditional predictive ML models. The Fiddler platform still requires publication of the applications inputs and outputs in order to calculate metrics that represent performance. In the world of LLM applications, these inputs and outputs are the prompts, prompt context, response and the source documents retrieved (if in a RAG-based application).

What is different about LLM monitoring, as compared to traditional ML model monitoring, are the metrics that are used to measure performance and how they are calculated. Fiddler is a pioneer in the AI Trust domain and, as such, offers the most extensive set of AI safety and trust metrics available today.

The Enrichment Framework

By leveraging the Fiddler platform's enrichment framework, several LLM trust and safety metrics can be calculated to help application owners monitor the health of their LLM applications. At the time of model onboarding, LLM application owners must instruct Fiddler which enrichment services to leverage for their monitoring needs. Then as LLM application "inferences" are published into the Fiddler platform, the enrichment pipeline scores the inputs and outputs of the LLM inference with the enrichments requested.

Figure 1. The Fiddler Enrichment Framework

Figure 1. The Fiddler Enrichment Framework

As Figure 1 depicts, after the raw unstructured inputs and outputs of the LLM application are published to Fiddler, the enrichment framework sends the data through the enrichment pipeline to score the inference for a variety of AI trust and safety metrics. These metrics can then be used to monitor the overall health of the LLM application and alert stakeholders to any degradation in performance.

Figure 2. A Fiddler dashboard showing LLM application performance

Figure 2. A Fiddler dashboard showing LLM application performance

As seen in Figure 2, with the metrics produced by the enrichment framework, stakeholders can monitor LLM application performance over time and conduct root cause analysis when problematic trends are alerted upon.

At the time of model onboarding, application owners can opt in to the various and ever-expanding Fiddler enrichments by specifying fdl.Enrichment as custom features in the Fiddler ModelSpec object.

# Automatically generating embedding for a column named “question”

fiddler_custom_features = [
        fdl.Enrichment(
            name='question_embedding',
            enrichment='embedding',
            columns=['question'],
        ),
        fdl.TextEmbedding(
            name='question_cf',
            source_column='question',
            column='question_embedding',
        ),
    ]

model_spec = fdl.ModelSpec(
    inputs=['question'],
    custom_features=fiddler_custom_features,
)

The code snippet above illustrates how the ModelSpec object is configured to opt in to an embedding enrichment, which is then used to create a fdl.TextEmbedding input. This input allows for drift detection and embedding visualizations with UMAP.

Enrichments Available

As of this release, please reference fdl.Enrichment for a list of available enrichments.