Product Concepts

What is ML Observability?

ML observability is the modern practice of gaining comprehensive insights into your AI application's performance throughout its lifecycle. It goes beyond simple indicators of good and bad performance by empowering all model stakeholders to understand why a model behaves in a certain manner and how to enhance its performance. ML Observability starts with monitoring and alerting on performance issues, but goes much deeper into guiding model owners towards the underlying root cause of model performance issues.

What is LLM Observability?

LLM observability is the practice of evaluating, monitoring, analyzing, and improving Generative AI or LLM based application across their lifecycle. Fiddler provides real-time monitoring on safety metrics like toxicity, bias, and PII and correctness metrics like hallucinations, faithfulness and relevancy.

Projects

A project within Fiddler represents your organization's distinct AI applications or use cases. It houses all of the model schemas that have been onboarded to Fiddler for the purpose of AI observability. Projects are typically specific to a given business unit, ML application, or use case. They serve as a jumping-off point for Fiddler's model monitoring and explainability features.

Additionally, Fiddler projects serve as the primary access control or authorization mechanism. Different users and teams are given access to view model data within Fiddler at the project level.

Models

Model Schemas

In Fiddler, A model schema is the metadata about a model that is being observed. Model schemas are onboarded to Fiddler so that Fiddler understands the data under observation. Fiddler does not require the model artifact itself to properly observe the performance of the model; however, model artifacts can be uploaded to Fiddler to unlock advanced explainability features. Instead, we may just need adequate information about the model's schema, or the model's specification in order to monitor the model.

📘 Working with Model Artifacts

You can upload your model artifacts to Fiddler to unlock high-fidelity explainability for your model. However, it is not required. If you do not wish to upload your artifact but want to explore explainability with Fiddler, we can build a surrogate model on the backend to be used in place of your artifact.

Model Versions

Fiddler offers model versions to organize related models, streamlining processes such as model retraining or conducting champion vs. challenger analyses. When retraining a model, instead of uploading a new model instance, you can create a new version of the existing model, retaining its core structure while accommodating necessary updates. These updates can include modifications to the schema, such as adding or removing columns, modifying data types, adjusting value ranges, updating the model specifications, and even refining task parameters or Explainable AI (XAI) settings.

Data

Environments

Environments designate different types of data in the Fiddler platform. There are two types of data, or environments, in Fiddler:

Pre-production Environments

Data designated as pre-production contains non-time series data, which is uploaded to Fiddler in a single batch and referred to as a dataset. Datasets can be training datasets, validation datasets, or other static data meant to be evaluated as a whole without the dimension of trends over time.

Production Environments

Data designated as production contains time series data such as production or inference logs, which are the digital exhaust emitted by each decision a model makes. This time series data provides the inputs and outputs of each model inference/decision, which is what Fiddler analyses and compares against the pre-production data to determine if the model's performance is degrading.

Datasets

Datasets within Fiddler are static sets of data that have been uploaded to Fiddler. These datasets are typically training datasets or validation datasets meant to be analyzed by themselves. Datasets can also be used to define baselines.

Baselines

Baselines are data that serve as a point of reference for calculating data drift. When determining if drift has occurred, Fiddler must compare the distribution of production data (at a point in time) to reference data. Baselines serve as this reference.

Most commonly, training data is used to establish a model's baseline. In this case, the model's training data is uploaded as a Dataset. Fiddler will then create a static pre-production baseline of the same name. Multiple baselines can be defined for a model too. It is not uncommon to have other baselines defined not from training data, but from static sets of historical inferences or rolling baselines that look back weeks, months, or quarters at historical inferences.

Segments

Segments (also called Cohorts) are subsets of the inference logs defined by custom filters. Segments allow users to analyze metrics for specific subsets of your data, for example, "People under 50". Segments help you focus on interesting cohorts or areas of model underperformance for more precise insights. Additionally, you can set alerts on these Segments to stay informed about important changes or trends within these defined subsets.

Metrics

Metrics are computations performed on data received. Fiddler supports the ability to specify custom user-defined metrics Custom Metrics in addition to five out-of-the-box metric types:

  • Data Integrity

  • Data Drift

  • Performance

  • Statistics

  • Traffic

Alerts

Alerts are user-specified rules which trigger when some condition is met by production data received in Fiddler. Alert rule notifications are sent via email, Slack, custom webhooks, or any combination thereof.

Trust Scores / Enrichments

Enrichments augment existing columns with additional metrics to monitor different aspects of LLM applications. The new metrics are available for use within the analyze, charting, and alerting functionalities in Fiddler.

Dashboards and Charts

Fiddler uses customizable Dashboards for monitoring and sharing model behavior. A Dashboard is comprised of Charts which provide three distinct types of visualizations: monitoring charts, embedding visualizations, and performance analytics. Dashboards consolidate visualizations in one place offering a detailed overview of model performance as well as an entry point for deeper data analysis and root cause identification.

Monitoring Charts

Monitoring charts provide a comprehensive view of model performance and support model to model performance analysis. With intuitive displays for data drift, performance metrics, data integrity, traffic patterns, and more, monitoring charts empower users to maintain model accuracy and reliability with ease.

Embedding Visualizations

Embedding visualization is a powerful charting tool used to understand and interpret complex relationships in high-dimensional data. Reducing the dimensionality of custom features into a 2D or 3D space makes it easier to identify patterns, clusters, and outliers.

Jobs

Fiddler Jobs are a feature used to track operations such as data publishing or adding model assets such as user artifacts. Jobs are created automatically and can be observed both in the Fiddler UI and polled using the API. Upon successful job completion, the new data or model asset is available for use in the monitoring, alerting, and charting functionalities. If the job fails, users can navigate to the Jobs page and click on "failure" for more details on the job failure. Users can then remediate the cause of error or contact the Fiddler team for more support.

Bookmarks

Bookmarking enables quick access to projects, models, charts, and dashboards. The comprehensive bookmark page enhances your convenience and efficiency in navigating Fiddler.

Last updated

© 2024 Fiddler Labs, Inc.