LLM-Based Metrics

LLM-based metrics use large language models to evaluate the quality of text generated by AI. This approach is much closer to how humans judge text, making these metrics particularly useful for evaluating AI-generated content for use cases such as chatbots, writing assistants, or content creation tools.

LLM-based metrics can adapt to different topics and types of text because LLMs have been trained on a wide range of information, making them a valuable tool for developers and researchers looking to enhance the quality of AI-generated text.

Currently, Fiddler supports three types of LLM-based metrics: LLM-as-a-Judge evaluators (including RAG Health Metrics), OpenAI-based enrichments, and Fiddler Fast Trust Model metrics.

RAG Health Metrics (LLM-as-a-Judge Evaluators)

RAG Health Metrics are a purpose-built diagnostic triad for evaluating RAG applications. These evaluators use LLM-as-a-Judge approaches and are available in Agentic Monitoring and Experiments:

  • Answer Relevance 2.0 — Ordinal scoring (High/Medium/Low = 1.0/0.5/0.0) measuring how well the response addresses the query. Also available in LLM Observability.

  • Context Relevance — Ordinal scoring measuring whether retrieved documents support the query. Available in Agentic Monitoring and Experiments only.

  • RAG Faithfulness — Binary scoring (Yes/No = 1/0) assessing whether the response is grounded in retrieved documents. Also available in LLM Observability.

See RAG Health Diagnostics for a conceptual guide to using these evaluators together for root cause analysis.

circle-exclamation

OpenAI-based metrics

  • These metrics are generated through the OpenAI API, which may introduce latency due to network communication and processing time.

  • OpenAI API access token MUST BE provided by the user, which will be configured during onboarding.

  • The specific model to be used for these metrics will also be chosen during onboarding.

Currently, the below metrics are OpenAI-based:

Fiddler Fast Trust metrics

  • These metrics are generated through Fiddler's in-house, purpose-built SLMs.

  • These metrics can be generated in air-gapped environments and do not rely on any over-the-network connection to generate such scores.

Currently, the below metrics are Fiddler Fast Trust Model-based:

  • Fast Safety — Evaluates safety across 11 dimensions including jailbreaking, toxicity, and harmful content.

  • Fast Faithfulness (FTL) — Proprietary Fast Trust Model for hallucination detection. Not to be confused with RAG Faithfulness above.