Trust Score
Last updated
Was this helpful?
Last updated
Was this helpful?
(also known as Enrichments) are quantitative measurements and evaluations that assess various quality and safety dimensions of Large Language Model (LLM) outputs. These scores are generated by Fiddler's Trust Models to provide objective metrics for monitoring, evaluating, and governing LLM/GenAI systems.
When LLM inputs and outputs are processed through Fiddler's platform, Trust Scores are automatically calculated to evaluate dimensions such as safety, toxicity, hallucination, relevance, coherence, and other critical aspects of LLM performance. These metrics serve as key indicators that help organizations understand how their LLM systems are performing in real-world scenarios and identify potential issues before they impact users or business outcomes.
Trust Scores function as both monitoring metrics within Fiddler's observability platform and as evaluation signals that power Fiddler Guardrails' real-time content filtering capabilities. They translate complex qualitative judgments about LLM outputs into quantifiable measurements that can be tracked, analyzed, and used to trigger alerts or actions.
Fiddler leverages Trust Scores as the foundation of its LLM monitoring and governance capabilities. When inference data is published to the Fiddler platform, the Fiddler Trust Service automatically generates these scores using specialized Trust Models that are optimized for efficient, accurate evaluation.
In the observability context, Trust Scores appear as metrics in monitoring dashboards, providing visibility into LLM output quality over time. Organizations can track trends, set thresholds, and configure alerts based on these scores to detect degradation or anomalies in their LLM applications.
For real-time protection through Fiddler Guardrails, Trust Scores serve as the decision signals that determine whether content should be filtered. When evaluations indicate that content violates safety policies, Guardrails can block the output or provide explanations of the specific violations detected.
Fiddler's approach to Trust Scores emphasizes both efficiency and accuracy, using purpose-built models that deliver comparable quality assessments to general-purpose LLMs but with significantly lower latency and computational requirements.
Trust Scores address a fundamental challenge in LLM governance: how to objectively measure the quality, safety, and reliability of generative AI outputs. Unlike traditional ML models where performance can be evaluated through clear accuracy metrics, LLM outputs require more nuanced evaluation across multiple dimensions.
By providing quantifiable measurements of LLM output characteristics, Trust Scores enable organizations to maintain visibility into their generative AI systems, detect potential issues, and ensure outputs meet quality and safety standards. This capability is essential for responsible AI deployment, especially as LLM applications scale across enterprise environments and serve diverse user populations.
Trust Scores also bridge the gap between qualitative human judgments about content and the quantitative metrics needed for systematic monitoring and governance, enabling more consistent, scalable approaches to LLM quality assurance.
Objective Measurement: Trust Scores provide quantifiable, consistent evaluations of LLM outputs, transforming subjective qualities like "harmfulness" or "faithfulness" into measurable metrics that can be tracked and analyzed.
Comprehensive Evaluation: By assessing multiple dimensions of LLM performance, Trust Scores offer a holistic view of output quality beyond simple binary judgments like "correct" or "incorrect."
Early Warning System: Changes in Trust Scores can serve as early indicators of model degradation or emerging issues before they become significant problems that impact users.
Governance Support: Tracking Trust Scores over time provides evidence of ongoing monitoring and quality control for regulatory compliance and internal governance requirements.
Efficient Filtering: As signals for content filtering, Trust Scores enable real-time protection without introducing prohibitive latency or computational overhead.
Performance Benchmarking: Trust Scores allow organizations to compare different LLM systems, prompting strategies, or configurations based on objective quality measurements.
Continuous Improvement: By identifying patterns in lower-scoring outputs, organizations can refine their models, prompts, and application design to improve overall quality and safety.
Safety Scores: Evaluations that assess whether content contains harmful, toxic, illegal, or otherwise inappropriate material across multiple safety dimensions including violence, hate speech, explicit content, and more.
Faithfulness Scores: Measurements that evaluate how factually accurate and reliable LLM outputs are, detecting hallucinations, fabrications, and factual inconsistencies.
Coherence Scores: Metrics that assess the logical flow, consistency, and overall readability of generated content, identifying outputs that are disjointed or confusing.
Relevance Scores: Evaluations of how well LLM outputs address the specific query or prompt, detecting off-topic or tangential responses.
Toxicity Scores: Specialized safety metrics focused specifically on detecting offensive, toxic, or harmful language across various categories.
Sentiment Scores: Measurements of the emotional tone and sentiment expressed in LLM outputs, useful for detecting inappropriately negative or emotional responses.
Bias Scores: Evaluations that detect potential biases in LLM outputs, including gender, racial, political, or other forms of bias that might impact fairness.
While Trust Scores provide essential visibility into LLM quality and safety, implementing effective scoring systems involves several technical and practical challenges.
Subjectivity Management: Many aspects of content evaluation involve inherently subjective judgments, making it challenging to create metrics that consistently align with diverse human perspectives on qualities like "harmfulness" or "quality."
Latency Requirements: Generating comprehensive Trust Scores without introducing significant latency requires highly optimized models and efficient processing pipelines, especially for real-time applications.
Cultural Context: Content appropriateness often depends on cultural, regional, or industry-specific contexts, requiring Trust Scores to account for these variations when possible.
Domain Adaptation: Trust Scores may need adjustment for specific domains or applications, as evaluation criteria can vary significantly between use cases like customer service, creative writing, or technical documentation.
Score Interpretation: Translating numerical scores into actionable insights requires clear guidelines and thresholds that may vary by organization or application context.
Evaluation Evolution: As language usage and societal standards evolve, Trust Score models need periodic updates to maintain alignment with current expectations and norms.
Multi-dimensional Analysis: Effectively monitoring multiple score dimensions simultaneously requires sophisticated dashboarding and prioritization to avoid information overload.
Determine Relevant Dimensions
Identify which aspects of LLM output quality and safety are most important for your specific use cases.
Prioritize score dimensions based on your application context and risk profile.
Configure Enrichments
When setting up your LLM model in Fiddler, enable appropriate Trust Score enrichments.
Select Fast Safety, Fast Faithfulness, or other relevant enrichments based on monitoring needs.
Establish Baselines
Collect initial score distributions to understand normal performance patterns.
Determine appropriate threshold ranges for acceptable score values.
Set Up Monitoring
Create dashboards that display relevant Trust Scores alongside other monitoring metrics.
Configure alerts for score anomalies or threshold violations.
Implement Response Protocols
Define escalation procedures for different types of score anomalies.
Establish review processes for investigating outputs flagged by low Trust Scores.
Continuously Refine
Regularly analyze score patterns to identify opportunities for model or prompt improvements.
Adjust thresholds and alert settings based on observed performance and feedback.
Q: How are Trust Scores calculated?
Trust Scores are generated by Fiddler's specialized Trust Models, which are purpose-built LLMs optimized for efficient evaluation tasks. These models analyze LLM inputs and outputs to assess various quality and safety dimensions, producing quantitative scores that reflect the degree to which content exhibits specific characteristics or concerns.
Q: What's the difference between Trust Scores and traditional metrics?
Unlike traditional ML metrics like accuracy or precision that measure clear right/wrong outcomes, Trust Scores evaluate more nuanced, multidimensional aspects of LLM outputs such as safety, factuality, coherence, and relevance. These dimensions often require more sophisticated evaluation approaches than simple binary assessments.
Q: How reliable are Trust Scores compared to human evaluation?
Fiddler's Trust Scores are designed to correlate strongly with human judgments while providing the consistency and scalability of automated systems. While no automated evaluation can perfectly match human assessment in all cases, Trust Scores provide reliable signals that align well with human evaluations across most common use cases.
Q: Can I customize how Trust Scores are calculated?
While the underlying evaluation models are standardized, Fiddler allows organizations to customize how scores are interpreted and applied through configurable thresholds, weights, and alert settings. This enables adaptation to specific organizational needs and risk tolerances.
Q: How do Trust Scores relate to Guardrails?
Trust Scores serve as the evaluation signals that power Fiddler Guardrails' decision-making. When content is evaluated for safety or quality concerns, the resulting Trust Scores are compared against policy thresholds to determine whether the content should be filtered or allowed.