LLM based metrics

LLM-based metrics use the power of large language models to evaluate the quality of text generated by AI. These metrics go beyond basic checks, understanding the context and nuances of language to assess how relevant, coherent, or creative the text is. This approach is much closer to how humans judge text, making these metrics particularly useful for improving AI-generated content, whether it's for chatbots, writing assistants, or content creation tools. They are an excellent tool for detecting hallucinations.

One of the best things about LLM-based metrics is their flexibility. They can adapt to different topics and types of text because they've been trained on a wide range of information. This adaptability makes them a valuable tool for developers and researchers looking to enhance the quality of AI-generated text, ensuring it meets high standards of clarity, relevance, and engagement. However, choosing the right model for the job is crucial, as it can significantly affect the metrics' effectiveness in providing useful feedback.

Fiddler comes with llm based enrichments like Answer Relevance Faithfulness, Coherence and Conciseness. This list of llm based enrichments will keep expanding.

Requirements:

  • This enrichment requires access to the OpenAI API, which may introduce latency due to network communication and processing time.
  • OpenAI API access token MUST BE provided by the user.
ModelContext Window (tokens)
gpt-3.5-turbo16,385
gpt-48,192
gpt-4-turbo-preview128,000
gpt-4-06138,192
gpt-4-32k32,768
gpt-4-32k-061332,768

reference

https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo