Tag: GPT-4

Hallucination Leaderboard (GPT-4, LLama, Claude 2)

Finally, we have a hallucination leaderboard! Key Takeaways: Really cool that we are beginning to do these evaluations and capture them in leaderboards! Hallucination Comparison Table The leaderboard for publicly available language models has been determined through the use of Vectara’s Hallucination Evaluation Model. This tool assesses the frequency with which a language model generates…

15 November 2023
Llama 2 As Accurate as GPT-4 for Summaries

Llama 2 provides summaries with a factual accuracy comparable to GPT-4, but at 1/30th the cost. In this experiment, Anyscale team found Llama-2-70b is almost as strong at factuality as gpt-4, and considerably better than gpt-3.5-turbo. The Anyscale Team used Anyscale Endpoints to compare Llama 2 7b, 13b, and 70b (chat-hf fine-tuned) against OpenAI gpt-3.5-turbo…

12 October 2023

Hallucination Leaderboard (GPT-4, LLama, Claude 2)