In April 2024, Meta launched Llama 3, the latest generation of advanced, open-source large language models. The initial release featured Llama 3 8B and Llama 3 70B, both setting new performance benchmarks for LLMs in their respective sizes. However, within three months, several other models surpassed these benchmarks, highlighting the rapid advancements in artificial intelligence.
Meta has announced that the most ambitious model in the Llama 3 series will feature over 400 billion parameters—a significant leap in scale that is still in training. In a dramatic development, early benchmark data for the forthcoming Llama 3.1 models, including the 8B, 70B, and the colossal 405B, were leaked on the LocalLLaMA subreddit today. Preliminary results suggest that the Llama 3.1 405B model could potentially outperform the current industry leader, OpenAI’s GPT-4o, in several critical AI benchmarks.
LLama 3.1 vs GPT-4o
If the Llama 3.1 405B model indeed surpasses GPT-4o, it would mark the first time an open-source model has eclipsed a leading closed-source LLM.
Benchmarks | GPT-4o | Meta Llama-3.1-405B | Meta Llama-3.1-70B | Meta Llama-3-70B | Meta Llama-3.1-8B | Meta Llama-3-8B |
---|---|---|---|---|---|---|
boolq | 0.905 | 0.921 | 0.909 | 0.892 | 0.871 | 0.82 |
gsm8k | 0.942 | 0.968 | 0.948 | 0.833 | 0.844 | 0.572 |
hellaswag | 0.891 | 0.92 | 0.908 | 0.874 | 0.768 | 0.462 |
human_eval | 0.921 | 0.854 | 0.793 | 0.39 | 0.683 | 0.341 |
mmlu_humanities | 0.802 | 0.818 | 0.795 | 0.706 | 0.619 | 0.56 |
mmlu_other | 0.872 | 0.875 | 0.852 | 0.825 | 0.74 | 0.709 |
mmlu_social_sciences | 0.913 | 0.898 | 0.878 | 0.872 | 0.761 | 0.741 |
mmlu_stem | 0.696 | 0.831 | 0.771 | 0.696 | 0.595 | 0.561 |
openbookqa | 0.882 | 0.908 | 0.936 | 0.928 | 0.852 | 0.802 |
piqa | 0.844 | 0.874 | 0.862 | 0.894 | 0.801 | 0.764 |
social_iqa | 0.79 | 0.797 | 0.813 | 0.789 | 0.734 | 0.667 |
truthfulqa_mc1 | 0.825 | 0.8 | 0.769 | 0.52 | 0.606 | 0.327 |
winogrande | 0.822 | 0.867 | 0.845 | 0.776 | 0.65 | 0.56 |
The impressive performance of Llama 3.1 against GPT-4 highlights the potential of open-source AI to compete with, and even surpass, proprietary models. This shift could have significant implications for the AI industry, promoting greater transparency, collaboration, and innovation.
Furthermore, the rapid advancements and competitive edge shown by the Llama 3.1 models suggest that open-source initiatives can drive progress at a pace comparable to, if not faster than, their closed-source counterparts. This dynamic could lead to more accessible and adaptable AI technologies, benefiting a broader range of applications and users.
Conclusion
As Meta continues to develop and refine the Llama 3.1 series, the AI community will be keenly observing how these models perform in real-world applications and against future benchmarks. The anticipated release of the Instruct versions of Llama 3.1 will likely further enhance their capabilities, setting new standards in AI performance and usability.
The ongoing rivalry between Meta’s Llama models and OpenAI’s GPT series underscores the importance of continuous innovation and the need to push the boundaries of what is possible with AI. With the potential arrival of GPT-5 and further iterations of Llama models, the landscape of AI development promises to remain dynamic and highly competitive, driving the technology forward in unprecedented ways.
Download
Llama 3.1 download.
Meta’s state-of-the-art open source large language model
- 405B, the largest openly available model
- 128K context length, improved reasoning & coding capabilities
- Improved and upgraded multilingual 8B and 70B models
Read related articles: