Llama 3.1 405B

In April 2024, Meta launched Llama 3, the latest generation of advanced, open-source large language models. The initial release featured Llama 3 8B and Llama 3 70B, both setting new performance benchmarks for LLMs in their respective sizes. However, within three months, several other models surpassed these benchmarks, highlighting the rapid advancements in artificial intelligence.

Meta has announced that the most ambitious model in the Llama 3 series will feature over 400 billion parameters—a significant leap in scale that is still in training. In a dramatic development, early benchmark data for the forthcoming Llama 3.1 models, including the 8B, 70B, and the colossal 405B, were leaked on the LocalLLaMA subreddit today. Preliminary results suggest that the Llama 3.1 405B model could potentially outperform the current industry leader, OpenAI’s GPT-4o, in several critical AI benchmarks.

LLama 3.1 vs GPT-4o

If the Llama 3.1 405B model indeed surpasses GPT-4o, it would mark the first time an open-source model has eclipsed a leading closed-source LLM.

Benchmarks	GPT-4o	Meta Llama-3.1-405B	Meta Llama-3.1-70B	Meta Llama-3-70B	Meta Llama-3.1-8B	Meta Llama-3-8B
boolq	0.905	0.921	0.909	0.892	0.871	0.82
gsm8k	0.942	0.968	0.948	0.833	0.844	0.572
hellaswag	0.891	0.92	0.908	0.874	0.768	0.462
human_eval	0.921	0.854	0.793	0.39	0.683	0.341
mmlu_humanities	0.802	0.818	0.795	0.706	0.619	0.56
mmlu_other	0.872	0.875	0.852	0.825	0.74	0.709
mmlu_social_sciences	0.913	0.898	0.878	0.872	0.761	0.741
mmlu_stem	0.696	0.831	0.771	0.696	0.595	0.561
openbookqa	0.882	0.908	0.936	0.928	0.852	0.802
piqa	0.844	0.874	0.862	0.894	0.801	0.764
social_iqa	0.79	0.797	0.813	0.789	0.734	0.667
truthfulqa_mc1	0.825	0.8	0.769	0.52	0.606	0.327
winogrande	0.822	0.867	0.845	0.776	0.65	0.56

The impressive performance of Llama 3.1 against GPT-4 highlights the potential of open-source AI to compete with, and even surpass, proprietary models. This shift could have significant implications for the AI industry, promoting greater transparency, collaboration, and innovation.

Furthermore, the rapid advancements and competitive edge shown by the Llama 3.1 models suggest that open-source initiatives can drive progress at a pace comparable to, if not faster than, their closed-source counterparts. This dynamic could lead to more accessible and adaptable AI technologies, benefiting a broader range of applications and users.

Conclusion

As Meta continues to develop and refine the Llama 3.1 series, the AI community will be keenly observing how these models perform in real-world applications and against future benchmarks. The anticipated release of the Instruct versions of Llama 3.1 will likely further enhance their capabilities, setting new standards in AI performance and usability.

The ongoing rivalry between Meta’s Llama models and OpenAI’s GPT series underscores the importance of continuous innovation and the need to push the boundaries of what is possible with AI. With the potential arrival of GPT-5 and further iterations of Llama models, the landscape of AI development promises to remain dynamic and highly competitive, driving the technology forward in unprecedented ways.

Llama 3.1 405B

LLama 3.1 vs GPT-4o

Conclusion

Download