Why LLama 2 Matters

In the study detailing the creation of Llama 2, the researchers evaluated its performance against various benchmarks, juxtaposing it with both open and closed source models such as GPT-3.5, GPT-4, PaLM, and PaLM 2. Briefly, Llama’s 70B versions surpassed other open source LLMs in performance. They matched the capabilities of GPT-3.5 and PaLM on many benchmarks but lagged behind GPT-4 or PaLM 2.

Notable LLMs like OpenAI’s GPT-3 and GPT-4, Google’s PaLM series, and Anthropic’s Claude are all closed source. While businesses and scholars can access and even fine-tune these models via their official APIs, they can’t delve deep or comprehend their inner workings.

Llama 2

However, Llama 2 is different. Its research paper provides an in-depth look at how the model was developed and trained. If you’re technically inclined, you can download (though note that even its smallest version exceeds 13 GB) and run Llama 2 on your machine or even inspect its code. Additionally, it’s compatible with cloud platforms like Microsoft Azure, Amazon Web Services, and Hugging Face, enabling users to train it on custom datasets. Do refer to Meta’s guidelines on using Llama responsibly.

How does Llama 2 work?

Llama 2’s neural network was crafted using 2 trillion “tokens” sourced from public databases such as Common Crawl, which archives countless webpages, Wikipedia, and classic literature from Project Gutenberg. A token can be a word or a semantic unit, enabling the model to interpret the context and make logical text predictions. For instance, if “Apple” and “iPhone” are often seen together, the model deduces their association, distinguishing them from words like “apple,” “banana,” and “fruit.”

However, utilizing the vast expanses of the internet to train AI inherently risks incorporating biases and harmful content. To counteract this, the developers incorporated additional training methods like reinforcement learning with human feedback (RLHF). Through RLHF, human evaluators assess and rank the AI’s responses, guiding it towards producing safer and more relevant answers. To enhance its conversational abilities, versions designed for chat were further refined using specialized datasets to emulate human-like interactions.

Yet, these models serve primarily as foundational structures. Should you desire an LLM that produces article summaries reflecting your company’s unique tone, Llama 2 can be trained using dozens or even thousands of examples to cater to that need. Likewise, chat-centric versions can be further adapted to handle customer inquiries by integrating data sources like FAQs and past chat interactions.

Conclusion

By unveiling Llama in such an open manner, Meta has simplified the process for other firms to craft AI-driven applications with increased autonomy. The only significant licensing condition is that mega-companies boasting over 700 million monthly users need special authorization to deploy Llama, making it inaccessible for giants like Apple, Google, and Amazon.

This move is indeed thrilling. The majority of major tech advancements in the last seven decades have stemmed from open research and trials, and it seems AI is following this tradition. While giants like Google and OpenAI will always be prominent in the field, they won’t dominate the commercial landscape or achieve the user exclusivity that Google enjoys in sectors like search and advertising.

Releasing Llama to the public ensures that there will always be a viable option opposed to proprietary AIs.

Read related topics: