Llama 2 encompasses a series of generative text models that have been both pretrained and fine-tuned, with sizes varying from 7 billion up to 70 billion parameters. This specific repository is dedicated to the 7B version. For other model links, please refer to the index provided below.
Details on the Model
Please note: Accessing this model is bound by the Meta license agreement. To get the model weights and tokenizer, it’s required to go to Meta’s website, agree to the License, and then proceed to ask for access at Hugging Face.
Meta has crafted and made available to the public the Llama 2 suite of large-scale language models (LLMs). These models, both pretrained and fine-tuned, span from 7 billion to 70 billion parameters. Hugging Face team also fine-tuned certain LLMs for dialogue-centric tasks, naming them Llama-2-Chat. These enhanced models outshine most open-source conversational models in the majority of our evaluations. Furthermore, when judged for their assistance and safety based on human assessment, they match the standards of renowned proprietary models such as ChatGPT and PaLM.
The Minds Behind the Model: Meta
Diverse Offerings: Llama 2 is available in various sizes, including 7B, 13B, and 70B, with both pretrained and refined versions.
Input: The models only accept text.
Output: They only produce text.
Technological Blueprint
Llama 2 functions as an auto-regressive language model, leveraging a refined transformer design. The adjusted models incorporate supervised fine-tuning (SFT) as well as reinforcement learning using human feedback (RLHF) to resonate with human inclinations concerning utility and security.
Training Data | Params | Content Length | GQA | Tokens | LR | |
---|---|---|---|---|---|---|
Llama 2 | A new mix of publicly available online data | 7B | 4k | ✗ | 2.0T | 3.0 x 10-4 |
Llama 2 | A new mix of publicly available online data | 13B | 4k | ✗ | 2.0T | 3.0 x 10-4 |
Llama 2 | A new mix of publicly available online data | 70B | 4k | ✔ | 2.0T | 1.5 x 10-4 |
Training Period: Llama 2 underwent training from January 2023 through to July 2023.
Current State: This model remains unchanging and was trained using an offline dataset. As we work towards enhancing model safety based on community insights, updated versions of the refined models will be introduced.
Licensing: For a specialized commercial license, visit: https://ai.meta.com/resources/models-and-libraries/llama-downloads/
Associated Study: “Llama-2: Open Foundation and Fine-tuned Chat Models”
Intended Use
Primary Applications: Llama 2 is designed for both commercial and research applications, specifically in English. The fine-tuned models are crafted for chatbot-like interactions, while the pretrained ones offer flexibility for diverse natural language generation activities.
For optimal chat functionalities, there’s a prescribed format that must be adhered to. This includes using specific tags like INST and <<SYS>>, as well as BOS and EOS tokens. Additionally, users should be attentive to whitespaces and line breaks, ensuring the removal of unnecessary spaces by employing the strip() function on input data. For a comprehensive understanding, please consult HF dev team’s reference code on GitHub under: chat_completion.
Activities Beyond Scope: Users should abstain from deploying Llama 2 in ways that breach relevant laws or guidelines, such as trade compliance regulations. Additionally, Llama 2 shouldn’t be utilized for non-English languages or any applications outside the stipulations of the Acceptable Use Policy and the Licensing Agreement pertaining to Llama 2.
Training Data
Summary: Llama 2 underwent pretraining on a massive 2 trillion tokens, sourced from publicly accessible data. The refining process incorporated publicly available instructional datasets and over a million fresh human-labeled examples. It’s crucial to note that data from Meta users isn’t included in either the pretraining or the refining phases.
Data Recency: While the pretraining data was capped in September 2022, some of the refining data extends to a more recent date, reaching July 2023.
Assessment Outcomes
In the following segment, we present the performance metrics of both Llama 1 and Llama 2, gauged against recognized academic standards. For these evaluations, the Hugging Face in-house assessment toolkit was utilized.
Model | Size | Code | Commonsense Reasoning | World Knowledge | Reading Comprehension | Math | MMLU | BBH | AGI Eval |
---|---|---|---|---|---|---|---|---|---|
Llama 1 | 7B | 14.1 | 60.8 | 46.2 | 58.5 | 6.95 | 35.1 | 30.3 | 23.9 |
Llama 1 | 13B | 18.9 | 66.1 | 52.6 | 62.3 | 10.9 | 46.9 | 37.0 | 33.9 |
Llama 1 | 33B | 26.0 | 70.0 | 58.4 | 67.6 | 21.4 | 57.8 | 39.8 | 41.7 |
Llama 1 | 65B | 30.7 | 70.7 | 60.5 | 68.6 | 30.8 | 63.4 | 43.5 | 47.6 |
Llama 2 | 7B | 16.8 | 63.9 | 48.9 | 61.3 | 14.6 | 45.3 | 32.6 | 29.3 |
Llama 2 | 13B | 24.5 | 66.9 | 55.4 | 65.8 | 28.7 | 54.8 | 39.4 | 39.1 |
Llama 2 | 70B | 37.5 | 71.9 | 63.6 | 69.4 | 35.2 | 68.9 | 51.2 | 54.2 |
Safety
TruthfulQA | Toxigen | ||
---|---|---|---|
Llama 1 | 7B | 27.42 | 23.00 |
Llama 1 | 13B | 41.74 | 23.08 |
Llama 1 | 33B | 44.19 | 22.57 |
Llama 1 | 65B | 48.71 | 21.77 |
Llama 2 | 7B | 33.29 | 21.25 |
Llama 2 | 13B | 41.86 | 26.10 |
Llama 2 | 70B | 50.18 | 24.60 |
TruthfulQA | Toxigen | ||
---|---|---|---|
Llama-2-Chat | 7B | 57.04 | 0.00 |
Llama-2-Chat | 13B | 62.18 | 0.00 |
Llama-2-Chat | 70B | 64.14 | 0.01 |
Ethical Aspects and Constraints
Llama 2 represents an emerging technology, and its utilization is not without challenges. All evaluations conducted so far have been centered around English, and it’s impossible to encompass every conceivable situation. Given this, like other LLMs, the responses Llama 2 generates are not always foreseeable.
There may be instances where it offers incorrect, prejudiced, or potentially controversial answers to user queries. Consequently, developers aiming to integrate Llama 2 should first conduct rigorous safety assessments and make necessary adjustments, ensuring the model aligns with their specific application requirements.
Read other articles: