Yarn Llama 2

Yarn Llama 2 has been trained and released on HuggingFace. This language model trained for 128k context.

To ensure this model functions as intended, the Flash Attention library is necessary. Refer to the Model Usage segment for guidance on installation.

Team Contributors

bloc97: Role – Methods, Publication, and Evaluations

@theemozilla: Role – Methods, Publication, and Evaluations

@EnricoShippole: Role – Model Development and Data Processes

honglu2875: Role – Publication and Evaluations

Their heartfelt gratitude goes to Stability AI, Carper AI, and Eleuther AI for their substantial contribution of computing capabilities, which were pivotal for both training these models and concluding our research. Special thanks to Jonathan Tow and Dakota Mahan for their insights on the Stability AI compute cluster. The team also extend appreciation to a16z and PygmalionAI for the resources they provided, enabling to conduct tests and analyses on the models.

Installation

Install FA2 and Rotary Extensions:

pip install flash-attn --no-build-isolation
pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary

Yarn Llama 2 Model Usage

Thee Team also trained a set of models at 64k context length. You can download the Yarn-Llama-2-13b-64k / 128k models on HuggingFace.

Read related topics: