Yarn Llama 2 has been trained and released on HuggingFace. This language model trained for 128k context.
To ensure this model functions as intended, the Flash Attention library is necessary. Refer to the Model Usage segment for guidance on installation.
Team Contributors
bloc97: Role – Methods, Publication, and Evaluations
@theemozilla: Role – Methods, Publication, and Evaluations
@EnricoShippole: Role – Model Development and Data Processes
honglu2875: Role – Publication and Evaluations
Their heartfelt gratitude goes to Stability AI, Carper AI, and Eleuther AI for their substantial contribution of computing capabilities, which were pivotal for both training these models and concluding our research. Special thanks to Jonathan Tow and Dakota Mahan for their insights on the Stability AI compute cluster. The team also extend appreciation to a16z and PygmalionAI for the resources they provided, enabling to conduct tests and analyses on the models.
Installation
Install FA2 and Rotary Extensions:
pip install flash-attn --no-build-isolation
pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary
Yarn Llama 2 Model Usage
Thee Team also trained a set of models at 64k context length. You can download the Yarn-Llama-2-13b-64k / 128k models on HuggingFace.
Read related topics: