Meta has unveiled Llama 2, a model trained on a massive 2 trillion tokens with a default context length of 4096. The Llama 2-Chat models, specifically fine-tuned using over a million human annotations, are optimized for chat applications. In this article we will explain ho to Run Llama-2 locally using Ollama.
Training for Llama 2 spanned from January 2023 to July 2023.
Dive into Llama 2
In the given example, we showcase the Chat model of Llama 2, which boasts 13b parameters and functions as a chat/instruct model.
API Guide
First, initiate the Ollama server:
ollama serve
To use the model:
curl -X POST http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt":"Why is the sky blue?"
}'
Command-Line Interface (CLI)
- First, download Ollama.
- Launch the terminal and input: ollama run llama2
Tip: If ‘ollama run’ detects that the model hasn’t been downloaded yet, it will initiate ‘ollama pull’. If you wish to only download without activating it, opt for: ollama pull llama2.
Memory Considerations
- 7b models: Minimum 8GB RAM
- 13b models: Minimum 16GB RAM
- 70b models: Minimum 64GB RAM
- For challenges with high quantization levels, consider the q4 variant or terminate memory-intensive applications.
Variants of the Model
Ollama presents numerous Llama-2 model versions that are adjusted based on the official models for optimal local performance.
- Chat Variant: Tailored for chat or dialogue purposes. This is the Ollama default and is indicated with the -chat tag in the tags section. Usage: ollama run llama2
- Pre-trained Variant: This is the basic version without any chat-specific fine-tuning and is marked with the -text tag. Usage: ollama run llama2:text
Ollama typically operates on a 4-bit quantization. To experiment with varying quantization levels, explore other tag options. The digit following ‘q’ signifies the quantization bit count, e.g., q4 denotes 4-bit quantization. Also, a higher number translates to improved model precision, albeit at a cost to speed and augmented memory usage.
Aliases |
---|
latest, 7b, 7b-chat, 7b-chat-q4_0, chat |
7b-text, 7b-text-q4_0, text |
13b, 13b-chat, 13b-chat-q4_0 |
13b-text, 13b-text-q4_0 |
70b, 70b-chat, 70b-chat-q4_0 |
70b-text, 70b-text-q4_0 |
Model source
Chat source on Ollama
- 7b parameters source: The Bloke
- 7b parameters original source: Meta
- 13b parameters source: The Bloke
- 13b parameters original source: Meta
- 70b parameters source: The Bloke
- 70b parameters original source: Meta