Llemma LLM – a Language Model for Mathematics. The model was kick-started with weights from Code Llama 7B and underwent training on Proof-Pile-2 for a duration covering 200B tokens. There’s also a variant of this model boasting 34B parameters, dubbed Llemma 34B.
Performance Insights
Llemma models excel in sequential mathematical thinking and are adept at utilizing computational mathematics tools like Python and formal theorem proving tools.
Sequential Mathematical Analysis
In tasks requiring sequential mathematical reasoning, the Llemma models have the edge over Llama-2 and Code Llama. Even when adjusted for model size, they surpass Minerva.
Model | Size | GSM8k | OCW | MMLU-STEM | SAT | MATH |
---|---|---|---|---|---|---|
Llama 2 | 7B | 11.8% | 3.7% | 29.9% | 25% | 3.2% |
Code Llama | 7B | 10.5% | 4.4% | 25.1% | 9.4% | 4.5% |
LLEMMA | 7B | 36.4% | 7.7% | 37.7% | 53.1% | 18.0% |
Minerva | 8B | 16.2% | 7.7% | 35.6% | – | 14.1% |
———— | —— | ——– | ——- | ———– | ——- | ——- |
Code Llama | 34B | 29.6% | 7.0% | 40.5% | 40.6% | 12.2% |
LLEMMA | 34B | 51.5% | 11.8% | 49.0% | 71.9% | 25.0% |
———— | —— | ——– | ——- | ———– | ——- | ——- |
Minerva | 62B | 52.4% | 12.0% | 53.9% | – | 27.6% |
Minerva | 540B | 58.8% | 17.6% | 63.9% | – | 33.6% |
Further performance can be extracted by using majority voting:
Model | Size | GSM8k maj@100 | OCW maj@100 | MMLU-STEM maj@16 | SAT maj@16 | MATH maj@256 |
---|---|---|---|---|---|---|
LLEMMA | 7B | 54.0% | 14.3% | 49.9% | 78.1% | 33.5 |
Minerva | 8B | 28.4% | 12.5% | 43.4% | – | 25.4% |
——— | —— | ————- | ———– | —————– | ———– | ———— |
LLEMMA | 34B | 69.3% | 18.4% | 59.7% | 81.3% | 43.1% |
——— | —— | ————- | ———– | —————– | ———– | ———— |
Minerva | 62B | 68.5% | 23.5% | 63.5% | – | 43.4% |
Minerva | 540B | 78.5% | 30.8% | 75.0% | – | 50.3% |
Llemma LLM Use and Theorem Proving
In addition to chain-of-thought reasoning, Llemma has strong capabilities in computational mathematics tasks. For tool use and formal theorem proving evaluations, see paper.
Citation
@misc{azerbayev2023llemma,
title={Llemma: An Open Language Model For Mathematics},
author={Zhangir Azerbayev and Hailey Schoelkopf and Keiran Paster and Marco Dos Santos and Stephen McAleer and Albert Q. Jiang and Jia Deng and Stella Biderman and Sean Welleck},
year={2023},
eprint={2310.10631},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Read related articles: