Llemma LLM

Llemma LLM – a Language Model for Mathematics. The model was kick-started with weights from Code Llama 7B and underwent training on Proof-Pile-2 for a duration covering 200B tokens. There’s also a variant of this model boasting 34B parameters, dubbed Llemma 34B.

Performance Insights

Llemma models excel in sequential mathematical thinking and are adept at utilizing computational mathematics tools like Python and formal theorem proving tools.

Sequential Mathematical Analysis

In tasks requiring sequential mathematical reasoning, the Llemma models have the edge over Llama-2 and Code Llama. Even when adjusted for model size, they surpass Minerva.

Model	Size	GSM8k	OCW	MMLU-STEM	SAT	MATH
Llama 2	7B	11.8%	3.7%	29.9%	25%	3.2%
Code Llama	7B	10.5%	4.4%	25.1%	9.4%	4.5%
LLEMMA	7B	36.4%	7.7%	37.7%	53.1%	18.0%
Minerva	8B	16.2%	7.7%	35.6%	–	14.1%
————	——	——–	——-	———–	——-	——-
Code Llama	34B	29.6%	7.0%	40.5%	40.6%	12.2%
LLEMMA	34B	51.5%	11.8%	49.0%	71.9%	25.0%
————	——	——–	——-	———–	——-	——-
Minerva	62B	52.4%	12.0%	53.9%	–	27.6%
Minerva	540B	58.8%	17.6%	63.9%	–	33.6%

Further performance can be extracted by using majority voting:

Model	Size	GSM8k maj@100	OCW maj@100	MMLU-STEM maj@16	SAT maj@16	MATH maj@256
LLEMMA	7B	54.0%	14.3%	49.9%	78.1%	33.5
Minerva	8B	28.4%	12.5%	43.4%	–	25.4%
———	——	————-	———–	—————–	———–	————
LLEMMA	34B	69.3%	18.4%	59.7%	81.3%	43.1%
———	——	————-	———–	—————–	———–	————
Minerva	62B	68.5%	23.5%	63.5%	–	43.4%
Minerva	540B	78.5%	30.8%	75.0%	–	50.3%

Llemma LLM Use and Theorem Proving

In addition to chain-of-thought reasoning, Llemma has strong capabilities in computational mathematics tasks. For tool use and formal theorem proving evaluations, see paper.

Citation

@misc{azerbayev2023llemma, title={Llemma: An Open Language Model For Mathematics}, author={Zhangir Azerbayev and Hailey Schoelkopf and Keiran Paster and Marco Dos Santos and Stephen McAleer and Albert Q. Jiang and Jia Deng and Stella Biderman and Sean Welleck}, year={2023}, eprint={2310.10631}, archivePrefix={arXiv}, primaryClass={cs.CL} }

Read related articles: