LLama

Tag: Models

Llama-4

Ahmad Al-Dahle shared a glimpse into Meta’s massive AI project—training Llama 4 on a cluster with over 100,000 H100 GPUs! This scale is pushing AI boundaries and advancing both product capabilities and open-source contributions. Great to visit one of our data centers where we’re training Llama 4 models on a cluster bigger than 100K H100’s!…

31 October 2024
WordLLama

WordLlama is a utility for NLP and word embedding that repurposes components from large language models (LLMs) to generate efficient and compact word representations, similar to GloVe, Word2Vec, or FastText. It starts by extracting the token embedding codebook from a state-of-the-art LLM (e.g., LLaMA3 70B) and trains a small, context-free model within a general-purpose embedding…

17 September 2024
Llama 3.1 405B

In April 2024, Meta launched Llama 3, the latest generation of advanced, open-source large language models. The initial release featured Llama 3 8B and Llama 3 70B, both setting new performance benchmarks for LLMs in their respective sizes. However, within three months, several other models surpassed these benchmarks, highlighting the rapid advancements in artificial intelligence.…

23 July 2024