Trustbit's score for LLama 2

The Trustbit benchmarks assess models for their fit in digital product development. A higher score indicates better performance. The benchmarks differentiate between proprietary cloud models and open source models with lenient licenses:

  • They identify proprietary cloud models.
  • They spotlight open source models suitable for local operation. The Trustbit LLM Leaderboard offers a monthly refreshed comparison of various Large Language Models, including ChatGPT, to determine their potential for product development.

At a strategic perspective, advancements in the GPT/LLM arena are seen consistently. It is anticipated that some projects will transition from cloud-based solutions to in-house models by 2024.

Key enhancements observed

The launch of Llama 2 by Meta has catalyzed further innovative steps thanks to its accommodating license. The introductions of Code Llama and the language-agnostic model “Sonar” stand out. While these aren’t part of the benchmark, they supplement general LLMs and are especially effective in bespoke support systems for specific domains. Unexpected findings from Llama 2 and Nous Hermes 70B benchmarking.

Upon evaluating emerging models, it was unexpectedly found that Llama 2 might not be the optimal choice for LLM-driven product creation due to its verbosity. However, specific Llama 2 refinements, especially Nous Hermes 70B, have shown promise. It stands as the inaugural open model to surpass GPT 3.5 in their assessments.

Cost vs. Performance: An Overview of the LLama 2 70B Model

Operating the 70B model can be somewhat costly. Executing a task comparable to this benchmark would cost roughly ~5.75 EUR on a GPU machine leased from a major cloud service provider. Yet, this marks just the onset of an evolving trend of better quality at a reasonable price. The 13B refinement of Llama 2 is making strides, and the newly introduced Falcon 180B model is challenging the leaders.

Furthermore, considering lesser-capacity models remains valuable, especially as innovative guidance methods and domain-focused beam searches emerge.

