In addition to other improvements, current release enables running Meta Llama 2 7B efficiently on devices like the iPhone 15 Pro, Samsung Galaxy S24 and other edge devices — it also includes early support for Llama 3 8B. More details on ExecuTorch Alpha below.
ExecuTorch
ExecuTorch Alpha is focused on deploying large language models and large ML models to the edge, stabilizing the API surface, and improving installation processes.
Mobile devices face significant limitations in terms of computing power, memory, and energy. To adapt large language models (LLMs) for these devices, we extensively use quantization and other strategies to efficiently size the models.
ExecuTorch alpha now offers 4-bit post-training quantization through GPTQ. We’ve expanded compatibility with various devices, supporting dynamic shapes and new data types in XNNPack on CPUs. We’ve enhanced model exporting and optimization, reduced memory requirements, and boosted execution speed.
This allows the Llama 2 7B model to run effectively on devices like the iPhone 15 Pro, iPhone 15 Pro Max, Samsung Galaxy S22, S23, and S24, along with other edge devices. We also provide preliminary support for the Llama 3 8B model. Continuous enhancements are being made to improve tokens per second across different edge devices, with the latest performance updates available on GitHub.
We are collaborating closely with partners like Apple, Arm, and Qualcomm Technologies to utilize GPU and NPU enhancements for improved performance via Core ML, MPS, TOSA, and the Qualcomm AI Stack backends.
Productivity
Deploying high-performance models optimized for specific devices often requires detailed analysis of on-device runtime data to make appropriate adjustments to the original PyTorch model. The ExecuTorch alpha provides a robust SDK that offers comprehensive visibility from model creation to deployment, including insights into delegate and hardware-specific details.
We have upgraded the ExecuTorch SDK to improve debugging and profiling tools. Leveraging PyTorch’s capabilities, ExecuTorch’s debugging features allow developers to trace operator nodes back to the original Python source code, facilitating quicker troubleshooting and performance optimization for both delegated and non-delegated model scenarios. More information about the ExecuTorch SDK is available here.
By PyTourch Dev Team
Read related articles: