Open source LLM compiler for models on Huggingface. 152 tok/s. 11.3W. 5.3B CPU instructions. mlx-lm: 113 tok/s. 14.1W. 31.4B CPU instructions on macbook M1 Pro. (opens in new tab)

Discussed on r/LocalLLM and r/LocalLLaMA

HuggingFace transformer compiler for optimised native inference binaries - pacifio/unc

Read the original article

Sign in to keep reading the full article.