100+ t/s on Qwen3.6-27B Q8 across a 5090 + 3090 Ti — switching to tensor split-mode got me from 70 to 100+ (opens in new tab)
LLM inference in C/C++. Contribute to ggml-org/llama.cpp development by creating an account on GitHub.
Read the original article