100+ t/s on Qwen3.6-27B Q8 across a 5090 + 3090 Ti — switching to tensor split-mode got me from 70 to 100+ (opens in new tab)

Covered by imil.net, NVIDIA Technical BlogDiscussed on r/LocalLLaMA

LLM inference in C/C++. Contribute to ggml-org/llama.cpp development by creating an account on GitHub.

Read the original article

Sign in to keep reading the full article.

Covered in 2 articles

RTX 5080 + RTX 3090 Setup: 80+ Tok/s on Qwen 3.6 27B Q8

Discussed on Hacker News and r/LocalLLaMA

NVIDIA Technical Blog·

Build Personal AI Agents on Windows PCs with New Tools from Microsoft and Nvidia

Discussed on Hacker News