How fast is LlamaStash? Overhead, throughput, and a fair comparison with Ollama and LM Studio (opens in new tab)
Originally published at deepu.tech. In my release post for LlamaStash I made a claim I need to back up. The wrapper adds zero overhead vs running llama-server directly. That is the kind of claim that should not exist in a blog post without numbers behind it. So here are the numbers. LlamaStash spawns the unmodified upstream llama-server. So three different questions follow from that, and there is a benchmark suite for each. Suite A: overhead regression. Does llamastash start add any m...
Read the original article