I switched from LM Studio/Ollama to llama.cpp, and I absolutely love it

If you’re just getting started with running local LLMs, it’s likely that you’ve been eyeing or have opted for LM Studio and Ollama. These GUI-based tools are the defaults for a reason. They make hosting and connecting to local AI models extremely easy, and it’s how I supercharged my Raycast experience with AI. However, recently, I’ve made the decision to move to llama.cpp for my local AI setup. Yes, LM studio and Ollama offered everything I needed, including a polished interface and one-click model loading. But those conveniences come with trade-offs. From extra layers of abstraction to slower startup times, and less control over how the models actually run. Switching to llama.cpp strips all that away and gives you direct access, efficiency, and flexibility. It’s now become my go-to recommendation for anyone with more than a fleeting interest in gaining more control over their local AI models, or interest in learning how they work.

LM Studio and Ollama are great entry points to start things off

The terminal approach strips it down to the essentials

There’s no denying that LM Studio and Ollama are excellent tools for getting started. The GUI is intuitive and perfectly designed for anyone who wants to quickly test out models and chat with them. Ollama goes a step further by making AI models developer-friendly. But that ease can come at the cost of control. After a while, I wanted more control over setting memory constraints and to achieve higher token speeds on my admittedly anemic hardware. That’s what led me to llama.cpp.

Llama.cpp strips the GUI, but in return, you get full control over how the app runs. Obviously, you can choose the essentials, like picking out LLM models to load up. From there, you can pick what precision to use and how much memory to allocate. It’s a bare-bones experience as far as ease of use goes, but it’s also deeply satisfying in terms of what you can pull off with just a few commands in the terminal.

There are also performance gains to be had. Startup is quicker, resource utilization is lower, and you can tune everything to your liking. That makes sense. Since the terminal-based approach eliminates multiple layers of abstraction, there are fewer background processes, no GUI overhead, and improved resource utilization. In many ways, it is similar to building your own AI stack rather than using an existing AI tool. And that control lets you understand and optimize your entire workflow.

The advantages of taking a terminal-first approach

Speed, efficiency gains, and open-source

There are advantages to taking a terminal-based approach to running a local LLM. Once you’re comfortable on the command line, llama.cpp starts to outshine GUI tools in several ways. For example, llama.cpp is lean, portable, and incredibly fast. The app is written in C++, which lets it run efficiently even on modest hardware. This makes it perfect for lower-powered computers, embedded systems, or a home server.

While LM Studio also uses llama.cpp under the hood, it only gives you access to pre-quantized models. With llama.cpp, you can quantize your models on-device, trim memory usage, and tailor performance specifically to your device’s capabilities instead of adapting to a one-size-fits-all approach.

This setup also makes llama.cpp highly portable. You can run the same setup on macOS, Linux, or an SBC-like a Raspberry Pi without significant reconfiguration. That flexibility isn’t really a thing on LM Studio, which is tied to desktop environments. With llama.cpp, your entire AI stack can move with you to whatever your platform of choice is.

While we’re at it, there’s the open-source advantage too. Even though llama.cpp is the base foundation for several popular GUIs, like LM Studio. LM Studio in itself isn’t open-source. By switching over to llama.cpp, you get the powerful base that popular GUIs rely on without any of the middle layers. For developers, this means you can integrate llama directly.cpp into scripts, or use it as a backend of apps like chatbots, or automate tasks across your setup. It opens up a lot of flexibility, allowing you to do things like direct execution using API calls. For example, you could pull up a model in a Docker container, call it from the command line, and have it run as part of your local pipeline for a productivity stack. Basically, llama.cpp lets you run things how you want it, not just how a GUI wrapper will allow it.

Switching over to llama.cpp is about more than just speed

Switching over from LM Studio or Ollama to llama.cpp has its speed advantages, but that’s not the sole reason to switch. More than that, it gives you control and the ability to understand and shape your local AI operation. Compared to LM Studio, llama.cpp lets you build a framework that fits your specific needs. Even though it requires more effort and has a steeper learning curve than standard GUI-based approaches, it’s hard to take a step back from the performance, portability, and control that you get with running the open-source llama.cpp.

llama.cpp

Llama.cpp is an open-source framework that runs large language models locally on your computer.