I repurposed my old GPU for self-hosted AI and it changed my life

When artificial intelligence tools started making the rounds a few years ago, I figured most of them would end up as marketing gimmicks without any tangible use cases. Even now, that seems to be the case for most products that have the word AI tacked on them, be it a random application advertising the integration of some useless AI feature or hardware manufacturers looking to stuff as many buzzwords into their devices as possible.

However, I must admit that large language models can have some neat benefits in certain scenarios. Since I’ve got a few GPUs lying around, I figured I could try running some local LLMs on them. Turns out, certain home lab services and everyday utilities work pretty well with self-hosted LLMs – to the point where I’d actually recommend repurposing old graphic…

Home Assistant benefits tremendously from self-hosted LLMs

Even the smaller models work really well with HASS

Home Assistant is known for many things, including its terrific compatibility with popular smart tools, amazing automation provisions, and a plethora of useful add-ons. Despite having “assistant” in its name, the fact that you can have full-blown conversations with HASS is still overlooked by many smart home enthusiasts.

That’s a real shame, because Home Assistant pairs really well with self-hosted LLMs. With the right configuration, you can not only query HASS about your smart home setup, but can also issue orders to manage all your IoT devices. When I first connected my Home Assistant server with an LLM, I used a mere GTX 1080 with small 4b models from Ollama, and the performance and accuracy were pretty decent.

And that’s before you throw voice assistants into the mix. With my Ollama models acting as the conversation agents, faster-whisper as the speech-to-text agent, and Whisper converting the transcription model, I’ve even got a full-on voice assistant that accepts vocal input and even responds accordingly. The latter models work well even with the server’s processor, so the conversation agent is the only aspect that requires the extra horsepower of my GPU.

Ollama models provide solid privacy for my Open Notebook tasks

I don’t want to send my research data to Google’s servers

Google’s NotebookLM is one of the most popular AI-centric tools out there, and it deserves all the praise it gets. Since it uses your own sources to grab information, it’s more reliable than most LLMs, and you can use reports, audio overviews, and flash cards to get concise summaries of the source documents. That said, privacy can be an issue with NotebookLM, as all the processing is performed on Google’s servers. Likewise, Google’s proprietary AI may not be able to display results on sensitive research topics.

Luckily, Open Notebook is a third-party implementation of the project that lets you use other AI models, including self-hosted LLMs. I’ve paired my Open Notebook server with Ollama, and it works pretty well when I add enough sources and fine-tune the reports on my notebooks. I tend to use larger 7b (and sometimes, even 12b) LLMs. Although they take longer than the 4b models paired to my HASS server, I rely on Open Notebook for research anyway. So, I usually go to brew coffee while waiting for Open Notebook to finish processing the sources.

I even use local LLMs with VS Code

Thanks to some useful extensions

Before I begin, let me add that AI-powered code is nowhere near as useful as something even an amateur programmer could create. So, rather than relying on LLMs to generate code snippets for me, I use them to review my painstakingly created programs and help with troubleshooting if (or rather, when) things go wrong with my nested statements.

VS Code supports a plethora of extensions that add AI tools to the mix, and well, GitHub Copilot and Claude Code are surprisingly effective coding companions. But since I’m a stalwart member of the self-hosting camp, I prefer using private LLMs running on local hardware to aid my programming escapades. Continue.Dev works well with my Ollama model collection. Sure, I can’t rely on it for everything, but its auto-completion and troubleshooting prowess is more than enough for my coding tasks.

Local LLMs are even more useful for obscure self-hosted services

If you’re into running applications inside container environments, you’ll find LLMs quite handy. Paperless-ngx, for example, is an amazing tool for managing documents, and the Paperless-GPT companion app can use LLMs to enhance OCR operations and perform automatic tagging. Heck, it can even improve the search functionality to pull documents using their context instead of just relying on keywords.

Likewise, Karakeep is an amazing web page archival utility that also supports images, RSS feeds, and PDF files. Pairing it with an LLM lets Karakeep generate tags and short summaries for my archived content, so I don’t have to waste time manually organizing my ever-expanding bookmark collection.