When AI platforms started mushrooming out of nowhere, I wasn’t very fond of using them in my workflows. And well, that sentiment still hasn’t changed now, especially with most software and hardware manufacturers slapping artificial intelligence onto their products. However, I wouldn’t group large language models and image generators under the same umbrella as AI apps that big corporations try to shove down everybody’s throats at every chance they get.
That’s because LLMs (and by extension, image generation tools) have their use cases, no matter how niche they may be. The problem is that online AI-hosting platforms are rife with privacy issues, with most of them restricting essential features behind regular subscription licenses. Self-hosting my own AI tools turned out to be a neat …
When AI platforms started mushrooming out of nowhere, I wasn’t very fond of using them in my workflows. And well, that sentiment still hasn’t changed now, especially with most software and hardware manufacturers slapping artificial intelligence onto their products. However, I wouldn’t group large language models and image generators under the same umbrella as AI apps that big corporations try to shove down everybody’s throats at every chance they get.
That’s because LLMs (and by extension, image generation tools) have their use cases, no matter how niche they may be. The problem is that online AI-hosting platforms are rife with privacy issues, with most of them restricting essential features behind regular subscription licenses. Self-hosting my own AI tools turned out to be a neat solution, and here’s a collection of utilities I run on local servers (and even my daily driver) to integrate LLMs and image creation models into my everyday tasks.
Ollama
It aids plenty of services in my self-hosted stack
If you’ve read my articles on XDA, you may already know that I love running bajillion services on my home lab. Although most of them are fairly self-sufficient and work well inside containerized environments while consuming minimal resources, others mesh well with external services – including LLMs.
While we’re on the subject, I use Ollama to deploy and manage the majority of my self-hosted LLMs. If you haven’t heard of it, Ollama is a FOSS tool that can harness the processing capabilities of your local hardware to run LLMs without involving external cloud platforms. Ollama supports a ton of models, ranging from simple 0.7B and 1B variants that can even run on SBCs and CPUs to hardcore 70B+ LLMs that can only be tamed by cutting-edge graphics cards. And the best part? Ollama’s API is compatible with many home lab services, so I can directly integrate it with my arsenal of containers and virtual machines.
Home Assistant, for instance, supports AI-powered queries, and even something as lightweight as 3B Ollama-based models can answer most of my questions and perform the right smart home operations without issues. Likewise, Paperless-GPT utilizes my Ollama models to improve the document manager’s OCR and search capabilities. Then there’s Karakeep, which uses LLMs to generate tags and summaries for my bookmarks. Heck, I’ve even armed the VS Code instances running on my dev VMs with LLMs using the Continue.Dev extension. And not for generating code, mind you. The models instead provide autocompletion, code review, and debugging support for my programming tasks. And I haven’t even talked about using Ollama models with Open Notebook.
KoboldCPP
To run LLMs on my daily drivers
As much as I love my Ollama instance, I run it as a VM on my home server, and there are times when I can’t access the LLMs from my everyday machine. That’s because my old RTX 3080 Ti can only handle so many LLM workloads before hitting high utilization numbers, and I wouldn’t want to tax it with extra tasks.
That’s where my KoboldCPP instances fit into the equation. I’ve got it running on both my MacBook and (bare-metal) Windows 11 PC. Coming from Ollama, it took a little longer to get used to KoboldCPP, but I use it for everything from my D&D escapades to quickly checking the feasibility of my programming algorithms.
Automatic1111
Or rather, Stable Diffusion
Image generators may still produce inconsistencies when creating illustrations, but they’ve come a long way since the early days, when their creations were pretty much nightmare fuel. Similar to Ollama and KoboldCPP, Automatic1111 is a tool where I can feed the model files and play around with prompts to create images.
Personally, I tend to avoid using Automatic1111 (or rather, Stable Diffusion models) to generate illustrations, and creating custom character portraits for my D&D campaigns is the farthest I’m willing to go with it. Instead, I mostly use it to upscale low-resolution images from my early childhood days. The only caveat is that it requires way too much processing power – to the point where I’m already planning to upgrade to a used RTX 4090. And as weird as it may sound, I also use Automatic1111 with GIMP to add AI-powered inpainting and X/Y plotting support to the king of FOSS image editors.
Faster-whisper
For my audio transcription tasks
Faster-whisper is something I never really used until recently, but I’m really glad I came across this handy tool. Upon first glance, being able to transcribe audio may seem like a niche utility, but faster-whisper is more than worth the complicated setup. Running faster-whisper’s algorithms on meetings makes it easier to create notes, as I don’t have to constantly switch back and forth between different sections of the footage.
The same holds true for podcasts and interviews, and it can even generate subtitles for YouTube videos. Sure, it’s a bit of a pain to configure, but being able to archive long audio clips is quite handy – especially for content creators like yours truly.
You’ll need fairly decent systems to self-host AI tools
Although LLMs-hosting utilities and image generators have become a lot more accessible, they need a lot of horsepower for the best results. Sure, the low-parameter models may run on budget hardware, but their utility is rather limited – especially for image upscaling and other tasks that require better accuracy. Their high-capacity counterparts can produce significantly better results, though you’ll need to sacrifice your wallet for cutting-edge graphics cards if you don’t want to encounter performance issues.