🏠 Local LLM Deployment - masterdev

Neo-X7/Neo-AI: A fully offline AI assistant powered by Ollama. Stores and retrieves conversations using SQLite + LanceDB vector search. No cloud. No API keys. Runs entirely on your machine.

🖥️Self-hosted apps Code

github.com··DEV

Qwen 3.6 27B AutoRound GGUF, need your feedback

🗃️SQLite

huggingface.co··r/LocalLLaMA

On-device AI is a margin decision

🖥️Self-hosted apps Blog

ziraph.com··Hacker News

Unsloth Gemma 4 QAT

🪟Awesome windows command-line

unsloth.ai·

GPUsnek is Python on nVidia’s CUDA

🪟Awesome windows command-line Blog

blog.adafruit.com·

Clairvoyant: Predictive SJF Scheduling to Mitigate Head-of-Line Blocking in Serial LLM Backends

🖥️Self-hosted apps Academic

arxiv.org·

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

🖥️Self-hosted apps News Blog

blog.google··Hacker News

I've tested so many desktop AI tools, but Hermes with Ollama is my new favorite - here's why

🖥️Self-hosted apps News Tutorial

zdnet.com·

Fixing a stuck Ollama runner and building a GPU watchdog

🖥️Self-hosted apps

patrickmccanna.net··Hacker News

Microsoft is killing the Copilot+ PC advantage, brings Windows 11’s local AI to RTX 30+ PCs with 6GB vRAM

🖥️Self-hosted apps

windowslatest.com·

AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support

🖥️Self-hosted apps

phoronix.com·

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

🖥️Self-hosted apps

deemwar-products.github.io··Hacker News

iOS 27's most advanced on-device AI needs 12GB of RAM – and most iPhones don't have it

🖥️Self-hosted apps News

techspot.com·

Gemma 4 QAT on 10GB Laptop: Local AI with 6.7GB VRAM

🖥️Self-hosted apps

everylocalai.com··DEV

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

Ollama 0.30 GPU Boost: Faster local Qwen inference on NVIDIA

Ollama 0.30 delivers faster NVIDIA GPU performance and wider hardware support

What Ollama Reveals About Local AI, Agents, and Open Models

lightmetal: GPU LLM Inference From a Single Java 25 JAR

Improved performance and model support with GGUF

Neo-X7/Neo-AI: A fully offline AI assistant powered by Ollama. Stores and retrieves conversations using SQLite + LanceDB vector search. No cloud. No API keys. Runs entirely on your machine.

Qwen 3.6 27B AutoRound GGUF, need your feedback

On-device AI is a margin decision

Unsloth Gemma 4 QAT

GPUsnek is Python on nVidia’s CUDA

Clairvoyant: Predictive SJF Scheduling to Mitigate Head-of-Line Blocking in Serial LLM Backends

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

I've tested so many desktop AI tools, but Hermes with Ollama is my new favorite - here's why

Fixing a stuck Ollama runner and building a GPU watchdog

Microsoft is killing the Copilot+ PC advantage, brings Windows 11’s local AI to RTX 30+ PCs with 6GB vRAM

AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

iOS 27's most advanced on-device AI needs 12GB of RAM – and most iPhones don't have it

Gemma 4 QAT on 10GB Laptop: Local AI with 6.7GB VRAM