🏠 Local LLM Deployment - masterdev

Neo-X7/Neo-AI: A fully offline AI assistant powered by Ollama. Stores and retrieves conversations using SQLite + LanceDB vector search. No cloud. No API keys. Runs entirely on your machine.

🖥️Self-hosted apps Code

github.com··DEV

lightmetal: GPU LLM Inference From a Single Java 25 JAR

🗃️SQLite Blog

adambien.blog·

Improved performance and model support with GGUF

🖥️Self-hosted apps Blog

ollama.com·

On-device AI is a margin decision

🖥️Self-hosted apps Blog

ziraph.com··Hacker News

Qwen 3.6 27B AutoRound GGUF, need your feedback

🗃️SQLite

huggingface.co··r/LocalLLaMA

What Ollama Reveals About Local AI, Agents, and Open Models

🖥️Self-hosted apps Blog

odsc.medium.com·

Unsloth Gemma 4 QAT

🪟Awesome windows command-line

unsloth.ai·

I've tested so many desktop AI tools, but Hermes with Ollama is my new favorite - here's why

🖥️Self-hosted apps News Tutorial

zdnet.com·

Clairvoyant: Predictive SJF Scheduling to Mitigate Head-of-Line Blocking in Serial LLM Backends

🖥️Self-hosted apps Academic

arxiv.org·

AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support

🖥️Self-hosted apps

phoronix.com·

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

🖥️Self-hosted apps News Blog

blog.google··Hacker News

Fixing a stuck Ollama runner and building a GPU watchdog

🖥️Self-hosted apps

patrickmccanna.net··Hacker News

Microsoft is killing the Copilot+ PC advantage, brings Windows 11’s local AI to RTX 30+ PCs with 6GB vRAM

🖥️Self-hosted apps

windowslatest.com·

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

🖥️Self-hosted apps

deemwar-products.github.io··Hacker News

iOS 27's most advanced on-device AI needs 12GB of RAM – and most iPhones don't have it

🖥️Self-hosted apps News

techspot.com·

Gemma 4 QAT on 10GB Laptop: Local AI with 6.7GB VRAM

🖥️Self-hosted apps

everylocalai.com··DEV

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

🪟Awesome windows command-line

local-llm.utop.workers.dev··Hacker News

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

Ollama 0.30 GPU Boost: Faster local Qwen inference on NVIDIA

Ollama 0.30 delivers faster NVIDIA GPU performance and wider hardware support

Neo-X7/Neo-AI: A fully offline AI assistant powered by Ollama. Stores and retrieves conversations using SQLite + LanceDB vector search. No cloud. No API keys. Runs entirely on your machine.

lightmetal: GPU LLM Inference From a Single Java 25 JAR

Improved performance and model support with GGUF

On-device AI is a margin decision

Qwen 3.6 27B AutoRound GGUF, need your feedback

What Ollama Reveals About Local AI, Agents, and Open Models

Unsloth Gemma 4 QAT

I've tested so many desktop AI tools, but Hermes with Ollama is my new favorite - here's why

Clairvoyant: Predictive SJF Scheduling to Mitigate Head-of-Line Blocking in Serial LLM Backends

AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

Fixing a stuck Ollama runner and building a GPU watchdog

Microsoft is killing the Copilot+ PC advantage, brings Windows 11’s local AI to RTX 30+ PCs with 6GB vRAM

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

iOS 27's most advanced on-device AI needs 12GB of RAM – and most iPhones don't have it

Gemma 4 QAT on 10GB Laptop: Local AI with 6.7GB VRAM

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU