🏠 Local LLM Deployment - masterdev

Less-relevant results

Apple WWDC On-Device AI Deep Dive - Google Docs

🖥️Self-hosted apps

gist.is··Hacker News

Putting a datacenter GPU in a gaming PC for £200 ($268)

🖥Home Lab Setup Blog

blog.adafruit.com·

Apple's most advanced on-device AI features will only work on select devices

🖥️Self-hosted apps News

gsmarena.com·

147th airhacks tv: Local LLMs, LightMetal, ZSmith Agents, AI Rails, Saving Tokens

🖥️Self-hosted apps Blog

adambien.blog·

Using Scikit-LLM with Open-Source LLMs

🖥️Self-hosted apps

machinelearningmastery.com·

Nvidia GeForce RTX 50 Super GPUs may launch in early 2027 with 50% more VRAM

🗃️SQLite

club386.com·

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

🗃️SQLite News Blog

kaitchup.substack.com··r/LocalLLaMA

Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent

🖥️Self-hosted apps Blog

dnhkng.github.io·

Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good

🖥️Self-hosted apps Blog

towardsai.net·

The latest Gemma 4 models use a training trick to slash their on-device memory footprint

🗃️SQLite

androidauthority.com·

Quality Is Not a Safety Proxy Under Quantization

🗃️SQLite Academic

arxiv.org·

NexusOS v2.0 – A zero-dependency pipeline streaming server chaos to Parquet

🖥️Self-hosted apps

huggingface.co··Hacker News

KaiFelixBennett/gemma4-turboquant-rdna4: Run Gemma-4-31B at full 256K context on a $1,400 AMD RDNA4 GPU (gfx1201): TurboQuant KV cache + HIP-graph-safe Flash-Attention for llama.cpp, fully measured on real hardware.

🗃️SQLite Code

github.com··Hacker News

Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB

🖥️Self-hosted apps Blog

ziraph.com··Hacker News

Qualcomm Announces On-Device AI Claw Ecosystem Plan

🖥️Self-hosted apps

autonews.gasgoo.com·

NVIDIA's RTX 5060 May Finally Get The VRAM Upgrade Gamers Wanted

🖥Home Lab Setup News

hothardware.com·

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

Tales of an Ollama Honeypot (Part 3): More Traffic, More Findings

iOS 27’s most powerful on-device AI requires iPhone 17 Pro, iPhone Air

local llm on laptop 780M GPU using llama + gemma 4 qat

Apple WWDC On-Device AI Deep Dive - Google Docs

Putting a datacenter GPU in a gaming PC for £200 ($268)

Apple's most advanced on-device AI features will only work on select devices

147th airhacks tv: Local LLMs, LightMetal, ZSmith Agents, AI Rails, Saving Tokens

Using Scikit-LLM with Open-Source LLMs

Nvidia GeForce RTX 50 Super GPUs may launch in early 2027 with 50% more VRAM

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent

Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good

The latest Gemma 4 models use a training trick to slash their on-device memory footprint

Quality Is Not a Safety Proxy Under Quantization

NexusOS v2.0 – A zero-dependency pipeline streaming server chaos to Parquet

KaiFelixBennett/gemma4-turboquant-rdna4: Run Gemma-4-31B at full 256K context on a $1,400 AMD RDNA4 GPU (gfx1201): TurboQuant KV cache + HIP-graph-safe Flash-Attention for llama.cpp, fully measured on real hardware.

Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB

Qualcomm Announces On-Device AI Claw Ecosystem Plan

NVIDIA's RTX 5060 May Finally Get The VRAM Upgrade Gamers Wanted