🏠 Local LLMs - kudolink

Neo-X7/Neo-AI: A fully offline AI assistant powered by Ollama. Stores and retrieves conversations using SQLite + LanceDB vector search. No cloud. No API keys. Runs entirely on your machine.

🧠LLMs Blog

iankduncan.com·

Optimal Post-Training Quantization Scales and Where to Find Them

🧠LLMs Academic

arxiv.org·

The biggest local LLM on your machine is useless if it can't call a single tool, no matter how many parameters it has

🤗Open Source AI

xda-developers.com·

lightmetal: GPU LLM Inference From a Single Java 25 JAR

🤗Open Source AI Blog

adambien.blog·

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

🤗Open Source AI News Blog

blog.google··Hacker News

Less-relevant results

On-device AI is a margin decision

🤗Open Source AI Blog

ziraph.com··Hacker News

"AI" Is Eating Platform Monopolist Free Cash Flow, Not the World: CHART OF THE DAY

🤗Open Source AI News Blog

braddelong.substack.com··Substack

LM Studio now lets you use your iPhone to talk to local models on your Mac

🤗Open Source AI

9to5mac.com··r/apple

NexusOS v2.0 – A zero-dependency pipeline streaming server chaos to Parquet

🌐Web Dev

huggingface.co··Hacker News

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

🧠LLMs News Blog

kaitchup.substack.com··r/LocalLLaMA

Previewing nAnalyst, the layer that finally explains your network

🤖AI Coding

ntop.org·

Using local LLMs for agentic coding

🤗Open Source AI Blog

blog.alexewerlof.com·

Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good

🤗Open Source AI Blog

towardsai.net·

martidu4/honey-ai: 🍯 All-in-one AI honeypot powered by local LLMs. SSH, HTTP, FTP, Telnet, SMTP, MySQL, Redis, Git, VNC, RDP — with canary tokens, tarpits, GZIP bombs, and threat intel reporting.

🤗Open Source AI Code

github.com··Hacker News

WWDC 2026: Foundation Models (& Anarlog)

🤗Open Source AI

skushagra.com·

The latest Gemma 4 models use a training trick to slash their on-device memory footprint

🤗Open Source AI

androidauthority.com·

Re-quantizing a local LLM 14x faster by skipping the tensors that didn't change

🤗Open Source AI News Blog

andreaborio.substack.com··Substack

Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB

🤗Open Source AI Blog

ziraph.com··Hacker News

Neo-X7/Neo-AI: A fully offline AI assistant powered by Ollama. Stores and retrieves conversations using SQLite + LanceDB vector search. No cloud. No API keys. Runs entirely on your machine.

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

What's in the Box? A Field Guide to AI Models

Optimal Post-Training Quantization Scales and Where to Find Them

The biggest local LLM on your machine is useless if it can't call a single tool, no matter how many parameters it has

lightmetal: GPU LLM Inference From a Single Java 25 JAR

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

On-device AI is a margin decision

"AI" Is Eating Platform Monopolist Free Cash Flow, Not the World: CHART OF THE DAY

LM Studio now lets you use your iPhone to talk to local models on your Mac

NexusOS v2.0 – A zero-dependency pipeline streaming server chaos to Parquet

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

Previewing nAnalyst, the layer that finally explains your network

Using local LLMs for agentic coding

Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good

martidu4/honey-ai: 🍯 All-in-one AI honeypot powered by local LLMs. SSH, HTTP, FTP, Telnet, SMTP, MySQL, Redis, Git, VNC, RDP — with canary tokens, tarpits, GZIP bombs, and threat intel reporting.

WWDC 2026: Foundation Models (& Anarlog)

The latest Gemma 4 models use a training trick to slash their on-device memory footprint

Re-quantizing a local LLM 14x faster by skipping the tensors that didn't change

Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB