🧠 Local llm - akapaka

martidu4/honey-ai: 🍯 All-in-one AI honeypot powered by local LLMs. SSH, HTTP, FTP, Telnet, SMTP, MySQL, Redis, Git, VNC, RDP — with canary tokens, tarpits, GZIP bombs, and threat intel reporting.

🤖Qwen Code

github.com··Hacker News

Qwen 3.6 27B AutoRound GGUF, need your feedback

⚡LLM Quantization

huggingface.co··r/LocalLLaMA

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

⚡LLM Quantization

vettedconsumer.com··Hacker News

On-device AI is a margin decision

🧠LLM Inference Blog

ziraph.com··Hacker News

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

🧠LLM Inference News Blog

blog.google··Hacker News

A system programmer’s guide to LLM inference

🧠LLM Inference Blog

blog.xiangpeng.systems··Hacker News

Token4Token — pay-per-token inference on Gnosis + Swarm

🧠LLM Inference

t4t.eth.link··Hacker News

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

🧠LLM Inference News Blog

kaitchup.substack.com··r/LocalLLaMA

KaiFelixBennett/gemma4-turboquant-rdna4: Run Gemma-4-31B at full 256K context on a $1,400 AMD RDNA4 GPU (gfx1201): TurboQuant KV cache + HIP-graph-safe Flash-Attention for llama.cpp, fully measured on real hardware.

🧠LLM Inference Code

github.com··Hacker News

local AI agents for Cursor with pre-tuned marketplace/commu

🔌Model Context Protocol

locaible.com··Hacker News

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

🧠LLM Inference

local-llm.utop.workers.dev··Hacker News

Here's a llama.cpp CLI Command builder.

🧠LLM Inference

llamabuilding.com··r/LocalLLaMA

Purpose-built local AI agents

🤖Qwen Blog

samihonkonen.com··Hacker News

Run (your largest) local models from your iPhone

🧠LLM Inference Blog

lmstudio.ai··Hacker News, r/LocalLLaMA

Evaluating bigaspv2-5, a Flow Matching Alternative to SDXL

⚡LLM Quantization

hackernoon.com·

DeskDash - a free Windows tool to easily manage your GGUF files

⚡LLM Quantization

gerry7.itch.io··r/LocalLLaMA

Remove padding and multiple D2D copies for MTP by gaugarg-nv · Pull Request #24086 · ggml-org/llama.cpp

🧠LLM Inference Code

github.com··r/LocalLLaMA

Omnifs: APIs and data sources as files you can ls, cat, grep, and pipe

🕸️WebAssembly

omnifs.dev··Hacker News

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

Fixing a stuck Ollama runner and building a GPU watchdog

martidu4/honey-ai: 🍯 All-in-one AI honeypot powered by local LLMs. SSH, HTTP, FTP, Telnet, SMTP, MySQL, Redis, Git, VNC, RDP — with canary tokens, tarpits, GZIP bombs, and threat intel reporting.

Qwen 3.6 27B AutoRound GGUF, need your feedback

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

On-device AI is a margin decision

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

A system programmer’s guide to LLM inference

Token4Token — pay-per-token inference on Gnosis + Swarm

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

KaiFelixBennett/gemma4-turboquant-rdna4: Run Gemma-4-31B at full 256K context on a $1,400 AMD RDNA4 GPU (gfx1201): TurboQuant KV cache + HIP-graph-safe Flash-Attention for llama.cpp, fully measured on real hardware.

local AI agents for Cursor with pre-tuned marketplace/commu

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

Here's a llama.cpp CLI Command builder.

Purpose-built local AI agents

Run (your largest) local models from your iPhone

Evaluating bigaspv2-5, a Flow Matching Alternative to SDXL

DeskDash - a free Windows tool to easily manage your GGUF files

Remove padding and multiple D2D copies for MTP by gaugarg-nv · Pull Request #24086 · ggml-org/llama.cpp

Omnifs: APIs and data sources as files you can ls, cat, grep, and pipe