🦙 Simple finetuning LLMs - autocole

martidu4/honey-ai: 🍯 All-in-one AI honeypot powered by local LLMs. SSH, HTTP, FTP, Telnet, SMTP, MySQL, Redis, Git, VNC, RDP — with canary tokens, tarpits, GZIP bombs, and threat intel reporting.

🧩WASI Code

github.com··Hacker News

Fixing a stuck Ollama runner and building a GPU watchdog

📚Monorepo Patterns

patrickmccanna.net··Hacker News

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

⚙️Finetuning LLMs faster with less memory News Blog

kaitchup.substack.com··r/LocalLLaMA

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

⚙️Finetuning LLMs faster with less memory News Blog

blog.google··Hacker News

Less-relevant results

On-device AI is a margin decision

🔄AI Pipeline design and techniques Blog

ziraph.com··Hacker News

DeskDash - a free Windows tool to easily manage your GGUF files

🤖Coding Automation

gerry7.itch.io··r/LocalLLaMA

Omnifs: APIs and data sources as files you can ls, cat, grep, and pipe

🧩WASI

omnifs.dev··Hacker News

local AI agents for Cursor with pre-tuned marketplace/commu

🤖Coding Automation

locaible.com··Hacker News

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

⚙️Finetuning LLMs faster with less memory

deemwar-products.github.io··Hacker News

Token4Token — pay-per-token inference on Gnosis + Swarm

🔵LLM frameworks and AI libraries for TypeScript

t4t.eth.link··Hacker News

A system programmer’s guide to LLM inference

⚙️Finetuning LLMs faster with less memory Blog

blog.xiangpeng.systems··Hacker News

1-bit and 1.58 bit LLM Benchmarking on Jetson Orin Nano Super | Bonsai LM

⚙️Finetuning LLMs faster with less memory

smolhub.com··r/LocalLLaMA

DiffusionGemma: The Developer Guide- Google Developers Blog

⚙️Finetuning LLMs faster with less memory Blog

developers.googleblog.com··r/LocalLLaMA

NexusOS v2.0 – A zero-dependency pipeline streaming server chaos to Parquet

🪟Tauri

huggingface.co··Hacker News

Fine-tuning vs RAG vs MeMo: Where should LLM Knowledge Live?

⚙️Finetuning LLMs faster with less memory

pub.towardsai.net

Launch HN: General Instinct (YC P26) – Frontier models on edge devices

🔥Svelte Discussion

news.ycombinator.com··Hacker News

DiffusionGemma: 4x Faster Text Generation

⚙️Finetuning LLMs faster with less memory News Blog

blog.google··Hacker News, r/LocalLLaMA, r/singularity

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

⚙️Finetuning LLMs faster with less memory

local-llm.utop.workers.dev··Hacker News

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

Qwen 3.6 27B AutoRound GGUF, need your feedback

martidu4/honey-ai: 🍯 All-in-one AI honeypot powered by local LLMs. SSH, HTTP, FTP, Telnet, SMTP, MySQL, Redis, Git, VNC, RDP — with canary tokens, tarpits, GZIP bombs, and threat intel reporting.

Fixing a stuck Ollama runner and building a GPU watchdog

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

On-device AI is a margin decision

DeskDash - a free Windows tool to easily manage your GGUF files

Omnifs: APIs and data sources as files you can ls, cat, grep, and pipe

local AI agents for Cursor with pre-tuned marketplace/commu

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

Token4Token — pay-per-token inference on Gnosis + Swarm

A system programmer’s guide to LLM inference

1-bit and 1.58 bit LLM Benchmarking on Jetson Orin Nano Super | Bonsai LM

DiffusionGemma: The Developer Guide- Google Developers Blog

NexusOS v2.0 – A zero-dependency pipeline streaming server chaos to Parquet

Fine-tuning vs RAG vs MeMo: Where should LLM Knowledge Live?

Launch HN: General Instinct (YC P26) – Frontier models on edge devices

DiffusionGemma: 4x Faster Text Generation

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU