💬 LLMs - zhang · Scour

Improved performance and model support with GGUF

🔧AI Tools Blog

defai-digital/ax-engine: Apple Silicon LLM runtime supporting Gemma 4 and Qwen 3.6 MTP modes

🔧AI Tools Code

github.com··Hacker News

Ollama 0.30 delivers faster NVIDIA GPU performance and wider hardware support

alternativeto.net·

Alignment Collapse Under KV Cache Quantization: Diagnosis and Mitigation

🔧AI Tools Academic

lightmetal: GPU LLM Inference From a Single Java 25 JAR

🔌Embedded Systems Blog

adambien.blog·

Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB

🔧AI Tools Blog

ziraph.com··Hacker News

Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%

zozo123.github.io··Hacker News

Qwen 3.6 27B AutoRound GGUF, need your feedback

🔌Embedded Systems

huggingface.co··r/LocalLLaMA

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

vettedconsumer.com··Hacker News

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

🔧AI Tools Blog

blogs.nvidia.com·

Fixing a stuck Ollama runner and building a GPU watchdog

patrickmccanna.net··Hacker News

I've tested so many desktop AI tools, but Hermes with Ollama is my new favorite - here's why

🔧AI Tools News Tutorial

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

🔧AI Tools News Blog

blog.google··Hacker News

A system programmer’s guide to LLM inference

🔌Embedded Systems Blog

blog.xiangpeng.systems··Hacker News

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

deemwar-products.github.io··Hacker News

2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP

🔧AI Tools Blog

dnhkng.github.io·

AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support

Using Scikit-LLM with Open-Source LLMs

machinelearningmastery.com·

What's in the Box? A Field Guide to AI Models

🔧AI Tools Blog

iankduncan.com·

Oil-impregnated densified wood veneer with high electrical insulation enabled by nanosized oil channels

🔌Embedded Systems

Log in to enable infinite scrolling