⚡ Quantization - zongyuzhang · Scour

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

🔓Open-source Models

local-llm.utop.workers.dev··Hacker News

Less-relevant results

Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB

🧠LLMs Blog

ziraph.com··Hacker News

Show HN: Ext-Infer

infer.displace.tech··Hacker News

Gemma 4 12B: A unified, encoder-free multimodal model

🔓Open-source Models Discussion

news.ycombinator.com··Hacker News

Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent

🔧Tool Use Blog

dnhkng.github.io·

Holding the FP8 Quality Ceiling at 8-Bit Weights and Activations: INT8 and GGUF Post-Training Quantization of Ideogram 4.0 for Consumer GPUs

🔓Open-source Models Academic

Nvidia DGX Spark GB10 – AI Models and Guide with vLLM and Autonomous Script

🧠LLMs Code

github.com··Hacker News

BeeLlama.cpp DFlash on Strix Halo: 2.7x Gemma 31B, But MTP Is Still Faster

🔓Open-source Models

sleepingrobots.com·

Google DeepMind releases Gemma 4 QAT, but Unsloth developer Daniel Han warns naive llama.cpp conversions suffer accuracy loss

🧠LLMs News

Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good

🧠LLMs Blog

towardsai.net·

Ideogram4 GGUF is out!

🔓Open-source Models

huggingface.co··r/StableDiffusion

Apple rebuilt its on-device AI stack at WWDC 2026

🎭Multimodal AI Blog

ziraph.com··Hacker News

I wired a fully offline voice loop to Ollama + LM Studio — 100% CPU, no GPU, nothing leaves your machine (Silero VAD + Parakeet STT + Supertonic TTS 3)

🕵️AI Agents Code

github.com··r/LocalLLaMA

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

🔓Open-source Models News Blog

blog.google··Hacker News

Florian Brand, Prime Intellect research engineer, adopts Gemma 4 E4B 6-bit quantized as his primary local Mac LLM

🔓Open-source Models News

digg.com··Hacker News

Launch HN: General Instinct (YC P26) – Frontier models on edge devices

🔓Open-source Models Discussion

news.ycombinator.com··Hacker News

not much happened today | AINews

🕵️AI Agents

defai-digital/ax-engine: Apple Silicon LLM runtime supporting Gemma 4 and Qwen 3.6 MTP modes

🔓Open-source Models Code

github.com··Hacker News

FADA: Accessible fetal ultrasound interpretation and annotation with a selectively distilled unified vision-language model

👁️VLMs Academic

The latest Gemma 4 models use a training trick to slash their on-device memory footprint

🔓Open-source Models

androidauthority.com·

Log in to enable infinite scrolling