Quantization

Feeds to Scour
SubscribedAll
Scoured 58 posts in 12.5 ms

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

 🔓Open-source Models
Less-relevant results

Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB

 🧠LLMs  Content type: Blog
ziraph.com··Hacker News

Show HN: Ext-Infer

 🔧Tool Use

Gemma 4 12B: A unified, encoder-free multimodal model

 🔓Open-source Models  Content type: Discussion

Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent

 🔧Tool Use  Content type: Blog
dnhkng.github.io·

Holding the FP8 Quality Ceiling at 8-Bit Weights and Activations: INT8 and GGUF Post-Training Quantization of Ideogram 4.0 for Consumer GPUs

 🔓Open-source Models  Content type: Academic
arxiv.org·

Nvidia DGX Spark GB10 – AI Models and Guide with vLLM and Autonomous Script

 🧠LLMs  Content type: Code
github.com··Hacker News

BeeLlama.cpp DFlash on Strix Halo: 2.7x Gemma 31B, But MTP Is Still Faster

 🔓Open-source Models
sleepingrobots.com·

Google DeepMind releases Gemma 4 QAT, but Unsloth developer Daniel Han warns naive llama.cpp conversions suffer accuracy loss

 🧠LLMs  Content type: News
digg.com·

Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good

 🧠LLMs  Content type: Blog
towardsai.net·

Ideogram4 GGUF is out!

 🔓Open-source Models

Apple rebuilt its on-device AI stack at WWDC 2026

 🎭Multimodal AI  Content type: Blog
ziraph.com··Hacker News

I wired a fully offline voice loop to Ollama + LM Studio — 100% CPU, no GPU, nothing leaves your machine (Silero VAD + Parakeet STT + Supertonic TTS 3)

 🕵️AI Agents  Content type: Code
github.com··r/LocalLLaMA

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

 🔓Open-source Models  Content type: News  Content type: Blog
blog.google··Hacker News

Florian Brand, Prime Intellect research engineer, adopts Gemma 4 E4B 6-bit quantized as his primary local Mac LLM

 🔓Open-source Models  Content type: News
digg.com··Hacker News

Launch HN: General Instinct (YC P26) – Frontier models on edge devices

 🔓Open-source Models  Content type: Discussion

not much happened today | AINews

 🕵️AI Agents
news.smol.ai·

defai-digital/ax-engine: Apple Silicon LLM runtime supporting Gemma 4 and Qwen 3.6 MTP modes

 🔓Open-source Models  Content type: Code
github.com··Hacker News

FADA: Accessible fetal ultrasound interpretation and annotation with a selectively distilled unified vision-language model

 👁️VLMs  Content type: Academic
arxiv.org·

The latest Gemma 4 models use a training trick to slash their on-device memory footprint

 🔓Open-source Models
androidauthority.com·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help