Quantization

Feeds to Scour
SubscribedAll
Scoured 85 posts in 7.3 ms

Sigma-Branch: Hierarchical Single-Path Network Reconstruction for Dynamic Inference with Reduced Active Parameters

 🧠Inference Engineering  Content type: Academic
arxiv.org·

Google DeepMind releases Gemma 4 QAT, but Unsloth developer Daniel Han warns naive llama.cpp conversions suffer accuracy loss

 💰Inference Cost  Content type: News
digg.com·

OpenAI govt stake 🇺🇸, Google compute deal 🚀, Microsoft Scout launch 🤖

 💰Inference Cost
tldr.tech·

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

 💰Inference Cost

Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB

 💰Inference Cost  Content type: Blog
ziraph.com··Hacker News

Building & Benchmarking: LLMs on a 16GB Jetson Orin NX for Hermes Agent

 🎮GPU Computing  Content type: Blog
dnhkng.github.io·

Gemma 4 12B: A unified, encoder-free multimodal model

 FlashAttention  Content type: Discussion

BeeLlama.cpp DFlash on Strix Halo: 2.7x Gemma 31B, But MTP Is Still Faster

 🚀Speculative Decoding
sleepingrobots.com·

FADA: Accessible fetal ultrasound interpretation and annotation with a selectively distilled unified vision-language model

 💰Inference Cost  Content type: Academic
arxiv.org·

Remove padding and multiple D2D copies for MTP by gaugarg-nv · Pull Request #24086 · ggml-org/llama.cpp

 🟢CUDA  Content type: Code
github.com··r/LocalLLaMA

Ideogram4 GGUF is out!

 🚀Speculative Decoding

Apple rebuilt its on-device AI stack at WWDC 2026

 🔢GEMM Optimization  Content type: Blog
ziraph.com··Hacker News

Dew Drop - June 8, 2026 (#4685)

 🧠Inference Engineering
alvinashcraft.com·

stable-diffusion.cpp/docs/quantization_and_gguf.md at master · leejet/stable-diffusion.cpp

 💰Inference Cost  Content type: Code

2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP

 🧠Inference Engineering  Content type: Blog
dnhkng.github.io·

Recover-LoRA for Aggressive Quantization: Reclaiming Accuracy in 2-Bit Language Models via Low-Rank Adaptation with Knowledge Distillation on Synthetic Data

 💰Inference Cost  Content type: Academic
arxiv.org·

iChristGit/comfyui-llamacpp-ideogram: ComfyUI Prompt enhancer for ideogram4 powered by llama cpp

 🚀Speculative Decoding  Content type: Code

not much happened today | AINews

 🧠Inference Engineering
news.smol.ai·

alexziskind1/model-shelf: Model Shelf is a local-first model resolver that helps AI agents and scripts find model weights on your own storage before downloading from Hugging Face. Point it at an internal SSD, NAS, external SSD, or Thunderbolt DAS, and it returns the best local path for GGUF, MLX, safetensors, Ollama, vLLM, and other local AI workflows.

 🧠Inference Engineering  Content type: Code
github.com·

Launch HN: General Instinct (YC P26) – Frontier models on edge devices

 💰Inference Cost  Content type: Discussion

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help