Model optimizations in LLMs

Feeds to Scour
SubscribedAll
Scoured 186 posts in 7.6 ms

Joint Structural Pruning and Mixed-Precision Quantization for LLM Compression

 📊AI Performance Profiling  Content type: Academic
arxiv.org·

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

 🚀LLM serving frameworks  Content type: News  Content type: Blog
blog.google··Hacker News

Orchestrate your LLM pipeline. Locally

 🧠Large Language Models (LLMs)
llmforge.app··Hacker News

Qwen 3.6 27B AutoRound GGUF, need your feedback

 🔢Quantization of LLMs

Google's DiffusionGemma generates 256 tokens in parallel and self-corrects as it goes

 🚀LLM serving frameworks
venturebeat.com·

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

 🔧Systems-level optimizations for LLM serving  Content type: Code
github.com··Hacker News, r/LLM

Friday Five — June 12, 2026

 🔧Systems-level optimizations for LLM serving
redhat.com·

DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200

 🚀LLM serving frameworks  Content type: News

DiffusionGemma: Discrete diffusion in a large language model

 🧠Large Language Models (LLMs)

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

 🔢Quantization of LLMs

Model2vec-zig: static text embeddings in pure Zig, in a single binary

 🔢Quantization of LLMs
ziggit.dev·

Ultrafast machine learning on FPGAs via Kolmogorov-Arnold Networks

 Real-time AI Systems

Pruned YOLOv8 ONNX INT8 Fails: 3 Fixes That Work

 🔢Quantization of LLMs  Content type: Blog  Content type: Discussion
tildalice.io·

Quantization Was Never About the Bits

 🔢Quantization of LLMs  Content type: Blog
medium.com
·

Unsloth Gemma 4 QAT

 🚀LLM serving frameworks
unsloth.ai·

Domain-Specific Small Language Models (Manning)

 🧠Large Language Models (LLMs)
i-programmer.info·

The latest Gemma 4 models use a training trick to slash their on-device memory footprint

 🔢Quantization of LLMs
androidauthority.com·

The Quantization Error of the Soul: Why Silicon Valley is Inverting the Promethean Fire

 🔢Quantization of LLMs  Content type: Blog
medium.com
·

Re-quantizing a local LLM 14x faster by skipping the tensors that didn't change

 🧠Large Language Models (LLMs)  Content type: News  Content type: Blog

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

 🔢Quantization of LLMs  Content type: News  Content type: Blog

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help