Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Model optimizations in LLMs
✨ Model optimizations in LLMs
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
176
posts in
4.1
ms
Joint Structural
Pruning
and Mixed-Precision
Quantization
for
LLM
Compression
📊
AI Performance Profiling
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Joint Structural Pruning and Mixed-Precision Quantization for LLM Compression
Qwen 3.6 27B AutoRound GGUF, need your feedback
🔢
Quantization of LLMs
huggingface.co
·
2d
2 days ago
·
r/LocalLLaMA
Actions for Qwen 3.6 27B AutoRound GGUF, need your feedback
How Does Attention Work in
LLMs
? 2026 Deep Dive
🧠
Large Language Models (LLMs)
Content type:
Blog
medium.com
·
1h
1 hour ago
Actions for How Does Attention Work in LLMs? 2026 Deep Dive
Orchestrate your
LLM
pipeline. Locally
🧠
Large Language Models (LLMs)
llmforge.app
·
11h
11 hours ago
·
Hacker News
Actions for Orchestrate your LLM pipeline. Locally
DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200
🚀
LLM serving frameworks
Content type:
News
newsletter.semianalysis.com
·
2d
2 days ago
·
Hacker News
Actions for DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200
harshuljain13/llm-inference-at-scale
: A Practitioner handbook for production
llm
serving.
🔧
Systems-level optimizations for LLM serving
Content type:
Code
github.com
·
5d
5 days ago
·
Hacker News
,
r/LLM
Actions for harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.
Google's DiffusionGemma generates 256 tokens in parallel and self-corrects as it goes
🚀
LLM serving frameworks
venturebeat.com
·
13h
13 hours ago
Actions for Google's DiffusionGemma generates 256 tokens in parallel and self-corrects as it goes
Friday Five — June 12, 2026
🔧
Systems-level optimizations for LLM serving
redhat.com
·
4h
4 hours ago
Actions for Friday Five — June 12, 2026
DiffusionGemma: Discrete diffusion in a
large
language
model
🧠
Large Language Models (LLMs)
idlemachines.co.uk
·
5h
5 hours ago
·
Hacker News
Actions for DiffusionGemma: Discrete diffusion in a large language model
Gemma 4 QAT
models
:
Optimizing
model compression for mobile and laptop efficiency
🚀
LLM serving frameworks
Content type:
News
Content type:
Blog
blog.google
·
6d
6 days ago
·
Hacker News
Actions for Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency
Ultrafast machine learning on FPGAs via Kolmogorov-Arnold Networks
⚡
Real-time AI Systems
aarushgupta.io
·
2d
2 days ago
·
Lobsters
,
Hacker News
Actions for Ultrafast machine learning on FPGAs via Kolmogorov-Arnold Networks
Model2vec-zig
: static text embeddings in pure Zig, in a single binary
🔢
Quantization of LLMs
ziggit.dev
·
8h
8 hours ago
Actions for Model2vec-zig: static text embeddings in pure Zig, in a single binary
GGUF vs GPTQ vs AWQ: The Plain-English Guide to
LLM
Quantization
(and Which One to Pick)
🔢
Quantization of LLMs
vettedconsumer.com
·
5d
5 days ago
·
Hacker News
Actions for GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)
Quantization
Was Never About the
Bits
🔢
Quantization of LLMs
Content type:
Blog
medium.com
·
7h
7 hours ago
Actions for Quantization Was Never About the Bits
Pruned YOLOv8 ONNX INT8 Fails: 3 Fixes That Work
🔢
Quantization of LLMs
Content type:
Blog
Content type:
Discussion
tildalice.io
·
6d
6 days ago
Actions for Pruned YOLOv8 ONNX INT8 Fails: 3 Fixes That Work
Domain-Specific Small
Language
Models
(Manning)
🧠
Large Language Models (LLMs)
i-programmer.info
·
1d
1 day ago
Actions for Domain-Specific Small Language Models (Manning)
Re-quantizing
a local
LLM
14x faster by skipping the
tensors
that didn't change
🧠
Large Language Models (LLMs)
Content type:
News
Content type:
Blog
andreaborio.substack.com
·
1d
1 day ago
·
Substack
Actions for Re-quantizing a local LLM 14x faster by skipping the tensors that didn't change
Unsloth Gemma 4 QAT
🚀
LLM serving frameworks
unsloth.ai
·
6d
6 days ago
Actions for Unsloth Gemma 4 QAT
The
Quantization
Error of the Soul: Why Silicon Valley is Inverting the Promethean Fire
🔢
Quantization of LLMs
Content type:
Blog
medium.com
·
14h
14 hours ago
Actions for The Quantization Error of the Soul: Why Silicon Valley is Inverting the Promethean Fire
The latest Gemma 4
models
use a training trick to slash their on-device memory footprint
🔢
Quantization of LLMs
androidauthority.com
·
6d
6 days ago
Actions for The latest Gemma 4 models use a training trick to slash their on-device memory footprint
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help