Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
ML Inference
🚀 ML Inference
Specific
model inference, inference optimization, TensorRT, ONNX
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
315
posts in
10.0
ms
UniSVQ: 2-bit Unified Scalar-Vector
Quantization
⚙️
ML Systems
Content type:
Academic
arxiv.org
·
3d
3 days ago
·
Cited by 1 article
Actions for UniSVQ: 2-bit Unified Scalar-Vector Quantization
Less-relevant results
DiffusionGemma: 4x Faster Text Generation
🖥️
GPU Computing
Content type:
News
Content type:
Blog
22
articles covering this post
blog.google
·
3d
3 days ago
·
Hacker News
,
r/LocalLLaMA
,
r/singularity
·
Cited by 22 articles
Actions for DiffusionGemma: 4x Faster Text Generation
12B Gemma 4 QAT
Deployment
with NVIDIA L4, Cloud
Run
, MCP, and Antigravity CLI
🧠
Deep Learning
Content type:
Blog
medium.com
·
1d
1 day ago
Actions for 12B Gemma 4 QAT Deployment with NVIDIA L4, Cloud Run, MCP, and Antigravity CLI
European Sovereign AI. Breakthrough Performance
⚡
Query Engines
infercom.ai
·
9h
9 hours ago
·
Hacker News
Actions for European Sovereign AI. Breakthrough Performance
massimo92/spark: CLI tool for
serving
LLMs with
vLLM
on NVIDIA DGX Spark. One file, zero friction.
🖥️
GPU Computing
Content type:
Code
github.com
·
2d
2 days ago
·
Hacker News
Actions for massimo92/spark: CLI tool for serving LLMs with vLLM on NVIDIA DGX Spark. One file, zero friction.
Intelligent
inference
scheduling with llm-d on Red Hat AI
⚡
Query Engines
developers.redhat.com
·
2d
2 days ago
Actions for Intelligent inference scheduling with llm-d on Red Hat AI
All sorts of famous Attention Layers
🧠
Deep Learning
Content type:
Blog
harsh-ps-2003.bearblog.dev
·
6h
6 hours ago
Actions for All sorts of famous Attention Layers
DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200
🖥️
GPU Computing
Content type:
News
newsletter.semianalysis.com
·
4d
4 days ago
·
Hacker News
·
Cited by 1 article
Actions for DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200
Unsloth Kimi-K2.7-Code-GGUF
🛠️
Compilers
huggingface.co
·
11h
11 hours ago
·
r/LocalLLaMA
Actions for Unsloth Kimi-K2.7-Code-GGUF
Making FlashAttention-4 faster for
inference
🖥️
GPU Computing
Content type:
Blog
modal.com
·
2d
2 days ago
·
Hacker News
Actions for Making FlashAttention-4 faster for inference
Inferoa
AI harness claimed 90% cache savings. We ran it and measured 97.8%
⚡
Query Engines
zozo123.github.io
·
3d
3 days ago
·
Hacker News
Actions for Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%
Friday Five — June 12, 2026
🧠
Memory Management
redhat.com
·
1d
1 day ago
·
Cited by 1 article
Actions for Friday Five — June 12, 2026
Unlocking AI flexibility in Europe: A guide to cross-region
inference
for EU data processing and
model
access
⚡
Query Engines
Content type:
Blog
aws.amazon.com
·
5d
5 days ago
Actions for Unlocking AI flexibility in Europe: A guide to cross-region inference for EU data processing and model access
Kimi K2.7-Code: open-source coding
model
with better token efficiency
⚙️
ML Systems
8
articles covering this post
huggingface.co
·
1d
1 day ago
·
Hacker News
,
r/LocalLLaMA
·
Cited by 8 articles
Actions for Kimi K2.7-Code: open-source coding model with better token efficiency
Show HN:
Quant
Picker – which GGUF file fits your
model
and machine
📄
Systems Papers
vettedconsumer.com
·
20h
20 hours ago
·
Hacker News
Actions for Show HN: Quant Picker – which GGUF file fits your model and machine
MiMo-v2.5-Pro-UltraSpeed: 1T
model
with 1000 TPS
⚙️
ML Systems
Content type:
Blog
10
articles covering this post
mimo.xiaomi.com
·
5d
5 days ago
·
Hacker News
,
r/LocalLLaMA
·
Cited by 10 articles
Actions for MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 TPS
Metrics that Matter with Serverless
Inference
⚙️
ML Systems
digitalocean.com
·
1d
1 day ago
Actions for Metrics that Matter with Serverless Inference
NVIDIA Blackwell Leads on First Agentic AI Infrastructure Benchmark
🖥️
GPU Computing
Content type:
Blog
blogs.nvidia.com
·
23h
23 hours ago
Actions for NVIDIA Blackwell Leads on First Agentic AI Infrastructure Benchmark
A system programmer’s guide to LLM
inference
🖥️
GPU Computing
Content type:
Blog
blog.xiangpeng.systems
·
5d
5 days ago
·
Hacker News
Actions for A system programmer’s guide to LLM inference
Model2vec-zig
: static text embeddings in pure Zig, in a single binary
⚙️
ML Systems
ziggit.dev
·
2d
2 days ago
Actions for Model2vec-zig: static text embeddings in pure Zig, in a single binary
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help