Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
ML Inference
⚡ ML Inference
inference engine, model serving, inference optimization, runtime
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
151
posts in
7.2
ms
PagedAttention vs Traditional KV Cache: How
vLLM
Reinvented GPU Memory for LLM
Inference
🖥️
Systems ML
Content type:
Blog
medium.com
·
2d
2 days ago
Actions for PagedAttention vs Traditional KV Cache: How vLLM Reinvented GPU Memory for LLM Inference
AMD's Lemonade SDK For Local AI Adds
NVIDIA
CUDA Support
🎮
GPU Programming
phoronix.com
·
12h
12 hours ago
·
r/artificial
Actions for AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support
No Token Left Behind: Demystifying Token-in-Token-Out in Miles
🧠
Deep Learning
Content type:
Blog
lmsys.org
·
1d
1 day ago
·
Hacker News
Actions for No Token Left Behind: Demystifying Token-in-Token-Out in Miles
2x GH200 for LLM
inference
, Part 2:
vLLM
, DeepSeek V4 Flash, and MTP
🔗
Distributed Training
Content type:
Blog
dnhkng.github.io
·
3d
3 days ago
Actions for 2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP
magenta/magenta-realtime: Magenta RealTime 2: An Open-Weights Live Music
Model
🧠
Deep Learning
Content type:
Code
github.com
·
22h
22 hours ago
Actions for magenta/magenta-realtime: Magenta RealTime 2: An Open-Weights Live Music Model
google/gemma-4-31B-it · fix: chat template — null handling, reasoning preservation, turn-tag balance, input validation
🧠
Deep Learning
huggingface.co
·
2d
2 days ago
·
r/LocalLLaMA
Actions for google/gemma-4-31B-it · fix: chat template — null handling, reasoning preservation, turn-tag balance, input validation
Vadzo Imaging Introduces HDR MIPI CSI-2 Embedded Cameras Recommended for Drone and UAV Applications
🔄
MLOps
Content type:
News
einpresswire.com
·
22h
22 hours ago
Actions for Vadzo Imaging Introduces HDR MIPI CSI-2 Embedded Cameras Recommended for Drone and UAV Applications
A system programmer’s guide to LLM
inference
🧠
Deep Learning
Content type:
Blog
blog.xiangpeng.systems
·
3d
3 days ago
·
Hacker News
Actions for A system programmer’s guide to LLM inference
DiffusionGemma: The Developer Guide- Google Developers Blog
🎮
GPU Programming
Content type:
Blog
developers.googleblog.com
·
1d
1 day ago
·
r/LocalLLaMA
Actions for DiffusionGemma: The Developer Guide- Google Developers Blog
Build a Medical Report Analyzer on Dedicated
Inference
with Python
🧠
Deep Learning
digitalocean.com
·
6d
6 days ago
Actions for Build a Medical Report Analyzer on Dedicated Inference with Python
Alignment Collapse Under KV Cache
Quantization
: Diagnosis and Mitigation
🗜️
Quantization
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Alignment Collapse Under KV Cache Quantization: Diagnosis and Mitigation
For Robotaxis, Safety Must Be Built In, Not Bolted On
🎮
GPU Programming
Content type:
Blog
blogs.nvidia.com
·
10h
10 hours ago
Actions for For Robotaxis, Safety Must Be Built In, Not Bolted On
Gemma 4 QAT
models
:
Optimizing
model
compression
for mobile and laptop efficiency
🗜️
Quantization
Content type:
News
Content type:
Blog
blog.google
·
5d
5 days ago
·
Hacker News
Actions for Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency
China's Xiaomi MiMo Is Now 15X Faster Than ChatGPT and Claude (4 minute read)
🗜️
Quantization
Content type:
News
decrypt.co
·
2d
2 days ago
·
Hacker News
Actions for China's Xiaomi MiMo Is Now 15X Faster Than ChatGPT and Claude (4 minute read)
Google's new open
model
DiffusionGemma generates text from noise instead of word by word
🧠
Deep Learning
the-decoder.com
·
10h
10 hours ago
Actions for Google's new open model DiffusionGemma generates text from noise instead of word by word
GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM
Quantization
(and Which One to Pick)
🗜️
Quantization
vettedconsumer.com
·
4d
4 days ago
·
Hacker News
Actions for GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)
APEX4: Efficient Pure W4A4 LLM
Inference
via Intra-SM Compute Rebalancing
🎮
GPU Programming
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing
OpenCV 5 Debuts with Improved
ONNX
Support and Native AI Upgrades
🧠
Deep Learning
Content type:
News
hackster.io
·
14h
14 hours ago
Actions for OpenCV 5 Debuts with Improved ONNX Support and Native AI Upgrades
The Death of the Four Golden Signals: Designing Telemetry for Non-Deterministic Infrastructure
🔄
MLOps
devops.com
·
5d
5 days ago
Actions for The Death of the Four Golden Signals: Designing Telemetry for Non-Deterministic Infrastructure
Latest technical articles & videos.
⚙️
Systems Programming
certdepot.net
·
4d
4 days ago
Actions for Latest technical articles & videos.
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help