Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
ML Inference
🚀 ML Inference
Specific
model inference, inference optimization, TensorRT, ONNX
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
314
posts in
15.9
ms
[AINews] Fable and Mythos officially too dangerous to release
📄
Systems Papers
Content type:
News
latent.space
·
14h
14 hours ago
Actions for [AINews] Fable and Mythos officially too dangerous to release
Token4Token — pay-per-token
inference
on Gnosis + Swarm
⚡
Query Engines
t4t.eth.link
·
4d
4 days ago
·
Hacker News
Actions for Token4Token — pay-per-token inference on Gnosis + Swarm
vLLM
Transformers Backend: Bridging Hugging Face Compatibility and High-Performance
Inference
⚙️
ML Systems
Content type:
Blog
odsc.medium.com
·
1d
1 day ago
Actions for vLLM Transformers Backend: Bridging Hugging Face Compatibility and High-Performance Inference
2x GH200 for LLM
inference
, Part 2:
vLLM
, DeepSeek V4 Flash, and MTP
🖥️
GPU Computing
Content type:
Blog
dnhkng.github.io
·
5d
5 days ago
Actions for 2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP
DiffusionGemma: Discrete diffusion in a large language
model
🧠
Deep Learning
idlemachines.co.uk
·
1d
1 day ago
·
Hacker News
Actions for DiffusionGemma: Discrete diffusion in a large language model
Xiaomi MiMo-V2.5-Pro Just Hit 1,000 Tokens Per Second!
⚙️
ML Systems
gizchina.com
·
4d
4 days ago
Actions for Xiaomi MiMo-V2.5-Pro Just Hit 1,000 Tokens Per Second!
OpenCV 5.0 Computer Vision Library Released with Rewritten DNN
Engine
🎥
Video Analytics
linuxiac.com
·
5d
5 days ago
Actions for OpenCV 5.0 Computer Vision Library Released with Rewritten DNN Engine
Why are cached input tokens cheaper with AI
services
?
⚙️
ML Systems
xeiaso.net
·
1d
1 day ago
Actions for Why are cached input tokens cheaper with AI services?
The economics of
speculative
decoding
⚙️
ML Systems
Content type:
Blog
fergusfinn.com
·
5d
5 days ago
·
Hacker News
Actions for The economics of speculative decoding
vicharak-in/Gati: Gati Accelerates Your CNN Algorithms!
🧠
Deep Learning
Content type:
Code
github.com
·
1d
1 day ago
·
Hacker News
Actions for vicharak-in/Gati: Gati Accelerates Your CNN Algorithms!
HNSW vs LSH: How Elasticsearch hits 0.99 recall@10 at 15,000 QPS — and what it costs
⚡
Query Engines
Content type:
Blog
elastic.co
·
4d
4 days ago
·
Cited by 1 article
Actions for HNSW vs LSH: How Elasticsearch hits 0.99 recall@10 at 15,000 QPS — and what it costs
ReSET: Accurate
Latency-Critical
NVFP4 Reasoning via Step-Aware Temperature Scaling
🖥️
GPU Computing
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for ReSET: Accurate Latency-Critical NVFP4 Reasoning via Step-Aware Temperature Scaling
OpenCV Introduces New DNN
Inference
Engine
🎥
Video Analytics
i-programmer.info
·
5d
5 days ago
Actions for OpenCV Introduces New DNN Inference Engine
How to Setup a Local Coding Agent on macOS
🦀
Rust
Content type:
Blog
ikyle.me
·
1d
1 day ago
·
Hacker News
·
Cited by 2 articles
Actions for How to Setup a Local Coding Agent on macOS
Kimi K2.7-Code cuts thinking tokens 30% — but practitioners say the benchmarks don't check out
🖥️
GPU Computing
venturebeat.com
·
21h
21 hours ago
Actions for Kimi K2.7-Code cuts thinking tokens 30% — but practitioners say the benchmarks don't check out
Quantization
Was Never About the Bits
⚙️
ML Systems
Content type:
Blog
medium.com
·
1d
1 day ago
Actions for Quantization Was Never About the Bits
The
Inference
Alpha: Maximizing Frontier
Models
on AMD
🖥️
GPU Computing
Content type:
Blog
digitalocean.com
·
3d
3 days ago
Actions for The Inference Alpha: Maximizing Frontier Models on AMD
Lowest-Cost LLM
Inference
: The Complete OpenRouter Guide
⚡
Query Engines
Content type:
Blog
Content type:
Discussion
Content type:
Tutorial
openrouter.ai
·
1d
1 day ago
Actions for Lowest-Cost LLM Inference: The Complete OpenRouter Guide
TFLite Edge
Model
Quantizer
Snippet
🧠
Deep Learning
itsevilduck.gumroad.com
·
5d
5 days ago
·
DEV
Actions for TFLite Edge Model Quantizer Snippet
Ollama's highest performance on Apple Silicon yet with MLX
⚡
Query Engines
Content type:
Blog
ollama.com
·
2d
2 days ago
Actions for Ollama's highest performance on Apple Silicon yet with MLX
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help