ML Inference

Feeds to Scour
SubscribedAll
Scoured 314 posts in 15.9 ms

[AINews] Fable and Mythos officially too dangerous to release

 📄Systems Papers  Content type: News
latent.space·

Token4Token — pay-per-token inference on Gnosis + Swarm

 Query Engines
t4t.eth.link··Hacker News

vLLM Transformers Backend: Bridging Hugging Face Compatibility and High-Performance Inference

 ⚙️ML Systems  Content type: Blog
odsc.medium.com·

2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP

 🖥️GPU Computing  Content type: Blog
dnhkng.github.io·

DiffusionGemma: Discrete diffusion in a large language model

 🧠Deep Learning

Xiaomi MiMo-V2.5-Pro Just Hit 1,000 Tokens Per Second!

 ⚙️ML Systems
gizchina.com·

OpenCV 5.0 Computer Vision Library Released with Rewritten DNN Engine

 🎥Video Analytics
linuxiac.com·

Why are cached input tokens cheaper with AI services?

 ⚙️ML Systems
xeiaso.net·

The economics of speculative decoding

 ⚙️ML Systems  Content type: Blog

vicharak-in/Gati: Gati Accelerates Your CNN Algorithms!

 🧠Deep Learning  Content type: Code
github.com··Hacker News

HNSW vs LSH: How Elasticsearch hits 0.99 recall@10 at 15,000 QPS — and what it costs

 Query Engines  Content type: Blog

ReSET: Accurate Latency-Critical NVFP4 Reasoning via Step-Aware Temperature Scaling

 🖥️GPU Computing  Content type: Academic
arxiv.org·

OpenCV Introduces New DNN Inference Engine

 🎥Video Analytics
i-programmer.info·

How to Setup a Local Coding Agent on macOS

 🦀Rust  Content type: Blog

Kimi K2.7-Code cuts thinking tokens 30% — but practitioners say the benchmarks don't check out

 🖥️GPU Computing
venturebeat.com·

Quantization Was Never About the Bits

 ⚙️ML Systems  Content type: Blog
medium.com
·

The Inference Alpha: Maximizing Frontier Models on AMD

 🖥️GPU Computing  Content type: Blog
digitalocean.com·

Lowest-Cost LLM Inference: The Complete OpenRouter Guide

 Query Engines  Content type: Blog  Content type: Discussion  Content type: Tutorial
openrouter.ai·

TFLite Edge Model Quantizer Snippet

 🧠Deep Learning

Ollama's highest performance on Apple Silicon yet with MLX

 Query Engines  Content type: Blog
ollama.com·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help