Inference

Feeds to Scour
SubscribedAll
Scoured 349 posts in 9.1 ms

The Bill Arrives: How to Manage Agentic AI Costs at Scale

 🧠AI  Content type: Blog
cockroachlabs.com·

Two Leaps to 1000 Tokens/s on a 1T-Parameter Model: On Inference Systems, Execution Boundaries, and Co-Design

 🕵️AI Agents  Content type: Blog
tilert.ai··Hacker News

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

 🧠AI  Content type: News  Content type: Blog
blog.google··Hacker News

Token4Token — pay-per-token inference on Gnosis + Swarm

 🔀LoRA

Ask HN: Is software engineering still a good career choice for new students?

 🤖Machine Learning  Content type: Discussion

Magenta RealTime 2: Open and Local Live Music Models

 🤖Machine Learning

Mobile AI Compute Engine (MACE) inference framework — Vision SDK

 Transformers  Content type: Blog
mapbox.com·

On-device AI is a margin decision

 📊AI Evals  Content type: Blog
ziraph.com··Hacker News

OpenCV Introduces New DNN Inference Engine

 🧠AI
i-programmer.info·

146th airhacks tv: Rust, Java 25, AI Agents, BCE, Web Components, zunit, zb

 🤖Machine Learning  Content type: Blog
adambien.blog·

Pruned YOLOv8 ONNX INT8 Fails: 3 Fixes That Work

 🧠AI  Content type: Blog  Content type: Discussion
tildalice.io·

UniSVQ: 2-bit Unified Scalar-Vector Quantization

 🔀LoRA  Content type: Academic
arxiv.org·

HNSW vs LSH: How Elasticsearch hits 0.99 recall@10 at 15,000 QPS — and what it costs

 📐Embeddings  Content type: Blog
elastic.co·

DiffusionGemma: 4x Faster Text Generation

 🧠AI  Content type: News  Content type: Blog

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

 🧠AI

China's Xiaomi MiMo Is Now 15X Faster Than ChatGPT and Claude (4 minute read)

 🧠AI  Content type: News
decrypt.co·

No Token Left Behind: Demystifying Token-in-Token-Out in Miles

 💬LLMs  Content type: Blog
lmsys.org··Hacker News

Speculators v0.5.0: DFlash support and online training

 🔀LoRA
developers.redhat.com·

Xiaomi MiMo-V2.5-Pro Just Hit 1,000 Tokens Per Second!

 🧠AI
gizchina.com·

Build a Medical Report Analyzer on Dedicated Inference with Python

 🧠AI
digitalocean.com·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help