LLM Inference

Feeds to Scour
SubscribedAll
Scoured 291 posts in 7.3 ms

MLPerf and the rise of latency-aware LLM benchmarking

 🧠KV Cache
edn.com·

RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference

 🧠KV Cache  Content type: Academic
arxiv.org·

Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB

 📦Parquet  Content type: Blog
ziraph.com··Hacker News
Less-relevant results

🇳🇱 Go/Golang job: Senior Backend Engineer (Go) | Studio AI at Creative Fabrica (Amsterdam, Netherlands)

 🕸️Distributed Systems
golangprojects.com·

1-bit and 1.58 bit LLM Benchmarking on Jetson Orin Nano Super | Bonsai LM

 🧠KV Cache
smolhub.com··r/LocalLLaMA

NVIDIA releases Nemotron 3 Ultra, claiming five times the speed and 30 percent lower costs than prior modelsThe model delivers 300 tokens per second on benchmar...

 vLLM
digg.com·

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

 vLLM

Vadzo Imaging Introduces HDR MIPI CSI-2 Embedded Cameras Recommended for Drone and UAV Applications

 🌊Stream Processing  Content type: News
einpresswire.com·

Nemotron 3 Ultra now available on AI Gateway

 vLLM
vercel.com·

Google open-sources speedy DiffusionGemma text diffusion model

 vLLM
siliconangle.com·

Mobile AI Compute Engine (MACE) inference framework — Vision SDK

 🧠KV Cache  Content type: Blog
mapbox.com·

BeeLlama.cpp DFlash on Strix Halo: 2.7x Gemma 31B, But MTP Is Still Faster

 🧠KV Cache
sleepingrobots.com·

No Token Left Behind: Demystifying Token-in-Token-Out in Miles

 🌊Stream Processing  Content type: Blog
lmsys.org··Hacker News

Google’s DiffusionGemma is 4x faster than its other Gemma models

 🌲LSM Trees
thenewstack.io·

Making LLMs faster and more efficient across multiple languages

 vLLM
techxplore.com·

Which is faster: Gemini 3.5 Flash or Kimi K2.6 on Cerebras

 🌊Stream Processing  Content type: Blog
cerebras.ai·

146th airhacks tv: Rust, Java 25, AI Agents, BCE, Web Components, zunit, zb

 🌊Stream Processing  Content type: Blog
adambien.blog·

Why I care so much about energy per token

 🧠KV Cache  Content type: Blog
ziraph.com··Hacker News

libertywing/FlashMemory-Deepseek-V4: FlashMemory DS-V4 Retriever: a lightweight retriever that sparsifies DeepSeek-V4 CSA KV-cache. Weights available on Hugging Face.

 🧠KV Cache  Content type: Code
github.com·

3x Faster Search: Parallel Test-Time Scaling with Instructed-Retriever-1

 🧠KV Cache  Content type: Blog
databricks.com·
Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help