LLM Inference

Feeds to Scour
SubscribedAll
Scoured 249 posts in 8.1 ms

The Edge LLM Offload Story

 🧠LLM Training
semiengineering.com·
Less-relevant results

MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 TPS

 Zig  Content type: Blog

A field journal on Ray Data and Daft for multimodal data lake (14 minute read)

 🕸️axum  Content type: Blog
mehulbatra.medium.com·

Why I care so much about energy per token

 Zig  Content type: Blog
ziraph.com··Hacker News

MLPerf and the rise of latency-aware LLM benchmarking

 🧠LLM Training
edn.com·

Intro — Sehastrajit

 🖥️Self-Hosting  Content type: Blog
medium.com·

Show HN: Ext-Infer

 🦀Rust

fix(gateway): fail closed for unknown model auth · openclaw/openclaw@85343ea

 🦀Rust  Content type: Code
github.com·

STAR-KV: Low-Rank KV Cache Compression via Soft Thresholding for Adaptive Rank Control

 ⚙️Systems Programming  Content type: Academic
arxiv.org·

Latest technical articles & videos.

 ⚙️Systems Programming
certdepot.net·

TGI(SG)F.

 🔀Session Types  Content type: News
theverge.com
·

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

 Zig  Content type: Code
github.com··Hacker News

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

 🧠LLM Training

Nvidia Nemotron 3 Ultra

 🧠LLM Training

Where to Host Your Open-Source Model (Under 10B Parameters)

 🖥️Self-Hosting
digitalocean.com·

Breaking the Ice: Analyzing Cold Start Latency in vLLM

 ⚙️Systems Programming  Content type: Academic
arxiv.org·

mirkolenz/llmhop: Tiny, stateless Go router that dispatches OpenAI-compatible requests to single-model vLLM and sglang backends with zero external dependencies

 🖥️Self-Hosting  Content type: Code
github.com··Hacker News

How to Run Gemma 4 12B Locally - The Best AI For Consumer Laptops

 🖥️Self-Hosting  Content type: Video
youtube.com·

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

 🧠LLM Training  Content type: News  Content type: Blog
blog.google··Hacker News

Clairvoyant: Predictive SJF Scheduling to Mitigate Head-of-Line Blocking in Serial LLM Backends

 ⚙️Systems Programming  Content type: Academic
arxiv.org·
Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help