LLM Inference

Feeds to Scour
SubscribedAll
Scoured 172 posts in 7.2 ms

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

 🪟Context Windows

Token4Token — pay-per-token inference on Gnosis + Swarm

 🧠LLMs
t4t.eth.link··Hacker News

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

 🤖Data science  Content type: News  Content type: Blog

Apple WWDC On-Device AI Deep Dive - Google Docs

 🧠LLMs
gist.is··Hacker News

Characterizing Software Aging in GPU-Based LLM Serving Systems

 🔬Deep Learning  Content type: Academic
arxiv.org·

DiffusionGemma 26B A4B results on my 5090

 🧠LLMs

gist:5b74b8c31e934ff50ce57aa653a343d5

 🔤Tokenization
gist.github.com··r/LocalLLaMA

The economics of speculative decoding

 🤖LLM  Content type: Blog

MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 TPS

 🎯Fine-tuning  Content type: Blog

On-device AI is a margin decision

 🔬Deep Learning  Content type: Blog
ziraph.com··Hacker News

TileFuse: A Fused Mixed-Precision Kernel Library for Efficient Quantized LLM Inference on AMD NPUs

 🔬Deep Learning  Content type: Academic
arxiv.org·

Show HN: Ext-Infer

 🪟Context Windows

KaiFelixBennett/gemma4-turboquant-rdna4: Run Gemma-4-31B at full 256K context on a $1,400 AMD RDNA4 GPU (gfx1201): TurboQuant KV cache + HIP-graph-safe Flash-Attention for llama.cpp, fully measured on real hardware.

 🔬Deep Learning  Content type: Code
github.com··Hacker News

Omnifs: APIs and data sources as files you can ls, cat, grep, and pipe

 🔍Information Retrieval
omnifs.dev··Hacker News

local AI agents for Cursor with pre-tuned marketplace/commu

 🎯Fine-tuning

Youssof Altoukhi (@Youssofal_)

 🧠LLMs
xcancel.com··r/LocalLLaMA

VIA-SD: Verification via Intra-Model Routing for Speculative Decoding

 🧠LLMs  Content type: Academic
arxiv.org·

DiffusionGemma: 4x Faster Text Generation

 🤖Data science  Content type: News  Content type: Blog

How to Measure Time To First Token (TTFT) in AI Systems

 💬Natural Language Processing

Ask HN: Is software engineering still a good career choice for new students?

 🤖Machine Learning  Content type: Discussion

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help