LLM Inference

Feeds to Scour
SubscribedAll
Scoured 142 posts in 17.7 ms

All sorts of famous Attention Layers

 💬LLMs  Content type: Blog
Less-relevant results

Deploying NVIDIA Nemotron-3 Ultra 550B, with B200 GPUs, vLLM on Google Kubernetes Engine — Football…

 KV Cache  Content type: Blog
ammettw.medium.com·

Google's DiffusionGemma generates 256 tokens in parallel and self-corrects as it goes

 KV Cache

Is anyone else not finding the Web UI on latest (b9680) of llama.cpp?

 💬LLMs  Content type: Discussion  Content type: Code
github.com··r/LocalLLaMA

How Public AI delivers sovereign LLM inference on AWS and Intel

 KV Cache  Content type: Blog

How to Setup a Local Coding Agent on macOS

 🔧MLOps  Content type: Blog  3 articles covering this post
ikyle.me··Hacker News·Cited by 3 articles·Covers 6 stories

DiffusionGemma: Discrete diffusion in a large language model

 KV Cache

Friday Five — June 12, 2026

 KV Cache
redhat.com·

[AINews] Satya on Loopcraft: Building Frontier Ecosystems

 💬LLMs  Content type: News
latent.space
·

New comment by Greenpants in "Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?"

 💬LLMs  Content type: Discussion

SwiftCache: Efficient LLM Serving for Multi-turn Conversations with Heterogeneous KV Cache Sharing

 KV Cache  Content type: Academic
arxiv.org·

Speculative Decoding: How to Get Free Tokens

 💬LLMs  Content type: Blog
medium.com
·

Rust port of transformers (1M lines of code)

 💬LLMs  Content type: Code
github.com··Hacker News

Built Uber aggregator that tracks top AI researchers and leaders

 💬LLMs
brightray.ai··Hacker News

12B Gemma 4 QAT Deployment with NVIDIA L4, Cloud Run, MCP, and Antigravity CLI

 KV Cache  Content type: Blog
medium.com
·

How to fit Qwen 3.6 35B A3B into 16GB of VRAM, & run it with Llama.cpp on an RTX 3080

 🗄️Storage Engines

Coordinated Scheduling for MoE LLM Serving

 KV Cache  Content type: Academic
arxiv.org·

I restarted a 10 year old Xeon 174 times to delete twelve flags and gain four tokens a second

 🗄️Storage Engines  Content type: Blog
point.free··Hacker News

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help