AI Infrastructure

Feeds to Scour
SubscribedAll
Scoured 162 posts in 7.5 ms

Making FlashAttention-4 faster for inference

 💬LLMs  Content type: Blog
modal.com··Hacker News

Breaking free of a single datacenter: Practical geo-distributed AI operations with the k0smos platforms

 ☁️Cloud Computing  Content type: Blog
cncf.io·

DiffusionGemma: The Developer Guide

 🤖AI  Content type: Blog

AI Serving Platform That Adapts to Your Model

 ☸️K8S  Content type: Blog
databricks.com·

google/gemma-4-31B-it · fix: chat template — null handling, reasoning preservation, turn-tag balance, input validation

 📦Containerization

RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference

 💬LLMs  Content type: Academic
arxiv.org·

CommBench: Can LLMs Write Correct and Efficient GPU Communication Code?

 💬LLMs

Monitor Nebius AI Cloud with Datadog

 ☁️Cloud Computing  Content type: Blog
datadoghq.com·

Token4Token — pay-per-token inference on Gnosis + Swarm

 ☁️Cloud Computing
t4t.eth.link··Hacker News

Google's new open-weights model brings image-generation tricks to AI text generation

 🤖AI  Content type: News
theregister.com·

[eCHO News] Episode #104: mTLS for Cilium. Lisp for eBPF

 ☁️Cloud Computing

How we fight GPU scarcity without compromise

 🔒Cybersecurity  Content type: Blog
equixly.com··Hacker News

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

 🤖AI  Content type: Code
github.com··Hacker News

Cloud: 10 companies that raised the most in 2025

 ☁️Cloud Computing  Content type: News
tech.eu·

What Network Data Can and Can’t Tell Us About AI Infrastructure

 🔗Networking  Content type: Blog
backblaze.com·

What AI benchmarks miss about real-world performance

 ☁️Cloud Computing
venturebeat.com·

Build a local voice agent with Red Hat OpenShift AI

 🤖AI
developers.redhat.com·

DiffusionGemma: 4x Faster Text Generation

 🤖AI  Content type: News  Content type: Blog

PagedAttention vs Traditional KV Cache: How vLLM Reinvented GPU Memory for LLM Inference

 💬LLMs  Content type: Blog
medium.com
·

APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing

 💬LLMs  Content type: Academic
arxiv.org·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help