Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
AI Infrastructure
🔧 AI Infrastructure
Specific
AI compute, GPU cluster, inference, model deployment
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
162
posts in
7.5
ms
Making FlashAttention-4 faster for
inference
💬
LLMs
Content type:
Blog
modal.com
·
21h
21 hours ago
·
Hacker News
Actions for Making FlashAttention-4 faster for inference
Breaking free of a single datacenter: Practical
geo-distributed
AI
operations with the k0smos platforms
☁️
Cloud Computing
Content type:
Blog
cncf.io
·
3d
3 days ago
Actions for Breaking free of a single datacenter: Practical geo-distributed AI operations with the k0smos platforms
DiffusionGemma: The Developer Guide
🤖
AI
Content type:
Blog
developers.googleblog.com
·
2d
2 days ago
·
Hacker News
Actions for DiffusionGemma: The Developer Guide
AI
Serving
Platform That Adapts to Your
Model
☸️
K8S
Content type:
Blog
databricks.com
·
1d
1 day ago
Actions for AI Serving Platform That Adapts to Your Model
google/gemma-4-31B-it · fix: chat template — null handling, reasoning preservation, turn-tag balance, input validation
📦
Containerization
huggingface.co
·
3d
3 days ago
·
r/LocalLLaMA
Actions for google/gemma-4-31B-it · fix: chat template — null handling, reasoning preservation, turn-tag balance, input validation
RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step
LLM
Inference
💬
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference
CommBench: Can LLMs Write Correct and Efficient
GPU
Communication Code?
💬
LLMs
uccl-project.github.io
·
1d
1 day ago
·
Hacker News
Actions for CommBench: Can LLMs Write Correct and Efficient GPU Communication Code?
Monitor Nebius
AI
Cloud with Datadog
☁️
Cloud Computing
Content type:
Blog
datadoghq.com
·
3d
3 days ago
Actions for Monitor Nebius AI Cloud with Datadog
Token4Token — pay-per-token
inference
on Gnosis + Swarm
☁️
Cloud Computing
t4t.eth.link
·
2d
2 days ago
·
Hacker News
Actions for Token4Token — pay-per-token inference on Gnosis + Swarm
Google's new open-weights
model
brings image-generation tricks to
AI
text generation
🤖
AI
Content type:
News
theregister.com
·
14h
14 hours ago
Actions for Google's new open-weights model brings image-generation tricks to AI text generation
[eCHO News] Episode #104: mTLS for Cilium. Lisp for eBPF
☁️
Cloud Computing
isovalent-9197153.hs-sites.com
·
6d
6 days ago
Actions for [eCHO News] Episode #104: mTLS for Cilium. Lisp for eBPF
How we fight
GPU
scarcity without compromise
🔒
Cybersecurity
Content type:
Blog
equixly.com
·
6d
6 days ago
·
Hacker News
Actions for How we fight GPU scarcity without compromise
KJLdefeated/RL.cu
: RLVR
training
for
LLM
in CUDA/C++
🤖
AI
Content type:
Code
github.com
·
5d
5 days ago
·
Hacker News
Actions for KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++
Cloud: 10 companies that raised the most in 2025
☁️
Cloud Computing
Content type:
News
tech.eu
·
1d
1 day ago
Actions for Cloud: 10 companies that raised the most in 2025
What Network Data Can and Can’t Tell Us About
AI
Infrastructure
🔗
Networking
Content type:
Blog
backblaze.com
·
1d
1 day ago
Actions for What Network Data Can and Can’t Tell Us About AI Infrastructure
What
AI
benchmarks miss about real-world performance
☁️
Cloud Computing
venturebeat.com
·
17h
17 hours ago
Actions for What AI benchmarks miss about real-world performance
Build a local voice agent with Red Hat OpenShift
AI
🤖
AI
developers.redhat.com
·
4d
4 days ago
Actions for Build a local voice agent with Red Hat OpenShift AI
DiffusionGemma: 4x Faster Text Generation
🤖
AI
Content type:
News
Content type:
Blog
blog.google
·
1d
1 day ago
·
Hacker News
,
r/LocalLLaMA
,
r/singularity
Actions for DiffusionGemma: 4x Faster Text Generation
PagedAttention vs Traditional KV Cache: How
vLLM
Reinvented
GPU
Memory for
LLM
Inference
💬
LLMs
Content type:
Blog
medium.com
·
3d
3 days ago
Actions for PagedAttention vs Traditional KV Cache: How vLLM Reinvented GPU Memory for LLM Inference
APEX4: Efficient Pure W4A4
LLM
Inference
via Intra-SM
Compute
Rebalancing
💬
LLMs
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help