Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
AI Infrastructure
🏗️ AI Infrastructure
AI hardware, ML infrastructure, AI accelerator, inference server
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
375
posts in
4.1
ms
Running LLM
Inference
on Kubernetes: What It Actually Takes
💬
LLMs
Content type:
Blog
fairwinds.com
·
5d
5 days ago
Actions for Running LLM Inference on Kubernetes: What It Actually Takes
Ollama
0.30
GPU
Boost: Faster local Qwen
inference
on NVIDIA
🟢
NVIDIA
everylocalai.com
·
11h
11 hours ago
·
DEV
Actions for Ollama 0.30 GPU Boost: Faster local Qwen inference on NVIDIA
Infrastructure
Options for Scalable
AI
Inference
🖧
Dedicated Servers
Content type:
Blog
mirantis.com
·
1d
1 day ago
Actions for Infrastructure Options for Scalable AI Inference
Intelligent
inference
scheduling with llm-d on Red Hat
AI
☁️
Cloud Computing
developers.redhat.com
·
8h
8 hours ago
Actions for Intelligent inference scheduling with llm-d on Red Hat AI
Resource-aware Computation-Communication Overlap for
multi-GPU
ML
Workloads
🟢
NVIDIA
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Resource-aware Computation-Communication Overlap for multi-GPU ML Workloads
Introducing Piper: A Programmable
Distributed
Training
System
🔀
Distributed Training
Content type:
Academic
Content type:
Blog
syfi.cs.washington.edu
·
5h
5 hours ago
·
Hacker News
Actions for Introducing Piper: A Programmable Distributed Training System
GPU
Servers
for Best Performance
🟢
NVIDIA
leaseweb.com
·
6d
6 days ago
·
DEV
Actions for GPU Servers for Best Performance
Inferoa
AI
harness claimed 90% cache savings. We ran it and measured 97.8%
🔧
MLOps
zozo123.github.io
·
21h
21 hours ago
·
Hacker News
Actions for Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%
Fixing a stuck
Ollama
runner and building a
GPU
watchdog
🟢
NVIDIA
patrickmccanna.net
·
2d
2 days ago
·
Hacker News
Actions for Fixing a stuck Ollama runner and building a GPU watchdog
CommBench: Can LLMs Write Correct and Efficient
GPU
Communication Code?
⚡
GPU Computing
uccl-project.github.io
·
1h
1 hour ago
·
Hacker News
Actions for CommBench: Can LLMs Write Correct and Efficient GPU Communication Code?
AMD's Lemonade SDK For Local
AI
Adds NVIDIA
CUDA
Support
⚡
GPU Computing
phoronix.com
·
15h
15 hours ago
·
r/artificial
Actions for AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support
KJLdefeated/RL.cu
: RLVR
training
for LLM in CUDA/C++
🟢
NVIDIA
Content type:
Code
github.com
·
4d
4 days ago
·
Hacker News
Actions for KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++
DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200
🟢
NVIDIA
Content type:
News
newsletter.semianalysis.com
·
1d
1 day ago
·
Hacker News
Actions for DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200
WEKA software speeds long context
AI
inferencing
on Oracle’s public cloud
🟢
NVIDIA
Content type:
News
blocksandfiles.com
·
17h
17 hours ago
Actions for WEKA software speeds long context AI inferencing on Oracle’s public cloud
From
GPU
to Token: The 8-Layer Observability Stack for
AI
Infrastructure
⚡
GPU Computing
Content type:
Blog
jimmysong.io
·
2d
2 days ago
Actions for From GPU to Token: The 8-Layer Observability Stack for AI Infrastructure
Where to Host Your Open-Source
Model
(Under 10B Parameters)
⚡
GPU Computing
digitalocean.com
·
6d
6 days ago
Actions for Where to Host Your Open-Source Model (Under 10B Parameters)
NVIDIA
Accelerates
Google DeepMind’s DiffusionGemma for Local
AI
🟢
NVIDIA
Content type:
Blog
blogs.nvidia.com
·
15h
15 hours ago
Actions for NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI
Token4Token — pay-per-token
inference
on Gnosis + Swarm
💬
LLMs
t4t.eth.link
·
1d
1 day ago
·
Hacker News
Actions for Token4Token — pay-per-token inference on Gnosis + Swarm
What
Ollama
Reveals About Local
AI
, Agents, and Open
Models
💬
LLMs
Content type:
Blog
odsc.medium.com
·
9h
9 hours ago
Actions for What Ollama Reveals About Local AI, Agents, and Open Models
Improved performance and
model
support with GGUF
💬
LLMs
Content type:
Blog
ollama.com
·
6d
6 days ago
Actions for Improved performance and model support with GGUF
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help