Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
AI Infrastructure
⚙️ AI Infrastructure
AI stack, model serving, inference, ML infrastructure
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
219
posts in
7.1
ms
harshuljain13/llm-inference-at-scale
: A Practitioner handbook for production llm
serving
.
🧠
LLMs
Content type:
Code
github.com
·
4d
4 days ago
·
Hacker News
,
r/LLM
Actions for harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.
DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200
📊
AI Monitoring
Content type:
News
newsletter.semianalysis.com
·
1d
1 day ago
·
Hacker News
Actions for DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200
Intelligent
inference
scheduling with llm-d on Red Hat
AI
📊
AI Monitoring
developers.redhat.com
·
11h
11 hours ago
Actions for Intelligent inference scheduling with llm-d on Red Hat AI
Infrastructure
Options for Scalable
AI
Inference
🔍
GEO
Content type:
Blog
mirantis.com
·
1d
1 day ago
Actions for Infrastructure Options for Scalable AI Inference
From Explicit Elements to Implicit Intent: A Predefined Library for Auditable Behavioral
Inference
🧠
LLMs
Content type:
Academic
arxiv.org
·
7h
7 hours ago
Actions for From Explicit Elements to Implicit Intent: A Predefined Library for Auditable Behavioral Inference
Running LLM
Inference
on
Kubernetes
: What It Actually Takes
🧠
LLMs
Content type:
Blog
fairwinds.com
·
5d
5 days ago
Actions for Running LLM Inference on Kubernetes: What It Actually Takes
Inferoa
AI
harness claimed 90% cache savings. We ran it and measured 97.8%
📊
AI Monitoring
zozo123.github.io
·
1d
1 day ago
·
Hacker News
Actions for Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%
AI
Serving
Platform That Adapts to Your
Model
📊
AI Monitoring
Content type:
Blog
databricks.com
·
19h
19 hours ago
Actions for AI Serving Platform That Adapts to Your Model
From
GPU
to Token: The 8-Layer Observability
Stack
for
AI
Infrastructure
📊
AI Monitoring
Content type:
Blog
jimmysong.io
·
2d
2 days ago
Actions for From GPU to Token: The 8-Layer Observability Stack for AI Infrastructure
The
Inference
Alpha: Maximizing Frontier
Models
on AMD
🧠
LLMs
Content type:
Blog
digitalocean.com
·
21h
21 hours ago
Actions for The Inference Alpha: Maximizing Frontier Models on AMD
google/gemma-4-31B-it · fix: chat template — null handling, reasoning preservation, turn-tag balance, input validation
🔍
GEO
huggingface.co
·
3d
3 days ago
·
r/LocalLLaMA
Actions for google/gemma-4-31B-it · fix: chat template — null handling, reasoning preservation, turn-tag balance, input validation
Token4Token — pay-per-token
inference
on Gnosis + Swarm
🧠
LLMs
t4t.eth.link
·
2d
2 days ago
·
Hacker News
Actions for Token4Token — pay-per-token inference on Gnosis + Swarm
CommBench: Can LLMs Write Correct and Efficient
GPU
Communication Code?
🧠
LLMs
uccl-project.github.io
·
4h
4 hours ago
·
Hacker News
Actions for CommBench: Can LLMs Write Correct and Efficient GPU Communication Code?
NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local
AI
🔍
GEO
Content type:
Blog
blogs.nvidia.com
·
19h
19 hours ago
Actions for NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI
Magenta RealTime 2: Open and Local Live Music
Models
🧠
LLMs
magenta.withgoogle.com
·
6d
6 days ago
·
Hacker News
,
Hacker News
,
r/LocalLLaMA
Actions for Magenta RealTime 2: Open and Local Live Music Models
Mobile
AI
Compute
Engine
(MACE)
inference
framework — Vision SDK
🔎
AI Search
Content type:
Blog
mapbox.com
·
2d
2 days ago
Actions for Mobile AI Compute Engine (MACE) inference framework — Vision SDK
Location: Lubbock, TX, USA Remote: Yes (Remote-friendly, US-based) Technologies:...
📡
Information Retrieval
Content type:
Discussion
news.ycombinator.com
·
20h
20 hours ago
·
Hacker News
Actions for Location: Lubbock, TX, USA Remote: Yes (Remote-friendly, US-based) Technologies:...
OpenCV Introduces New DNN
Inference
Engine
🔍
GEO
i-programmer.info
·
2d
2 days ago
Actions for OpenCV Introduces New DNN Inference Engine
2x GH200 for LLM
inference
, Part 2:
vLLM
, DeepSeek V4 Flash, and MTP
💡
Framework Thinking
Content type:
Blog
dnhkng.github.io
·
3d
3 days ago
Actions for 2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP
I Processed 2.4 Billion Tokens Across 52
AI
Models
for $0.52. Here's the Full Breakdown.
🛠️
Developer Tools
saintlex.sbs
·
8h
8 hours ago
·
DEV
Actions for I Processed 2.4 Billion Tokens Across 52 AI Models for $0.52. Here's the Full Breakdown.
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help