Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🤖 AI Inference
Model Serving, Inference Optimization, ONNX, Model Deployment
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
182714
posts in
51.2
ms
Really
excellent
work by the inference team to serve this model so
efficiently
!
⚡
Inference
twitter.macworks.dev
·
3d
The
Silent
Versioning
Problem in AI Inference
🤖
LLM Inference
digitalocean.com
·
2d
Can IBM’s
RITS
Platform and
vLLM
Reset the Bar for Enterprise AI Access?
🔄
AI Workflows
futurumgroup.com
·
1d
How to
Explain
AI to a
Friend
Who Doesn’t Follow Tech
🤖
GenAI
hongkiat.com
·
5d
Building a Local LLM Server with Raspberry Pi 5, Ollama,
Tailscale
and
Chatbox
🍓
Raspberry Pi
woliveiras.com
·
1d
·
r/LLM
Move
voxcpm
to AI and Agents > Pre-trained Models and Inference · vinta/awesome-python@
c08b123
🤖
GenAI
github.com
·
5d
vLLM-Lens
: Fast Interpretability
Tooling
That Scales to Trillion-Parameter Models
🤖
LLM Inference
lesswrong.com
·
3d
New Google
TPUs
multiply
AI infrastructure efficiency
☁️
GCP
techtarget.com
·
4d
an open-source
runtime
for
reliable
on-device AI agents
🏛
Sovereign AI Infrastructure
mirrorneuron.io
·
3d
·
Hacker News
Red Hat Performance and Scale Engineering
🔄
AI Workflows
redhat.com
·
5d
NVIDIA and Google infrastructure cuts AI
inference
costs
📊
Compute Markets
artificialintelligence-news.com
·
3d
a16z: Large Model Deployment =
Forgetting
—Can “Continual Learning” Break This
Vicious
Cycle?
🤖
LLM Inference
techflowpost.com
·
3d
not much
happened
today
✍️
Prompt Engineering
news.smol.ai
·
5d
dunetrace/dunetrace
: Runtime
observability
for AI agents. Privacy-safe by design.
📦
Sandboxing
github.com
·
6d
·
Hacker News
Ship
AI-powered
Products
Faster (Website)
⚙️
AI Automation
21st.dev
·
4d
Google is in talks with
Marvell
to build custom AI inference chips as it
diversifies
beyond Broadcom
🖥️
Local AI
oodaloop.com
·
6d
Prax
: An agent runtime that
learns
from past mistakes and fixes code in a loop
🧠
Context Engineering
github.com
·
3d
·
Hacker News
The Hidden
Bottlenecks
in LLM
Inference
and How to Fix Them
🤖
LLM Inference
digitalocean.com
·
4d
llmrb/llm.rb
: Ruby's most
capable
AI runtime
🧠
Context Engineering
github.com
·
3d
·
Lobsters
The LLM Inference
Trilemma
:
Throughput
, Latency, Cost
⚡
Inference
digitalocean.com
·
4d
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help