Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
⚡ AI Inference
Specific
model serving, inference optimization, vLLM, quantization
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
149488
posts in
12.9
ms
Overcoming
inference
challenges
🧠
LLMs
redhat.com
·
3d
Blink: CPU-Free LLM Inference by
Delegating
the Serving Stack to GPU and
SmartNIC
🧠
LLMs
arxiv.org
·
6h
Redefining
AI Inference With New
Silicon
Architecture
✍️
Prompt Engineering
semiengineering.com
·
1d
The Engine Behind Modern LLM Inference, Part 1: Continuous
Batching
,
PagedAttention
, and the End of…
🧠
LLMs
medium.com
·
17h
RetroInfer
: A
Vector
Storage Engine for Scalable Long-Context LLM Inference
🧠
LLMs
vldb.org
·
1d
Claude
Managed
Agents: The Infrastructure
Abstraction
That Changes How You Ship AI in Production
🤖
LLM Agents
medium.com
·
4h
LLM
inference
engine from
scratch
in C++
🧠
LLMs
anirudhsathiya.com
·
4d
·
Hacker News
I Ran My
KYB
Engine at Three
Quantization
Levels. Accuracy Didn't Move. Cost Dropped 6x.
🧠
LLMs
walsenburgtech.com
·
17h
·
Hacker News
Inside LLM Inference: KV Cache,
Prefill
, and the
Decode
Bottleneck
🧠
LLMs
pub.towardsai.net
·
1d
Guardrails
at the gateway: Securing AI inference on
GKE
with Model Armor
🤖
LLM Agents
cloud.google.com
·
10h
Compare
TEE-Based
AI Providers
🤖
LLM Agents
confidentialinference.net
·
1d
·
Hacker News
TurboQuant
Explained: Extreme AI
Compression
for Faster, Cheaper LLM Inference and Vector Search
🧠
LLMs
medium.com
·
5d
Prediction: The "Inference
Supercycle
" Could Be
Bigger
Than the Training Boom. 1 Growth Stock to Own.
🧠
LLMs
finance.yahoo.com
·
17h
kymuco/codex-dispatcher
: Telegram bot for running local Codex workflows from chat with session continuity, diagnostics, and runtime controls.
✍️
Prompt Engineering
github.com
·
3h
·
r/SideProject
Google
TurboQuant
Explained: The 6x Memory Compression That
Crashed
Chip Stocks
🧠
LLMs
medium.com
·
2d
What Is AI
Inference
?
🤖
LLM Agents
sambanova.ai
·
3d
Data
Orchestration
in the Age of Autonomous Agents: Architectural Patterns Building on
NemoClaw
& OpenClaw
🤖
LLM Agents
backblaze.com
·
18h
Deep Dive into Google Cloud
Pub/Sub
Single Message
Transforms
and AI Inference
🧠
LLMs
medium.com
·
2d
The case for Model-as-a-Service over
self-managed
inference
🧠
LLMs
news.ycombinator.com
·
3d
·
Hacker News
Attn-QAT
: Making 4-Bit Attention Actually Work
🧠
LLMs
haoailab.com
·
1d
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help