Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🏗️ AI Infrastructure
Model Serving, GPU Clusters, Inference Optimization, MLOps
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
5757
posts in
14.9
ms
Inference
Arena
– new
benchmark
of local inference and training
📱
Edge AI
kvark.github.io
·
3d
·
Hacker News
Safeguarded
AI
🤖
AI Coding Tools
aria.org.uk
·
1d
·
Hacker News
EU's
Exposed
AI Infrastructure
🏠
Self-hosted AI
insecurestack.substack.com
·
19h
·
Substack
Optimizing
our inference back end with custom load
balancing
🧩
Nomad
photoroom.com
·
8h
·
Hacker News
The case for Model-as-a-Service over
self-managed
inference
🏠
Self-hosted AI
news.ycombinator.com
·
1d
·
Hacker News
AI Infrastructure
Roadmap
: Five
frontiers
for 2026
🤖
AI Coding Tools
nextbigteng.substack.com
·
6d
·
Substack
dvelton/ai-pixel
: One pixel. Three
weights
. Real inference. AI model that fits in a single pixel.
📱
Edge AI
github.com
·
23h
·
Hacker News
Characterizing
WebGPU
Dispatch Overhead for LLM Inference Across Four GPU Vendors, Three
Backends
, and Three Browsers
🔥
Burn
arxiv.org
·
2d
·
Hacker News
benchmarking
inference
of popular models on consumer hardware
🔥
PyTorch
inferena.tech
·
3d
·
Hacker News
UCCL-EP
: Portable
Expert-Parallel
Communication
🔍
eBPF
uccl-project.github.io
·
1d
·
Hacker News
KernelEvolve
: How Meta’s Ranking Engineer Agent
Optimizes
AI Infrastructure
📱
Edge AI
engineering.fb.com
·
6d
·
Hacker News
LLM
inference
engine from
scratch
in C++
💻
Local LLMs
anirudhsathiya.com
·
3d
·
Hacker News
PacifAIst/Quansloth
: Based on the implementation of Google's TurboQuant (ICLR 2026) —
Quansloth
brings elite KV cache compression to local LLM inference.
Quansloth
is a fully private, air-gapped AI server that runs massive context models natively on consumer hardware with ease
💻
Local LLMs
github.com
·
1d
·
Hacker News
The Long-Term Memory Layer for AI Systems by the Creator of
Apache
Cassandra
🌊
Event Streaming
cortexdb.ai
·
6d
·
Hacker News
Unlocking cloud inference
compute
for
OpenClaw
🏠
Self-hosted AI
news.ycombinator.com
·
18h
·
Hacker News
Meshllm
– Pool
compute
to run powerful open models
☁️
Serverless Rust
docs.anarchai.org
·
6d
·
Hacker News
Trinity-Large-Thinking
: the open source brain your AI agents have been missing
🤖
AI agents
firethering.com
·
4d
·
Hacker News
AI Assistance Reduces
Persistence
and
Hurts
Independent Performance
🤖
AI Coding Tools
arxiv.org
·
1d
·
Hacker News
wf802222/loqi
:
Loqi
is an experimental memory architecture for AI systems.
🤖
AI Coding Tools
github.com
·
3d
·
Hacker News
vLLM
introduces memory
optimizations
for long-context inference
⚙️
LLVM
github.com
·
4d
·
Hacker News
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help