Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🏗️ AI Infrastructure
Model Serving, GPU Clusters, Inference Optimization, MLOps
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
34
posts in
19.5
ms
Cerebras IPO Signals Growing Pressure on the
GPU
Scaling
Model
⚡
Hardware Acceleration
hpcwire.com
·
6d
·
Hacker News
InferenceBench
: A Benchmark for Open-Ended Inference
Optimization
by
AI
Agents
🤖
AI Inference
inferencebench.ai
·
5h
·
Hacker News
Artain-AI/ignite-ms
: Fast self-hosted embedding engine for search, RAG, and reindexing workloads on NVIDIA GPUs. Built in Rust +
TensorRT
for teams that care about scale, cost, and control.
🔥
Burn
github.com
·
11h
·
Hacker News
SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for
LLM
Inference
on Superchips
🔁
Cache Coherence
supercomputing-system-ai-lab.github.io
·
2d
·
Hacker News
DeepSeek V4 Flash: Bringing Frontier
AI
to the Home
⚡
Hardware Acceleration
blog.jonathanpage.com
·
2d
·
Hacker News
GPU
Memory Math for LLMs: Formula That Tells You What Fits on Your
GPU
⚙️
Performance Profiling
theahmadosman.substack.com
·
7h
·
Substack
,
r/LocalLLaMA
Let
AI
Agents Write Your
Serving
Stack with VibeServe
🤖
AI agents
syfi.cs.washington.edu
·
6d
·
Hacker News
Command A+: Making sovereign agentic capabilities available to all
🤖
AI Coding Tools
cohere.com
·
12h
·
Hacker News
ImpactArbiter – A PyTorch autograd
trap
for
LLM
memory bugs
∀
Lean4
github.com
·
2d
·
Hacker News
A cheap fix that saves the
AI
$400M dollars a year and brings 4B people online
☁️
Serverless Rust
codecai.net
·
3d
·
Hacker News
I tried 4
LLM
speedup techniques on CPU. Three made it slower.
⚙️
Performance Profiling
deemwar-products.github.io
·
9h
·
Hacker News
The Best Open Source and Open-Weight
LLM
Models
to Run Locally in 2026
💻
Local LLMs
huggingface.co
·
2d
Ollama Doesn't Know Its
GPU
Is on Another Machine
⚡
Hardware Acceleration
loopholelabs.io
·
14h
·
Hacker News
2.3x
KV
Cache
Compression at 32k Context
🏗
Computer Architecture
github.com
·
6d
·
Hacker News
The Oats Protocol – Open Agent Tools for Local Coding Agents
🧩
Nomad
news.ycombinator.com
·
2d
·
Hacker News
I replaced GitHub Copilot with a self-hosted
AI
and I won’t go back
🤖
AI Coding Tools
xda-developers.com
·
9h
Mistral SDK
🎨
Design Systems
dsebastien.net
·
2d
My RTX 5090 can't keep up with Apple Silicon on the biggest local LLMs, and I hate to admit it
🖥
computers
xda-developers.com
·
6d
zero-intelligence/zero-intel: Every codebase has a confession. Most people never ask it the right question.
🔍
Code Review
github.com
·
12h
·
Hacker News
You don't need a Macbook to do applied
AI
engineering
💻
Systems Programming
sammai.bearblog.dev
·
11h
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help