Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🤖 AI Inference
Model Serving, Inference Optimization, ONNX, Model Deployment
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
72
posts in
19.1
ms
InferenceBench
: A Benchmark for Open-Ended Inference
Optimization
by
AI
Agents
🏗️
AI Infrastructure
inferencebench.ai
·
5h
·
Hacker News
A cheap fix that saves the
AI
$400M dollars a year and brings 4B people online
🏗️
AI Infrastructure
codecai.net
·
3d
·
Hacker News
SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM
Inference
on Superchips
🔁
Cache Coherence
supercomputing-system-ai-lab.github.io
·
2d
·
Hacker News
Command A+: Making sovereign agentic capabilities available to all
🤖
AI Coding Tools
cohere.com
·
12h
·
Hacker News
Unleashing the Power of
ONNX
for Speedier SBERT
Inference
🏗️
AI Infrastructure
pub.towardsai.net
·
2d
Artain-AI/ignite-ms
: Fast self-hosted embedding
engine
for search, RAG, and reindexing workloads on NVIDIA GPUs. Built in Rust +
TensorRT
for teams that care about scale, cost, and control.
🔥
Burn
github.com
·
12h
·
Hacker News
Let
AI
Agents Write Your
Serving
Stack with VibeServe
🏗️
AI Infrastructure
syfi.cs.washington.edu
·
6d
·
Hacker News
Training a 22MB prompt injection classifier
🏗️
AI Infrastructure
stackone.com
·
14h
·
Hacker News
DeepSeek V4 Flash: Bringing Frontier
AI
to the Home
⚡
Hardware Acceleration
blog.jonathanpage.com
·
2d
·
Hacker News
kouhxp/yapsnap: Snap any video URL or audio file into plaintext. No GPU. No cloud. One command.
🎚️
Audio Codecs
github.com
·
7h
·
Hacker News
The Best Open Source and Open-Weight LLM
Models
to
Run
Locally in 2026
💻
Local LLMs
huggingface.co
·
2d
GPU Memory Math for LLMs: Formula That Tells You What Fits on Your GPU
🏗️
AI Infrastructure
theahmadosman.substack.com
·
8h
·
Substack
,
r/LocalLLaMA
Show HN: GPT-2
inference
in pure C#, 0 bytes allocated per token
🔥
Burn
github.com
·
3d
·
Hacker News
The Oats Protocol – Open Agent Tools for Local Coding Agents
🧩
Nomad
news.ycombinator.com
·
2d
·
Hacker News
I tried 4 LLM speedup techniques on CPU. Three made it slower.
⚙️
Performance Profiling
deemwar-products.github.io
·
10h
·
Hacker News
Ollama Doesn't Know Its GPU Is on Another Machine
⚡
Hardware Acceleration
loopholelabs.io
·
15h
·
Hacker News
Mistral SDK
🎨
Design Systems
dsebastien.net
·
2d
A VERY lightweight open web-search tool for smaller local LLMs
⚙️
DataFusion
github.com
·
6d
·
Hacker News
,
r/LocalLLaMA
I replaced GitHub Copilot with a self-hosted
AI
and I won’t go back
🤖
AI Coding Tools
xda-developers.com
·
10h
With Its IPO Done, Cerebras Can Get Back To Pushing The
AI
Envelope
🧠
Neuromorphic Chips
nextplatform.com
·
5d
·
Hacker News
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help