Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🤖 AI Inference
Model Serving, Inference Optimization, ONNX, Model Deployment
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
72
posts in
14.7
ms
CPritch/shiftpaper: Parallax wallpaper for Wayland with depth estimation, written in Rust + WGSL
📈
Grafana
github.com
·
2d
·
Hacker News
imec IC-Link and TSMC 3DFabric Alliance Expansion Signals New Era of System-Level Scaling
⚡
Hardware Acceleration
semiwiki.com
·
1d
KV Cache and Flash Attention with interactive diagrams
💾
Cache Optimization
kvcache.cobanov.dev
·
9h
·
Hacker News
I ran this bulky LLM on an SBC cluster, and it's the most unhinged setup I've ever built
⚙️
LLVM
xda-developers.com
·
6d
zero-intelligence/zero-intel: Every codebase has a confession. Most people never ask it the right question.
🔍
Code Review
github.com
·
13h
·
Hacker News
The
AI
Inference
Supercycle Is Here. These 2 Stocks Will Be the Biggest Winners of This Megatrend (Hint: It's Not Broadcom or Intel)
🏗️
AI Infrastructure
fool.com
·
2d
Show HN: Marlin-2B: a tiny VLM to extract structured information from videos
🏗️
AI Infrastructure
huggingface.co
·
2d
·
Hacker News
Flash Getting Stacked High-Bandwidth Version
🔁
Cache Coherence
semiengineering.com
·
6d
wojciechowskiapp/Kaption: Real-time in-game subtitle translation for Hoyoverse titles like Genshin Impact, Honkai: Star Rail on Windows
🗣️
Voice Coding
github.com
·
2d
·
Hacker News
AI
Inference
Costs: The Wake-Up Call for 2026 and 2027
🏗️
AI Infrastructure
blog.herlein.com
·
1d
·
Hacker News
What's in a GGUF, besides the weights - and what's still missing?
🤖
LLMs
nobodywho.ooo
·
6d
·
Hacker News
,
r/LocalLLaMA
BuffaloTechRider/Autodidact: Self-learning
AI
agent that gets smarter and cheaper over time. Routes between local and cloud LLMs, learns from every interaction, remembers everything.
🤖
AI agents
github.com
·
1d
·
Hacker News
2.3x KV Cache Compression at 32k Context
🏗
Computer Architecture
github.com
·
6d
·
Hacker News
Software 3.0
🔧
Software Engineering
dsebastien.net
·
2d
codexstar69/pi-listen: Hold-to-talk voice input for Pi CLI — Deepgram streaming STT with live transcription, voice commands, and cross-platform hold detection
🎙️
Whisper
github.com
·
1d
·
Hacker News
PyTorch, rewritten from scratch in pure Rust
🔥
Burn
github.com
·
6d
·
Hacker News
ImpactArbiter – A PyTorch autograd trap for LLM memory bugs
∀
Lean4
github.com
·
2d
·
Hacker News
kouhxp/cheap-im: CPU-only voice agent approximating Thinking Machines' Interaction
Models
demo
🎚️
Voice AI Systems
github.com
·
3d
·
Hacker News
chiennv2000/orthrus: Fast, lossless LLM
inference
via dual-view diffusion decoding.
💻
Local LLMs
github.com
·
5d
·
Hacker News
LocalVibe – Pure-Rust local
AI
stack with MCP, in one binary (Apple Silicon)
☁️
Serverless Rust
github.com
·
4d
·
Hacker News
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help