Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🤖 AI Inference
Model Serving, Inference Optimization, ONNX, Model Deployment
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
5782
posts in
12.1
ms
The case for Model-as-a-Service over
self-managed
inference
🏠
Self-hosted AI
news.ycombinator.com
·
1d
·
Hacker News
LLM
inference
engine from
scratch
in C++
💻
Local LLMs
anirudhsathiya.com
·
3d
·
Hacker News
Compare
TEE-Based
AI Providers
🏗️
AI Infrastructure
confidentialinference.net
·
12h
·
Hacker News
Comparative
Characterization
of
KV
Cache Management Strategies for LLM Inference
🔁
Cache Coherence
arxiv.org
·
1d
Google gives
enterprises
new controls to manage AI inference costs and
reliability
🏗️
AI Infrastructure
infoworld.com
·
5d
GPU Memory for LLM Inference: Why
Llama-70B
Doesn't Fit
🏗️
AI Infrastructure
darshanfofadiya.com
·
2d
·
Hacker News
onnx/onnx
: Open standard for machine learning
interoperability
🚀
MLOps
github.com
·
2d
Cryptographic
Provenance
for LLM Inference
∀
Lean4
commitllm.com
·
5d
·
Hacker News
UCCL-EP
: Portable
Expert-Parallel
Communication
🔍
eBPF
uccl-project.github.io
·
1d
·
Hacker News
Lightweight LLM
aggregator
(
vLLM
, Llama-server)
☁️
Serverless Rust
go-llm-proxy.com
·
6d
·
Hacker News
,
Hacker News
Karpathy
's knowledge base matches our
Grep-is-All-You-Need
paper
🔍
RAG
localkin.dev
·
4d
·
Hacker News
BEKO2210/cricket-brain
: A
biomorphic
AI inference engine based on cricket auditory neuroscience — delay-line coincidence detection without matrix multiplication
🧠
Neuromorphic Chips
github.com
·
1d
·
Hacker News
Speeding Up
IAMTrail
: One Boto3 Process Instead of 1,500 CLI
Invocations
☁️
Serverless Rust
zoph.me
·
4d
vLLM
introduces memory
optimizations
for long-context inference
⚙️
LLVM
github.com
·
4d
·
Hacker News
Characterizing
WebGPU
Dispatch Overhead for LLM Inference Across Four GPU Vendors, Three
Backends
, and Three Browsers
🏗️
AI Infrastructure
arxiv.org
·
3d
·
Hacker News
janit/viiwork
: LLM inference load balancer optimized for AMD Radeon VII GPUs
🏗️
AI Infrastructure
github.com
·
3d
·
Hacker News
haschka/ocr
_tool: A tool to perform
OCR
with an AI
OCR
model and an inference engine like llama.cpp
🤖
AI Coding Tools
github.com
·
3d
·
Hacker News
hamtun24/openuma
: Unified Memory Abstraction Layer for AI Inference on AMD APUs and Intel iGPUs
⚙️
Performance Profiling
github.com
·
5d
·
Hacker News
rpgeeganage/lazy-tool
: local-first MCP discovery runtime for agents — search before
invoke
, reduce prompt bloat, and route to local MCP tools
☁️
Serverless Rust
github.com
·
6d
·
Hacker News
tinelabs/tine
: A
branching
notebook runtime for AI and humans. Written in Rust 🦀
☁️
Serverless Rust
github.com
·
3d
·
Hacker News
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help