Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🤖 AI Inference
Model Serving, Inference Optimization, ONNX, Model Deployment
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
171006
posts in
25.8
ms
Technology solutions targeting the performance of gen-AI inference in
resource
constrained
platforms
🏗️
AI Infrastructure
arxiv.org
·
1d
Introducing
dotLLM
- Building an LLM
Inference
Engine in C#
🏗️
AI Infrastructure
kokosa.dev
·
14h
·
Hacker News
Quantization
,
LoRA
, and the 8% Problem: Benchmarking Local LLMs for Production AI
📱
Edge AI
walsenburgtech.com
·
3d
·
Hacker News
Turning idle household RTX
3090s
into a batch AI inference network: looking for
testers
🏗️
AI Infrastructure
solvyr.com
·
2d
·
r/selfhosted
Redefining
AI Inference With New
Silicon
Architecture
⚡
Hardware Acceleration
semiengineering.com
·
5d
LLM inference engine
written
ground-up
natively
in C#/.NET
🏗️
AI Infrastructure
dotllm.dev
·
13h
·
Hacker News
Google
Enhances
AI Inference Control for
Enterprises
🏗️
AI Infrastructure
pub.towardsai.net
·
6d
Compare
TEE-Based
AI Providers
🏗️
AI Infrastructure
confidentialinference.net
·
6d
·
Hacker News
The Engine Behind Modern LLM Inference, Part 1: Continuous
Batching
,
PagedAttention
, and the End of…
🏗️
AI Infrastructure
medium.com
·
5d
Reasoning as Data:
Representation-Computation
Unity and Its Implementation in a
Domain-Algebraic
Inference Engine
⚙️
Alloy
arxiv.org
·
1d
Flow-Controlled
Scheduling
for LLM Inference with
Provable
Stability Guarantees
💻
Local LLMs
arxiv.org
·
1d
Characterizing
Performance-Energy
Trade-offs
of Large Language Models in Multi-Request Workflows
🏗️
AI Infrastructure
arxiv.org
·
1d
Token-Budget-Aware
Pool
Routing
for Cost-Efficient LLM Inference
🏗️
AI Infrastructure
arxiv.org
·
1d
Blink: CPU-Free LLM Inference by
Delegating
the Serving Stack to GPU and
SmartNIC
🏗️
AI Infrastructure
arxiv.org
·
5d
Scheduling the
Unschedulable
:
Taming
Black-Box LLM Inference at Scale
🏗️
AI Infrastructure
arxiv.org
·
6d
QaRL
: Rollout-Aligned Quantization-Aware RL for Fast and Stable Training under Training--Inference
Mismatch
📱
Edge AI
arxiv.org
·
5d
MoBiE
: Efficient Inference of Mixture of Binary Experts under Post-Training
Quantization
📱
Edge AI
arxiv.org
·
6d
Joint Task
Offloading
, Inference Optimization and UAV Trajectory Planning for Generative AI
Empowered
Intelligent Transportation Digital Twin
👥
Digital Twins
arxiv.org
·
5d
From LLM to Silicon:
RL-Driven
ASIC
Architecture Exploration for On-Device AI Inference
⚡
Hardware Acceleration
arxiv.org
·
5d
Neural
Computers
🧠
Neuromorphic Hardware
arxiv.org
·
6d
·
Hacker News
,
Hacker News
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help