Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
ML Inference
⚡ ML Inference
inference engine, model serving, inference optimization, runtime
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
148
posts in
19.8
ms
I Processed 2.4 Billion Tokens Across 52 AI
Models
for $0.52. Here's the Full Breakdown.
🔄
MLOps
saintlex.sbs
·
3h
3 hours ago
·
DEV
Actions for I Processed 2.4 Billion Tokens Across 52 AI Models for $0.52. Here's the Full Breakdown.
How we fight GPU scarcity without compromise
🧠
Deep Learning
Content type:
Blog
equixly.com
·
5d
5 days ago
·
Hacker News
Actions for How we fight GPU scarcity without compromise
Real-Time Industrial Defect Detection on Edge Hardware Using Fine-Tuned YOLOv8: A Systematic Benchmark on the NEU Surface Defect Database and MVTec AD with Automotive & Battery Manufacturing Extensions
🖥️
Systems ML
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Real-Time Industrial Defect Detection on Edge Hardware Using Fine-Tuned YOLOv8: A Systematic Benchmark on the NEU Surface Defect Database and MVTec AD with Automotive & Battery Manufacturing Extensions
KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++
🎮
GPU Programming
Content type:
Code
github.com
·
4d
4 days ago
·
Hacker News
Actions for KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++
OpenCV 5 release - New DNN
engine
with enhanced
ONNX
and LLM/VLM support, Intel, Arm, and RISC-V hardware
optimizations
- CNX Software
🧠
Deep Learning
Content type:
News
cnx-software.com
·
1d
1 day ago
Actions for OpenCV 5 release - New DNN engine with enhanced ONNX and LLM/VLM support, Intel, Arm, and RISC-V hardware optimizations - CNX Software
Why agentic AI needs an open
inference
stack
🔄
MLOps
redhat.com
·
3d
3 days ago
Actions for Why agentic AI needs an open inference stack
CoreML vs TFLite: iPhone 15 Pro GPU 2.3x Faster
🤖
Machine Learning
Content type:
Blog
Content type:
Discussion
tildalice.io
·
4d
4 days ago
Actions for CoreML vs TFLite: iPhone 15 Pro GPU 2.3x Faster
Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good
🗜️
Quantization
Content type:
Blog
towardsai.net
·
2d
2 days ago
Actions for Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good
How Will the Chiplet IC Market Transform Semiconductor Design Through 2034?
🎮
GPU Programming
Content type:
Blog
semiconinsights.blogspot.com
·
23h
23 hours ago
Actions for How Will the Chiplet IC Market Transform Semiconductor Design Through 2034?
End-to-End Context
Compression
at Scale
📐
Model Architecture
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for End-to-End Context Compression at Scale
How frontier teams are reinventing AI-native development
🔄
MLOps
Content type:
Blog
aws.amazon.com
·
5h
5 hours ago
Actions for How frontier teams are reinventing AI-native development
[AINews] FrontierCode: Benchmarking for Code Quality over Slop
🗜️
Quantization
Content type:
News
latent.space
·
2d
2 days ago
Actions for [AINews] FrontierCode: Benchmarking for Code Quality over Slop
Model
Evaluations: Prove Your Routing Policy Actually Works
⚙️
Model Training
Content type:
Blog
digitalocean.com
·
6d
6 days ago
Actions for Model Evaluations: Prove Your Routing Policy Actually Works
Learning Fuzzy Logic: Automatic Rule Discovery Through Differentiable Circuits
🤖
Machine Learning
metafunctor.com
·
4d
4 days ago
·
DEV
Actions for Learning Fuzzy Logic: Automatic Rule Discovery Through Differentiable Circuits
A 185 TOPS/W/mm2 Bayesian
Inference
Engine
with 640 aJ Write-Free FeFET GRNG for Uncertainty-Aware Aerial Search and Rescue
🧠
Deep Learning
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for A 185 TOPS/W/mm2 Bayesian Inference Engine with 640 aJ Write-Free FeFET GRNG for Uncertainty-Aware Aerial Search and Rescue
huawei-csl/KVarN: KVarN is a native
vLLM
KV-cache
quantization
backend for your agents: 3-5x more context,
throughput
above FP16, and FP16-level accuracy. Calibration-free, one flag.
🎮
GPU Programming
Content type:
Code
github.com
·
6d
6 days ago
·
Hacker News
Actions for huawei-csl/KVarN: KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.
ASUS ExpertBook Ultra Flagship Business Laptop Debuts In SEA Markets, Featuring Sub-1kg Chassis & Intel Core Ultra X7 Processor
🦀
WGPU
pokde.net
·
16h
16 hours ago
Actions for ASUS ExpertBook Ultra Flagship Business Laptop Debuts In SEA Markets, Featuring Sub-1kg Chassis & Intel Core Ultra X7 Processor
How Small Can You Go? LoRA Fine-Tuning 270M-8B
Models
for Merchant Information Extraction in Financial Transactions
🖥️
Systems ML
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for How Small Can You Go? LoRA Fine-Tuning 270M-8B Models for Merchant Information Extraction in Financial Transactions
Understanding Agentic AI Infrastructure
🔄
MLOps
Content type:
Blog
mirantis.com
·
1d
1 day ago
Actions for Understanding Agentic AI Infrastructure
How to
Run
Gemma 4 12B Locally - The Best AI For Consumer Laptops
🔄
MLOps
Content type:
Video
youtube.com
·
6d
6 days ago
Actions for How to Run Gemma 4 12B Locally - The Best AI For Consumer Laptops
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help