Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
MLSys
⚙ MLSys
Specific
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
197
posts in
9.0
ms
Build a local voice agent with Red Hat OpenShift AI
🎮
GPU Architecture
developers.redhat.com
·
3d
3 days ago
Actions for Build a local voice agent with Red Hat OpenShift AI
Beyond Per-Token Pricing: A Concurrency-Aware Methodology for LLM Infrastructure Cost Estimation
🎮
GPU Architecture
Content type:
Academic
arxiv.org
·
13h
13 hours ago
Actions for Beyond Per-Token Pricing: A Concurrency-Aware Methodology for LLM Infrastructure Cost Estimation
Does anyone know what PCIe
mode
was used for these benchmarks?
💡
FlashAttention
Content type:
Code
github.com
·
5d
5 days ago
·
r/LocalLLaMA
Actions for Does anyone know what PCIe mode was used for these benchmarks?
[AINews] Open
Models
, Model Labs vs Agent Labs, and What's Untrainable — Sarah Guo
🐧
Kernel Dev
Content type:
News
latent.space
·
14h
14 hours ago
Actions for [AINews] Open Models, Model Labs vs Agent Labs, and What's Untrainable — Sarah Guo
AI Serving Platform That Adapts to Your
Model
🎮
GPU Architecture
Content type:
Blog
databricks.com
·
1d
1 day ago
Actions for AI Serving Platform That Adapts to Your Model
Best Python AI Frameworks in 2026 | The PyCharm Blog
📦
TVM
Content type:
Blog
blog.jetbrains.com
·
17h
17 hours ago
Actions for Best Python AI Frameworks in 2026 | The PyCharm Blog
Thoughts on Claude Fable's silent safeguards
📦
TVM
lesswrong.com
·
18h
18 hours ago
Actions for Thoughts on Claude Fable's silent safeguards
mirkolenz/llmhop: Tiny, stateless Go router that dispatches OpenAI-compatible requests to
single-model
vLLM
and sglang backends with zero external dependencies
🐧
Kernel Dev
Content type:
Code
github.com
·
6d
6 days ago
·
Hacker News
Actions for mirkolenz/llmhop: Tiny, stateless Go router that dispatches OpenAI-compatible requests to single-model vLLM and sglang backends with zero external dependencies
Toward
Compiler
World
Models
:
Learning
Latent Dynamics for Efficient Tensor Program Search
🟩
CUDA
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Toward Compiler World Models: Learning Latent Dynamics for Efficient Tensor Program Search
Latest technical articles & videos.
🐧
Kernel Dev
certdepot.net
·
5d
5 days ago
Actions for Latest technical articles & videos.
Nvidia's RTX Spark is a developer's dream, but AMD's Ryzen AI Max+ is what most people actually need for local AI
🎮
GPU Architecture
xda-developers.com
·
4d
4 days ago
Actions for Nvidia's RTX Spark is a developer's dream, but AMD's Ryzen AI Max+ is what most people actually need for local AI
I stopped using most of Rust’s advanced features for my
ML
library
💻
OS
Content type:
Code
github.com
·
2d
2 days ago
·
r/rust
Actions for I stopped using most of Rust’s advanced features for my ML library
Resource-aware
Computation-Communication
Overlap for
multi-GPU
ML
Workloads
💻
OS
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Resource-aware Computation-Communication Overlap for multi-GPU ML Workloads
Five labs, five minds: building a
multi-model
finance drama on small models
💻
OS
Content type:
Blog
huggingface.co
·
4d
4 days ago
Actions for Five labs, five minds: building a multi-model finance drama on small models
Alleged Fable sabotage of an
ML
project
📦
TVM
xcancel.com
·
14h
14 hours ago
·
Hacker News
Actions for Alleged Fable sabotage of an ML project
MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better
💡
FlashAttention
Content type:
News
Content type:
Blog
kaitchup.substack.com
·
5d
5 days ago
·
r/LocalLLaMA
Actions for MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better
RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM
Inference
📦
TVM
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference
Youssof Altoukhi (@Youssofal_)
💡
FlashAttention
xcancel.com
·
3d
3 days ago
·
r/LocalLLaMA
Actions for Youssof Altoukhi (@Youssofal_)
heterodoxin/graphkv: Graph-guided KV cache compression for memory-efficient LLM
inference
.
💻
OS
Content type:
Code
github.com
·
4d
4 days ago
·
r/LocalLLaMA
Actions for heterodoxin/graphkv: Graph-guided KV cache compression for memory-efficient LLM inference.
Claude Fable 5 silently degrades its own performance on frontier AI work
💡
FlashAttention
Content type:
News
Content type:
Blog
mkotlikov.substack.com
·
1d
1 day ago
·
Substack
Actions for Claude Fable 5 silently degrades its own performance on frontier AI work
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help