Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
ML Systems
🖥️ ML Systems
distributed training, CUDA, GPU cluster, training infrastructure
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
31
posts in
16.3
ms
APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing
💬
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing
If Claude Fable stops helping you, you’ll never know
💬
LLMs
simonwillison.net
·
1d
1 day ago
·
Hacker News
Actions for If Claude Fable stops helping you, you’ll never know
Gerrymandering the Warp: Non-Control-Data Attacks on
CUDA
Collective Decision
📐
Scaling Laws
Content type:
Academic
arxiv.org
·
6h
6 hours ago
Actions for Gerrymandering the Warp: Non-Control-Data Attacks on CUDA Collective Decision
LLM-Based Porting of Optimized C++ to
CUDA
Through Deoptimization and Reoptimization
📐
Scaling Laws
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for LLM-Based Porting of Optimized C++ to CUDA Through Deoptimization and Reoptimization
WarpGuard: Protected-Site Control-Flow Integrity for
CUDA
SASS Binaries
🤖
AI Agents
Content type:
Academic
arxiv.org
·
6h
6 hours ago
Actions for WarpGuard: Protected-Site Control-Flow Integrity for CUDA SASS Binaries
SET: Stream-Event-Triggered Scheduling for Efficient
CUDA
Graph
Pipelines
🔥
PyTorch
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for SET: Stream-Event-Triggered Scheduling for Efficient CUDA Graph Pipelines
TileFuse: A Fused
Mixed-Precision
Kernel Library for Efficient Quantized LLM Inference on AMD NPUs
📉
Deep Learning
Content type:
Academic
arxiv.org
·
6h
6 hours ago
Actions for TileFuse: A Fused Mixed-Precision Kernel Library for Efficient Quantized LLM Inference on AMD NPUs
AutoMegaKernel: A
Statically-Checked
Agent Harness for Self-Retargeting Megakernel Synthesis
💬
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
·
Hacker News
Actions for AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis
Joint Structural Pruning and
Mixed-Precision
Quantization for LLM Compression
💬
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Joint Structural Pruning and Mixed-Precision Quantization for LLM Compression
Learned Subspace Compression for Communication-Efficient
Pipeline
Parallelism
🧠
AI Research
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Learned Subspace Compression for Communication-Efficient Pipeline Parallelism
Characterizing Software Aging in
GPU-Based
LLM Serving
Systems
💬
LLMs
Content type:
Academic
arxiv.org
·
6h
6 hours ago
Actions for Characterizing Software Aging in GPU-Based LLM Serving Systems
Enhancing AI Interpretability and Safety through Localised Architectures
🔍
Interpretability
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Enhancing AI Interpretability and Safety through Localised Architectures
INFRAMIND:
Infrastructure-Aware
Multi-Agent Orchestration
🎮
Reinforcement Learning
Content type:
Academic
arxiv.org
·
6h
6 hours ago
Actions for INFRAMIND: Infrastructure-Aware Multi-Agent Orchestration
Resource-aware Computation-Communication Overlap for
multi-GPU
ML
Workloads
📉
Deep Learning
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Resource-aware Computation-Communication Overlap for multi-GPU ML Workloads
Demystifying NVSHMEM: A
System-Level
Analysis on Symmetric Memory and Device-Initiated Operations in
GPU
Communication
📐
Scaling Laws
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Demystifying NVSHMEM: A System-Level Analysis on Symmetric Memory and Device-Initiated Operations in GPU Communication
Bergson: An Open Source Library for Data Attribution
⚙️
Model Training
Content type:
Academic
arxiv.org
·
6h
6 hours ago
Actions for Bergson: An Open Source Library for Data Attribution
Breaking the Bubble: Asynchronous
Pipeline
Parallel
Training
with Bounded Weight Inconsistency
⚙️
Model Training
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Breaking the Bubble: Asynchronous Pipeline Parallel Training with Bounded Weight Inconsistency
A Scalable PyTorch Abstraction for
Multi-GPU
Gaussian Splatting
🔥
PyTorch
Content type:
Academic
arxiv.org
·
6h
6 hours ago
Actions for A Scalable PyTorch Abstraction for Multi-GPU Gaussian Splatting
On
GPU
Implementation for
Multi-Precision
Integer Division
📐
Scaling Laws
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for On GPU Implementation for Multi-Precision Integer Division
PALUTE: Processing-In-Memory Acceleration via Lookup Table for Edge LLM Inference
💬
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for PALUTE: Processing-In-Memory Acceleration via Lookup Table for Edge LLM Inference
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help