Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
ML Systems
🖥️ ML Systems
distributed training, CUDA, GPU cluster, training infrastructure
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
23
posts in
5.4
ms
APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing
💬
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing
If Claude Fable stops helping you, you’ll never know
💬
LLMs
simonwillison.net
·
1d
1 day ago
·
Hacker News
Actions for If Claude Fable stops helping you, you’ll never know
LLM-Based Porting of Optimized C++ to
CUDA
Through Deoptimization and Reoptimization
📐
Scaling Laws
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for LLM-Based Porting of Optimized C++ to CUDA Through Deoptimization and Reoptimization
SET: Stream-Event-Triggered Scheduling for Efficient
CUDA
Graph
Pipelines
🔥
PyTorch
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for SET: Stream-Event-Triggered Scheduling for Efficient CUDA Graph Pipelines
AutoMegaKernel: A
Statically-Checked
Agent Harness for Self-Retargeting Megakernel Synthesis
💬
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
·
Hacker News
Actions for AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis
Joint Structural Pruning and
Mixed-Precision
Quantization for LLM Compression
💬
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Joint Structural Pruning and Mixed-Precision Quantization for LLM Compression
Learned Subspace Compression for Communication-Efficient
Pipeline
Parallelism
🧠
AI Research
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Learned Subspace Compression for Communication-Efficient Pipeline Parallelism
Enhancing AI Interpretability and Safety through Localised Architectures
🔍
Interpretability
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Enhancing AI Interpretability and Safety through Localised Architectures
Resource-aware Computation-Communication Overlap for
multi-GPU
ML
Workloads
📉
Deep Learning
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Resource-aware Computation-Communication Overlap for multi-GPU ML Workloads
Demystifying NVSHMEM: A
System-Level
Analysis on Symmetric Memory and Device-Initiated Operations in
GPU
Communication
📐
Scaling Laws
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Demystifying NVSHMEM: A System-Level Analysis on Symmetric Memory and Device-Initiated Operations in GPU Communication
Breaking the Bubble: Asynchronous
Pipeline
Parallel
Training
with Bounded Weight Inconsistency
⚙️
Model Training
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Breaking the Bubble: Asynchronous Pipeline Parallel Training with Bounded Weight Inconsistency
On
GPU
Implementation for
Multi-Precision
Integer Division
📐
Scaling Laws
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for On GPU Implementation for Multi-Precision Integer Division
PALUTE: Processing-In-Memory Acceleration via Lookup Table for Edge LLM Inference
💬
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for PALUTE: Processing-In-Memory Acceleration via Lookup Table for Edge LLM Inference
Floating-point autotuning with customized
precisions
📐
Scaling Laws
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Floating-point autotuning with customized precisions
BIDENT: Heterogeneous Operator-level Mapping for Efficient Edge Inference
📐
Scaling Laws
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for BIDENT: Heterogeneous Operator-level Mapping for Efficient Edge Inference
Communication Strategy Selection for
Multi-GPU
3D FDTD with Convolutional Perfectly Matched Boundary Layers
📐
Scaling Laws
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Communication Strategy Selection for Multi-GPU 3D FDTD with Convolutional Perfectly Matched Boundary Layers
SpectrumKV: Per-Token
Mixed-Precision
KV Cache Transfer for Prefill-Decode Disaggregated LLM Serving
💬
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for SpectrumKV: Per-Token Mixed-Precision KV Cache Transfer for Prefill-Decode Disaggregated LLM Serving
StageFrontier: Synchronization-Aware Stage Accounting for
Distributed
ML
Training
📉
Deep Learning
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for StageFrontier: Synchronization-Aware Stage Accounting for Distributed ML Training
STAR-KV: Low-Rank KV Cache Compression via Soft Thresholding for Adaptive Rank Control
📐
Scaling Laws
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for STAR-KV: Low-Rank KV Cache Compression via Soft Thresholding for Adaptive Rank Control
SABLE:
GPU-Based
Power Flow Accelerator for Sparsity-Aware Batched Learning
📉
Deep Learning
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for SABLE: GPU-Based Power Flow Accelerator for Sparsity-Aware Batched Learning
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help