🖥️ ML Systems - Bingran · Scour

APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing

💬LLMs Academic

If Claude Fable stops helping you, you’ll never know

simonwillison.net··Hacker News

Gerrymandering the Warp: Non-Control-Data Attacks on CUDA Collective Decision

📐Scaling Laws Academic

LLM-Based Porting of Optimized C++ to CUDA Through Deoptimization and Reoptimization

📐Scaling Laws Academic

WarpGuard: Protected-Site Control-Flow Integrity for CUDA SASS Binaries

🤖AI Agents Academic

SET: Stream-Event-Triggered Scheduling for Efficient CUDA Graph Pipelines

🔥PyTorch Academic

TileFuse: A Fused Mixed-Precision Kernel Library for Efficient Quantized LLM Inference on AMD NPUs

📉Deep Learning Academic

AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis

💬LLMs Academic

arxiv.org··Hacker News

Joint Structural Pruning and Mixed-Precision Quantization for LLM Compression

💬LLMs Academic

Learned Subspace Compression for Communication-Efficient Pipeline Parallelism

🧠AI Research Academic

Characterizing Software Aging in GPU-Based LLM Serving Systems

💬LLMs Academic

Enhancing AI Interpretability and Safety through Localised Architectures

🔍Interpretability Academic

INFRAMIND: Infrastructure-Aware Multi-Agent Orchestration

🎮Reinforcement Learning Academic

Resource-aware Computation-Communication Overlap for multi-GPU ML Workloads

📉Deep Learning Academic

Demystifying NVSHMEM: A System-Level Analysis on Symmetric Memory and Device-Initiated Operations in GPU Communication

📐Scaling Laws Academic

Bergson: An Open Source Library for Data Attribution

⚙️Model Training Academic

Breaking the Bubble: Asynchronous Pipeline Parallel Training with Bounded Weight Inconsistency

⚙️Model Training Academic

A Scalable PyTorch Abstraction for Multi-GPU Gaussian Splatting

🔥PyTorch Academic

On GPU Implementation for Multi-Precision Integer Division

📐Scaling Laws Academic

PALUTE: Processing-In-Memory Acceleration via Lookup Table for Edge LLM Inference

💬LLMs Academic

Log in to enable infinite scrolling