ML Inference
Less-relevant results
APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing
🖥️GPU Computing Content type: AcademicGF-DiT: Scheduling Parallelism for Diffusion Transformer Serving
🖥️GPU Computing Content type: AcademicSNN-MLIR: An MLIR Dialect for Compiling Neuromorphic SNNs from NIR to Bare-Metal C
🛠️Compilers Content type: AcademicP-Cast Precision in FP8 Attention: Sink-Induced Collapse and the Optimality of S=2^8
⚙️ML Systems Content type: AcademicNo more posts from rishabh's subscribed feeds.