Hardware Acceleration
APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing
🖥️GPU Computing Content type: AcademicSET: Stream-Event-Triggered Scheduling for Efficient CUDA Graph Pipelines
🖥️GPU Computing Content type: AcademicGraph Traversal on Tensor Cores: A BFS Framework for Modern GPUs
🔢Tensor Cores Content type: AcademicNo more posts from surajkadapa's subscribed feeds.