🖥️ GPU Computing - rishabh · Scour

APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing

🚀ML Inference Academic

nomp: A Framework for Building Domain Specific Compilers

🛠️Compilers Academic

AgentCompile: An LLM-Guided Compiler for Direct CUDA Inference

🧠Deep Learning Academic

A Scalable PyTorch Abstraction for Multi-GPU Gaussian Splatting

🧠Deep Learning Academic

AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis

🚀ML Inference Academic

arxiv.org··Hacker News, Hacker News

Energy-Efficient On-Device RAG on a Mobile NPU: System Design and Benchmark on Snapdragon X Elite

🚀ML Inference Academic

Beyond Per-Token Pricing: A Concurrency-Aware Methodology for LLM Infrastructure Cost Estimation

🚀ML Inference Academic

Communication Strategy Selection for Multi-GPU 3D FDTD with Convolutional Perfectly Matched Boundary Layers

🌐Distributed Systems Academic

No more posts from rishabh's subscribed feeds.

Scour all 25267 feeds Learn more about Feeds

Log in to enable infinite scrolling