rishabh's Feed · Scour

defai-digital/ax-engine: Apple Silicon LLM runtime supporting Gemma 4 and Qwen 3.6 MTP modes

🚀ML Inference Code

github.com··Hacker News

Near-Optimal Distributed 2-Ruling Sets on Graphs with Low Arboricity

🌐Distributed Systems Academic

Achieving Cloud-Grade SLOs for Local Mixture-of-Experts Inference through CPU-GPU Hybrid Design

📄Systems Papers Academic

nomp: A Framework for Building Domain Specific Compilers

🖥️GPU Computing Academic

Multiversion Concurrency Control for Multiversion B-Trees

🗄️Databases Academic

Real-Time Language Model Jamming: A Case Study for Live Music Accompaniment Generation

🚀ML Inference Academic

From Fork-Join to Asynchronous Tasks: Parallelizing Tiled Cholesky Decomposition with OpenMP and HPX

🛠️Compilers Academic

AgentCompile: An LLM-Guided Compiler for Direct CUDA Inference

🧠Deep Learning Academic

M*: A Modular, Extensible, Serving System for Multimodal Models

⚙️ML Systems Academic

FlashCP: Load-Balanced Communication-Efficient Context Parallelism for LLM Training

🗄️Databases Academic

APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing

🖥️GPU Computing Academic

Beyond Per-Token Pricing: A Concurrency-Aware Methodology for LLM Infrastructure Cost Estimation

🚀ML Inference Academic

Attention at the Theoretical Minimum: A Mathematics of Arrays Framework for Memory-Optimal Transformer Kernels

🧠Deep Learning Academic

SNN-MLIR: An MLIR Dialect for Compiling Neuromorphic SNNs from NIR to Bare-Metal C

🛠️Compilers Academic

Defeat the Heap: Zero-Copy Data Movement in AXI4MLIR

🛠️Compilers Academic

arxiv.org··Hacker News

Clairvoyant: Predictive SJF Scheduling to Mitigate Head-of-Line Blocking in Serial LLM Backends

🚀ML Inference Academic

Dynamic Software Updates using CRDTs

📄Systems Papers Academic

Sign up or login to customize your feed and get personalized topic recommendations

Toward Compiler World Models: Learning Latent Dynamics for Efficient Tensor Program Search

🧠Deep Learning Academic

Energy-Efficient On-Device RAG on a Mobile NPU: System Design and Benchmark on Snapdragon X Elite

🖥️GPU Computing Academic

ASTRA-sim 3.0: Next-Level Distributed Machine Learning Simulations via High-Fidelity GPU and Infrastructure Modeling

🚀ML Inference Academic

Log in to enable infinite scrolling