🚀 ML Inference - rishabh · Scour

TileFuse: A Fused Mixed-Precision Kernel Library for Efficient Quantized LLM Inference on AMD NPUs

📄Systems Papers Academic

Less-relevant results

Create Your Own Programming Language with Rust

🛠️Compilers

createlang.rs··Hacker News

defai-digital/ax-engine: Apple Silicon LLM runtime supporting Gemma 4 and Qwen 3.6 MTP modes

⚙️ML Systems Code

github.com··Hacker News

Real-Time Language Model Jamming: A Case Study for Live Music Accompaniment Generation

⚙️ML Systems Academic

Beyond Per-Token Pricing: A Concurrency-Aware Methodology for LLM Infrastructure Cost Estimation

🖥️GPU Computing Academic

APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing

🖥️GPU Computing Academic

M*: A Modular, Extensible, Serving System for Multimodal Models

⚙️ML Systems Academic

Clairvoyant: Predictive SJF Scheduling to Mitigate Head-of-Line Blocking in Serial LLM Backends

⚙️ML Systems Academic

ASTRA-sim 3.0: Next-Level Distributed Machine Learning Simulations via High-Fidelity GPU and Infrastructure Modeling

🖥️GPU Computing Academic

GF-DiT: Scheduling Parallelism for Diffusion Transformer Serving

🖥️GPU Computing Academic

Toward Compiler World Models: Learning Latent Dynamics for Efficient Tensor Program Search

🧠Deep Learning Academic

TinyContainer: Container Runtime Middleware Enabling Multi-tenant Microcontrollers with Built-in Security

💾Storage Systems Academic

SNN-MLIR: An MLIR Dialect for Compiling Neuromorphic SNNs from NIR to Bare-Metal C

🛠️Compilers Academic

SpectrumKV: Per-Token Mixed-Precision KV Cache Transfer for Prefill-Decode Disaggregated LLM Serving

🛠️Compilers Academic

P-Cast Precision in FP8 Attention: Sink-Induced Collapse and the Optimality of S=2^8

⚙️ML Systems Academic

No more posts from rishabh's subscribed feeds.

Scour all 25267 feeds Learn more about Feeds

Log in to enable infinite scrolling