ML Inference

Feeds to Scour
SubscribedAll
Scoured 15 posts in 10.3 ms

TileFuse: A Fused Mixed-Precision Kernel Library for Efficient Quantized LLM Inference on AMD NPUs

 📄Systems Papers  Content type: Academic
arxiv.org·
Less-relevant results

Create Your Own Programming Language with Rust

 🛠️Compilers
createlang.rs··Hacker News

defai-digital/ax-engine: Apple Silicon LLM runtime supporting Gemma 4 and Qwen 3.6 MTP modes

 ⚙️ML Systems  Content type: Code
github.com··Hacker News

Real-Time Language Model Jamming: A Case Study for Live Music Accompaniment Generation

 ⚙️ML Systems  Content type: Academic
arxiv.org·

Beyond Per-Token Pricing: A Concurrency-Aware Methodology for LLM Infrastructure Cost Estimation

 🖥️GPU Computing  Content type: Academic
arxiv.org·

APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing

 🖥️GPU Computing  Content type: Academic
arxiv.org·

M*: A Modular, Extensible, Serving System for Multimodal Models

 ⚙️ML Systems  Content type: Academic
arxiv.org·

Clairvoyant: Predictive SJF Scheduling to Mitigate Head-of-Line Blocking in Serial LLM Backends

 ⚙️ML Systems  Content type: Academic
arxiv.org·

ASTRA-sim 3.0: Next-Level Distributed Machine Learning Simulations via High-Fidelity GPU and Infrastructure Modeling

 🖥️GPU Computing  Content type: Academic
arxiv.org·

GF-DiT: Scheduling Parallelism for Diffusion Transformer Serving

 🖥️GPU Computing  Content type: Academic
arxiv.org·

Toward Compiler World Models: Learning Latent Dynamics for Efficient Tensor Program Search

 🧠Deep Learning  Content type: Academic
arxiv.org·

TinyContainer: Container Runtime Middleware Enabling Multi-tenant Microcontrollers with Built-in Security

 💾Storage Systems  Content type: Academic
arxiv.org·

SNN-MLIR: An MLIR Dialect for Compiling Neuromorphic SNNs from NIR to Bare-Metal C

 🛠️Compilers  Content type: Academic
arxiv.org·

SpectrumKV: Per-Token Mixed-Precision KV Cache Transfer for Prefill-Decode Disaggregated LLM Serving

 🛠️Compilers  Content type: Academic
arxiv.org·

P-Cast Precision in FP8 Attention: Sink-Induced Collapse and the Optimality of S=2^8

 ⚙️ML Systems  Content type: Academic
arxiv.org·

No more posts from rishabh's subscribed feeds.

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help