Systems-level optimizations for LLM serving

Feeds to Scour
SubscribedAll
Scoured 35 posts in 6.1 ms

HACK++: Towards More Effective Head-Aware Key-Value Compression for Efficient Visual Autoregressive Modeling

 Real-time AI Systems  Content type: Academic
arxiv.org·

AVIS: Adaptive Test-Time Scaling for Vision-Language Models

 🧠Large Language Models (LLMs)  Content type: Academic
arxiv.org·

You Only Index Once: Cross-Layer Sparse Attention with Shared Routing

 🧠Large Language Models (LLMs)  Content type: Academic
arxiv.org·

BUDDY: BUdget-Driven DYnamic Depth Routing for Adaptive Large Language Model Inference

 🧠Large Language Models (LLMs)  Content type: Academic
arxiv.org·

TLDR: Compressing Audio Tokens for Efficient Autoregressive Text-to-Speech

 🧠Large Language Models (LLMs)  Content type: Academic
arxiv.org·

TRADE: Transducer-Augmented Decoder for Speech LLM

 Real-time AI Systems  Content type: Academic
arxiv.org·

Beyond tokens: a unified framework for latent communication in LLM-based multi-agent systems

 🤖Agents using LLMs  Content type: Academic
arxiv.org·

QCFuse: Query-Aware Cache Fusion via Compressed View for Efficient RAG Serving

 🔍Retrieval-augmented generation  Content type: Academic
arxiv.org·

EinSort: Sorting is All We Need for Tensorizing LLM

 Model optimizations in LLMs  Content type: Academic
arxiv.org·

K-Forcing: Joint Next-K-Token Decoding via Push-Forward Language Modeling

 🧠Large Language Models (LLMs)  Content type: Academic
arxiv.org·

Towards Tight Bounds for Streaming Attention

 🧠Large Language Models (LLMs)  Content type: Academic
arxiv.org·

InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning

 🔍Retrieval-augmented generation  Content type: Academic
arxiv.org·

Latent Reasoning with Normalizing Flows

 🧠Large Language Models (LLMs)  Content type: Academic
arxiv.org·

TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies

 Real-time AI Systems  Content type: Academic
arxiv.org·

Visual Para-Thinker++: A Single-Policy Multi-Agent Framework for Visual Reasoning

 🤖Agents using LLMs  Content type: Academic
arxiv.org·

No more posts from pleto's subscribed feeds.

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help