AI Performance Profiling

Feeds to Scour
SubscribedAll
Scoured 18 posts in 9.1 ms

GenAutoML: An Agentic Framework for Dynamic Architecture Generation and Optimization in Time-Series Analysis

 🧠Large Language Models (LLMs)  Content type: Academic
arxiv.org·

Joint Structural Pruning and Mixed-Precision Quantization for LLM Compression

 Model optimizations in LLMs  Content type: Academic
arxiv.org·

Beyond Per-Token Pricing: A Concurrency-Aware Methodology for LLM Infrastructure Cost Estimation

 Model optimizations in LLMs  Content type: Academic
arxiv.org·

Fast Speech Foundation Model Distillation Using Interleaved Stacking

 ⚙️AI Infrastructure Automation  Content type: Academic
arxiv.org·

vla.cpp: A Unified Inference Runtime for Vision-Language-Action Models

 🚀LLM serving frameworks  Content type: Academic
arxiv.org·

Tangram: Unlocking Non-Uniform KV Cache for Efficient Multi-turn LLM Serving

 🔧Systems-level optimizations for LLM serving  Content type: Academic
arxiv.org··Hacker News

Efficient Traffic Prediction at Scale: A Systematic Study of STGCN Architectural Depth

 🔧Systems-level optimizations for LLM serving  Content type: Academic
arxiv.org·

Context-Driven Incremental Compression for Multi-Turn Dialogue Generation

 🧠Large Language Models (LLMs)  Content type: Academic
arxiv.org·

Light-WAM: Efficient World Action Models with State-Fusion Action Decoding

 Real-time AI Systems  Content type: Academic
arxiv.org·

RhymeFlow: Training-Free Acceleration for Video Generation with Asynchronous Denoising Flow Scheduling

 Real-time AI Systems  Content type: Academic
arxiv.org·

Efficient-WAM: A 1B-Parameter World-Action Model with Low-Cost Future Imagination

 🔍Retrieval-augmented generation  Content type: Academic
arxiv.org·

TBD-VLA: Temporal Block Diffusion Vision Language Action Model

 🧠Large Language Models (LLMs)  Content type: Academic
arxiv.org·

NTILC: Neural Tool Invocation via Learned Compression

 🧠Large Language Models (LLMs)  Content type: Academic
arxiv.org·

A Low-Latency Semantic State Estimator using Latent Predictive Learning for Dynamic Network Monitoring and Orchestration

 Real-time AI Systems  Content type: Academic
arxiv.org·

LUNA-AD: Lightweight Uncertainty-Aware Language Model with Lifelong Learning for Autonomous Driving

 🧠Large Language Models (LLMs)  Content type: Academic
arxiv.org·

Quantum-Inspired Reinforcement Learning for Low-Latency Intrusion Detection in V2X and Internet-of-Vehicles Networks

 Real-time AI Systems  Content type: Academic
arxiv.org·

CT-VAM: A Cerebello-Thalamic-Inspired Vision-Action Model for Efficient Visuomotor Control

 Real-time AI Systems  Content type: Academic
arxiv.org·

Less Is More: Training-Free Acceleration Framework of 3D Diffusion Models for Low-Count PET Denoising via Global-Local Trajectory Reduction

 🔢Quantization of LLMs  Content type: Academic
arxiv.org·

No more posts from pleto's subscribed feeds.

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help