Tape Programming Models, Sequential Computation, Linear Processing, Storage Abstractions
Sparse Query Attention (SQA): A Computationally Efficient Attention Mechanism with Query Heads Reduction
arxiv.org·21h
Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation
arxiv.org·21h
Turbocharge Your Diffusion LLMs: Adaptive Block Decoding for Peak Performance by Arvind Sundararajan
Say One Thing, Do Another? Diagnosing Reasoning-Execution Gaps in VLM-Powered Mobile-Use Agents
arxiv.org·21h
Loading...Loading more...