Attention Mechanisms

Feeds to Scour
SubscribedAll
Scoured 97 posts in 9.4 ms

Attention at the Theoretical Minimum: A Mathematics of Arrays Framework for Memory-Optimal Transformer Kernels

 🧠Neural Network Architectures  Content type: Academic
arxiv.org·
Less-relevant results

The Memory Problem is Solved: How Google’s Memory Caching Makes RNNs Smart Again

 🤖Transformer Architecture  Content type: Blog
medium.com·

GRAMformer: Any-Order Modality Interactions via Volumetric Multimodal Cross-Attention

 🧠Neural Network Architectures  Content type: Academic
arxiv.org·

Claude Mythos Glasswing: Why AI Vuln Discovery Terrifies Me

 🔮ML  Content type: Blog  Content type: Discussion
tildalice.io·

Inside the Latent Flow: Causal Deciphering of Attention Dynamics in Audio Separation Foundation Models

 🤖Transformer Architecture  Content type: Academic
arxiv.org·

How to Fine-Tune Nemotron 3.5 ASR for Your Language, Domain, or Accent

 🤖AI  Content type: Blog

SheafStain: Sheaf-Theoretic Schr\"odinger Bridge for Spatially and Biologically Coherent Virtual Staining

 🤖Transformer Architecture  Content type: Academic
arxiv.org·

Chiaroscuro Attention: Spending Compute in the Dark

 🤖Transformer Architecture  Content type: Academic
arxiv.org·

Guardian Angels: LLM Personalization for Productivity and Security

 🤖Transformer Architecture
gwern.net··Hacker News

NVIDIA/cosmos: NVIDIA Cosmos is an open platform of world models, datasets, and tools that enables developers to build Physical AI for robots, autonomous vehicles, smart infrastructure, and more.

 🧠Neural Network Architectures  Content type: Code
github.com·

One Step Closer to Ground Truth: A Multi-Scale Residual-Aware Representation Learning Pipeline for Predicting Time Series Data

 📈Time Series Forecasting  Content type: Academic
arxiv.org·

Overcoming Decoder Inconsistencies in Whisper for Dravidian and Low-Resource Languages

 🤖Transformer Architecture  Content type: Academic
arxiv.org·

RAPID: Layer-Wise Redundancy-Aware Pruning and Importance-Driven Token Merging for Efficient ViT

 🤖Transformer Architecture  Content type: Academic
arxiv.org·

SCALE: Scalable Cross-Attention Learning with Extrapolation for Agentic Workflow Scheduling

 🎮Reinforcement Learning  Content type: Academic
arxiv.org·

Beyond Item IDs: Scaling Short-Form-Video Recommendation via Semantic-Native Long Sequence Modeling

 🤖Transformer Architecture  Content type: Academic
arxiv.org·

Gated Bidirectional Linear Attention for Generative Retrieval

 🤖Transformer Architecture  Content type: Academic
arxiv.org·

I stopped using most of Rust’s advanced features for my ML library

 🧠Neural Network Architectures  Content type: Code
github.com··r/rust

ATT-CR: Adaptive Triangular Transformer for Cloud Removal

 🤖Transformer Architecture  Content type: Academic
arxiv.org·

MOSS-Video-Preview: Toward Real-Time Video Understanding via Cross-Attention

 📈Time Series Forecasting  Content type: Academic
arxiv.org·

Learning Instance-Adaptive Low-Rank Orthogonal Subspaces for Clothes-Changing Person Re-Identification

 🗄️Vector Databases  Content type: Academic
arxiv.org·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help