👁️ Attention Mechanisms - hussoster · Scour

Attention at the Theoretical Minimum: A Mathematics of Arrays Framework for Memory-Optimal Transformer Kernels

🧠Neural Network Architectures Academic

Less-relevant results

The Memory Problem is Solved: How Google’s Memory Caching Makes RNNs Smart Again

🤖Transformer Architecture Blog

GRAMformer: Any-Order Modality Interactions via Volumetric Multimodal Cross-Attention

🧠Neural Network Architectures Academic

Claude Mythos Glasswing: Why AI Vuln Discovery Terrifies Me

🔮ML Blog Discussion

Inside the Latent Flow: Causal Deciphering of Attention Dynamics in Audio Separation Foundation Models

🤖Transformer Architecture Academic

How to Fine-Tune Nemotron 3.5 ASR for Your Language, Domain, or Accent

🤖AI Blog

huggingface.co··Hacker News

SheafStain: Sheaf-Theoretic Schr\"odinger Bridge for Spatially and Biologically Coherent Virtual Staining

🤖Transformer Architecture Academic

Chiaroscuro Attention: Spending Compute in the Dark

🤖Transformer Architecture Academic

Guardian Angels: LLM Personalization for Productivity and Security

🤖Transformer Architecture

gwern.net··Hacker News

NVIDIA/cosmos: NVIDIA Cosmos is an open platform of world models, datasets, and tools that enables developers to build Physical AI for robots, autonomous vehicles, smart infrastructure, and more.

🧠Neural Network Architectures Code

One Step Closer to Ground Truth: A Multi-Scale Residual-Aware Representation Learning Pipeline for Predicting Time Series Data

📈Time Series Forecasting Academic

Overcoming Decoder Inconsistencies in Whisper for Dravidian and Low-Resource Languages

🤖Transformer Architecture Academic

RAPID: Layer-Wise Redundancy-Aware Pruning and Importance-Driven Token Merging for Efficient ViT

🤖Transformer Architecture Academic

SCALE: Scalable Cross-Attention Learning with Extrapolation for Agentic Workflow Scheduling

🎮Reinforcement Learning Academic

Beyond Item IDs: Scaling Short-Form-Video Recommendation via Semantic-Native Long Sequence Modeling

🤖Transformer Architecture Academic

Gated Bidirectional Linear Attention for Generative Retrieval

🤖Transformer Architecture Academic

I stopped using most of Rust’s advanced features for my ML library

🧠Neural Network Architectures Code

github.com··r/rust

ATT-CR: Adaptive Triangular Transformer for Cloud Removal

🤖Transformer Architecture Academic

MOSS-Video-Preview: Toward Real-Time Video Understanding via Cross-Attention

📈Time Series Forecasting Academic

Learning Instance-Adaptive Low-Rank Orthogonal Subspaces for Clothes-Changing Person Re-Identification

🗄️Vector Databases Academic

Log in to enable infinite scrolling