👁️ Attention Optimization - miterion · Scour

Beyond Standard LLMs

magazine.sebastianraschka.com·17h·

Discuss: Hacker News, r/LLM

🧩Attention Kernels

Flag this post

Eyes on Target: Gaze-Aware Object Detection in Egocentric Video

arxiv.org·1d

⚡Flash Attention

Flag this post

Topographical sparse mapping: A training framework for deep learning models

sciencedirect.com·8h·

Discuss: Hacker News

📊Gradient Accumulation

Flag this post

Attention ISN'T all you need?! New Qwen3 variant Brumby-14B-Base leverages Power Retention technique

venturebeat.com·10h

⚡Flash Attention

Flag this post

Transformers Architecture: How Google’s ‘Attention Is All You Need’ Changed Deep Learning Forever

pub.towardsai.net·52m

🧩Attention Kernels

Flag this post

Hybrid channel attention network for auditory attention detection

nature.com·2d

🧩Attention Kernels

Flag this post

Redundancy Maximization as a Principle of Associative Memory Learning

arxiv.org·1h

🔗Kernel Fusion

Flag this post

'No Free Lunch: Deconstruct Efficient Attention with MiniMax M2'

lmsys.org·1d

⚡Flash Attention

Flag this post

Continuous Autoregressive Language Models

shaochenze.github.io·3h·

Discuss: Hacker News

🏎️TensorRT

Flag this post

Generalizing Test-Time Compute-Optimal Scaling as an Optimizable Graph

huggingface.co·1h·

Discuss: Hacker News

🏎️TensorRT

Flag this post

Crushing ML Latency: The (Un)Official Best Practices for Systems Optimisation

pub.towardsai.net·52m

📊Profiling Tools

Flag this post

Bio-Inspired Neuron Synapse Optimization for Adaptive Learning and Smart Decision-Making

arxiv.org·1d

📊Gradient Accumulation

Flag this post

Unlock the Power of GANs: Train with Tiny Datasets!

dev.to·11h·

Discuss: DEV

📊Gradient Accumulation

Flag this post

Optimizing Native Sparse Attention with Latent Attention and Local Global Alternating Strategies

arxiv.org·1d

🧩Attention Kernels

Flag this post

Gated DeltaNet (Linear Attention variant in Qwen3-Next and Kimi Linear)

sebastianraschka.com·2d·

Discuss: r/LLM

🧩Attention Kernels

Flag this post

Linear Differential Vision Transformer: Learning Visual Contrasts via Pairwise Differentials

arxiv.org·1d

🏎️TensorRT

Flag this post

H-FA: A Hybrid Floating-Point and Logarithmic Approach to Hardware Accelerated FlashAttention

arxiv.org·1d

⚡Flash Attention

Flag this post

AI study gives insights into why super-recognisers excel at identifying faces

theguardian.com·6h

🧩Attention Kernels

Flag this post

Correlation detection as a stimulus computable account for audiovisual perception, causal inference, and saliency maps in mammals

elifesciences.org·1d

🧩Attention Kernels

Flag this post

Kimi Linear: An Expressive, Efficient Attention Architecture

arxiviq.substack.com·3d·

Discuss: Substack

🧩Attention Kernels

Flag this post

Loading more...