Beyond Standard LLMs
magazine.sebastianraschka.com·17h·
Discuss: Hacker News, r/LLM
🧩Attention Kernels
Flag this post
Eyes on Target: Gaze-Aware Object Detection in Egocentric Video
arxiv.org·1d
Flash Attention
Flag this post
Topographical sparse mapping: A training framework for deep learning models
sciencedirect.com·8h·
Discuss: Hacker News
📊Gradient Accumulation
Flag this post
Attention ISN'T all you need?! New Qwen3 variant Brumby-14B-Base leverages Power Retention technique
venturebeat.com·10h
Flash Attention
Flag this post
Transformers Architecture: How Google’s ‘Attention Is All You Need’ Changed Deep Learning Forever
pub.towardsai.net·52m
🧩Attention Kernels
Flag this post
Hybrid channel attention network for auditory attention detection
nature.com·2d
🧩Attention Kernels
Flag this post
Redundancy Maximization as a Principle of Associative Memory Learning
arxiv.org·1h
🔗Kernel Fusion
Flag this post
'No Free Lunch: Deconstruct Efficient Attention with MiniMax M2'
lmsys.org·1d
Flash Attention
Flag this post
Continuous Autoregressive Language Models
shaochenze.github.io·3h·
Discuss: Hacker News
🏎️TensorRT
Flag this post
Generalizing Test-Time Compute-Optimal Scaling as an Optimizable Graph
huggingface.co·1h·
Discuss: Hacker News
🏎️TensorRT
Flag this post
Crushing ML Latency: The (Un)Official Best Practices for Systems Optimisation
pub.towardsai.net·52m
📊Profiling Tools
Flag this post
Bio-Inspired Neuron Synapse Optimization for Adaptive Learning and Smart Decision-Making
arxiv.org·1d
📊Gradient Accumulation
Flag this post
Unlock the Power of GANs: Train with Tiny Datasets!
dev.to·11h·
Discuss: DEV
📊Gradient Accumulation
Flag this post
Optimizing Native Sparse Attention with Latent Attention and Local Global Alternating Strategies
arxiv.org·1d
🧩Attention Kernels
Flag this post
Gated DeltaNet (Linear Attention variant in Qwen3-Next and Kimi Linear)
sebastianraschka.com·2d·
Discuss: r/LLM
🧩Attention Kernels
Flag this post
Linear Differential Vision Transformer: Learning Visual Contrasts via Pairwise Differentials
arxiv.org·1d
🏎️TensorRT
Flag this post
H-FA: A Hybrid Floating-Point and Logarithmic Approach to Hardware Accelerated FlashAttention
arxiv.org·1d
Flash Attention
Flag this post
AI study gives insights into why super-recognisers excel at identifying faces
theguardian.com·6h
🧩Attention Kernels
Flag this post
Kimi Linear: An Expressive, Efficient Attention Architecture
arxiviq.substack.com·3d·
Discuss: Substack
🧩Attention Kernels
Flag this post