Attention Mechanisms

Feeds to Scour
SubscribedAll
Scoured 227 posts in 8.0 ms

markusheimerl/gpt: A generative pretrained transformer implementation

 🤖Transformers  Content type: Code
github.com··Hacker News

ELI5 is a terrible learning prompt, here's the structural reason it fails and a 4-level replacement that actually sticks

 🤖Transformers  Content type: Blog  Content type: Tutorial

Kuramoto Attention: Synchronizing Self-Attention on the Torus

 🤖Transformers  Content type: Academic
arxiv.org·

Big Blue’s Redbook on Storage Scale KV Cache management

 🔍RAG  Content type: News
blocksandfiles.com·

Your LLM Isn’t Reading Your Manners — It’s Counting Your Tokens

 🤖Transformers  Content type: Blog
medium.com
·

How we fight GPU scarcity without compromise

 🤖Transformers  Content type: Blog
equixly.com··Hacker News

The Sequence Knowledge #874: Transformers or Not?

 🤖Machine Learning
substackcdn.com··Substack
Less-relevant results

A deep learning framework for emotion recognition in music using multimodal data fusion

 🤖Machine Learning  Content type: Academic
nature.com·

Machine learning from scratch, what to build before using scikit-learn

 🤖Machine Learning  Content type: Tutorial
iwtlp.com··DEV

How LLMs Actually Work: A Friendly Map for Humans • oreoro

 🤖Transformers

Making FlashAttention-4 faster for inference

 🌟Ray Tracing  Content type: Blog
modal.com·

Context compression finally works in production: new research cuts LLM input 16x without the accuracy hit

 🔍RAG
venturebeat.com·

The Memory Problem is Solved: How Google’s Memory Caching Makes RNNs Smart Again

 🤖Machine Learning  Content type: Blog
medium.com·

Wall Attention: Length Generalization With Diagonal Gates | Tilde

 🤖Transformers  Content type: Blog

Apple WWDC On-Device AI Deep Dive - Google Docs

 🤖AI
gist.is··Hacker News

The Inference Alpha: Maximizing Frontier Models on AMD

 🤖Transformers  Content type: Blog
digitalocean.com·

What an LLM Actually Does With Your Prompt First

 🤖AI
siliconopera.com·

SPADE: Split-and-Delay Embeddings for Autoregressive High-Granularity Calorimeter Simulation

 🤖Transformers  Content type: Academic
arxiv.org·

DiffusionGemma: The Developer Guide

 🤖Machine Learning  Content type: Blog

VelocityFM: Short-Horizon Protein Trajectory Prediction via Flow Matching in Velocity Space

 🤖Transformers  Content type: Academic
biorxiv.org·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help