Motivation

For a while my knowledge of ML was limited to what I’ve learned in school: perceptrons, gradient descent, perhaps multiple perceptrons grouped into layers. Looking at ML landscape from afar I couldn’t follow how many fundamentally new ideas were developed. Conference papers are often written in a way that presents the idea, but not the intuition or the impetus for exploring that particular direction. Looking at the attention paper I was quite lost: why do we need all of and and ? What is their intuitive explanation? Why this direction is being explored at all?

Reading further did not make it simpler, with many new concepts introduced at once. Flash attention seemed like an indecipherable rewrite. Mamba was voodoo magic.

For a long while, I wanted a blogpost to explai…

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help