DeepViT: Towards Deeper Vision Transformer
dev.to·17h·
Discuss: DEV
🧠Deep Learning
Preview
Report Post

DeepViT: Letting Vision Transformers Go Deeper

Some image models, called vision transformers, stops getting better when you add more layers. The reason? Their internal focus maps — the so called attention — start to look the same, layer after layer, so deeper parts just repeat what earlier parts already know. So more layers won’t help, and training becomes wasteful. A simple idea fix it. By gently re-making the focus maps at each stage, a trick named Re-attention keep them fresh and different with almost no extra cost. It lets the model learn new things in higher layers instead of repeating old ones. Results shows deeper models then give deeper understanding of images and better scores on big tests. This means that, with a small change, image models can actually…

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help