DEV Community

Why Your PyTorch Training Crawls on a Beefy GPU (And How to Fix It) (opens in new tab)

Covers Dao-AILab/flash-attentionDiscussed on DEV

Last month I was helping a friend debug a training loop that was running at maybe 15% GPU utilization on an A100. Fifteen percent. On a card that costs more than my first car. He'd already tried bumping the batch size, swapping the optimizer, and rewriting the data loader — nothing moved the needle. This is one of those frustrating problems where the obvious knobs do nothing, because the obvious knobs aren't where the bottleneck lives. So let's actually walk through how to figure out why your...

Read the original article
Sign in to keep reading the full article.

Keyboard Shortcuts

Navigation

Next / previous post
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Discover
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help