Writing an LLM from scratch, part 29 -- using DistributedDataParallel to train a base model from scratch in the cloud
gilesthomas.com·3d·
Homebrew CPUs
Preview
Report Post

Archives

Categories

Blogroll

I’m carrying on with my "extra credit" projects after finishing the main body of Sebastian Raschka’s book "Build a Large Language Model (from Scratch)". Having proven that I could train a GPT-2 small scale base model from scratch on my RTX 3090 in 48 hours, I wanted to try training it on a multi-GPU machine on Lambda Labs. There are two benefits I see in doing that:

  1. I can learn what you need to change in a simple single-GPU training loop to make it multi-GPU.
  2. I…

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help