Writing an LLM from scratch, part 29 -- using DistributedDataParallel to train a base model from scratch in the cloud
gilesthomas.com·3d·
🧩LLM Integration
Preview
Report Post

Archives

Categories

Blogroll

I’m carrying on with my "extra credit" projects after finishing the main body of Sebastian Raschka’s book "Build a Large Language Model (from Scratch)". Having proven that I could train a GPT-2 small scale base model from scratch on my RTX 3090 in 48 hours, I wanted to try training it on a multi-GPU machine on Lambda Labs. There are two benefits I see in doing that:

  1. I can learn what you need to change in a simple single-GPU training loop to make it multi-GPU.
  2. I…

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help