Training a 67M-parameter transformer on an M4 Mac Mini
geddydukes.com·18h·
Discuss: Hacker News
🤖ML
Preview
Report Post

I trained a 67-million-parameter transformer end to end on an M4 Mac Mini using Apple Silicon MPS and achieved 93.94 percent exact-match accuracy on CLI command generation.

No discrete GPU. Twenty-four gigabytes of unified memory. A task where a single missing character counts as complete failure.

This project started as a constraint experiment. How far could a carefully built small model go if every part of the pipeline was designed around consumer hardware limits? That meant training from scratch, streaming data instead of downloading it, and being honest about what worked and what broke.

The answer surprised me. With modern architectural components like RoPE, RMSNorm, and SwiGLU, aggressive data efficiency, and roughly 13 hours of pretraining plus about four minutes of superv…

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help