Claude Plays Pokemon: Opus 4.5 Follow-up
lesswrong.com·5h
chess
Preview
Report Post

Published on January 29, 2026 4:14 PM GMT

ClaudePlaysPokemon is a simple test of the question “Can the LLM Claude beat Pokemon Red?”. As new Claude models have been released, we have gotten closer to answering that question with “yes”. Similar projects with other models are also common, but they use harnesses that give the models significantly more help with the task, and therefore I think, and many others agree, that ClaudePlaysPokemon represents the best test of underlying LLM progress.

I’m not the only LessWronger to want to write about it either. Insights into Claude Opus 4.5 from Pokémon was written two mont…

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help