Scalable Power Sampling: Training-Free Reasoning for LLMs via Distrib Sharpening
medium.com·3d·
Discuss: Hacker News
📚Automata Learning
Preview
Report Post

The uncomfortable question: What if RL isn’t teaching LLMs how to reason?

Reinforcement learning has become the default explanation for why large language models suddenly feel better at reasoning.

We fine-tune them with verifiers. We optimise against rewards. We celebrate new post-training pipelines like GRPO that push performance higher on math, code, and QA benchmarks.

And it works.

But there is an uncomfortable question lurking underneath all of this progress:

What if reinforcement learning isn’t actually teaching models how to reason?

What if it’s mostly doing something simpler, something we’ve been overlooking?

In our recent work, we took this possibility seriously. We asked whether the reasoning capabilities we attribute to RL might already exist inside ba…

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help