LLMs and reinforcement learning

My reflection on the Richard Sutton interview with Dwarkesh Patel was that it was interesting how much the two participants talk past each other, and fail to find common ground. Particularly that they couldn’t agree on the power of reinforcement learning, when it’s such a large part of the LLM workflow.

To be clear, it isn’t the large language model that engages in reinforcement learning, it’s the person who’s applying the LLM to their task. That’s all that prompt engineering is. Here’s the workflow:

Identify a goal.
Hypothesize a prompt that leads the LLM to satisfy the goal.
Try out the prompt and generate an outcome.
Observe the gap between the outcome and the intended goal.
Repeat steps 1-4 (yes, include step…

About Graham

Similar Posts