The Ilya Sutskever interview – my key takeaways
quickchat.ai·18h·
Discuss: Hacker News
🐌Slow Tech / Humane Computing
Preview
Report Post

I finally got round to listening to the Ilya Sutskever interview. A few points caught my attention. I’m listing them below starting from the least controversial.

#1 RL value functions

In today’s Reinforcement Learning, agents get a single scalar reward at the very end of the task they were given. In RLHF which we use for fine-tuning LLMs it’s arguably even worse than that.

Ilya, like many others, see that as a clear sign that RL could work so much better if we figure out how to make good use of value functions.

What is a value function? It’s being able to tell how well I’m doing on a task as I’m still doing it.

Imagine an LLM being fine-tuned for a long-ranging agenti…

Similar Posts

Loading similar posts...