I finally got round to listening to the Ilya Sutskever interview. A few points caught my attention. I’m listing them below starting from the least controversial.

#1 RL value functions

In today’s Reinforcement Learning, agents get a single scalar reward at the very end of the task they were given. In RLHF which we use for fine-tuning LLMs it’s arguably even worse than that.

Ilya, like many others, see that as a clear sign that RL could work so much better if we figure out how to make good use of value functions.

What is a value function? It’s being able to tell how well I’m doing on a task as I’m still doing it.

Imagine an LLM being fine-tuned for a long-ranging agenti…

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help