Improving Cursor Tab with RL (opens in new tab)  🆕New AI

At Cursor, our goal is to make developers an order of magnitude more productive. An important part of that goal is Cursor Tab, our system that predicts your next action across your codebase. Whenever you type a character or move your cursor within the editor, our Tab model tries to predict what you’ll do next, and if it has sufficient confidence, we’ll display its prediction as a suggestion that you can accept by pressing Tab.

Our Tab model runs on every user action, handling over 400 million requests per day. As a result, we have a lot of data about which suggestions users accept and reject. This post describes how we use this data to improve Tab using online reinforcement learning.

Our approach is unusual because it involves rolling out new models to users frequently throughout the day and using that data for training. Most other LLM providers train on static datasets or use paid labelers, and only roll out a new model to users as part of a named model release every few months.

The problem of noisy suggestions

We try to keep the accept rate of Tab suggestions high. If the accept rate is low, it means we’re showing too many incorrect suggestions, which is distracting and disrupts the flow of coding.

Achieving a high accept rate isn’t just about making the model smarter, but also knowing when to suggest and when not to. Sometimes there simply isn’t enough information to know what action the user is going to take: even if the model had perfect knowledge and reasoning ability, it wouldn’t know what the user will do. In these situations, we shouldn’t suggest anything.

To increase the accept rate of the model’s suggestions, one simple approach is to train a separate model to predict whether the suggestion will be accepted. In 2022, Parth Thakkar found that GitHub Copilot used this approach, deriving a “contextual filter score” using a logistic regression model taking 11 features as inputs, including the programming language, whether the previous suggestion was accepted or rejected, the trailing characters before the user’s cursor, and other features. It’s unknown what signal this model was trained to predict, but our best guess is that it’s predicting the likelihood that the user will accept a suggestion if one is shown. When the score is lower than 15%, the suggestion is skipped and nothing is shown.

This solution is viable, but we wanted a more general mechanism that reused the powerful representation of the code learned by the Tab model. Instead of filtering out bad suggestions, we wanted to alter the Tab model to avoid producing bad suggestions in the first place. Therefore, we opted to use policy gradient methods instead.

The policy gradient

Loading more...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help