The assistant axis: situating and stabilizing the character of large language models
anthropic.com·2d·
🦙Ollama
Preview
Report Post

*Left: *Character archetypes form a "persona space," with the Assistant at one extreme of the "Assistant Axis." Right: Capping drift along this axis prevents models (here, Llama 3.3 70B) from drifting into alternative personas and behaving in harmful ways.

When you talk to a large language model, you can think of yourself as talking to a character. In the first stage of model training, pre-training, LLMs are asked to read vast amounts of text. Through this, they learn to simulate heroes, villains, philosophers, programmers, and just about every other character archetype under the sun. In the nex…

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help