How might continual learning affect safety and alignment? (opens in new tab)

Covers 2 stories including Claude's Constitution

This is the third post in our sequence Implications of Continual Learning for LLM Agents.SummaryWe argue that continual learning (CL) has two major potential safety implications: it may enable changes to LLM goals and values after deployment, and it eliminates the last-mover advantage held by current safety interventions.We identify three pathways for goal and value change during deployment. First, loss of developer-side control over generalization. Second, value systematization, induced when...

Read the original article