Stickiness in AI Behavioral Design (opens in new tab)
Current model specs aim to shape the behaviors of near-present models, rather than the behaviors of models arbitrarily far into the future. OpenAI writes that their model spec aims to apply “0-3 months ahead of the present.” Anthropic’s Constitution for Claude notes that the document “is likely to change in important ways in the future.” So these documents are presented as provisional guidelines, not as trying to set behavioral standards for the far future.But what if current model behaviors ...
Read the original article