Evaluating chain-of-thought monitorability
openai.com·22h·
Discuss: Hacker News
💡Explainable AI
Preview
Report Post

When AI systems make decisions that are difficult to supervise directly, it becomes important to understand how those decisions are made. One promising approach is to monitor a model’s internal reasoning, rather than only its actions or final outputs.

Modern reasoning models, such as GPT‑5 Thinking, generate an explicit chain-of-thought before producing an answer. Monitoring these chains-of-thought for misbehavior can be far more effective⁠ than monitoring a model’s actions and outputs alone. However, researchers at OpenAI and across the broader industry worry⁠(opens in a new window) that this chain-of-thought “monitorability” may be fragile to changes in training procedure, data sources, …

Similar Posts

Loading similar posts...