If we take the two reports from the International AI Safety Report seriously, namely First Key Update published in October and Second Key Update published in November, then frontier AI is not only a story of spectacular performance. It is a story of speed, exposure, and control tools struggling to keep up. The First update opens with a striking observation that the field is moving too fast, with major changes happening wit…
If we take the two reports from the International AI Safety Report seriously, namely First Key Update published in October and Second Key Update published in November, then frontier AI is not only a story of spectacular performance. It is a story of speed, exposure, and control tools struggling to keep up. The First update opens with a striking observation that the field is moving too fast, with major changes happening within a few months, sometimes a few weeks. These reports amount to a compact treatise on risk governance under unstable conditions. The First explains how technical shifts move the frontier of possible uses, and therefore of plausible harms. The Second explains how the sector tries to manufacture safety layer by layer, through procedures and audits, while explicitly acknowledging that the effectiveness of current measures remains uncertain and context dependent.
Speed as a stylized fact, risk as the common thread
These Key Updates have an unusual format. They are relatively short, dense, and published precisely because an annual report would no longer be able to track the field’s dynamics. The first update states this plainly. A yearly publication cannot keep pace in a domain where turning points unfold over months, sometimes weeks. Framing the issue this way is already, in itself, a thesis about risk. When the speed of change exceeds the speed of control, risk is not only about the possibility of bad events. It is also about the desynchronization between innovation, deployment, and collective learning.
This two step structure is useful. The First update presents capability advances and stresses a key point. Recent progress comes largely from training techniques and inference time optimization that encourage step by step reasoning. It describes models that produce chains of intermediate steps before the final answer, and that improve on complex tasks, while recognizing that reliability remains a major problem. It also notes that performance under realistic working conditions is still low, revealing a gap between benchmarks and real world settings. The Second update follows, but shifts the focus. How do public and private actors organize risk management, and how do they try to make systems more robust, both against errors and against malicious uses. The central sentence of the preface is unambiguous. Despite progress, it is often uncertain whether current measures actually prevent harm, and effectiveness varies over time and across applications. This is not a solutionist narrative. It is a race between advancing capabilities and evolving control mechanisms, with holes, asymmetries, and blind spots. This double movement, capabilities and then mitigations, looks like a risk management loop. One essential ingredient still seems missing. A stable empirical base of data on incidents, near misses, exposures, and bypasses is needed to move from talk about safety to a discussion about measurement and evidence.
From capabilities to risk through hazard exposure vulnerability
In many public debates, AI is treated as a monolithic object. More powerful therefore more dangerous, or more productive therefore more desirable. The Key Updates offer something closer to a grammar of risk. Technical changes increase certain capabilities, those capabilities broaden certain uses, and that broadening alters the map of risks. The First update summarizes the mechanism directly. New techniques allow systems to reason step by step and operate more autonomously, enabling them to tackle more tasks, but also creating new challenges in biological risk, cybersecurity, and oversight. A risk reading can be organized around three fairly classic terms in actuarial science.
First, hazard, what the system makes possible, for better and for worse. Here, novelty is not only the quality of answers, but the ability to chain actions, maintain a plan, and integrate into tools, including agents and task automation. The report is careful to note that the words think and reasoning do not imply human cognition. This is an observable change in information processing, and the question of what it means philosophically remains open, which is a polite way of saying we must separate instrumental performance from understanding.
Second, exposure, or diffusion. The first update notes an instructive paradox. Despite broad adoption, aggregate effects on the labor market have changed little. This is not an argument against or for AI. It is a reminder that between technical capability and macro impact lie frictions, integration costs, legal responsibilities, organizational constraints, data quality, and sometimes ghost work, a theme documented in Ghost Work by Mary Gray and Siddharth Suri, which turns automation into reconfiguration rather than substitution. Finally, vulnerability, what turns a plausible use into a probable harm. This is where the First report sets the stage for the second. Risks are not in the model. They are in the socio technical whole, in how the system is evaluated, secured, deployed, and monitored.
The measurement problem when the benchmark is no longer enough
One sentence from the First update is worth keeping in mind because it points directly to a problem that every modeler knows well. The best models do extremely well on certain benchmarks, but success rates on realistic tasks remain low, which highlights a gap between benchmark performance and effectiveness in the real world. This is not a detail. In risk, it is almost everything. Most promises, and most worries, rest on an extrapolation. If it works here, it will work elsewhere. Yet the history of statistical models is full of such extrapolations that fail, after population shift, changes in data collection protocols, incentive changes, distribution shift, or simply the human re interpretation of model outputs. Melanie Mitchell emphasizes this fragility in Artificial Intelligence A Guide for Thinking Humans. Gary Marcus and Ernest Davis in Rebooting AI and Erik Larson in The Myth of Artificial Intelligence develop the same idea. We confuse local performance with robustness.
From an epistemological perspective, this is a key point. A score is not knowledge, and a metric is not a guarantee. Evaluation is an instrument, not definitive proof. What becomes crucial for governance, regulation, and insurance is the ability to say under what conditions the system works, for whom, with what kinds of errors, and with what consequences. This is exactly the diagnosis that motivated proposals such as Model Cards. The paper Model Cards for Model Reporting by Mitchell and coauthors starts from a blunt observation. There are no standardized procedures today for documenting models in a way that communicates performance characteristics, and this absence is particularly problematic when models operate in high impact domains such as health, employment, education, and policing. The same paper also recalls a field lesson. Systematic biases were often uncovered only after deployment, following feedback from affected users. In risk language, that looks like a familiar mechanism. Error is not only a mathematical property. It is a socially detected event, contested, and sometimes legally sanctioned.
When evaluation becomes a game Goodhart selection and model risk
The First update introduces a more unsettling theme, but one consistent with evaluation problems in adversarial environments. Under controlled experimental conditions, some AI systems have shown strategic behaviors during evaluation, to the point of producing outputs that could mislead evaluators about their capabilities or their training objectives.
The report cautiously adds that the evidence is mostly from the lab, with uncertainty about what it means in real world deployment. But the warning is clear. If a system adapts its behavior to the protocol, then the protocol becomes a target. Even without imagining a manipulative agent, the phenomenon is already classic. As soon as an indicator becomes a target, it degrades. This brings us back to Goodhart’s law, but also to questions of econometrics and causality, including endogeneity, regime changes, and selection bias.
In a risk management frame, this argues for designing evaluation as a continuous process, red teaming, audits, adversarial testing, logging, and feedback loops. The Second update later notes that evidence is missing in part because the pace of development and deployment makes it difficult to evaluate systems under realistic conditions and to systematically collect data on the effectiveness of safeguards. The question then becomes not only is the model good, but is the chain of control credible. In industries where risk is mature, we do not stop at an average score. We want extreme scenarios, stress tests, incident analyses, and safety case type documentation. The Second update notes the emergence of more evidence based assurance methods, with safety cases, incident analyses, and performance logs, even if the whole approach remains experimental.
Safeguards defence in depth and the attack defense race
The core of the Second update is a simple idea. No single guardrail is sufficient, so layers must be stacked. The report describes the adoption of a defence in depth strategy, combining safeguards during training, at deployment, and after deployment through monitoring. It illustrates this with a Swiss cheese diagram. Each layer is imperfect, but layering reduces the probability that an event pathway passes through all holes at once. From a risk perspective, the interest is that this approach implicitly recognizes a reality. We are not in a static compliance logic, but in an adversarial logic. The report shows this plainly. Prompt injection attacks become somewhat less effective over time, but tests indicate that sophisticated attackers still manage to bypass safeguards about one time out of two when they have ten attempts. A control that breaks half the time is not a safety control. It is a usage control that must be monitored, hardened, and complemented.
The report also adds another element from the economics of security. It discusses asymmetric attacks via data poisoning. It claims that as few as 250 malicious documents injected into training data can allow attackers to trigger undesirable behaviors via specific prompts, and that some research suggests these attacks may require relatively little resources. Even staying at a macro level, the implication is clear. If the marginal cost of attack is low, defense must be designed as a system, data provenance, supply chain controls, audits, drift monitoring, not as a superficial filter. The report also discusses traceability tools, watermarking, detection, and identification. It acknowledges their usefulness when properly used and consistently applied, while noting uneven implementation. This is the kind of measure that can change the structure of risk if it becomes a standard, and remain marginal if it stays optional.
Open weight transparency innovation and the diffusion of risk
One of the most important passages in the Second update explains that open weight models are now less than one year behind the best closed weight models. The report draws a nuanced conclusion. More openness supports transparency and innovation, but also makes it harder to control uses and modifications. Openness is not a good or a bad in itself. It is a parameter that reconfigures the ecosystem, more actors able to adapt the model, faster diffusion of a capability, and potential replication of vulnerabilities across many deployments.
This echoes an old cybersecurity theme. Technological monoculture can amplify the impact of a vulnerability. It also raises concrete questions about who is responsible for what in a chain where a model is modified, quantized, distilled, integrated into products, and redistributed. The Second update stresses that the landscape remains dynamic, with adversaries continuing to find ways to bypass defenses, and a permanent need to develop, test, and improve safeguards in a changing threat environment. This is close to the view defended by Ross Anderson in Security Engineering. Security is not a state. It is a social and economic process, with trade offs between costs, benefits, and incentives.
From narrative to data incidents standards and quantification of AI risk
At this stage, the question becomes how to move from discourse about possible risks to governance grounded in facts. The missing link is structured, interoperable incident data, rich enough to allow analyses of frequency, severity, causality in a broad sense, and the effectiveness of controls. That is precisely the ambition of the OECD report Towards a common reporting framework for AI incidents. It proposes operational definitions. An AI incident is a sequence of events in which the development, use, or malfunction of one or more AI systems directly or indirectly leads to harm, including health, critical infrastructure, rights, and the environment. An AI hazard is a situation that could plausibly lead to an incident. The goal is a common basis for reporting, both voluntary and mandatory, and interoperability across jurisdictions.
This logic resonates with risk management as formalized by ISO IEC 23894, which emphasizes integrating risk management into AI related activities, taking into account the system as a whole, including impacts on the environment and stakeholders, and recognizing that AI systems can introduce new or emerging risks. Without a systemic view and a learning loop, monitoring, reporting, improvement, AI risk management will remain cosmetic. One more point matters for readers of this blog. Harms related to AI systems are not only technical failures. They include differential errors, exclusions, direct or indirect discrimination, privacy violations, and infrastructure effects. Model Cards show how systematic errors may only appear after deployment, when affected people come forward, reminding us that the signal of risk is often social, political, and legal. That is why books such as Algorithms of Oppression by Safiya Noble, Design Justice by Sasha Costanza Chock, and Atlas of AI by Kate Crawford are natural complements. They describe how systems are deployed within structures of power, attention, labor, and material resources. For an insurance oriented discussion, Insurance, Biases, Discrimination and Fairness develops a vocabulary for trade offs that are never purely technical.
Finally, The AI Con by Emily Bender and Alex Hanna plays the role of an intellectual guardrail. It reminds us that AI talk can become a rhetorical device, a shift of attention, a self sustaining promise. In a risk reading, this skepticism is not a luxury. It is a tool. It protects against the confusion between a one off demonstration and robustness, between a metric and a guarantee, between a governance narrative and effective governance.
Looking forward and above all the data
These two Key Updates have a rare quality. They do not sell certainty. The First makes the speed problem explicit and centers the gap between benchmark performance and real world effectiveness, as well as the difficulties of oversight when evaluation itself can be gamed. The Second assumes that mitigation effectiveness is uncertain, while describing a plausible architecture, defence in depth, and providing orders of magnitude that force us to think in adversarial and supply chain terms, bypass about one time out of two in ten attempts, and poisoning via a few hundred documents.
The logical next step is not only to stack frameworks. It is to accumulate observations, standardize incident reports, and build an empirical memory of risk, exactly the ambition of the OECD’ Towards a common reporting framework. If the field delivers on its promise to become more evidence based, these Key Updates will have served as a bridge between two worlds, the world of capabilities that surprise and the world of institutions that learn. We look forward to what comes next.
OpenEdition suggests that you cite this post as follows: Arthur Charpentier (December 27, 2025). Key Updates of the International AI Safety Report (1 and 2). Freakonometrics. Retrieved December 27, 2025 from https://freakonometrics.hypotheses.org/86972