Assuring Agent Safety Evaluations By Analysing Transcripts
lesswrong.com·20h

Published on October 10, 2025 12:42 AM GMT

Summary

This is a research update from the Science of Evaluation team at the UK AI Security Institute. In this update, we share preliminary results from analysing transcripts of agent activity that may be of interest to  researchers working in the field.

AISI generates thousands of transcripts when running its automated safety evaluations, e.g. for OpenAI’s o1 model, many of which contain the equivalent of dozens of pages of text. This post details a case study where we systematically analysed the content of 6,390 testing transcripts. We hi…

Similar Posts

Loading similar posts...