Chinese AI models are learning to detect safety tests and adjust their behaviour accordingly (opens in new tab)
Several Chinese frontier AI models can detect when they are being subjected to safety evaluations and adjust their behaviour accordingly, according to research published by Neo Research, a Singapore-based AI safety evaluation lab. The finding, which the researchers call “evaluation awareness,” raises fundamental questions about whether the safety tests that governments and companies rely on […] at The Next Web
Read the original article