Experts find flaws in hundreds of tests that check AI safety and effectiveness
theguardian.com·12h·
Flag this post

Experts have found weaknesses, some serious, in hundreds of tests used to check the safety and effectiveness of new artificial intelligence models being released into the world.

Computer scientists from the British government’s AI Security Institute, and experts at universities including Stanford, Berkeley and Oxford, examined more than 440 benchmarks that provide an important safety net.

They found flaws that “undermine the validity of the resulting claims”, that “almost all … have weaknesses in at least one area”, and resulting scores might be “irrelevant or even misleading”.

Many of the benchmarks are used to evaluate the latest AI models released by the big technology companies, said the study’s lead author, Andrew Bean, a researcher at the Oxford Inte…

Similar Posts

Loading similar posts...