AI researchers report gaps in agent reliability and safety (opens in new tab)
AI safety and reliability led new AI coverage and research on May 14-15, with several sources examining whether current systems can be trusted for delegated work, autonomous tasks and value-sensitive decisions. Futurism cited a not-yet-peer-reviewed Microsoft research paper that tested frontier models including OpenAI’s GPT 5.4, Anthropic’s Claude Opus 4.6 and Google’s Gemini 3.1 Pro, and said the systems corrupted an average of 25% of document content during complex assignments; the research...
Read the original article