Brain the Size of a Planet: Are LLMs Thonking too Hard? (30 minute read) (opens in new tab)

Covers Defense at AI speed: Microsoft’s new multi-model agentic security system tops leading industry benchmarkCovered by tldr.techDiscussed on Hacker News

Earlier studies showed that Opus 4.6, GPT 5.4, Gemini 3.1-pro-preview, Deepseek R1-0528, and Qwen 3.6-plus were unable to find two of the vulnerabilities discussed in the Mythos blog post without extremely revealing hints. This study continues the previous experiments with 26 distinct Claude-4.6/4.7 and GPT-5.4/5.5 combinations and different context window sizes and reasoning efforts. It found that higher reasoning effort, and even later models, are not always better for triaging security res...

Read the original article

Sign in to keep reading the full article.

Sign Up Log In

Covered in 2 articles

tldr.tech·

Agentic coding for all 🏟, converse more 🗣, building AI-native startups 🛠

tldr.tech·

Covered in 2 articles

Agentic coding for all 🏟, converse more 🗣, building AI-native startups 🛠

ChatGPT marketshare drops 📉, Vercel eve 🤖, Replit links Claude 🔌