GPT-5.5 dominates $1,500 LLM hacking test while Gemini refuses to even try (opens in new tab)

Covers 2 stories including DeepSeek-V4: Towards Highly Efficient Million-Token Context IntelligenceDiscussed on r/ChatGPT

A security researcher spent $1,500 running 13+ AI models against a deliberately vulnerable app. GPT-5.5 led with a 70% solve rate, DeepSeek V4 Pro solved it for $0.62 per attempt, and Gemini refused to engage almost entirely.

Read the original article