The morning of December 11, 2025, I moderated a cybersecurity panel at The AI Summit here in NYC. The day before, OpenAI published a bombshell report: their GPT-5 model scored 27% on Capture the Flag hacking challenges in August. By October, GPT-5.1-Codex-Max scored 76%. OpenAI classified upcoming models as posing “high” cybersecurity risk, noting their ability to work autonomously for extended periods without human intervention.
That data should have driven every conversation at the summit. Instead, when I cited documented breaches, I got called a doomer and then this same panelist told the audience “we’re winning.”
Here’s what happened between mid-September and late November 2025:
Chinese state-sponsored hackers des…
The morning of December 11, 2025, I moderated a cybersecurity panel at The AI Summit here in NYC. The day before, OpenAI published a bombshell report: their GPT-5 model scored 27% on Capture the Flag hacking challenges in August. By October, GPT-5.1-Codex-Max scored 76%. OpenAI classified upcoming models as posing “high” cybersecurity risk, noting their ability to work autonomously for extended periods without human intervention.
That data should have driven every conversation at the summit. Instead, when I cited documented breaches, I got called a doomer and then this same panelist told the audience “we’re winning.”
Here’s what happened between mid-September and late November 2025:
Chinese state-sponsored hackers designated GTG-1002 weaponized Anthropic’s Claude Code to autonomously execute cyber espionage against 30 global targets, including major tech companies, financial institutions, and government agencies. The AI performed 80-90% of the attack operations with minimal human supervision. Anthropic confirmed this as “the first documented case of a cyberattack largely executed without human intervention at scale.”
On November 9, hackers breached Mixpanel’s systems through an SMS phishing campaign, exposing OpenAI API user data including names, emails, and locations. OpenAI disclosed this November 27.
On November 21, Google confirmed over 200 companies had data stolen via a Gainsight supply chain attack affecting Salesforce instances.
On November 18, Google patched CVE-2025-13223, a Chrome zero-day actively exploited in the wild.
Deloitte submitted two government reports with AI-hallucinated citations. The July 2025 Australian report contained fake academic papers and fabricated court quotes. The May 2025 Canadian healthcare report cited non-existent research and invented co-author relationships. Combined cost to taxpayers: approximately $1.4 million.
I asked the panel, “After everything we’ve discussed, the malware that rewrites itself, the trusted AI tools turned into exfiltration vectors, the 4TB databases exposed by human error, I’ll ask the only question that matters: Is anyone actually in command, or are we all just trying to survive systems we’ve already lost control of?”
Panelist 1 straight up called me a doomer. Promoted his podcast. Told everyone “we’re winning against adversaries.”
Panelist 2 listed Ivy League credentials. Pitched their startup services.
The two remaining panelists actually provided helpful, useful answers. They ship AI products daily. They shared real governance challenges, honest constraints. No sales pitch. No credential theater. Just lessons from the trenches. Substance.
The GTG-1002 campaign proves we’re not winning. Attackers convinced Claude it was conducting legitimate security testing. The AI autonomously discovered vulnerabilities, wrote exploit code, harvested credentials, moved laterally across networks, and exfiltrated data. Thousands of requests per second. Physically impossible for human operators to match.
Meanwhile, organizations still struggle with the basics; patch management, email phishing, fundamental cloud configurations, leaving S3 buckets open (you have to click through 9 warnings asking you if you really want to do this), and exposed API endpoints. Deloitte couldn’t/didn’t verify AI-generated citations before submitting million-dollar government reports as paid deliverables. 3rd party vendors like Mixpanel and Gainsight expand attack surfaces faster than anyone can audit.
OpenAI’s December 10, 2025, report quantified what I said on stage: AI cybersecurity capabilities accelerate rapidly. The 49-point improvement in eight weeks came from models working autonomously for extended periods. Fouad Martin at OpenAI explicitly noted this persistence as the driving force behind capability growth. The panelist who responded to the question reverted to calling me names while promoting his podcast. That’s not expertise. That’s performance.
Over 80 vendor booths. I counted. More than 25% used “agentic AI” in their taglines. Add “enterprise scale,” “accelerate,” “trust,” and “intelligent agents,” you’ve got buzzword bingo, not solutions.
Nobody explained how to audit AI vendors when Claude gets weaponized autonomously.
Nobody addressed the speed gap between AI offense improving 49 points in 8 weeks and AI defense decisions stuck in approval chains.
Demand vendors demonstrate how they can the GTG-1002 scenario. At a minimum, ask questions like:
1. How do they protect against AI systems probing autonomously for extended periods?
2. How do they verify AI outputs before submission?
3. How do they secure fundamentals while deploying advanced capabilities?
If they redirect to buzzwords, walk away.
If they cite credentials, which Ivy League school they teach, which group they founded and their startup’s services instead of answering, walk away.
If they tell you “we’re winning” while citing no data, definitely walk away. No, run!
December 10: OpenAI confirms AI hacking capabilities pose “high” risk, jumping 49 points in capability in eight weeks.
September 2025: Chinese hackers weaponize Claude Code for autonomous cyber espionage.
November 2025: OpenAI suffers vendor breach. Google confirms 200+ companies compromised. Chrome zero-day exploited in the wild.
2025: Deloitte submits $1.8 million in government reports containing AI hallucinations.
That’s not doom. That’s documentation. Those are facts and nothing but the facts.
While one panelist’s only response to the question was to revert to name calling; calling the moderator (me), a “doomer” for citing documented security reports, they told everyone in the standing room only packed room they prioritize narrative over facts.
Stop buying theater. Demand substance.
No posts