AI agent finds more security flaws than human hackers at Stanford

An artificial intelligence agent has shown it can do what even seasoned cybersecurity professionals sometimes miss. In a controlled experiment at Stanford University, an AI system called ARTEMIS managed to uncover security flaws across the university’s computer science networks and, in the process, outperformed most human hackers involved in the same test.

According to a new study reported by Business Insider, ARTEMIS was allowed to operate for 16 hours across Stanford’s public and private computer science networks, which include around 8,000 devices such as servers, computers, and smart systems. The researchers then compared its performance with that of 10 experienced human penetration t…

Within the first 10 hours alone, ARTEMIS identified nine valid security vulnerabilities, achieving an 82 percent valid submission rate. That result placed the AI ahead of nine out of the ten human participants, with only one expert performing better overall.

**Also Read: **AI Successfully Controls Satellite Attitude in Orbit for the First Time

The project was led by researchers Justin Lin, Eliot Jones, and Donovan Jasper, who developed ARTEMIS after noticing that many existing AI tools struggled with long, complex security tasks. Unlike conventional systems, ARTEMIS was designed to work autonomously for extended periods, continuously scanning networks and adapting its focus as new leads emerged.

One reason for its strong performance is how it handles scale. When ARTEMIS detects something unusual during a scan, it can instantly launch multiple smaller “sub-agents” to investigate different vulnerabilities at the same time. Human testers, by contrast, have to examine potential flaws one by one. In one case, the AI found a weakness in an older server that human testers could not access because their web browsers refused to load it. ARTEMIS bypassed the issue using a command-line request and successfully broke in.

Cost was another striking difference. Running ARTEMIS costs about $18 an hour, while even a more advanced version comes in at roughly $59 an hour. That is still far below the average annual salary of around $125,000 for a professional penetration tester in the US, the study said.

**Also Read: **A Man Powers His Home for 8 Years Using 1,000 Recycled Laptop Batteries

The researchers also tested other existing AI agents, but those systems lagged behind most human participants. ARTEMIS, however, performed at a level the team described as comparable to the strongest human testers.

The AI is not without limits. ARTEMIS struggled with tasks that required navigating graphical user interfaces, causing it to miss at least one critical vulnerability. It also produced more false positives than humans, sometimes mistaking harmless network signals for signs of a successful breach. The researchers noted that the system performs best in environments dominated by text-based inputs and outputs.

The findings come amid growing concern that AI is making hacking and cybercrime easier. As Business Insider previously reported, North Korean hacking groups have used generative AI tools to create fake military IDs for phishing campaigns, while other state-linked actors have relied on AI to gain insider access to corporate systems or launch cyberattacks abroad.

**Also Read: **AI Creates the First 100-Billion-Star Simulation of the Milky Way

Reporting by Times Now News highlights that ARTEMIS’s success could push organisations to rethink how they approach cybersecurity testing, especially given the system’s ability to work continuously and at a much lower cost than human experts.

For now, the Stanford researchers stress that ARTEMIS is a testing tool, not a replacement for human judgment. The experiment still makes one thing clear. In the race between attackers and defenders, AI is no longer just an assistant. In some cases, it is already competing at the top level.

Sources: Business Insider, Times Now News

Similar Posts