Claude finds 353 zero-days on Packagist

A critical file upload vulnerability allowing remote code execution, complete with reproduction steps.

Open source ecosystems have a long tail security problem. Python, Ruby, Javascript, PHP: these ecosystems have millions of packages. The top 100 packages get scrutinized. The next 5,000, not so much.

We built an AI-powered pipeline to change that. As our first target, we chose popular ecommerce extensions on Packagist. The results: 353 confirmed vulnerabilities across the top 5,000 extensions, affecting packages with 5.9 million total downloads.

![Terminal showing 10 parallel security auditor agent…

A critical file upload vulnerability allowing remote code execution, complete with reproduction steps.

Open source ecosystems have a long tail security problem. Python, Ruby, Javascript, PHP: these ecosystems have millions of packages. The top 100 packages get scrutinized. The next 5,000, not so much.

Terminal showing 10 parallel security auditor agents analyzing ecommerce extensions

Ten security auditor agents running in parallel, each analyzing a different ecommerce extension.

Vulnerability hunting at scale

We assembled a four-stage pipeline using Claude Opus 4.5 for the core analysis. We focused on Magento, the most popular commerce ecosystem with over 10,500 published extensions.

1. Aggregator: Queries Packagist for the top 5,000 magento2-module packages (based on downloads) and retrieves the latest version.

2. Security Auditor: A custom Claude agent that runs static analysis on each extension. The scope is narrow: only issues exploitable without admin access. This means RCE, SQL injection, PHP object injection, authentication bypass, arbitrary file operations, and XXE. We excluded XSS, CSRF, open redirects, and admin-only issues to focus on the most dangerous attack vectors.

Security Auditor prompt


Audit Magento 2 extensions for CRITICAL vulnerabilities exploitable without admin access.

## Scope

**Report only if:**

- Code path is reachable without authentication (or customer-level only)
- Works on default configurations
- Direct impact: RCE, auth bypass, mass data breach, payment manipulation
- Directly exploitable without preconditions

**Exclude:** Admin-only issues, XSS, CSRF, open redirects, info disclosure,
theoretical attacks, chained attacks, non-default configs, cart takeover.

## Vulnerability Types

- **RCE**: unserialize() with direct user input, command injection, eval, arbitrary file write
- **SQLi**: Raw queries with unsanitized input, filter/search/sort injection
- **Auth Bypass**: Broken authentication, session flaws, API auth issues
- **File Operations**: Path traversal, unrestricted upload, LFI/RFI
- **XXE**: XML external entities

## Output

Write a security-audit.json file to the extension root directory with this structure:

```json
[
{
"id": "CRITICAL-1",
"title": "Description of vulnerability",
"type": "RCE|SQLi|AuthBypass|FileOps|XXE",
"cvss": 9.8,
"file": "path/to/file.php",
"line": 123,
"description": "2-3 sentence explanation of the vulnerability...",
"code": "minimal vulnerable code snippet"
}
]

If no vulnerabilities found, write an empty array: `[]`.

3. Vulnerability Reproducer: A second Claude agent that validates each finding to filter out hallucinations. It spins up a Docker container with a fresh Magento install, adds the extension via Composer, traces the vulnerable code to an HTTP endpoint, crafts a minimal curl proof-of-concept, and attempts exploitation. Each issue gets marked as reproduced, false_positive, or inconclusive.

Vulnerability Reproducer prompt


Reproduce vulnerabilities from a `security-audit.json` report on a live Magento 2 instance.

## Required Inputs

The orchestrator must provide:
1. **PORT** - Unique port number for this instance (e.g., 8080, 8081, 8082...)
2. **Composer package name** - The package to install

## Setup

CONTAINER_NAME="magento-vuln-$PORT"
docker run -d --name "$CONTAINER_NAME" -p "$PORT:8080" -e MAGENTO_HOST=localhost -e MAGENTO_PORT="$PORT" magento-aio
until [ "$(docker inspect -f '' "$CONTAINER_NAME" 2>/dev/null)" = "healthy" ]; do sleep 2; done

## Install Extension

docker exec "$CONTAINER_NAME" composer -d /var/www/magento require --ignore-platform-reqs <vendor>/<module>
docker exec "$CONTAINER_NAME" php /var/www/magento/bin/magento setup:upgrade

Note:
- It is not necessary to run `setup:di:compile` or `setup:static-content:deploy` - the instance runs in developer mode.
- It is not necessary to flush the cache after installation, this is done by the `setup:upgrade` command.

## Reproduce Vulnerabilities

For each vulnerability in `security-audit.json`:

1. Read the vulnerable file, trace to HTTP endpoint (controller/webapi route)
2. Craft minimal curl PoC with benign payload
3. Execute and verify exploitation

Use `form_key` bypass for CSRF-protected endpoints: `curl -X POST -H "Cookie: form_key=test" -d "form_key=test&..." "http://localhost:$PORT/..."`

Database access: `docker exec "$CONTAINER_NAME" mysql -u magento -pmagento magento -e "..."`

## Update Report

After testing each vulnerability, update `security-audit.json` with a `reproduction` field:

- `"reproduced"` - Successfully exploited (include curl PoC)
- `"false_positive"` - Not exploitable (explain why not)
- `"inconclusive"` - Could not determine (explain blockers)

**Before marking reproduced**, check `etc/config.xml` for default values. Mark as `false_positive` if exploitation requires auth/security feature to be disabled or a secret/token/key to be empty.

Include `reproduction_notes` with the curl command used or explanation.

## Teardown

Remove the container: `docker rm -f "$CONTAINER_NAME"`

4. WAF Suggestor: For each confirmed vulnerability, an agent proposes active filtering rules for our eCommerde WAF. This allows us to protect customers immediately, even before vendors release patches.

Results

The Security Auditor flagged 447 potential issues across the 5,000 extensions. After running each through the Vulnerability Reproducer:

Status	Count	Percentage
Reproduced	353	79%
False positive	65	15%
Inconclusive	27	6%

The vulnerability types:

IDOR / Authentication Bypass (265): Order (payment) manipulation, payment data exposure, PII leaks, account takeover
SQL Injection (50): Direct database access, data theft, privilege escalation
Arbitrary File Read/Write (23): Encryption key theft, webshell deployment
Remote Code Execution (15): Attackers can execute arbitrary code on the server

Terminal output showing successfully reproduced vulnerabilities with curl PoC commands for auth bypass and SQL injection

Reproduction results showing successful exploitation with curl commands.

Popular packages tend to have fewer issues, however the correlation is weaker than we expected (p=0.08):

Category	Mean downloads
Extensions WITH vulnerabilities	25,904
Extensions WITHOUT vulnerabilities	37,426

Limitations

We limited the scope of our audit to save on costs. For example, our pipeline did not consider multi-pass or chained attack methods. A larger budget would surely find more issues.

Our aggregator only considered the latest version for each package. Since auto-update isn’t very common on ecommerce platforms, many stores will run older versions that may contain different vulnerabilities.

So far we have manually verified 30% of results and found no false positives for the verified vulnerabilities. There were however some classification errors (e.g., RCE vs arbitrary file write).

Vulnerability reproduction summary showing 10 modules tested with reproduced and false positive results

The reproducer agent validates findings by attempting actual exploitation, filtering out false positives.

The economics of AI security research

Running this pipeline cost about $10,000 in API calls. That’s $2 per extension for a security audit with proof-of-concept validation.

Clearly, security research is becoming a function of compute budget, not headcount. As Heelan argues, the limiting factor on exploit development is shifting from skilled researchers to token throughput. $10k bought us 353 confirmed vulnerabilities that would have taken a human team months.

Vendors can use these techniques to secure their software. But attackers can run the same pipeline. Heelan’s work shows LLMs can generate working exploits for complex vulnerabilities at roughly $30 each. The full pipeline from Packagist to working exploit is now economically viable for attackers.

Responsible disclosure

We don’t want to flood vendors with AI-generated bug reports, so we’re manually verifying results and reaching out.

The response has been mixed: some vendors patch within days, others don’t respond at all. We will issue public alerts when vendors release fixed versions.

Meanwhile, we’re adding attack filters to our ecommerce WAF to protect customers from vendors who don’t provide timely fixes.

Beyond Packagist: every ecosystem up for grabs

Our pipeline was optimized for Magento based on our years of experience in this field. But the same architecture works for any package ecosystem: PyPI, npm, RubyGems, Go modules, Cargo crates and so on.

The tooling is generic. Swap out the aggregator and apply a basic amount of domain knowledge to the verification pass.

What this means for merchants

Many of the vulnerabilities found in these extensions would allow an attacker to:

Order without paying
Steal your customer and payment data
Install ransomware on your store

If you run ecommerce on an open source platform:

Audit your extensions: Review what’s installed and whether you actually need each one. The fewer extensions, the smaller your attack surface. 1.

Block attacks: Use an eCommerce WAF as first line of defense. 1.

Stay up to date: Verify your vendor’s policy on disclosing security issues and subscribe to their changelog.