
A critical file upload vulnerability allowing remote code execution, complete with reproduction steps.
Open source ecosystems have a long tail security problem. Python, Ruby, Javascript, PHP: these ecosystems have millions of packages. The top 100 packages get scrutinized. The next 5,000, not so much.
We built an AI-powered pipeline to change that. As our first target, we chose popular ecommerce extensions on Packagist. The results: 353 confirmed vulnerabilities across the top 5,000 extensions, affecting packages with 5.9 million total downloads.

A critical file upload vulnerability allowing remote code execution, complete with reproduction steps.
Open source ecosystems have a long tail security problem. Python, Ruby, Javascript, PHP: these ecosystems have millions of packages. The top 100 packages get scrutinized. The next 5,000, not so much.
We built an AI-powered pipeline to change that. As our first target, we chose popular ecommerce extensions on Packagist. The results: 353 confirmed vulnerabilities across the top 5,000 extensions, affecting packages with 5.9 million total downloads.

Ten security auditor agents running in parallel, each analyzing a different ecommerce extension.
Vulnerability hunting at scale
We assembled a four-stage pipeline using Claude Opus 4.5 for the core analysis. We focused on Magento, the most popular commerce ecosystem with over 10,500 published extensions.
1. Aggregator: Queries Packagist for the top 5,000 magento2-module packages (based on downloads) and retrieves the latest version.
2. Security Auditor: A custom Claude agent that runs static analysis on each extension. The scope is narrow: only issues exploitable without admin access. This means RCE, SQL injection, PHP object injection, authentication bypass, arbitrary file operations, and XXE. We excluded XSS, CSRF, open redirects, and admin-only issues to focus on the most dangerous attack vectors.
Security Auditor prompt
Audit Magento 2 extensions for CRITICAL vulnerabilities exploitable without admin access.
## Scope
**Report only if:**
- Code path is reachable without authentication (or customer-level only)
- Works on default configurations
- Direct impact: RCE, auth bypass, mass data breach, payment manipulation
- Directly exploitable without preconditions
**Exclude:** Admin-only issues, XSS, CSRF, open redirects, info disclosure,
theoretical attacks, chained attacks, non-default configs, cart takeover.
## Vulnerability Types
- **RCE**: unserialize() with direct user input, command injection, eval, arbitrary file write
- **SQLi**: Raw queries with unsanitized input, filter/search/sort injection
- **Auth Bypass**: Broken authentication, session flaws, API auth issues
- **File Operations**: Path traversal, unrestricted upload, LFI/RFI
- **XXE**: XML external entities
## Output
Write a security-audit.json file to the extension root directory with this structure:
```json
[
{
"id": "CRITICAL-1",
"title": "Description of vulnerability",
"type": "RCE|SQLi|AuthBypass|FileOps|XXE",
"cvss": 9.8,
"file": "path/to/file.php",
"line": 123,
"description": "2-3 sentence explanation of the vulnerability...",
"code": "minimal vulnerable code snippet"
}
]
If no vulnerabilities found, write an empty array: `[]`.
3. Vulnerability Reproducer: A second Claude agent that validates each finding to filter out hallucinations. It spins up a Docker container with a fresh Magento install, adds the extension via Composer, traces the vulnerable code to an HTTP endpoint, crafts a minimal curl proof-of-concept, and attempts exploitation. Each issue gets marked as reproduced, false_positive, or inconclusive.
Vulnerability Reproducer prompt
Reproduce vulnerabilities from a `security-audit.json` report on a live Magento 2 instance.
## Required Inputs
The orchestrator must provide:
1. **PORT** - Unique port number for this instance (e.g., 8080, 8081, 8082...)
2. **Composer package name** - The package to install
## Setup
CONTAINER_NAME="magento-vuln-$PORT"
docker run -d --name "$CONTAINER_NAME" -p "$PORT:8080" -e MAGENTO_HOST=localhost -e MAGENTO_PORT="$PORT" magento-aio
until [ "$(docker inspect -f '' "$CONTAINER_NAME" 2>/dev/null)" = "healthy" ]; do sleep 2; done
## Install Extension
docker exec "$CONTAINER_NAME" composer -d /var/www/magento require --ignore-platform-reqs <vendor>/<module>
docker exec "$CONTAINER_NAME" php /var/www/magento/bin/magento setup:upgrade
Note:
- It is not necessary to run `setup:di:compile` or `setup:static-content:deploy` - the instance runs in developer mode.
- It is not necessary to flush the cache after installation, this is done by the `setup:upgrade` command.
## Reproduce Vulnerabilities
For each vulnerability in `security-audit.json`:
1. Read the vulnerable file, trace to HTTP endpoint (controller/webapi route)
2. Craft minimal curl PoC with benign payload
3. Execute and verify exploitation
Use `form_key` bypass for CSRF-protected endpoints: `curl -X POST -H "Cookie: form_key=test" -d "form_key=test&..." "http://localhost:$PORT/..."`
Database access: `docker exec "$CONTAINER_NAME" mysql -u magento -pmagento magento -e "..."`
## Update Report
After testing each vulnerability, update `security-audit.json` with a `reproduction` field:
- `"reproduced"` - Successfully exploited (include curl PoC)
- `"false_positive"` - Not exploitable (explain why not)
- `"inconclusive"` - Could not determine (explain blockers)
**Before marking reproduced**, check `etc/config.xml` for default values. Mark as `false_positive` if exploitation requires auth/security feature to be disabled or a secret/token/key to be empty.
Include `reproduction_notes` with the curl command used or explanation.
## Teardown
Remove the container: `docker rm -f "$CONTAINER_NAME"`
4. WAF Suggestor: For each confirmed vulnerability, an agent proposes active filtering rules for our eCommerde WAF. This allows us to protect customers immediately, even before vendors release patches.
Results
The Security Auditor flagged 447 potential issues across the 5,000 extensions. After running each through the Vulnerability Reproducer:
| Status | Count | Percentage |
|---|---|---|
| Reproduced | 353 | 79% |
| False positive | 65 | 15% |
| Inconclusive | 27 | 6% |
The vulnerability types:
- IDOR / Authentication Bypass (265): Order (payment) manipulation, payment data exposure, PII leaks, account takeover
- SQL Injection (50): Direct database access, data theft, privilege escalation
- Arbitrary File Read/Write (23): Encryption key theft, webshell deployment
- Remote Code Execution (15): Attackers can execute arbitrary code on the server

Reproduction results showing successful exploitation with curl commands.
Popular packages tend to have fewer issues, however the correlation is weaker than we expected (p=0.08):
| Category | Mean downloads |
|---|---|
| Extensions WITH vulnerabilities | 25,904 |
| Extensions WITHOUT vulnerabilities | 37,426 |
Limitations
We limited the scope of our audit to save on costs. For example, our pipeline did not consider multi-pass or chained attack methods. A larger budget would surely find more issues.
Our aggregator only considered the latest version for each package. Since auto-update isn’t very common on ecommerce platforms, many stores will run older versions that may contain different vulnerabilities.
So far we have manually verified 30% of results and found no false positives for the verified vulnerabilities. There were however some classification errors (e.g., RCE vs arbitrary file write).

The reproducer agent validates findings by attempting actual exploitation, filtering out false positives.
The economics of AI security research
Running this pipeline cost about $10,000 in API calls. That’s $2 per extension for a security audit with proof-of-concept validation.
Clearly, security research is becoming a function of compute budget, not headcount. As Heelan argues, the limiting factor on exploit development is shifting from skilled researchers to token throughput. $10k bought us 353 confirmed vulnerabilities that would have taken a human team months.
Vendors can use these techniques to secure their software. But attackers can run the same pipeline. Heelan’s work shows LLMs can generate working exploits for complex vulnerabilities at roughly $30 each. The full pipeline from Packagist to working exploit is now economically viable for attackers.
Responsible disclosure
We don’t want to flood vendors with AI-generated bug reports, so we’re manually verifying results and reaching out.
The response has been mixed: some vendors patch within days, others don’t respond at all. We will issue public alerts when vendors release fixed versions.
Meanwhile, we’re adding attack filters to our ecommerce WAF to protect customers from vendors who don’t provide timely fixes.
Beyond Packagist: every ecosystem up for grabs
Our pipeline was optimized for Magento based on our years of experience in this field. But the same architecture works for any package ecosystem: PyPI, npm, RubyGems, Go modules, Cargo crates and so on.
The tooling is generic. Swap out the aggregator and apply a basic amount of domain knowledge to the verification pass.
What this means for merchants
Many of the vulnerabilities found in these extensions would allow an attacker to:
- Order without paying
- Steal your customer and payment data
- Install ransomware on your store
If you run ecommerce on an open source platform:
Audit your extensions: Review what’s installed and whether you actually need each one. The fewer extensions, the smaller your attack surface. 1.
Block attacks: Use an eCommerce WAF as first line of defense. 1.
Stay up to date: Verify your vendor’s policy on disclosing security issues and subscribe to their changelog.