13 min read5 hours ago
–
Pt 2: AI for Threat Modeling and Remediation
Press enter or click to view image in full size
Ion Cannon Concept Art by Ralph McQuarrie — Courtesy Lucasfilm LLC
Introduction
It should surprise no one that Google says 2026 will be the year AI supercharges cybercrime. As the new arms race between cybercriminals and defenders goes into overdrive, understanding basic security posture is more vital than ever, and doing it all by hand won’t cut it any more.
In our last segment, Security By Design Pt 1: Introduction to Threat Modeling, we introduced the crucial concepts of analyzing a system by looking at **…
13 min read5 hours ago
–
Pt 2: AI for Threat Modeling and Remediation
Press enter or click to view image in full size
Ion Cannon Concept Art by Ralph McQuarrie — Courtesy Lucasfilm LLC
Introduction
It should surprise no one that Google says 2026 will be the year AI supercharges cybercrime. As the new arms race between cybercriminals and defenders goes into overdrive, understanding basic security posture is more vital than ever, and doing it all by hand won’t cut it any more.
In our last segment, Security By Design Pt 1: Introduction to Threat Modeling, we introduced the crucial concepts of analyzing a system by looking at Trust Boundaries and Attack Surfaces. We also covered the STRIDE model (Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service [DoS], and Elevation of Privilege) for analyzing threats based on these concepts.
On this channel, we like to automate away toil as much as possible, because manual work is both time-consuming and error-prone. While automated Threat Modeling tools have existed for a long time, they mostly rely on brute force, resulting in a massive slog of warnings and false positives, and before AI they were incapable of handling remediation.
As you’ll see, modern AI tools are far from perfect, but they significantly outperform manual methods for generating threat models and suggesting fixes.
The AI landscape evolves daily, and while many powerful tools exist, for this article we’ll focus on integrating security directly into your existing Claude Code workflow (or how to get one if you don’t have one!).
In Action: Building Our Target System
Let’s look at a concrete example: a simple ordering system where a React frontend places orders into an SQS queue, which in turn triggers a microservice that processes the order and writes it to the database. Of course, a real system is significantly more complex, including payment, fulfillment, transactions, retries, and notifications. But for the purpose of this exercise, we’ll keep it simple.
Traditionally, spinning up and managing separate components — a web server for the frontend, an API gateway, an AWS account for the SQS queue, an order processing service, and an RDBMS for the database — creates a lot of overhead and results in long develop/deploy/test/iterate cycles.
Fortunately, on a reasonably powerful machine, we can spin up everything locally by leveraging:
- Terraform to manage the Infrastructure as Code (IaC)
- LocalStack to run (most of) AWS locally
- Docker Compose to run it all, including the database (we pick MariaDB)
This is a great, fast, and easy way to develop: you can even use Docker to check the database without having to login separately! (Normally we’d want to make our local setup as close to Production as possible, and the paid version of LocalStack easily pays for itself in increased productivity. To keep this article more accessible, we’ll stick to the free tier, even if it means accepting a few compromises).
Even more fortunately, we can use Claude to write the entire stack. We’ll cover the detailed process and some interesting surprises in a separate article, but here’s the basic summary.
My favorite way to work with Claude or any AI Assistant is by writing single-line Jira tickets as TODOs in a CLAUDE.md file:
- Build a multi-tier app allowing users to place orders- This will be run on LocalStack locally to simulate AWS.- Build the appropriate scaffolding in Terraform to setup the LocalStack version- Components: - Frontend/Gateway: React, Places orders into Processing queue - Processing Queue: SQS - Backend Service: Background microservice that reads the queue and inserts records into the data store - Data Store: Persistent MariaDB Database.- Generate mermaid markup for the component diagram- Create a jpeg of the component diagram from the mermaid source- Create Dockerfiles as appropriate
In real life, you want these tickets to be much more detailed, including the order fields and all other necessary specifications.
For now, let’s fire up Claude and go. My preferred workflow involves using the assistant inside IntelliJ or another integrated IDE, so that it shows each step as a PR which you can accept, reject, or suggest modifications to. We create a blank project with only the above file, and let Claude do the rest.
> Implement ToDos
What we end up with is simple and runnable. Claude even provides sensible defaults for the ordering fields. What it doesn’t include is any kind of user management, meaning anyone who finds the site can create orders. Just like real-world Jira tickets, you get what you ask for and you don’t get what you don’t ask for, though you’d hope human developers would raise their hand during planning to say, “Um, don’t we need that?”
Of course, not everything works perfectly out of the box, but after a few hours of back-and-forth, we end up with a functional system running locally, and it even writes an extensive README.md with detailed instructions on running and troubleshooting the system.
If you want to play along at home, you can check out the repo at https://gitlab.com/tkersh/echobase.git .
You’ll want to start with v0.0, the iterated Claude output, including the Threat Model (don’t look at it yet!) without any security fixes. To actually run the system, you need to be on Mac, Linux or Windows+WSL, and have Terraform, LocalStack, and Docker/Docker Compose properly installed. You don’t need to run it unless you’re curious.
Take a moment to see which trust boundaries and attack surfaces you see:
Press enter or click to view image in full size
Threat Model
Let’s have Claude generate our Threat Model and see what we get:
> document the trust boundaries and attack surfaces of this system into TrustBoundaries.md
The resulting document is vastly more comprehensive than I expected, which is a great thing and makes me wish that Anthropic included this as an automatic step (or at least suggested it). Later, we’ll want to add a skill to do so, as well as skills for addressing some of the more common vulnerabilities out of the box.
Here’s the output:
Key Findings: 5 Trust Boundaries Identified: 1. External Network ↔ Docker Network (🔴 CRITICAL) - No auth, permissive CORS, no HTTPS 2. Frontend ↔ API Gateway (🟡 MEDIUM) - No encryption, minimal input validation 3. API Gateway ↔ SQS Queue (🟠 HIGH) - Hardcoded credentials, no message encryption 4. Order Processor ↔ SQS Queue (🟠 HIGH) - Same credential issues, no DLQ monitoring 5. Order Processor ↔ MariaDB (🔴 CRITICAL) - Hardcoded DB credentials, no encryption at rest
Attack Surfaces Documented: - Frontend: XSS, clickjacking, DoS vulnerabilities - API Gateway: Unauthenticated access, CORS misconfiguration, no rate limiting - SQS Queue: Credential theft, message tampering, queue flooding - Order Processor: Poison message attacks, resource exhaustion - MariaDB: Credential-based attacks, data exfiltration, no audit trail
Security Gaps (15 identified): - Critical: No authentication, no HTTPS, hardcoded credentials, permissive CORS, no rate limiting - High Priority: No input sanitization, no request limits, no encryption - Medium Priority: No DLQ monitoring, weak credentials, no API versioning The document includes: - Architecture diagrams showing trust boundaries - Specific attack vectors and risk levels - Detailed remediation code examples - Threat scenarios (DoS, credential theft, CSRF, MITM) - Compliance considerations (PCI DSS, GDPR, SOC 2, HIPAA) Security Status: ⚠️ NOT PRODUCTION READY - Multiple critical vulnerabilities require remediation before deployment.
The resulting document is quite thorough, and I recommend giving it a read to familiarize yourself: https://gitlab.com/tkersh/echobase/-/blob/6800142a6537227b439cc9fe0424d2c2d65856dd/TrustBoundaries.md
But… ouch!
It’s nice to see that Claude has such a robust list of suggestions, but it underscores how easy it is to naively create what looks like a fully functioning application but leaks like a sieve. Every week we get news of another serious breach on a vibe-coded (or naively coded) systems, and it would be nice if Claude started with more sensible security defaults. The worst offender is the hardcoded credentials, which is a great candidate for a skill so that Claude knows not to do it again.
First Stop: Credentials
Credentials are always tricky to set up in dev, but the time to handle them is early, before we get used to doing them the wrong way and forget to fix them later. This is exactly how security incidents happen. Fortunately, we have tools to take away the drudge work.
Let’s see what Claude suggests:
> Fix credential issues
Claude Code now:
- creates a script that uses openssl to generate strong random passwords
- puts them into the .env file
- makes changes to docker-compose and the env.example files for api-gateway and order-processor to eliminate the hard-coded values
- adds appropriate comments to those files and the README for running locally and in Prod
Developing with Claude is just like reviewing any other PR: double-check and don’t assume it’s correct. Yes, it makes a few mistakes, but it is still orders of magnitude faster than developing it all manually.
Did the fixes actually work?
Let’s give it a shot:
./generate-credentials.sh Credentials generated successfully!✓ Created .env file with secure credentials✓ Set file permissions to 600 (owner read/write only)./setup.shError connecting to database: Error: Access denied for user 'orderuser'@'192.168.65.1' (using password: YES)
It doesn’t work because we still have a persistent Docker volume containing the old, hardcoded credentials the database was initialized with.
Claude wants to fix this incrementally, but developing with patch after patch (especially with code we didn’t write ourselves) is what leads to environmental drift.
Why Environmental Drift is Dangerous
Environmental drift means your development setup slowly diverges from the production environment due to manual fixes and undocumented state changes (like the old credentials stuck in our volume). This means the deployment scripts and environment variables you rely on stop reflecting reality, and you only discover a fatal configuration error when you finally deploy to production.
Much better to tear down everything and rebuild using the same scripts we’ll use to install to other environments, so we discover any issues with the scripts right away. Using LocalStack, the process only takes a few minutes:
terraform destroy -auto-approvedocker-compose down -v./setup.sh./start.sh
And it works!
Normally, we’d want to add IAM (Identity and Access Management) policies next, but unfortunately, robust IAM enforcement is one of the limitations with the free Community version of LocalStack. Since I’m not going to make you buy a license just for this article, let’s move on to other security gaps.
We’ll tag this as version v0.1.
Yes, and Don’t Ask Again?
Claude keeps prompting us with variations of “Yes and don’t ask again”.
How we handle individual file changes depends on whether we want to look at each one individually or as a batch in a PR. I often like to look at a few at a time until I’m satisfied that Claude is going in the right direction, and then I turn it loose on larger groups.
But what about commands that execute locally, like this prompt we see in the IDE:
cd /Users/tadk/work/echobase && ./test-security.sh 2>&1 | head -100 Run security test script to verify it works Do you want to proceed? ❯ 1. Yes 2. Yes, and don't ask again for ./test-security.sh commands in /Users/tadk/work/echobase 3. No, and tell Claude what to do differently (esc)
Option #2 looks tempting, doesn’t it? Letting Claude run all of its automated tests itself each time seems like peak automation.
AIs like Claude and Cursor are so widely used now that there have already been several security incidents, including an npm supply-chain attack. The target on their backs is only growing larger. Not only are they an inviting vector for bad actors, but their own hallucinations can lead to painful mistakes (git rebase, anyone?).
Our policy here is simple:
- Running arbitrary scripts?** No.**
- Git access?** Not a fan.**
Consider the generalized permission:
2. Yes, and don't ask again for similar commands in…
Do we trust its definition of “similar command”? That’s a massive risk.
We walk the line between being too open and too cautious, but when the security of our machine (or our organization!) is at stake, we have to choose caution, restricting the AI’s ability to run code or scripts automatically. AI isn’t a self-driving car yet, and we can’t sleep at the wheel, much as we’d like to.
Fortunately, Claude’s sandbox mode helps you set and configure guardrails so you don’t have to do it all manually.
Hardening the Edges: Implementing Core Protections (v0.2)
What else should we fix? We could go down the list ourselves one by one, but since we’re looking at Claude’s capabilities let’s see what it suggests:
> Let's tackle the other security improvements
Claude generates this plan:
✅ What We Can Add Now (Development Ready) 1. CORS Restrictions - Limit origins to frontend only 2. Rate Limiting - Prevent DoS attacks 3. Request Size Limits - Already supported, just configure 4. Input Validation & Sanitization - Validate all inputs 5. Business Logic Validation - Check ranges, types 6. Security Headers - Helmet middleware
These seem like reasonable suggestions so we’ll make it so.
> Follow this plan
Claude provides us with sensible defaults for rate limits, input sanitization, and the others. Let’s take a look at its PR and make any changes we think are important:
RATE_LIMIT_MAX_REQUESTS=100
Wait, what? Maybe Claude’s just checking to see if we’re paying attention!
If this were a real system, we’d still need to spend some time thinking about our expected production traffic patterns, scaling vs. cost, etc. (although we can ask it for help in figuring all those out).
With a little help, Claude does a pretty good job of fixing these issues, even updating the security test script with sample malicious payloads (scripting, DROP table, display passwd, etc.).
Checkout v0.2 if you’re playing along.
Prioritizing Risk: Triaging Security Gaps (v0.3.x)
Now we can triage Claude’s remaining issues based on severity and ease of remediation.
Let’s focus on:
- No API Keys/Tokens: No client identity verification (high severity, medium remediation)
- Database encryption at rest disabled (medium severity, easy remediation)
- Man-in-the-middle attacks/no https (high severity, medium remediation)
User Auth and Restricted API Access (v0.3.1)
Next up is a major security oversight that Claude immediately flagged: the frontend was placing orders directly into the SQS queue without authorization. (Don’t look so shocked, this is more common in the wild than you’d want to think!)
> Implement user auth and restricted API access> Write automated tests to verify no unauthorized access to the backend or the SQS queue
The results aren’t quite ready out of the box, but after some back-and-forth iteration, Claude produces a reasonable testing suite and execution script.
A quick security reminder: During this process, Claude copied the sensitive .env file to a non-ignored .env.bak, almost checking a secret into the repository. Human-in-the-loop is critical for catching these accidental slips!
While batching the rest of the improvements would be faster, tackling them one at a time makes for clean Pull Requests and lets us clear the context window each time.
**Database Encryption, KMS, and Secrets (**v0.3.2 + v0.3.3)
For this stage, we want to enhance security by implementing Database Encryption at Rest and integrating with **KMS (Key Management Service) and Secrets Manager. **Ideally, we’d move our database into AWS RDS too, but while KMS is a welcome free inclusion in LocalStack, RDS is not.
So we’ll proceed with the available encryption steps, keeping the tutorial focused and accessible, and allowing you to play along at home without racking up AWS charges. We’ll put the full AWS migration out of scope to keep this article digestible (and your wallet happy).
This turns out to be a significant amount of work, as anyone knows who’s done it manually. This is what we get:
What Was Implemented: 1. KMS Key for Encryption (terraform/kms.tf) - Created KMS key with automatic rotation enabled - KMS Key ID: e075f0e4-8ab1-42aa-9f99-aff90539236c - Alias: alias/echobase-database 2. Secrets Manager Integration (terraform/secrets.tf) - Created secret echobase/database/credentials encrypted with KMS - Stores all database connection information - IAM policy for secret access with KMS decrypt permissions 3. Backend Services Updated - Both api-gateway and order-processor now retrieve credentials from Secrets Manager on startup - Installed @aws-sdk/client-secrets-manager package - Services successfully connect to MariaDB using retrieved credentials 4. Docker Compose Configuration - Enabled KMS and Secrets Manager services in localstack - Replaced DB credential environment variables with DB_SECRET_NAME - Kept MariaDB container (RDS not available in localstack community edition)
TLS/HTTPS (v0.3.4)
Many developers defer implementing TLS and HTTPS because setting them up locally is notoriously time-consuming. Fortunately, we have our handy assistant to handle the heavy lifting:
> Implement TLS/HTTPS
Claude quickly generates the necessary self-signed certificates for us and manages all the wiring. Everything seems functional, so we jump straight to testing, only to get this:
Tests: 21 failed, 16 passed, 37 total
It would certainly be nice if the assistant automatically updated the security tests every time it changed the security code, but alas! That’s another skill candidate.
> Now fix security tests
Now that we have a running test suite and the application is decently solid, it’s a good time to pause. There’s still plenty to do, but considering the distance we’ve covered, this marks a good spot to wrap up this installment.
Conclusion: AI is Both Mentor and Junior
We’ve successfully dipped our toes into the water, using Claude Code to build a multi-part system, perform basic threat modeling, and swiftly remediate the top security threats. We’ve demonstrated the iterative process of developing and hardening along the way, progressing from a naive setup to something far more robust.
The resulting code is still a long way from perfect and even further from being truly Production-ready. But we did manage to find and fix a number of issues that would have taken days to fix manually. Crucially, these issues might have slipped through the cracks entirely under an aggressive deadline or if we had been sidetracked by other priorities.
We’ve seen that Claude Code is an odd mixture of very smart mentor and very naive junior. Its code is far from perfect and still requires significant manual intervention and troubleshooting.
The issues that Claude flagged weren’t rocket science: they were mostly standard, basic-level security audit stuff that most developers should know. But having this process automated is incredibly valuable. Think of it as autocomplete on steroids: it saves time building boilerplate, looking up syntax, and dealing with all the fiddly bits. Few of us can remember all the security nuances all the time. When you’re fixing that critical flaw at 2 AM, Claude can help ensure you don’t accidentally introduce more vulnerabilities in the process.
Many teams defer the “nasty stuff” of security hardening only to be confronted later with several sprints’ worth of tedious slogging later in the dev cycle when deadlines loom. It’s far better to get this type of detail work out of the way upfront.
The Future of AI and Security
On the horizon are more thorough tools, such as OpenAI’s Aardvark (currently in private beta), which promises to continuously analyze source code to discover, validate, and propose fixes for security vulnerabilities at scale.
In our next installment, we’ll move beyond the basics and look at more advanced security concepts. We’ll explore using the Common Vulnerability Scoring System (CVSS) to triage more complex exploits based on the probability and impact of the attack, and assess how (or if) AI can truly help with these more complicated tasks.