If your agent can delete user data, your prompt isn’t a prompt, it’s a contract

Most agent demos look great… until the agent touches something irreversible.

A DSAR agent (GDPR/CCPA: export my data / delete my data) is the perfect stress test because it forces a hard truth:

In high-stakes agents, the instruction is the product. Not your tool calls. Not your planner. Your instruction contract.

Where good prompt design still breaks

Most AI engineers I know already do the “obvious” things:

they separate system vs user instructions
they constrain output (JSON / schema)
they add validators
they use tools instead of free-text guessing
they include safety language

And yet… the agent still fails in ways that feel unfair:

scope creeps (“everything you have on me” expands over time)
tool outputs leak fields you didn’t intend to expose
“helpf…

Most agent demos look great… until the agent touches something irreversible.

A DSAR agent (GDPR/CCPA: export my data / delete my data) is the perfect stress test because it forces a hard truth:

In high-stakes agents, the instruction is the product. Not your tool calls. Not your planner. Your instruction contract.

Where good prompt design still breaks

Most AI engineers I know already do the “obvious” things:

they separate system vs user instructions
they constrain output (JSON / schema)
they add validators
they use tools instead of free-text guessing
they include safety language

And yet… the agent still fails in ways that feel unfair:

scope creeps (“everything you have on me” expands over time)
tool outputs leak fields you didn’t intend to expose
“helpful” behavior overrides policy on edge cases
deletion gets planned too early or too confidently
auditability is missing when things go wrong

It’s not because the prompt is bad. It’s because the instructions aren’t written like an executable contract.

DSAR is where that gap shows up immediately.

The design move that changes everything

Write an Instruction Contract (like an API spec for behavior) Not a long essay. A structured, testable contract that the agent must follow every time.

Here’s the practical design process I use, with examples.

1) “Non-negotiables” (hard constraints, not vibes)

This isn’t “be safe.” This is “if X, do Y, always.”

Example constraints:

If the user requests data about someone else → refuse.
Never export raw logs; only return redacted summaries.
Never execute deletion unless: identity verified, AND explicit confirmation token provided.

This removes the interpretation layer where drift happens.

2) Scope definition (prevent scope creep, not just scope clarity)

Most teams write scope once, but scope still creeps as the agent becomes “more capable.”

So scope needs to be defined as:

what counts
what doesn’t
how to behave when ambiguous

Example scope: Include:

profile fields (name, email, phone)
orders & invoices
support tickets / chat transcripts
marketing preferences
device identifiers if collected

Exclude:

internal employee notes
aggregated analytics dashboards
records containing other users’ identifiers
internal security logs that expose system internals

Ambiguity rule example: If the system returns mixed-user records –> stop and escalate.

This is how you keep the agent from “expanding the mission.”

3) Tool boundaries (assume tools will return more than you asked for)

This is the part that separates “it works in demo” from “it’s safe in prod.”

Even if you ask for allow-listed fields, tools often return extra fields.

So your contract must explicitly state:

allowed fields
disallowed fields
what to do if disallowed fields show up

Example: CRM policy

Allowed: name, email, phone, created_at, last_login
Disallowed: notes, internal_tags, fraud_flags

Required behavior: If disallowed fields appear → discard + log the violation.

Example: Logs policy

Only return categories + date ranges + redacted snippets
Never return raw logs

Tool boundaries prevent accidental leakage and “quiet policy breakage.”

4) Output shape (make auditability default, not optional)

Most engineers constrain output, but the real win is making it audit-friendly.

Example output skeleton:

identity_verification: method + confidence + what matched
data_found[]: system, records found, date range
redactions_applied[]: what removed, why
deletion_plan[]: what will be deleted + dependencies
user_summary: plain-language summary

Now your agent produces an artifact that can be reviewed and diffed.

5) Stop rules (where reliability is earned)

This is where senior-ish prompt design shows up: knowing when to halt.

Examples:

Identity confidence < 0.9 –> stop; request one more proof.
Multiple matching user accounts –> stop; escalate.
Deletion requested but no explicit confirm token –> stop; ask user to confirm.
Any request expanding scope beyond requester identity –> refuse.

Stop rules are what keep your system from “getting creative.”

Why this is hard (and why it’s worth showing)

This isn’t prompt-writing as copywriting. It’s prompt-writing as system design:

policies –> constraints
constraints –> predictable behavior
predictable behavior –> auditable output

That’s the difference between “agent that looks smart” and “agent you can trust.”

The part that should be automated (without removing judgment)

The thinking stays human. But the repetitive scaffolding shouldn’t be.

What’s worth automating:

generating instruction contract templates per workflow
generating step-specific prompts + output schemas
enforcing consistent boundary rules across tools
versioning + diffs (“what changed?”)
running “golden request” regression checks

You still decide the rules. Automation just keeps the system consistent as you iterate.

HuTouch If you want to see how we structure prompt scaffolding, here’s the mockup: HuTouch Prompt Design Workflow

Live teardown (Dec 30)

We’re doing a live session on Dec 30th, 8:30am EST / 7:00pm IST. We’ll break down instruction contracts + stop rules live. Sign-up to get the invite to the event.

Similar Posts