Building dev tools with AI: the product decisions that need to be yours—Martian Chronicles, Evil Martians’ team blog

If you’re shipping with AI (which can mean generating everything from mockups to working code) you know it works. When something breaks, the problem usually isn’t the AI. Instead, it’s the decisions that were made for it. For dev tools and professional apps, the most expensive mistake is building something that doesn’t fit how people actually work.

Decision debt is the cost of shipping without knowing if you’re solving the right problem. Because building with AI feels “free”, it makes this debt accumulate faster. By the time you notice, it’s already in the code. Plowing through the development process without validating some key decisions is where things get down the wrong path.

Working with startups, I’ve come to recognize the key stages where human decisions matter most: …

Working with startups, I’ve come to recognize the key stages where human decisions matter most: understanding the problem, ideating solutions, scoping what to build, and validating with users.

There are established thinking methods that help us move forward at these points. Inspired by design thinking and product frameworks, I’ve tried to collect actionable advice. Most of these suggestions are low-barrier to learn and use, making them good for non-designers, and they also correlate to technical mental models, so there will be a wide overlap for a big selection of roles to implement these.

Book a call

Hire Evil Martians

Have a vibe-coded project? Get experienced developers involved from the get-go.

Stage 1: Understand the problem

AI surfaces findings easily: categorizing, summarizing, and connecting the dots. It certainly feels like research!

So, say you pipe what you have into AI (support tickets, interview notes, Reddit threads, Discord messages) and generate patterns in minutes. But without the right prompts and context, AI defaults to generic patterns. Depending on what you prompt, you could get 10 or 100 “insights” back with no signal about which ones matter for your users. The human part here involves thinking through insights on your own connects you to your vision, context, and priorities. That makes them precious, compared to the synthetic findings.

For dev tools in particular, invisible friction is everywhere: edge cases users have learned to live with, constant switching between tools, integration problems with things you’ve never heard of.

So, where to dig deeper? What’s the number one problem? Where’s the quick win? AI can surface candidates, but can’t tell you which ones matter most for your users. You need structure to see what matters.

Start with problem interviews. Before you generate anything, talk to people who have the problem. Not users of your product—people who experience the pain you’re trying to solve.

Five questions, fifteen minutes:

“What were you trying to do when this became a problem?”
“What have you already tried?”
“What’s the worst part?”
“How are you solving it now?”
“What would ‘fixed’ look like?”

Don’t pitch or explain your idea. Just listen. The goal is to hear the problem in their words, not to validate your assumptions.

Pro tip: No users yet? Find five people complaining about the problem on Reddit, Discord, or GitHub Issues and DM them. Most will talk for 20 minutes if you’re genuinely curious—they want someone to understand.

This takes 15–20 minutes per conversation. Five conversations often surface context AI summaries miss—tone, hesitation, the workarounds people forget to mention.

Also useful at this stage:

Journey mapping — map the full user workflow across stages (discover, setup, daily use, troubleshooting). Start with assumptions and mark uncertainty with ”?” — the gaps become your research agenda. Reveals where friction lives outside your product.

Ishikawa diagram — visualize root causes by drawing the problem as a “fish head” with branches for cause categories: people, process, tools, environment. Walk through each branch asking “what contributes to this?” Turns abstract problems into a visible cause-effect map you can act on.

Competitive analysis — go beyond feature comparison tables. Ask users “What do you use TODAY to solve this?” The answer is rarely a direct competitor—it’s spreadsheets, scripts, Slack workarounds, or doing nothing. Dig into app store reviews and Reddit threads for competitor products: what do users complain about? What’s missing? Your real benchmark isn’t the product you imagined competing with—it’s understanding the gaps others haven’t filled.

Stage 2: Ideate solutions

AI generates options instantly. Need three onboarding flows? Done. Five error-handling approaches? Here they are. The output looks considered.

But there’s a problem: the quality of AI output depends on what you feed it. Without your Stage 1 research, AI only has generic patterns to draw from—and for dev tools, generic means consumer-app defaults that miss how your users actually work.

The gap isn’t in generating options—it’s in generating the right options. The ones specific to what you learned in Stage 1. This is where human creativity matters: reframing the problem into opportunities worth exploring.

Start with “How Might We” questions. Take your biggest insights from Stage 1 and reframe them as opportunity questions.

Format: “How might we [verb] for [user] so that [outcome]?”

“Users abandon setup” becomes “How might we make setup feel like progress for teams with existing CI pipelines?”

“Config errors break production” becomes “How might we catch config mistakes before they ship?”

Write 5–10 HMW questions. Each one opens a different solution space without committing to a direction. The human judgment: which questions are worth exploring? Which reframe the problem in a way that matches what you learned about users?

Pro tip: You don’t want to be too broad (“How might we improve onboarding?”) or too narrow (“How might we add a tooltip?”). The sweet spot is specific enough to be actionable, open enough to allow multiple solutions.

This takes 20–30 minutes and you’ll generate more useful directions than an hour of AI prompting.

Also useful at this stage:

Working Backwards — write the press release or announcement before building. Start from the customer outcome: what will they be able to do? Why will they care? Forces you to articulate value before you get lost in implementation.

Reverse Brainstorming — ask “How could we make this problem worse?” instead of solving it directly. List everything that would break the experience: slow load times, confusing error messages, impossible config. Then flip each into a solution. Think of it as threat modeling for product ideas—devs already think this way about security and reliability.

Abstraction Laddering — ask “why?” to go broader (what’s the bigger problem?) or “how?” to go narrower (what’s the specific solution?). Helps you find the right level of the problem to solve. Too abstract and you’re boiling the ocean. Too specific and you’re building a feature, not solving a problem.

Stage 3: Scope what to build

Scope is key because AI makes generating options effortless. (Need three approaches to error handling? Done. Onboarding variants? Here’s a wizard, a checklist, a tutorial.)

AI operates strictly within the context you provide. If you don’t explicitly constrain it with your Stage 1 findings, it defaults to standard patterns—solutions that might work for a general audience but fail for expert developers. Properly scoping what you’re building makes the output considered, gives the team something to react to, and you’re moving.

For dev tools, there are specific gaps to note: AI doesn’t know your users configure everything via YAML and will never open a setup wizard. It doesn’t know your onboarding needs to work when users already have CI pipelines and team permissions in place.

The result: options that look right but aren’t specific. Average solutions to average problems. In the world of dev tools, average doesn’t fit real-life workflows, and thus, gets ignored.

Which direction fits how your users actually work? AI can’t tell you! That’s the human call. This is where you apply what you learned in Stage 1: your user insights become the filter. Focus comes into play here, since scoping narrows options and requires prioritizing, cleaning up, and making precise decisions.

Start with Now / Next / Later. Three buckets. Force every option into one.

Now: What must we validate or build in the next 2 weeks?
Next: What comes after, assuming Now works?
Later: What’s interesting but not urgent?

No “maybe” pile. No “high/medium/low” that means nothing. Just three buckets with clear time horizons.

The human judgment: what goes in “Now”? Use what you learned in Stage 1. The option that addresses your biggest user pain, tests your riskiest assumption, or unblocks everything else—that’s Now.

Pro tip: If everything feels like “Now,” you haven’t made a decision. Force-rank until only 1–2 items remain. The pain of choosing is the point.

This takes 30 minutes with your team, or 15 minutes solo.

Also useful at this stage:

Assumption Mapping — for each option, write down what has to be true for it to work. “Users will complete a setup wizard.” “Teams will invite colleagues through the dashboard.” Now stress-test: which assumptions are riskiest? Plot importance vs. uncertainty. High-high = test before building.

100 Dollar Test — everyone gets $100 to distribute across options. Forces real tradeoffs—you can’t put $25 on everything. Shows where the team actually believes value is, not what they say in meetings. Takes 10 minutes and cuts through analysis paralysis.

Impact vs. Effort matrix — the one 2×2 you need. Plot options by impact (from your Stage 1 insights, not generic “value”) vs. effort (honest estimate). Top-left quadrant = do first. Bottom-right = kill. Be honest about effort—developers consistently underestimate.

Stage 4: Validate and learn

Speed without validation is just faster failure (and not the “good” kind of failure). AI doesn’t know if users will understand your flow, find the button, or trust the output. It doesn’t know if your error messages help or confuse, if your defaults make sense, or if the thing you built solves the problem you identified.

For dev tools, the risks compound: a confusing CLI flag ships to thousands of developers, a bad default breaks production configs, a workflow that made sense in your head fails in real environments.

Start with think-aloud testing. Watch real users use your prototype while they verbalize their thoughts. “I’m looking for the config option… I expected it to be here… I’m confused why this is asking for my API key…”

Don’t explain. Don’t help. Just watch and listen. Their hesitation, confusion, and wrong turns are your data.

Five users typically surfaces the most common friction. Each session takes 20–30 minutes. You’ll learn more in an afternoon of testing than a week of internal debate.

Pro tip: Silence is hard. You’ll want to help when they struggle. Don’t. Their confusion is exactly what you need to see. If you explain, you’ve lost the signal.

For dev tools: test the CLI help text, the error messages, the config file format. The parts that feel “obvious” to you are where users struggle most.

Also useful at this stage:

Waitlist validation — before building, test demand with a landing page. One page: problem statement, proposed solution, signup form. Track your conversion rate—it’s your first real signal of interest. Low conversion? Either the problem isn’t painful enough or your pitch isn’t landing. Real interest beats hypothetical “yeah, I’d use that.”

Sean Ellis Test (PMF Survey) — after users have tried your product, ask one question: “How would you feel if you could no longer use this product?” Options: Very disappointed / Somewhat disappointed / Not disappointed. If 40%+ say “Very disappointed,” you have product-market fit signal. Below 40%? Keep iterating.

Dogfooding — use your own product daily, in real workflows. Not a demo run—actual work. The bugs you find at 11pm when you need the tool to work are different from the bugs QA finds. GitLab, Slack, and Atlassian all ship features internally first. For dev tools especially: if you won’t use your own CLI, why would anyone else?

The 10% rule

Finally, here’s a practical heuristic: if AI saves you 90% of development time, spend those savings on decision validation and customer research. Built something in 2 days instead of 20? Dedicate some time to assumption mapping and a think-aloud session.

A few hours of validation beats weeks of building the wrong thing.

At the end of the day, it’s us humans that still have the responsibility for the most important decisions in our product. Make sure you’re validating them.

Let’s recap with a quick checklist: is your team covering these in your current project?

#1: Understand the problem:

Have you talked to people who experience this problem (not just users of your product)?
Do you know how they solve it today—spreadsheets, scripts, workarounds, or nothing?
Can you explain the root cause, not just the symptom?

#2: Ideate solutions:

Did you generate multiple directions before committing to one?
Are your options specific to what you learned about users, or generic patterns?
Have you reframed the problem as an opportunity worth solving?

#3: Scope what to build:

Can you name what’s in “Now” and defend why it’s not in “Later”?
Have you identified the riskiest assumptions that need testing?
Did the team make real tradeoffs, or is everything still “high priority”?

#4: Validate and learn:

Have you watched real users try to use it (not just asked for opinions)?
Do you have signal on whether people want this—behavior, not just words?
When something’s off, do you know where to look?

Stage 1: Understand the problem

Journey mapping — map the full user workflow across stages (discover, setup, daily use, troubleshooting). Start with assumptions and mark uncertainty with ”?” — the gaps become your research agenda. Reveals where friction lives outside your product.

Ishikawa diagram — visualize root causes by drawing the problem as a “fish head” with branches for cause categories: people, process, tools, environment. Walk through each branch asking “what contributes to this?” Turns abstract problems into a visible cause-effect map you can act on.

Stage 2: Ideate solutions

Working Backwards — write the press release or announcement before building. Start from the customer outcome: what will they be able to do? Why will they care? Forces you to articulate value before you get lost in implementation.

Stage 3: Scope what to build

100 Dollar Test — everyone gets $100 to distribute across options. Forces real tradeoffs—you can’t put $25 on everything. Shows where the team actually believes value is, not what they say in meetings. Takes 10 minutes and cuts through analysis paralysis.

Stage 4: Validate and learn

The 10% rule

Similar Posts