I built AI DevKit because I wanted a workflow that makes AI coding feel less random and more efficient. But I also know from experience that AI looks great when you demo it. The problem only surfaces when you rely on them to ship something non-trivial.
I made a rule for myself. I would use AI DevKit to build the features inside AI DevKit. If the workflow breaks, the product is not ready.
About AI DevKit:
If you have not seen AI DevKit before, think of it as a way to work with AI without giving up control. Instead of treating AI as a black box that generate code, AI DevKit forces clear intent, explicit constraints, and tight feedback loops, so AI can execute well-defined work while humans stay responsible for direction and decisions. It can work well with any AI Coding Agents that you like, such as Cursor, Claude Code, Codex, Antigavity, OpenCode, etc.
This post is a build log of that experience, and what it taught me about where AI helps and where humans still need to steer.
My goal is to turn AI DevKit into a tool that makes building with AI more effective through clearer intent and sharper inputs. By reducing the amount of steering engineers need to do, we can progressively hand off execution to AI, while humans stay focused on deciding what to build and defining the right constraints.
The problem I was trying to solve
Every time I asked an agent to implement something, I retyped the same rules:
- Always return Response DTOs for APIs.
- Validate input at the boundary.
- Follow our folder structure.
- Avoid introducing new libraries without a reason.
I could paste these into prompts, or even add them as rules, but neither felt right.
Rules tend to trigger only in very specific contexts, often tied to certain files or patterns. You cannot realistically cover every case that way. Adding more rules also does not scale well. It increases complexity and still leaves gaps.
Some of these rules are also personal preferences. They make sense for how I work, but I would not want to enforce them at a project level. Others are not really rules at all. They are knowledge about the product, its constraints, or the tradeoffs we have already made.
Prompts and rules both fall short here. Prompts are ephemeral and disappear after each task. Rules are rigid and incomplete. Neither is a good place to store evolving engineering knowledge that needs judgment and context.
What I wanted was much more mechanical:
- store small, precise rules once
- retrieve them automatically when relevant
- apply them consistently across tasks
Not memory as chat history. Memory as engineering guidelines that actually get used.
That framing did not come from AI. It came from me being annoyed at repeating myself.
First assumption: memory is just storage and retrieval
At the beginning, I thought the feature would be straightforward.