17 min readJust now
–
Introduction
I’ve had LLMs write me entire features in minutes and I’ve had them generate thousands of lines of unusable garbage. The difference is usually not the model, but rather whether I treated it like a slot machine or like a fellow engineer.
For a bit of context, a few days ago, I used OpenCode to add a feature to Puppypal, a puppy training app I am building to better raise our puppy Bubbles. I described what I wanted in a few sentences, it made about a dozen changes in the code, and announced that it was done with the tasks.
I happily started testing it, and after clicking around some, I realized it had not only not finished** **implementing what I had described, it also broke a few other features because it made changes to data structures …
17 min readJust now
–
Introduction
I’ve had LLMs write me entire features in minutes and I’ve had them generate thousands of lines of unusable garbage. The difference is usually not the model, but rather whether I treated it like a slot machine or like a fellow engineer.
For a bit of context, a few days ago, I used OpenCode to add a feature to Puppypal, a puppy training app I am building to better raise our puppy Bubbles. I described what I wanted in a few sentences, it made about a dozen changes in the code, and announced that it was done with the tasks.
I happily started testing it, and after clicking around some, I realized it had not only not finished** **implementing what I had described, it also broke a few other features because it made changes to data structures that other parts of the code were using. After struggling a bit to try and figure out what went wrong and fix it, I decided to just `git reset` and apply proper engineering.
That experience is quite frustrating, so over time I have moved, like many others, towards a more robust system for using LLMs.
I’ve realised that if I treat the agent as a collaborator, give it a system, a procedure to follow and all the necessary information it needs, it actually delivers very high quality software.
So this article is my way of moving away from vibe coding and towards vibe engineering — a simple flow I use for AI-first development and a series of techniques that enable it. I hope that by the end of reading this, you will want to try it too.
If you want a concrete story of this in action, you can also read my earlier article about turning our puppy Bubbles’ training guide into a mobile app with vibe engineering.
Vibe Engineering
So how do we go from vibe coding to something that looks like proper engineering?
When I say “vibe coding” here, I mean throwing loose, underspecified requests at your agent and hoping it guesses what you meant. Sometimes that works, but it often leaves you with brittle code and half‑finished features.
By “vibe engineering”, I mean working with your agent the way you would with another engineer: agree on the goal, share the context, write down a plan, and put guardrails in place before you let it touch the codebase.
Most people will agree that describing what you want as an end result without thinking through how it will be implemented or what it does to code quality, architecture, or maintainability, is a bad idea.
That’s the software engineering equivalent of jumping straight into a foreign codebase with no plan and saying “no problem, I’ll just wing it”. That approach fails without AI, and with it, it fails faster and harder.
To curb that risk, you must treat your LLM agent like a junior developer who’s pretty good at following instructions, but who also has root access to your codebase.
You would not give a junior developer a vague one-line ticket, no docs, and no tests, then act surprised when they ship something brittle. You give them context about the product and architecture, a clear description of the change you want, maybe a plan you agree on together, and establish guardrails for things they really shouldn’t be able to do, like accessing your production database.
The same applies to agents. If you want them to work on non-trivial tasks across your codebase, you need to raise the standard of how you work with them.
Levels of AI Development
Before diving into the flow, it helps to set a baseline: what level of AI assistance are we even talking about?
Over the last few years I have developer at all of these levels, and I have landed on using agents (like OpenCode or Claude Code) almost all the time. For me, they strike a good balance between autonomy and control and solve all my needs.
Level 1: Basic
This is the traditional way of coding. You write, debug, and manage everything using your usual tools. I group all “non-LLM” and basic AI workflows here, including tab autocompletion.
In this tier, the tool looks only at the immediate context, like the current line or function, and suggests what you were probably about to type. Tools like Tabnine, standard autocomplete, and smart refactoring features live here.
Level 2: Copy-pasting in the chat
Press enter or click to view image in full size
Screenshot of using ChatGPT for coding | Kamen Zhekov
This is the first real “interaction” we got with an LLM in a chat window, our first taste of what they were capable of. I remember getting goosebumps the first time I used ChatGPT, but that hype died pretty fast.
You copy in files or snippets and ask for help in natural language, but the model does not have access to your system, so it can only tell you what to do (edit this function, run this command) and can only read whatever you curate for it beforehand. This works for small problems, but the loop gets tedious once your project grows.
Level 3: Guiding an agent
Press enter or click to view image in full size
Screenshot of using Claude Code as an agent from your CLI | Kamen Zhekov
Now things get interesting. Agents either integrate with your IDE, be standalone or live in your terminal, but most importantly they can read whatever they want in the folder they’re running and make changes across your whole codebase.
If you give the agent a task, it will read and write down its to-do list, try to understand your code, implement new business logic, review its own code, optimize existing functions, and summarize what it did to you. Marvelous.
This is the level I use most of the time. Tools like Claude Code, Codex CLI, OpenCode, and others live in the terminal and let you keep control while still offloading the heavy lifting.
Level 4: Hands-off engineering
Press enter or click to view image in full size
Screenshot of codex agent doing a PR review in a GitHub repository | Kamen Zhekov
The idea here is that you hand an agent a high-level goal and it handles everything: breaking down the plan, writing code, testing, iterating, and opening a PR. It could also have the role of an autonomous PR reviewer. Maybe there are multiple agents at work for this goal, maybe it’s only one, but you’re not doing any orchestrating, only minimal interactions for high-level decisions. This is where tools like Devin or GitHub’s autonomous agents live. I have not adopted this level for most of my work yet, as projects where I care about quality and maintainability still need me in the loop.
The Engineering Flow
At its core, AI-first development is about giving an agent the right information and a series of instructions to act on. The agent follows these instructions, uses tools as needed, and returns a response when it considers a task done.
Understanding this is the first step to using agents correctly, but we also have to think about what sort of information the agent needs to work.
Here’s the system at a glance
Press enter or click to view image in full size
Diagram of AI-First Engineering Flow | Kamen Zhekov
In your codebase, you have 3 key files that explain what your application does and who is it built for — your product, how it’s doing it on a technical level — your architecture, and how your agent can work in your codebase.
With that information, you give your agent goals that it breaks down into a series of tasks that it then executes. When it’s finished with a task or a goal, you ask it to review its own work, identify issues and correct them.
Define Your Product
First, you’d write down what problem you are tackling, who the users are, and what your goal is. It doesn’t need to be extensive, but it should give the agent the business context of your software, so that when it works on it, its reasoning will be grounded within that context.
For Puppypal, the product.md document looks like this:
## PitchPuppypal is a local-first mobile companion app that helps puppy owners raise a confident, well-adjusted dog by providing age-appropriate daily routines, milestone-based adventures with encouraging gamification, and an in-app training guide.## Users- First-time puppy owners: Want clear, confidence-building guidance and an easy way to track progress.- Experienced owners with a new puppy: Want structure, flexibility, and better tracking than ad-hoc notes.## ProblemPuppy owners often don't know what to prioritize at each age, how to socialize systematically, or how to balance exercise limits with developmental needs.Missing early socialization opportunities (especially in the first months) and inconsistent routines can lead to avoidable behavior challenges later.Our Solution: Provide a gentle daily plan that adapts to the puppy's exact age, makes progress visible and motivating (puppy points, levels, mastery), and includes a searchable training guide with contextual tips.### Core Features- **Multi-puppy profiles:** Track multiple puppies on one device with clean separation of their data.- **Today view + selected date:** Today defaults to the current day; Journey can set a selected date so users can review/log past days with the same UI.- **Daily routines (habits):** Session-based logging (multiple entries per day) for walk/play/training/rest and count-based logging for potty.- **Adventures (milestones):** Reflection-based logging (confident/wary/scared), retry gently flow, mastery progression, and mastery badges.- **Puppy points + leveling:** Global progression system that celebrates growth and reinforces consistency.- **Auto-fill planning:** Automatically populate the selected day with age-appropriate required habits and a configurable number of suggested adventures.- **Milestones dashboard:** Visual progress across milestone categories, with drill-down into activities.- **Journey history:** Calendar-based history and progress over time.- **Training guide:** Searchable guide content, including a "Current Age Guidelines" view.- **Customization:** Create custom activities and enable optional milestone modules.- **Backup & restore:** Export/import device data as an unencrypted file via the platform-native share/export flow.### Advanced Features- **Full localization:** Everything user-visible is localized (EN/FR/NL), including guide content and built-in activity names/categories.- **Notifications:** Local reminders (potty, daily check-in, retry) with per-puppy settings.
Simply writing that down and passing it to the agent when it’s working on your project adds a lot of value because the changes the agent makes will take the business context into account.
Define Your Architecture
Define the technical requirements, decisions, and constraints. It can be anything that’s related to how your application is built — what platforms is your app built for, what programming language is used, what database and which frameworks are used, where is it hosted, etc.
Similar to the product context, you want your agent to always be grounded in the technical context of your application. If it’s missing that context, it may very well write changes that are incompatible or deviate from your code simply because it doesn’t have the information on how it’s built.
Get Kamen Zhekov’s stories in your inbox
Join Medium for free to get updates from this writer.
For Puppypal the architecture.md file looks like this:
# ArchitectureThis document captures the architecture and technical stack for Puppypal.## Platforms- Mobile: iOS + Android- Runtime: React Native via Expo- Product stance: local-first, offline-capable (no accounts/sync in v1)## Language- TypeScript## UI & Navigation- UI framework: React Native- Navigation: React Navigation (bottom tabs + stack/modals)- Styling: NativeWind- Icons: `lucide-react-native` to match the mockup style- SVG: `react-native-svg`## State Management- Global store: Zustand- Store responsibilities: orchestration only (UI calls store actions; UI does not talk to SQLite directly)## Persistence- Database: SQLite via `expo-sqlite`- ORM/query layer: Drizzle ORM (SQLite + Expo integration)- Full-text search: SQLite FTS5 (via raw SQL where needed)- Identifiers: use string IDs everywhere (UUID for custom records)- Derived data policy: do not persist per-day or total points; compute from session logs- Migrations: migration files checked into the repo; small, focused, reversible changes
Define the AGENTS.md
A well-written AGENTS.md file at the root of your project goes a long way in guiding your agent on how to work inside your codebase.
This is a dedicated file for agent-specific guidance, restrictions, and instructions. It is by design information you want your agent to know for every single request that is made.
It typically contains a short description of the project, an overview of the software architecture, any coding conventions you’d like it to adhere to, and some agent-specific guardrails. You can also use it to write down whatever else your agent always has to know, apply and take into account. In that regard, it might overlap a bit with your architecture or product documentation, but that’s also fine.
For Puppypal, the AGENTS.md file like this:
# PuppypalThis repo contains **Puppypal**: a local-first **puppy growth companion** mobile app.## Prime Directives1. Always write unit tests for any code you modify or introduce2. Always use proper typing3. Always run make check after code changes with escalated permissions that you request from the user4. Create self-documenting code with proper docstrings## What the app does- Guides puppy owners through age-appropriate daily routines (habits) and milestone-based adventures.- Tracks progress with puppy-points, mastery, and a lightweight leveling system.- Provides a searchable in-app training guide (offline, bundled content).- Works entirely offline (no accounts, no cloud sync).More extensive product documentation at: `agents/product.md`### Mobile app location- Expo app lives in `mobile/`- Run locally: `cd mobile && npm start`### Architecture (layered)The production app follows the layered structure from `agents/documentation/architecture.md`:- `domain/`: pure business logic only (no React, no SQLite)- `data/`: repositories + SQLite/Drizzle integration- `store/`: global orchestration state (Zustand)- `ui/`: screens/components, no direct DB access- `navigation/`: React Navigation setup (tabs + stack/modals)- `app/`: app bootstrap (providers, splash/fonts, global config)- `tests/`: app tests (unit, integration)More extensive architecture documentation at: `agents/architecture.md`...
I update the AGENTS.md quite regularly, and I rely on it for information the agent needs to know before any task it needs to tackle. This also keeps my own mental model of the project up to date.
Setting a Goal
You have now set the context for your product, its architecture and the guidelines for working in your codebase. It’s time to start implementing.
Press enter or click to view image in full size
Diagram of flow for agentic task execution | Kamen Zhekov
In order for your agent to implement what you want, you need to specify its goal. There are various ways of doing that, but the better you structure it and the more relevant information you give your agent, the better the results.
There are different approaches to implement process of gathering that information and specifying that goal, the most famous and recent one being spec-driven development. I actually use it quite often through Agent OS, although I’ve adapted it to fit my own needs. If you’re curious about it, you can check out their GitHub repository below, it’s free and open-source.
In my experience, using specs as a way to describe the goal of your agent fits most complex tasks I want tackled. For example, in Puppypal I wanted to migrate some React web views to the Expo mobile application.
I kicked it off with
Replace the placeholder Today screen with the real layout from the mockup
and iterated with the agent until we arrived at a spec.md for that task:
# Specification: Today View — React Native UI Parity from Mockup## GoalImplement the Today view in React Native (Expo) so it matches the mockup preview's layout and interactions, backed by a mocked/in-memory store so the screen is non-throwaway and ready to swap onto real domain + persistence later.## User Stories- As a puppy owner, I want Today to show the selected day's routine (habits, adventures, and summary) so I can follow and log my puppy's daily activities.- As a developer, I want the Today UI built from reusable components and a stable UI-facing store contract so we can replace mocked data with real repositories without rewriting the screen.- As a multilingual user, I want all UI chrome text in Today localized (en/fr/nl) so the app never shows untranslated interface strings.## Specific Requirements**Mockup-to-RN screen map**- Treat `agents/documentation/mockups/mockup-preview/src/` as the source of truth for Today layout + interactions.- Implement a RN Today screen to replace the existing placeholder under `mobile/src/ui/screens/` (e.g., `mobile/src/ui/screens/TodayScreen.tsx` or similar), matching the mockup structure: selected-date header, guideline chips, daily summary, habits strip, and adventures list.- Colocate Today-specific components under a feature folder, for example: `mobile/src/ui/features/today/*` (e.g., `TodayHeader.tsx`, `GuidelineChips.tsx`, `HabitStrip.tsx`, `AdventureList.tsx`).**Reusable UI component architecture**- Compose Today from small, reusable components: `Card`, `IconButton`, `Chip`, `SectionHeader`, `ListRow`, `EmptyState`, and a modal sheet/page layout.- Prefer feature-local components for view-specific pieces and keep truly shared primitives in `mobile/src/ui/components/`.- Use theme tokens and typography from `mobile/src/ui/theme/*` for colors/spacing/fonts; avoid hardcoding values in Today components.- Ensure touch targets are finger-friendly (~44px) and add `accessibilityLabel` to icon-only controls in Today header and habit controls.**In-memory UI store contract**- Expose Today-facing state/actions via the global in-memory store (Zustand) used across the app; keep mock data import behind a seed/adapter layer.- Required UI state for Today: - `activePuppyId` (string) - `selectedDate` (ISO date string) - `todayPlanItems` (list of habit/adventure descriptors for the selected date) - `habitSessions` (records of logged habit sessions for the selected date) - `adventureAttempts` (attempts/reflections for adventures listed in Today)- Required UI actions for Today: - `selectPuppy(id)` - `setSelectedDate(date)` - `toggleHabitSession(habitId, options?)` (start/stop or mark complete; opens duration/potty flows when needed) - `logPottyEvent(sessionId, details)` - `addAdventureToDate(activityId, date)` / `removeAdventureFromDate(activityId, date)` - `updateAdventureReflection(adventureAttemptId, reflectionText)` - `openModal(modalName, payload)` / `closeModal()`- Keep computed/derived values (e.g., daily summary totals, mastery markers, in-today flags) in selectors to avoid duplication across Today components....
This is pretty extensive and information-heavy, so while SDD is an amazing tool, it is not necessarily the best approach for every task you throw at your agent. Perhaps you want to simply change the color of a button, and then you might just default to vibe-coding it. But when you’re tackling complex changes, SDD is a very reliable way of doing that.
As a rule of thumb, I formulate goals as specs when it generally requires a bit of in-depth analysis or understanding, maybe it affects multiple layers or code files like business logic, or touches data models and involves writing migrations.
Splitting it in Tasks
The final step before your agent starts working towards its goal is to break the spec into a list of concrete tasks. A good task is specific enough that the agent can complete it in a focused way but still high-level enough to have meaning on its own.
A task would be for example writing a function that implements some business logic, but writing one line of code of that function is not a task.
For the spec.md above, this is what the** task.md** breakdown looks like:
# Task List: Today View## OverviewImplement the Today screen and its directly related store and modal flows so the Today tab matches the mockup preview in layout and core interactions. This file lists all tasks in sequence.- [ ] Verify minimal dependency surface and ensure no new unapproved native deps are required; if a native dep is needed, install via `npx expo install`.- [ ] Ensure `import 'react-native-gesture-handler';` is present in `mobile/index.ts` only if gesture-driven interactions are necessary.- [ ] Add or extend `mobile/src/store/useAppStore.ts` to expose Today-facing state/actions: `activePuppyId`, `selectedDate`/`dateKey`, `todayPlanItems`, `habitSessions`, `adventureAttempts`, and actions `setSelectedDate`, `selectPuppy`, `toggleHabitSession`/`startHabitSession`/`stopHabitSession`, `logPottyEvent`, `addActivityToDate`, `removeActivityFromDate`, `updateAdventureReflection`, `openModal`, `closeModal`.- [ ] Add `mobile/src/store/types.ts` entries if needed for Today view-model stability (puppy, habit, activity, session, attempt types)....- [ ] Implement `mobile/src/ui/screens/HabitDurationScreen.tsx` modal: capture duration, notes, potty details, and update `habitSessions` in store; ensure keyboard-safe layout.- [ ] Implement `mobile/src/ui/screens/AddActivityScreen.tsx` modal: minimal search/add flow (use `fuse.js` if available) and add-to-date action.- [ ] Implement `mobile/src/ui/screens/CustomActivityScreen.tsx` modal: simple form to create and add a custom activity to selected date (use `react-hook-form` if available, otherwise a controlled form).- [ ] Implement `mobile/src/ui/screens/ActivityDetailScreen.tsx` modal: show activity details and allow adding/editing reflection; updates should be visible in Today after close.- [ ] Register modal routes in `mobile/src/navigation/RootNavigator.tsx` and types in `mobile/src/navigation/types.ts` only if the existing navigation does not already expose these modals; prefer reusing current modal registration.- [ ] Wire all Today UI interactions to the store actions/selectors (date navigation, puppy switch, habit session updates, add/remove activity, reflection updates) and ensure per-puppy isolation.## Verification Commands- `cd mobile && npm run typecheck`- Manual: start app (`cd mobile && npm start`) and run the manual smoke checks listed above.## Notes- Default interaction rule: duration/potty habits open the Habit Duration modal; simple check-off habits toggle completion inline.- Use `date-fns` `format(date, 'yyyy-MM-dd')` for `dateKey` to avoid timezone off-by-one issues.- Do not import mockup preview assets directly; place any required fixture images under `mobile/assets/mockup/` and resolve with a static `getImageSource` helper.
When asking your agent to generate this plan, there are a lot of important things for it to take into account — the code it expects to edit for each step, what existing code can it reuse, how will it verify that its changes work and haven’t broken things, and so on. Those are the same questions you would ask yourself as a developer, and forcing the agent to think about them usually makes what it delivers way higher quality.
Verify and Correct
For me, this is arguably the most important part of this flow. Agents, like humans, rarely come up with a perfect solution off the bat. Defining your goal, breaking it down into tasks, and most importantly implementing the changes is error-prone, and pushing your agent to check what it has done and correct it if it identifies issues is very important.
It’s a feedback loop where:
- An agent starts implementing a task
- It runs tests & linters during edits and fixes failures
- It finishes the task and requests a review
- Another agent reads what was implemented and identifies issues
- The main agent reads the identified issues and tackles them
- Repeat until no critical issues are found
If you still weren’t convinced that automated tests and other automated checks were important, I hope this gives you a push to set them up properly for your AI agent.
Review
We’re not yet at the point where you can simply leave everything up to the agent, so once the tasks are done, you do a final review. Because the agent has been verifying and correcting its work, this review is usually much less painful than if you’re reviewing its “first draft”, but you still have to validate the changes. It might have implemented some business logic wrong, maybe something was not clear in the spec, maybe it missed a bug, you never know.
Trying It
If you are already playing with agents or you’re interested in trying what you’ve read about, I would encourage you to pick one small project you care about and try using the flow from this article for a few specs end to end:
- Install and set up OpenCode and Agent OS, and read a bit about how AgentOS works and is used here
- Set up a git repository for your project
- Ask your agent to run
plan-product.mdto create your product and tech stack documentation - When you have a concrete change in mind, ask your agent to run
shape-specto shape your first spec thenwrite-speconce you are happy with the goal you have described - Create the task list with
create-tasksand kick off the implementation withimplement-tasks - In a new OpenCode window (or by tagging the
@generalagent from the main chat) ask it to review the current git changes and split its findings in important and minor findings. - Pass the review to your main agent, and repeat until there are no important issues found or you are satisfied with the result
I use this flow for my work every day, but the project that made me dive into it a bit deeper was Puppypal, and if you want to read a bit more about it or just look at some cute puppies, head over to the article below.
The Path Forward
I do not think there is a single right way to do AI-assisted development. Different teams and codebases will need different approaches, and there is a real learning curve to developing with AI.
For me, applying the workflow described in this article has changed what I expect from coding agents. It delivers solid results, works across many kinds of problems, and fits really well into both my day job and my side projects.
None of these methods replaces the basics; they depend on them. Domain knowledge, product sense, and engineering craft still carry most of the weight. Agents help you move faster and explore more options, but they lean on that foundation to do anything useful. If you have no idea what you are doing, how can you automate your craft?
Thanks for reading and have a great day!