I built a multi-agent loop where an adversarial Claude reviewer reads your actual codebase before approving plans (opens in new tab)
Large language models are surprisingly optimistic reviewers. Ask an LLM to review an implementation plan and it will often approve things that are objectively wrong: Non-existent file paths Incorrect function signatures Missing edge cases Broken assumptions about the codebase Incomplete testing strategies The problem is simple: the model is reasoning from its training data and the conversation context, not from your actual repository. I wanted something different. I wanted a reviewer whose de...
Read the original article