When pytest Said "Passed," It Was Lying (opens in new tab)
How a polluted virtual environment made my green tests meaningless Part of the ForgeFlow series — building a coding agent that runs its execution loop locally on an M5 Max, and writing down what actually breaks. Planning runs on Claude; code generation runs on a local model via Ollama, test-driven inside a Docker sandbox. For a few days, I made decisions on top of a number that wasn't true. The number was 186 passed. It came out of pytest, green, at the bottom of the terminal, the way it had ...
Read the original article