Audit AI-Generated Tests: Half of Green CI Proves Nothing (opens in new tab)
To audit AI-generated tests, score how many mirror the code instead of checking it. Green CI proves your tests agree with the code, not that it is correct — and when one agent writes both, they often just mirror it. mirror_audit.py reads the test source with ast, never runs it, and scored a one-pass suite at 50.0%, exit 1. AI disclosure: I drafted this with an AI writing assistant. The tool, the three fixtures, and every number below come from a real local run on Python 3.13.5, stdlib only. I...
Read the original article