What 1,000+ Harness Experiments Taught Me About Self-Improving Agents (opens in new tab)
Project Repository: So I recently wanted to see whether an AI agent could self-improve a harness to solve terminal bench tasks. To align on the definitions, “harness” means the system (e.g. Claude Code, Codex, ChatGPT web interface etc…) wrapping around the model (e.g. GPT 5.5, Claude Opus 4.7 etc…)...
Read the original article