DeepSWE v1.1 (opens in new tab)
DeepSWE measures frontier coding agents on original, long-horizon software engineering tasks.
Read the original articleDeepSWE measures frontier coding agents on original, long-horizon software engineering tasks.
Read the original article