Show HN: New Benchmark from SWE-bench team is 0% solved (opens in new tab) 🤖Developer productivity, AI-assisted coding, workflow automation
ProgramBench evaluates whether language models can rebuild programs from scratch.
Read the original articleProgramBench evaluates whether language models can rebuild programs from scratch.
Read the original article