Show HN: New Benchmark from SWE-bench team is 0% solved (opens in new tab) ⚡Code Optimization
ProgramBench evaluates whether language models can rebuild programs from scratch.
Read the original articleProgramBench evaluates whether language models can rebuild programs from scratch.
Read the original article