Show HN: New Benchmark from SWE-bench team is 0% solved (opens in new tab) ⚡Code Generation 6 articles covering this post
ProgramBench evaluates whether language models can rebuild programs from scratch.
Read the original articleProgramBench evaluates whether language models can rebuild programs from scratch.
Read the original article