Show HN: New Benchmark from SWE-bench team is 0% solved (opens in new tab) ⚡Assembly Language 4 articles covering this post
ProgramBench evaluates whether language models can rebuild programs from scratch.
Read the original articleProgramBench evaluates whether language models can rebuild programs from scratch.
Read the original article