Back to dougclayton's feed

Show HN: New Benchmark from SWE-bench team is 0% solved (opens in new tab) ⚡Code Optimization

programbench.com··Lobsters, Hacker News, r/singularity·Covered in 3 articles·Open original

ProgramBench evaluates whether language models can rebuild programs from scratch.

Read the original article

Sign in to keep reading the full article.

Covered in 3 articles

Kimi K2.7-Code: open-source coding model with better token efficiency

huggingface.co··Hacker News, r/LocalLLaMA

Latest open artifacts (#21): Open model bonanza! Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, GLM-5.1 & others. On CAISI's V4 assessment.

interconnects.ai

·

In other languages

Новый бенчмарк по кодингу для LLM ProgramBench: 9 топ моделей, 200 задач, 248 тысяч тестов. Полностью решённых