Show HN: New Benchmark from SWE-bench team is 0% solved

programbench.com · · Covered in 3 articles from 3 sources

Kimi K2.7-Code: open-source coding model with better token efficiency

huggingface.co··Hacker News, r/LocalLLaMA

Latest open artifacts (#21): Open model bonanza! Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, GLM-5.1 & others. On CAISI's V4 assessment.

interconnects.ai

·

In other languages

Новый бенчмарк по кодингу для LLM ProgramBench: 9 топ моделей, 200 задач, 248 тысяч тестов. Полностью решённых