Latest open models (#15): It’s Qwen's world and we get to live in it, on CAISI's report, & GPT-OSS update
interconnects.ai·9h
Flag this post

Before getting into the latest artifacts, there are a couple of pieces of crucial open ecosystem we have to cover.

First, the Center for AI Standards and Innovation (CAISI) released a report that observed the ecosystem and evaluated DeepSeek 3.1 against leading closed models. The evaluation scores they highlighted show some discrepancy with accepted results in the community. While MMLU-Pro, GPQA and HLE are close to the self-reported scores from DeepSeek and within usual error bars1, the SWE-bench Verified scores are off by a wide margin due to a wea…

Similar Posts

Loading similar posts...