Key takeaways:
➤ Impressive performance on agentic tasks: @Kimi_Moonshot’s Kimi K2.5 achieves an Elo of 1309 on our GDPval-AA evaluation, behind only OpenAI and Anthropic models. Kimi K2.5 leaps ahead of GLM-4.7, DeepSeek V3.2 and Gemini 3 Pro. GDPval-AA is our leading metric for general agentic performance, measuring the performance of models on realistic knowledge work tasks such as preparing presentations and analysis. Models are given shell access and web browsing capabilities in an agentic loop via our reference agentic harness called Stirrup.
➤ Native multimodality for the first time: Kimi K2.5 is the first flagship model from Moonshot to support multimodal (image and video) inputs. This is the first time that the leading open weights model has supported image input, removing a cri…
Key takeaways:
➤ Impressive performance on agentic tasks: @Kimi_Moonshot’s Kimi K2.5 achieves an Elo of 1309 on our GDPval-AA evaluation, behind only OpenAI and Anthropic models. Kimi K2.5 leaps ahead of GLM-4.7, DeepSeek V3.2 and Gemini 3 Pro. GDPval-AA is our leading metric for general agentic performance, measuring the performance of models on realistic knowledge work tasks such as preparing presentations and analysis. Models are given shell access and web browsing capabilities in an agentic loop via our reference agentic harness called Stirrup.
➤ Native multimodality for the first time: Kimi K2.5 is the first flagship model from Moonshot to support multimodal (image and video) inputs. This is the first time that the leading open weights model has supported image input, removing a critical barrier to the adoption of open weights models compared to proprietary models from the frontier labs. It represents significant differentiation for Kimi K2.5 compared to other open weights leaders including DeepSeek V3.2, GLM-4.7, MiniMax M2.1 and MiMo-V2-Flash. Kimi K2.5 scores 75% on the MMMU Pro visual reasoning benchmark, slightly behind Gemini 3 Pro but in line with GPT-5.2 and Claude Opus 4.5.
➤ Moderate cost to run Artificial Analysis Intelligence Index: Kimi K2.5 lands at $371 in Cost to Run Artificial Analysis Intelligence Index, more than 4x cheaper than Claude Opus 4.5 and GPT-5.2, but more than 5x more expensive than DeepSeek V3.2 and gpt-oss-120b.
➤ Moderate token usage: Kimi K2.5 demonstrates token usage comparable to other models in the same intelligence tier, …