Title:Reasoning Models Ace the CFA Exams
Abstract:Previous research has reported that large language models (LLMs) demonstrate poor performance on the Chartered Financial Analyst (CFA) exams. However, recent reasoning models have achieved strong results on graduate-level academic and professional examinations across various disciplines. In this paper, we evaluate state-of-the-art reasoning models on a set of mock CFA exams consisting of 980 questions across three Level I exams, two Level II exams, and three Level III exams. Using the same pass/fail criteria from prior studies, we find that most models clear all three levels. The models that pass, ordered by overall perfo…
Title:Reasoning Models Ace the CFA Exams
Abstract:Previous research has reported that large language models (LLMs) demonstrate poor performance on the Chartered Financial Analyst (CFA) exams. However, recent reasoning models have achieved strong results on graduate-level academic and professional examinations across various disciplines. In this paper, we evaluate state-of-the-art reasoning models on a set of mock CFA exams consisting of 980 questions across three Level I exams, two Level II exams, and three Level III exams. Using the same pass/fail criteria from prior studies, we find that most models clear all three levels. The models that pass, ordered by overall performance, are Gemini 3.0 Pro, Gemini 2.5 Pro, GPT-5, Grok 4, Claude Opus 4.1, and DeepSeek-V3.1. Specifically, Gemini 3.0 Pro achieves a record score of 97.6% on Level I. Performance is also strong on Level II, led by GPT-5 at 94.3%. On Level III, Gemini 2.5 Pro attains the highest score with 86.4% on multiple-choice questions while Gemini 3.0 Pro achieves 92.0% on constructed-response questions.
| Subjects: | Artificial Intelligence (cs.AI); Computation and Language (cs.CL); General Finance (q-fin.GN) |
| Cite as: | arXiv:2512.08270 [cs.AI] |
| (or arXiv:2512.08270v1 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2512.08270 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Yunzhe Chen [view email] [v1] Tue, 9 Dec 2025 05:57:19 UTC (2,651 KB)