Updated LLM Benchmark (Gemini 3 Flash) (opens in new tab)
I evaluate llms by how well they play text adventures. The last update I made was when Haiku 4.5 was released. Now that Google has released a preview of Gemini 3 Flash, I had to run the benchmark again. Having seen how well Gemini 2.5 Flash performed in earlier benchmarks, I would have expected Gemini 3 Flash to blow the other models out of the water, which it did … almost.
Read the original article