MiniMax M3 Decodes 1M Tokens 15x Faster — and It Shouldn’t Be This Cheap (opens in new tab)

Last Updated on June 3, 2026 by Editorial Team Author(s): Chew Loong Nian – AI ENGINEER Originally published on Towards AI. MiniMax M3 Decodes 1M Tokens 15x Faster — and It Shouldn’t Be This Cheap On June 1, a Shanghai lab quietly shipped a model that decodes a 1-million-token context 15.6x faster than its own previous generation — and charges you roughly 8% of what Claude Opus costs to do it. I spent two days poking at MiniMax M3 through the API, and the part that actually rewired how I thin...

Read the original article