MiniMax M3's New Attention: MiniMax Sparse Attention (opens in new tab)
plus more about FlashMemory-DeepSeek-V4, Trajectory-Refined Distillation, Test-Time Gradient Guidance, and End-to-End Context Compression at Scale
Read the original articleplus more about FlashMemory-DeepSeek-V4, Trajectory-Refined Distillation, Test-Time Gradient Guidance, and End-to-End Context Compression at Scale
Read the original article