not much happened today | AINews (opens in new tab)
**NVIDIA** released **Nemotron 3 Ultra**, a fully open **550B MoE** model with **55B active parameters** and **1M context**, optimized for long-running agent tasks with up to **5x speedup** and **30% cost reduction**. It features hybrid Mamba/attention, LatentMoE, native MTP, and was pretrained on **20T tokens** using NVFP4 low-precision format. Benchmarks show strong performance with **47.7 Intelligence Index** and **400+ output tokens/sec**. The model is supported across major serving platf...
Read the original article