🎭 Mixture of Experts - moyutianzun · Scour

On-device AI agents hit a hard memory limit. Apple's new architecture routes around it.

🤖agentic system

venturebeat.com·

Google's latest DiffusionGemma open AI model comes with a 4x speed boost

🔄Transformers News

arstechnica.com·

NVIDIA Nemotron 3 Ultra now available on Amazon SageMaker JumpStart

🤖agentic system Blog

aws.amazon.com·

SHAPE: Coalition-Aware Expert Pruning for Sparse Mixture-of-Experts LLMs

📊LLM Evaluation Academic

Nvidia’s best model is now live

🔄Transformers

thenewstack.io·

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

⚡CUDA Blog

blogs.nvidia.com·

Re-quantizing a local LLM 14x faster by skipping the tensors that didn't change

⚡Inference Optimization News Blog

andreaborio.substack.com··Substack

Apple rebuilt its on-device AI stack at WWDC 2026

⚡Inference Optimization Blog

ziraph.com··Hacker News

Google open-sources speedy DiffusionGemma text diffusion model

🔄Transformers

siliconangle.com·

LLM Research Papers: The 2026 List (January to May)

⚡Inference Optimization News

magazine.sebastianraschka.com

··Hacker News

Routing-Aware Expert Calibration for Machine Unlearning in Mixture-of-Experts Language Models

🎛️Fine-Tuning Academic

Microsoft faces scrutiny over clean data claims for MAI-Thinking-1

📊LLM Evaluation

MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 TPS

⚡Inference Optimization Blog

mimo.xiaomi.com··Hacker News, r/LocalLLaMA

DiffusionGemma: The Developer Guide

💾KV Cache Blog

developers.googleblog.com·

Microsoft Reduces OpenAI Dependency With In-House Frontier Models

📊LLM Evaluation News

hothardware.com·

STAR: Rethinking MoE Routing as Structure-Aware Subspace Learning

🔧MLIR Academic

A system programmer’s guide to LLM inference

⚡Inference Optimization Blog

blog.xiangpeng.systems··Hacker News

DiffusionGemma is Google’s fastest AI yet, but it comes with a big trade-off

androidauthority.com·

Deep X XM2 NPU: 80 TOPS Generative AI Accelerator at 5W

🔲TPU Architecture

armdevices.net·

Harnessing Routing Foresight for Micro-step-level MoE load balancing in RL Post-training

🤖agentic system Academic

Log in to enable infinite scrolling