🎭 Mixture of Experts - moyutianzun · Scour

DiffusionGemma is Google’s fastest AI yet, but it comes with a big trade-off

androidauthority.com·

Harnessing Routing Foresight for Micro-step-level MoE load balancing in RL Post-training

🤖agentic system Academic

Google’s Sergey Brin Sees A Path To AGI But Not Beyond It via @sejournal, @martinibuster

🔄Transformers

searchenginejournal.com·

DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200

💾KV Cache News

newsletter.semianalysis.com

··Hacker News

AI Week in Review 26.06.06

🤖agentic system News Blog

patmcguinness.substack.com··Substack

Google's new open model DiffusionGemma generates text from noise instead of word by word

🔄Transformers

the-decoder.com

·

TENP: Trapezoidal Expert Neuron Pruning For Mixture-of-Experts

↩️Backpropagation Academic

Introducing the Third Generation of Apple’s Foundation Models

🔄Transformers

machinelearning.apple.com··Hacker News, r/apple

Qualcomm Announces On-Device AI Claw Ecosystem Plan

🤖agentic system

autonews.gasgoo.com·

From Observation to Intervention: A Causal Audit of Expert Importance in Mixture-of-Experts Models

📊LLM Evaluation Academic

Dnotitia Releases DNA 3.0, an Enterprise-Ready AI Language Model Family - HPCwire

🔄Transformers

PADD: Path-Aligned Decompression Distillation for Non-Router Teacher to Guide MoE Student Learning

⚡Inference Optimization Academic

Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good

⚡Inference Optimization Blog

towardsai.net·

Startup Ricursive to Create an End-to-End AI Model for Chip Design

🔲TPU Architecture News

Enhancing Multilingual LLM-based ASR with Mixture of Experts and Dynamic Downsampling

🎛️Fine-Tuning Academic

MosaicIMU: Composing Carrier Experts for Generalizable Neural Inertial Odometry

⚡FlashAttention Academic

Sakana AI's Recursive Self-Improvement (RSI) Lab

🤖agentic system

sakana.ai··Hacker News

FAME: Forecastability-Aware Mixture of Experts for Heterogeneous Time Series Forecasting

⚡Inference Optimization Academic

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

⚡Inference Optimization News Blog

kaitchup.substack.com··r/LocalLLaMA

NGram-MoSE: Efficient Remote Sensing Super-Resolution via N-Gram Context and Mixture-of-Experts

🔄Transformers Academic

Sign up or log in to see more results

Log in to enable infinite scrolling