Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Mixture of Experts
🎭 Mixture of Experts
Specific
MoE Architecture, Sparse Models, Gating Networks, Model Scaling
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
94
posts in
6.7
ms
On-device AI agents hit a hard memory limit. Apple's new
architecture
routes
around it.
🤖
agentic system
venturebeat.com
·
1d
1 day ago
Actions for On-device AI agents hit a hard memory limit. Apple's new architecture routes around it.
Google's latest DiffusionGemma open AI
model
comes with a 4x speed boost
🔄
Transformers
Content type:
News
arstechnica.com
·
14h
14 hours ago
Actions for Google's latest DiffusionGemma open AI model comes with a 4x speed boost
NVIDIA Nemotron 3 Ultra now available on Amazon SageMaker JumpStart
🤖
agentic system
Content type:
Blog
aws.amazon.com
·
6d
6 days ago
Actions for NVIDIA Nemotron 3 Ultra now available on Amazon SageMaker JumpStart
SHAPE: Coalition-Aware
Expert
Pruning for
Sparse
Mixture-of-Experts
LLMs
📊
LLM Evaluation
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for SHAPE: Coalition-Aware Expert Pruning for Sparse Mixture-of-Experts LLMs
Nvidia’s best
model
is now live
🔄
Transformers
thenewstack.io
·
6d
6 days ago
Actions for Nvidia’s best model is now live
NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI
⚡
CUDA
Content type:
Blog
blogs.nvidia.com
·
18h
18 hours ago
Actions for NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI
Re-quantizing a local LLM 14x faster by skipping the tensors that didn't change
⚡
Inference Optimization
Content type:
News
Content type:
Blog
andreaborio.substack.com
·
21h
21 hours ago
·
Substack
Actions for Re-quantizing a local LLM 14x faster by skipping the tensors that didn't change
Apple rebuilt its on-device AI stack at WWDC 2026
⚡
Inference Optimization
Content type:
Blog
ziraph.com
·
2d
2 days ago
·
Hacker News
Actions for Apple rebuilt its on-device AI stack at WWDC 2026
Google open-sources speedy DiffusionGemma text diffusion
model
🔄
Transformers
siliconangle.com
·
9h
9 hours ago
Actions for Google open-sources speedy DiffusionGemma text diffusion model
LLM Research Papers: The 2026 List (January to May)
⚡
Inference Optimization
Content type:
News
magazine.sebastianraschka.com
·
4d
4 days ago
·
Hacker News
Actions for LLM Research Papers: The 2026 List (January to May)
Routing-Aware
Expert
Calibration for Machine Unlearning in
Mixture-of-Experts
Language
Models
🎛️
Fine-Tuning
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Routing-Aware Expert Calibration for Machine Unlearning in Mixture-of-Experts Language Models
Microsoft faces scrutiny over clean data claims for MAI-Thinking-1
📊
LLM Evaluation
4sysops.com
·
5d
5 days ago
Actions for Microsoft faces scrutiny over clean data claims for MAI-Thinking-1
MiMo-v2.5-Pro-UltraSpeed: 1T
model
with 1000 TPS
⚡
Inference Optimization
Content type:
Blog
mimo.xiaomi.com
·
3d
3 days ago
·
Hacker News
,
r/LocalLLaMA
Actions for MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 TPS
DiffusionGemma: The Developer Guide
💾
KV Cache
Content type:
Blog
developers.googleblog.com
·
1d
1 day ago
Actions for DiffusionGemma: The Developer Guide
Microsoft Reduces OpenAI Dependency With In-House Frontier
Models
📊
LLM Evaluation
Content type:
News
hothardware.com
·
4d
4 days ago
Actions for Microsoft Reduces OpenAI Dependency With In-House Frontier Models
STAR: Rethinking
MoE
Routing as Structure-Aware Subspace Learning
🔧
MLIR
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for STAR: Rethinking MoE Routing as Structure-Aware Subspace Learning
A system programmer’s guide to LLM inference
⚡
Inference Optimization
Content type:
Blog
blog.xiangpeng.systems
·
3d
3 days ago
·
Hacker News
Actions for A system programmer’s guide to LLM inference
DiffusionGemma is Google’s fastest AI yet, but it comes with a big trade-off
💾
KV Cache
androidauthority.com
·
4h
4 hours ago
Actions for DiffusionGemma is Google’s fastest AI yet, but it comes with a big trade-off
Deep X XM2 NPU: 80 TOPS Generative AI Accelerator at 5W
🔲
TPU Architecture
armdevices.net
·
6d
6 days ago
Actions for Deep X XM2 NPU: 80 TOPS Generative AI Accelerator at 5W
Harnessing Routing Foresight for Micro-step-level
MoE
load balancing in RL Post-training
🤖
agentic system
Content type:
Academic
arxiv.org
·
6h
6 hours ago
Actions for Harnessing Routing Foresight for Micro-step-level MoE load balancing in RL Post-training
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help