AI search provider Perplexity’s research wing has developed a new set of software optimizations that allows for trillion parameter or large models to run efficiently across older, cheaper hardware using a variety of existing network technologies, including Amazon’s proprietary Elastic Fabric Adapter.

These innovations, detailed in a paper published this week and released on GitHub for further scrutiny, present a novel approach to addressing one of the biggest challenges in serving large-scale mixture of experts models (MoE) at scale: memory and network latency.

Mo parameters, mo problems

MoE models, like DeepSeek V3 and R1 or Moonshot AI’s Kimi K2, are big, ranging from 671 billion to 1 trillion p…

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help