Perplexity shows how to run monster AI models more efficiently on aging GPUs, AWS networks
theregister.com·15h
Flag this post

AI search provider Perplexity’s research wing has developed a new set of software optimizations that allows for trillion parameter or large models to run efficiently across older, cheaper hardware using a variety of existing network technologies, including Amazon’s proprietary Elastic Fabric Adapter.

These innovations, detailed in a paper published this week and released on GitHub for further scrutiny, present a novel approach to addressing one of the biggest challenges in serving large-scale mixture of experts models (MoE) at scale: memory and network latency.

Mo parameters, mo problems

MoE models, like DeepSeek V3 and R1 or Moonshot AI’s Kimi K2, are big, ranging from 671 billion to 1 trillion p…

Similar Posts

Loading similar posts...