While Big Tech burns billions on bigger models, DeepSeek solved Transformer instability with elegant architecture and the results are stunning
9 min read5 days ago
–
Not a Medium member? You can read this story for free here ↗.
Training frontier AI models now costs $100M+ per run. Performance gains are plateauing. The industry’s solution? Throw more money at the problem.
DeepSeek has a better idea.
Recently I was listening to Dwarkesh Patel’s podcast with Ilya Sutskever, co-founder of OpenAI and co-author of AlexNet. In the podcast, Ilya called out the ind…
While Big Tech burns billions on bigger models, DeepSeek solved Transformer instability with elegant architecture and the results are stunning
9 min read5 days ago
–
Not a Medium member? You can read this story for free here ↗.
Training frontier AI models now costs $100M+ per run. Performance gains are plateauing. The industry’s solution? Throw more money at the problem.
DeepSeek has a better idea.
Recently I was listening to Dwarkesh Patel’s podcast with Ilya Sutskever, co-founder of OpenAI and co-author of AlexNet. In the podcast, Ilya called out the industry practice of spending trillions on compute and scaling to build the next best model, suggesting this practice needs to stop. Instead, he advocated returning to fundamental research on neural network architecture itself. He also discussed how human generalization emerges from evolutionary advantages embedded in our DNA.
Rather than burning cash on brute-force scaling, we need to improve our systems’ architecture. This is exactly where DeepSeek’s mHC breakthrough enters the picture.
What caught my attention was not just the critique but the direction of the solution.