How we run Gemini at scale across billions of posts (opens in new tab)

Covers Gemini thinking | Gemini API | Google AI for DevelopersDiscussed on Hacker News

Hear from Iván from the Modash Data Engineering team about how they run Gemini across billions of posts in a multi-cloud setup. Learn the cost and throughput optimizations that let them scale LLM inference to millions of new inputs daily — without the bill spiraling out of control.

Read the original article