Google has revealed it’s ported around 30,000 of its production packages to the Arm architecture and plans to convert them all so it can run workloads on both its own Axion silicon and x86 processors.
The search and ads giant documented its move in a preprint paper published last week, titled “Instruction Set Migration at Warehouse Scale”, and in a Wednesday post that reveals YouTube, Gmail, and BigQuery already run on both x86 and its Axion Arm CPUs – as do around 30,000 more applications.
Both documents explain Google’s migration process, which engineering fellow Parthasarathy Ranganathan and developer relations engineer Wolff Dobson said st…
Google has revealed it’s ported around 30,000 of its production packages to the Arm architecture and plans to convert them all so it can run workloads on both its own Axion silicon and x86 processors.
The search and ads giant documented its move in a preprint paper published last week, titled “Instruction Set Migration at Warehouse Scale”, and in a Wednesday post that reveals YouTube, Gmail, and BigQuery already run on both x86 and its Axion Arm CPUs – as do around 30,000 more applications.
Both documents explain Google’s migration process, which engineering fellow Parthasarathy Ranganathan and developer relations engineer Wolff Dobson said started with an assumption “that we would be spending time on architectural differences such as floating point drift, concurrency, intrinsics such as platform-specific operators, and performance.”
“At first, we migrated some of our top jobs like F1, Spanner, and Bigtable using typical software practices, complete with weekly meetings and dedicated engineers,” the pair wrote. “In this early period, we found evidence of the above issues, but not nearly as many as we expected. It turns out modern compilers and tools like sanitizers have shaken out most of the surprises.”
Google’s devs ended up spending most of their time on the following chores:
- Fixing tests that broke because they overfit to our existing x86 servers;
- Updating intricate build and release systems, usually for our oldest and highest-traffic services;
- Resolving rollout issues in production configurations;
- Taking care to avoid destabilizing critical systems.
The post and paper detail work on 30,000 applications, a collection of code sufficiently large that Google pressed its existing automation tools into service – and then built a new AI tool called “CogniPort” to do things its other tools could not.
“CogniPort operates on build and test errors,” Ranganathan and Dobson wrote. “If at any point in the process, an Arm library, binary, or test does not build or a test fails with an error, the agent steps in and aims to fix the problem automatically. As a first step, we have already used CogniPort’s Blueprint editing mode to generate migration commits that do not lend themselves to simple changes.”
Google found the agent succeeded about 30 percent of the time under certain conditions, and did best on test fixes, platform-specific conditionals, and data representation fixes.
That’s not an enormous success rate, but Google has at least another 70,000 packages to port.
- Alibaba Cloud reveals DB cluster manager it says can beat rival hyperscalers
- Need to move 1.2 exabytes across the world every day? Just Effingo
- Tencent Cloud’s home-grown traffic-tamer halves WAN latency
- Resistance is ... cheap? Cloudflare, Mandiant, and pals form incident response ‘n’ cyber insurance borg
The company’s aim is to finish the job so its famed Borg cluster manager – the basis of Kubernetes – can allocate internal workloads in ways that efficiently utilize Arm servers.
Doing so will likely save money, because Google claims its Axion-powered machines deliver up to 65 percent better price-performance than x86 instances, and can be 60 percent more energy-efficient.
Those numbers, and the scale of Google’s code migration project, suggest the web giant will need fewer x86 processors in years to come. ®