Beyond x86: Java on ARM in 2025

We all know the mantra: “Write Once, Run Anywhere.” But for most of Java’s history, that “Anywhere” really meant “anywhere, as long as there’s Intel or AMD underneath.” When most of us were starting out with Java, ARM was something you associated with phones, a Raspberry Pi, or some mysterious “embedded” device – not with a serious backend carrying production traffic in a major cloud.

That’s why Java on ARM is still pretty niche today: hardly any backend developer seriously considered it, because for years… there just wasn’t much to talk about. The journey both ARM processors and the broader JVM ecosystem had to go through to catch up with the needs of the server world has been long and bumpy, full of ugly bugs only discovered in production, hurriedly rolled-back patches, and tons of …

However, to understand why now Java on ARM is finally starting to make sense – and why it’s worth paying attention to – we first need to look at the evolution that the ARM architecture itself has undergone in recent years.

Although ARM has been with us for a long time and has powered phones, tablets, routers and millions of “smart” gadgets for years, it took two key events to really bring it into the halls of “serious” IT. First – Apple’s move to its own ARM chips in Macs. Overnight, a huge number of developers suddenly had ARM-based primary work machines on their desks, not just toy dev boards.

Second – the arrival of the Neoverse architecture, a family of ARM cores designed from scratch with data centers in mind, not smartphones.

To understand Neoverse, it’s worth starting with Cortex. It’s Cortex cores – in their various A, R and M variants – that sit inside most consumer electronics: from phones and tablets to Raspberry Pi boards. They’re designed under very strict constraints on power, cost and die area, so that SoC vendors can build cheap, energy-efficient chips for battery-powered devices. They’re perfect for the “a few strong cores + GPU + modem on one chip” scenario, but much less suited to servers with hundreds of watts of socket power, massive amounts of memory and full-blown enterprise requirements.

Neoverse is ARM’s answer to the question: “what should a server ARM look like if we stop thinking like phone designers?”. It’s a separate IP line, built from the ground up for infrastructure: high core counts (N1/N2/V2 scaling to hundreds of cores per board), large caches, mesh interconnects, a strong focus on memory bandwidth, virtualization, and RAS (Reliability, Availability, Serviceability) features. In other words – Neoverse is to data centers what Cortex is to smartphones: the basic building block partners like AWS, Ampere and others can use to build their own server CPUs without the compromises typical of mobile cores.

It’s Neoverse that really brought ARM into the cloud. The first generation of AWS Graviton was still built on the well-known Cortex-A72 cores and targeted rather “lighter” workloads. The later generations – Graviton2 on Neoverse N1, Graviton3 on Neoverse V1 and Graviton4 on Neoverse V2 – are fully-fledged, high-performance server processors that in many scenarios beat x86 in terms of price/performance and energy efficiency. In practice, this means that for a large chunk of use cases, “EC2 on ARM” is no longer a curiosity but starts becoming the default option.

The second pillar of this revolution are independent vendors such as Ampere with its Altra family of processors, also built on Neoverse N1. These are the chips that power, among others, ARM instances in Oracle Cloud and many other data centers. SoftBank’s acquisition of Ampere Computing for 6.5 billion dollars is a clear signal that ARM CPUs are no longer a niche experiment but a strategic piece of infrastructure for AI and cloud – especially given that SoftBank also controls ARM itself.

The end result is that a huge portion of new cloud-native workloads now land on ARM by default: microservices in Kubernetes, backend services, event-driven systems, application servers like Spring Boot or Quarkus. Hyperscalers are aggressively promoting ARM instances with attractive pricing, while managing to deliver 20–40% better price/performance and significant energy savings compared to classic x86.

For companies counting every watt and every dollar on their cloud bill, and at the same time building new services on top of containers and JVMs, the natural question is increasingly not “will ARM work?”, but “why are we still not using ARM instances?”.

And now it’s time to ask… why, exactly? Time to look back at the history of Java on ARM.

It’s 2011, Cambridge, UK. Andrew Haley (Red Hat) and Jon Masters (Chief ARM Architect at Red Hat) are sitting in a Thai pub called “The Wrestlers.” At some point Masters drops a bomb: 64-bit ARM (AArch64) is coming, and Red Hat wants to bring Red Hat Enterprise Linux to it. There’s just one tiny problem – there’s no Java on this platform. And without Java there is no enterprise. Haley hears an initial estimate: porting OpenJDK will take two experts about a year of work. The catch? Those experts… don’t exist yet. The team has to learn the ARM architecture on the fly, writing code against simulators before any real silicon ever shows up in a data center.

Long before we could talk about performance, we lived in the age of OpenJDK Zero. The idea was beautiful: write a JVM interpreter in pure C++, without a single line of assembly. That way Java would “run” on anything that had a GCC toolchain – from routers to exotic experimental chips. Reality was brutal: performance was awful. Zero had no JIT (Just-In-Time) compiler, so bytecode was interpreted instruction by instruction. It was like pushing a sports car up a hill – technically it moved, but no one wanted to run that on production.

The real breakthrough came with JEP 237, roughly around Java 9. That’s when engineers from Red Hat and Linaro rolled up their sleeves and aimed for a full-blown port with C1 and C2 compilers that truly “understand” ARM. The biggest challenge was changing the mental model. x86 (CISC) is brute force: complex instructions that do many things at once. ARM (RISC) is precision and simplicity, but with a different geometry of power. C2 had to learn how to use 31 general-purpose registers (x86 only has 16). That’s a massive difference – the JVM can keep most “hot” variables directly in CPU registers instead of constantly spilling them out to memory and loading them back, which dramatically changes the performance profile of the whole application.

However, the performance gain happens in intrinsics in JEP 315, where the JVM replaces selected Java methods with hand-written assembly. That’s where we saw both the biggest wins and the most painful failures. On the success side, you have cryptographic instructions (AES) from ARMv8 – suddenly GCM encryption in TLS sped up by 3.5–5x on Graviton2. Similarly, operations on strings (like String.indexOf) started using NEON vector instructions and stopped being such an obvious hotspot in many services. But there were dead ends, too: the attempt to speed up String.equals using NEON turned out to be worth it only for long strings; for the short ones that dominate typical business systems, the overhead of preparing vector registers simply killed the gain. Result: the code ended up in the “rolled back / Won’t Fix” bucket. Even more spectacular was the bug in the Math.log intrinsic (JDK-8210858): for extreme values, logarithms on ARM stopped being monotonic and produced different results than on Intel. In finance or scientific computing, that’s not a “minor discrepancy” – that’s a red alert. In the end, the intrinsic was removed – correctness beat performance.

Today, when you run Java 21 on AWS Graviton4 or Google Axion, you’re benefiting from more than a decade of those experiments and missteps. Latency-sensitive workloads on ARM64 have improved by several hundred percent compared to baseline Java 8. A modern GC helps a lot (the generational ZGC loves ARM tricks like TBI – Top Byte Ignore), as do well-tuned intrinsics and native support for SVE2 vectors. Then there’s the character of the CPUs themselves: Ampere Altra, Graviton and friends don’t have Hyper-Threading, so one Java thread is one physical core. The garbage collector doesn’t have to fight your business code for ALUs, which translates into much calmer latency tails – on ARM, tail latency is often noticeably flatter and more predictable than on x86.

The history of Java on ARM is, at its core, a story about how hardware is useless without brutally hard work on the software side. We’ve gone from a slow interpreter in pure C++, through logarithms that calculated “a bit differently,” all the way to virtual machines that, on chips like Google Axion, can squeeze out up to +150% performance in AI workloads compared to the latest Intels.

And while I know there are lies, bigger lies and benchmarks…

…if you’re sticking to x86 purely out of habit, you’re probably burning money. But before you rush to move production to ARM, do one thing: update your JDK. Java 8 on ARM works, but it’s like driving a Ferrari with the handbrake on. The real fun starts with Java 17, and ideally 21 – that’s where you finally see why someone spent all those years grinding away at the AArch64 port.

So if you look for the good business reason to migrate a new Java – I found you one 😊. Of course, if you crave the change – not everybody’s do.

Author: Artur Skowronski

Head of Java & Kotlin Engineering @ VirtusLab • Editor of JVM Weekly Newsletter

Author: Artur Skowronski

Similar Posts