
The number of AI inference chip startups in the world is gross – literally gross, as in a dozen dozens. But there is only one that is funded by two of the three biggest makers of HBM stacked memory and backed by the two biggest telecom companies in its indigenous country. And given that those who can get HBM allocations can make datacenter AI accelerators, even though South Korean startup Rebellions AI is arguably coming late to the game, perhaps its timing will turn out to be perfect.
And given that not only are Samsung and SK Hynix supplying Rebellions with HBM memory but Samsung is also the company’s foundry, these are advantages that Rebellion…

The number of AI inference chip startups in the world is gross – literally gross, as in a dozen dozens. But there is only one that is funded by two of the three biggest makers of HBM stacked memory and backed by the two biggest telecom companies in its indigenous country. And given that those who can get HBM allocations can make datacenter AI accelerators, even though South Korean startup Rebellions AI is arguably coming late to the game, perhaps its timing will turn out to be perfect.
And given that not only are Samsung and SK Hynix supplying Rebellions with HBM memory but Samsung is also the company’s foundry, these are advantages that Rebellions should be able to leverage as it wants to sell its AI accelerators not only in South Korea, but to a world at large that has gone absolutely mad for matrix math.
It doesn’t hurt to have watched as the first wave of AI startups –Groq, Cerebras Systems, SambaNova Systems, Graphcore, Nervana Systems, and Habana Labs – have all ran up against the limits of their own architectures and their funding, or in the case of Nervana and Habana, disappeared into the gaping maw of a once giant Intel pretty much never to be seen again.
“I say this all the time – the first mouse ends up in the trap, the second mouse gets the cheese,” Marshall Choy, chief business officer of Rebellions, tells The Next Platform.
Choy spent a dozen years at Sun Microsystems in charge of technical products and solutions engineering, and did more than eight years in that role for the engineered systems at Oracle in the wake of the Sun acquisition in 2010, so he knows about the Dot Com Boom and the transition to normalcy for Internet technologies. Choy was also on the founding team at SambaNova, first as vice president of products and then as chief customer officer before leaving for Rebellions a month ago.
“Let’s be honest, that first wave of AI accelerators, which lacked flexibility and adaptability, just never got wild success in the market,” Choy continues. “The gen two folks, we are the second mouse, and we have been patient. The ecosystem has developed, and we are strategically timing our entrance into various markets, which reduces the overall risk of the effort.”
Trying to take on Nvidia, AMD, and a rising number of homegrown AI accelerators coming out of the hyperscalers, cloud builders, and model builders was not the plan when Rebellions was founded back in September 2020 to create AI inference acceleration chips for high frequency trading companies. But then again, Nvidia was founded to make 3D graphics chips and then very precise kinds of acceleration for HPC simulation and modeling before it pivoted to a broader AI market that it has been turbocharging for more than a decade now. Things don’t always go as planned – sometimes they go far better than planned.
The Land Of The Morning Calm
Rebellions is headquartered in Seoul, the capital of and the largest city in the industrial and financial powerhouse that is South Korea, the 14th largest economy in the world with an expected gross domestic product of $1.86 trillion for 2025. (The United States is 1st with a projected $30.6 trillion, followed by China at $19.4 trillion; the European Union will be around $21.1 trillion, but is obviously many nation-states.)
The company has four co-founders, with Sung-hyun Park being its chief executive officer. Park got his bachelors degree at the Korea Advanced Institute of Science and Technology, and then got graduate degrees in electrical engineering and computer science (with a minor in finance) from Massachusetts Institute of Technology. After that, Park worked at Intel for two years as a senior research scientist, and did short stints as a staff engineer at Samsung Mobile, as an ASIC designer at the Starlink division of SpaceX, and an ultra-low latency equity trading system designer at Morgan Stanley before starting Rebellions.
Jinwook Oh is the company’s co-founder and chief technology officer. Oh got his bachelors degree in electrical engineering at Seoul National University and his advanced degrees at KAIST, which has tight relationships with the Korea Institute of Science and Technology Information (KISTI) for HPC and now AI research. Oh was a researcher at KAIST working with Microsoft and Texas Instruments for a number of years before becoming a staff member at IBM Research, and significantly worked on approximate computing, coarse grained reconfigurable arrays, and neural network accelerators. Rebellions co-founder and chief products officer Hyoeun Kim got his electrical engineering degrees at KAIST, worked as an engineer at Maxwave and Samsung Electronics before taking the CPO job at medical equipment maker Lunit before joining the company at its founding. And finally, there is co-founder Sungho Shin, a researcher in AI and algorithms from the Seoul National University.
Rebellions had Series A rounds in 2020 and 2022 for $61 million, and in 2024 it had a Series B round led by KT Corp (formerly Korea Telecom) with participation by the venture arm of oil giant Saudi Aramco. The Series C funding round was led by Arm Holdings (oddly enough) with participation of Samsung Ventures, Pegatron VC, Korea Development Bank, Korelya Capital, Kindred Ventures, and Top Tier Capital. SK Telecom became an investor in Rebellions in December 2024 when Sapeon Korea, an AI startup that was part of the Korean telecom giant and that also had investment from DRAM and HBM memory maker SK Hynix, was merged with Rebellions. With the merger, Rebellions became the first AI chip unicorn – meaning with a valuation of over $1 billion – based in South Korea. It’s valuation is likely $1.5 billion or more.
Here’s the thing: SK Telecom and SK Hynix are both part of the SK Group chaebol, which is the second largest conglomerate in the country. Samsung Group is the largest chaebol. They are both invested in and both supply HBM memory to Rebellions; Samsung is the company’s foundry partner.
While Rebellions etched its Ion chips for high frequency trading acceleration on 7 nanometer processes at Taiwan Semiconductor Manufacturing Co, the company moved to 5 nanometer processes with the follow-on Atom AI inference accelerators. The current Rebel line of chips, which is the ones we care about at this point because they compete against datacenter-class GPU accelerators from Nvidia and AMD, are made using Samsung’s 4 nanometer processes – and in fact, Rebellions is driving that process ramp for Samsung because IBM did not opt for it for its Power11 processors and stuck to a tweaked 7 nanometer process from Samsung.
In recent months, Rebellions has partnered with Arm to be part of its Arm Total Design ecosystem, which will allow companies making Arm CPUs based on Neoverse designs to integrate with Rebellion’s Atom or Rebel AI accelerators to create hybrid platforms using Samsung’s impending 2 nanometer processes. Moreover, Rebellions has also partnered with Marvell to exploit its signaling SerDes, chip-to-chip interconnects, and advanced packaging to create custom AI accelerators for customers – particularly sovereign AI centers and regional neoclouds in, for instance, Asia, Africa, or the Middle East who might want to buy AI accelerators that can’t be detained by US export controls.
While brings us all the way to the third generation Rebel AI inference chips.
With A Rebel Yell, They Cried Coarse Grained Cores. . .
With Nvidia GPUs, Google TPUs, and AWS Trainiums pretty much having the lock on AI training during the early years of the Mixture of Experts era and with inference being where people are trying to make money, Rebellions is understandably focusing its Rebel and future chips on inference.
The Rebel chip borrows its architecture from the Atom chip that preceded it, and specifically it draws on the coarse grained configurable array (CGRA) approach in processing element design that Oh worked on at Big Blue and marries it to a software-defined network-on-chip. Like this:
With this approach, the routing between any two processing elements, which Rebellions calls a neural core, on a Rebel chip is programmable, and this mesh interconnect can be scaled across chiplets to make ever-larger compute and memory complexes. The routing and scheduling within a chip and across collections of chips can adjust itself based on traffic patterns as inference jobs are running.
But perhaps the most useful part of the CGRA architecture is the fact that the cache memories, load store units, tensor units, and vector units on the neural cores have input buffers (IBUFs) that have a custom instruction set that makes them programmable. What this means is that arrays of neural cores can be programmed to look like a big systolic array for compute intensive operations during the prefill stage of an LLM doing inference, where the prompt is broken up into key-values, and then it can be reprogrammed to be more of a memory bandwidth machine to generate the token response to a query during the decode phase. There are intermediate phases as well, as the chart above shows.
To put it succinctly, the CGRA approach takes some of the elements of the programmability of the FPGA without paying the inefficiency that its full flexibility in programming requires.
Like other AI compute engines, the Rebel chip has a mix of compute engines within its neural core:
The details of each compute element on the Rebel neural cores are being kept under wraps for now, but we know that each core has 4 MB of L1 SRAM memory that feeds into a load/store unit that in turn hooks into a tensor unit and a vector unit. These math units support FP16, FP8, FP4, NF4, and MXFP4 precision, which is all you need to do inference these days. The neural core on the Rebel Single has 16 teraflops at FP16 precision and 32 teraflops at FP8 precision; we do not know how many ops per clock the neural core can do, so therefore we cannot ascertain the clock speed. But we expect it to be somewhere around 2 GHz.
To scale up the Rebel chip, a block of eight neural cores are linked together with a mesh interconnect through their SRAM blocks, a common feature in CPU, GPU, and XPU architectures. A pair of these are put on a single chiplet called the Rebel Single:
The Rebel Single has a PCI-Express 5.0 x16 port plus three UCI-Express-A chip interconnect ports and a single HBM3E memory controller. For now, the Rebel Single is using Samsung’s HBM3E stacked memory, but there is no reason it cannot support SK Hynix HBM memory and there is every reason to believe that it will eventually.
The HBM3E port runs at 1.2 TB/sec, the PCI-Express port runs at 128 GB/sec, and each of the three UCI-Express ports runs at 1 TB/sec. The Rebel Single has 64 neural cores and a total of 64 MB of L1 cache that is shared across them, and the mesh interconnect has 16 TB/sec of bandwidth allocated to the caches and another 16 TB/sec allocated to the neural cores.
On the upper left hand side of the Rebel Single chip, you will note a separate circuit block that has TDMA, CP, and Sync Man in it. These are important elements to the design that accelerate parts of the AI inference workflow:
We look forward to drilling down into these more, but for now this is all Rebellions is saying about these special logic blocks.
The command processor, or CP, has a pair of four-core Arm Neoverse CPU blocks with 4 MB of L2 cache, and it’s job is to assist the synchronization manager and the Task DMA controller below and above it to orchestrate and synchronize data movement across collections of Rebel chips so the compute elements have the data they need when they need it. Conceptually, we think, it is a bit like a NUMA controller for the HBM memories within a socket.
these neural core clusters are linked together to make a compute engine in a single socket. We presume that, in the long run, multiple sockets will be interlinked with a scale up network based on UALink or ESUN and maybe even licensed NVLink Fusion interconnects as customers desire. (Rebellions is keeping mum about this for now.)
To make a bigger compute complex, four Rebel Singles can be linked together like this:
This schematic shows a quad of Rebel Singles, and is obviously called a Rebel Quad, but as you can see, you could just keep stacking pairs of Rebel Single on the top and bottom to extend to a very large flat space of interconnected compute and memory. You could, if you wanted, make a very long sled that is the logical equivalent of a wafer-scale design with a huge amount of HBM memory hanging off of it, like one of those giant Snickers bars they sell at Christmas.
But Rebellions is not going to go crazy like that unless someone asks for it. But there are obviously many ways that CPU and XPU complexes can be linked together, and Oh and Choy teased some of the possibilities in front of us:
For now, the focus is on the Rebel Quad, which is a socket that we have actually held in our hands and that they would not let us have as a paperweight to add to our collection:
The chip complex uses Samsung’s ICube-S interposer and package technology, which is roughly analogous to the CoWoS-S interposer and packaging from TSMC. The package has four 12-high stacks of HBM3E memory with a total of 4.8 TB/sec and the two PCI-Express 5.0 x16 lanes have a total of 256 GB/sec of bandwidth into and out of the chip. (It is a pity that two of those PCI-Express controllers get stranded in the middle of the quad complex.
Here are the details on that UCI-Express-A chip-to-chip interconnect:
Rebellions has licensed its UCI-Express-A controller from Alphawave Semi, the chip startup that Qualcomm just ate for $2.4 billion.
The upshot of all this is that the Rebel Quad delivers 1 petaflops at FP16 precision and 2 petaflops at FP8 precision. It is unclear if the throughput doubles at the various FP4 precisions of if there is just a lot of zeroes hanging out there at the end half of the math unit.
The Rebel Quad socket burns 600 watts of juice, which is pretty low by comparison to Nvidia and AMD GPUs and Intel’s ill-fated Gaudi 3 AI accelerator of roughly similar performance:
It is interesting to us that an OAM socket is not available for the Rebel Quad, only a PCI-Express card form factor, but presumably this can be done should customers require it. (This is particularly important for liquid-cooled server setups, where you want to lay the chips down on a system board and run copper pipes over the top across multiple compute engines for the sake of density.
The Rebel Quad is absolutely competitive with the H200 from Nvidia in terms of raw performance – with 3.4 percent more FP16 and FP8 performance oomph – but delivers 20.7 percent more teraflops per watt. The B200 GPU from Nvidia has 2.2X the performance of the Rebel Quad, and it costs 1.7X more bandwidth and 1.7X more watts to deliver that, which is a fair bargain. The AMD MI325X has about the same teraflops per watt as the Rebel Quad, and delivers 28 percent more floating point throughput and need 25 percent more memory bandwidth and 25 percent more watts to accomplish this.
Real world performance, thanks to architecture differentiation, could be a lot different and we look forward to seeing benchmarks running real inference across these GPUs and the Rebel chips.
We have no idea about price, but it is reasonable to assume that Rebellions has some wiggle room here and that the company will price to value as well and not just race to the bottom. With more tensor math and HBM demand than there is supply, only a fool would start a price war.
The Rebel Single taped out in November 2024 and the Rebel Quad is now sampling to selected customers for proof of concept designs.
After plowing though all of that hardware from the core up, now Rebellions has to put software on top of it. And of course it will be using an open source stack based on a native implementation of PyTorch using the Triton inference engine and the vLLM open source library to manage the key-value cache for inference. Rebellions has also cooked up its own collective communications library, called RBLN CCL, which is akin to the Nvidia NCCL library; both are derivatives of the open source Message Passing Interface (MPI) library that transformed HPC all those decades ago and that is still foundational to AI.
Rebellions also has an inference serving layer called Raise that is analogous to Nvidia’s Dynamo inference stack, and has been hooked into the Ray distributed inference framework running atop Red Hat’s OpenShift Kubernetes container platform and its container variant of Red Hat Enterprise Linux, which came from its CoreOS acquisition all those years ago when The Next Platform was young.
Sign up to our Newsletter
Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between. Subscribe now