Home » Artificial intelligence » Nvidia Does $20 Billion Deal With Groq
The NVIDIA-Groq $20 billion deal announced on December 24, 2025 is a major strategic move in the AI hardware space. NVIDIA and Groq clarified that it is not a full company acquisition. The deal is structured as a non-exclusive licensing agreement for Groq’s inference technology, combined with NVIDIA hiring key Groq personnel. Groq’s founder and CEO Jonathan Ross (a former lead designer of Google’s Tensor Processing Unit/TPU), President Sunny Madra, and other senior team members will join NVIDIA to help integrate and scale the licensed technology. Groq itself remains an independent company, now led by CEO Simon Edwa…
Home » Artificial intelligence » Nvidia Does $20 Billion Deal With Groq
The NVIDIA-Groq $20 billion deal announced on December 24, 2025 is a major strategic move in the AI hardware space. NVIDIA and Groq clarified that it is not a full company acquisition. The deal is structured as a non-exclusive licensing agreement for Groq’s inference technology, combined with NVIDIA hiring key Groq personnel. Groq’s founder and CEO Jonathan Ross (a former lead designer of Google’s Tensor Processing Unit/TPU), President Sunny Madra, and other senior team members will join NVIDIA to help integrate and scale the licensed technology. Groq itself remains an independent company, now led by CEO Simon Edwards, and its GroqCloud inference platform will continue operating without interruption. This is a kind of acqui-hire + licensing structure.
Technical Capabilities of Groq’s LPU and Why It Justifies the Deal
Groq’s core innovation is the Language Processing Unit (LPU) — a custom ASIC (originally called Tensor Streaming Processor/TSP) purpose-built from the ground up for AI inference, especially sequential workloads like large language models (LLMs). Unlike general-purpose GPUs (originally designed for graphics and parallel compute), the LPU optimizes for the unique demands of inference: deterministic low latency, high token throughput, energy efficiency, and handling sequential dependencies in transformer-based models.
The key technical differentiators that have made Groq a leader in inference and explain NVIDIA’s interest in the SRAM-centric architecture.
The LPU integrates hundreds of MB of SRAM as primary weight storage (not just cache). This eliminates the massive memory bandwidth bottlenecks common in GPUs (where weights must shuttle between slow HBM and compute units).
It gives instant weight access, feeding compute units at full speed → dramatically lower latency and higher efficiency.
Deterministic, statically scheduled dataflow Groq uses a producer-consumer model with “conveyor belt”-style data movement between SIMD function units. Everything is statically scheduled by the compiler ahead of time (no dynamic branching or caching misses). This provides perfectly predictable performance, zero jitter, and optimal utilization — ideal for real-time applications where variable latency is unacceptable.
Tensor parallelism focus Unlike typical data parallelism (processing many requests at once), Groq emphasizes tensor parallelism — splitting individual layers/operations across multiple chips for faster single-user latency. This is critical for interactive chat, agents, and voice applications, where first-token and time-to-last-token speed matters most.
TruePoint numerics & lossless accuracy Custom low-precision formats maintain full model accuracy while maximizing speed and efficiency (no quantization degradation).
Overall performance claims Groq routinely delivers hundreds to thousands of tokens/second on large models (breaking 100+ tokens/s on Llama 70B early on), often 5–10× faster and 5–10× more cost/energy-efficient than GPU equivalents in real-world benchmarks. Customers have reported 7–8× faster chat speeds with ~90% cost reductions.

Brian Wang is a Futurist Thought Leader and a popular Science blogger with 1 million readers per month. His blog Nextbigfuture.com is ranked #1 Science News Blog. It covers many disruptive technology and trends including Space, Robotics, Artificial Intelligence, Medicine, Anti-aging Biotechnology, and Nanotechnology.
Known for identifying cutting edge technologies, he is currently a Co-Founder of a startup and fundraiser for high potential early-stage companies. He is the Head of Research for Allocations for deep technology investments and an Angel Investor at Space Angels.
A frequent speaker at corporations, he has been a TEDx speaker, a Singularity University speaker and guest at numerous interviews for radio and podcasts. He is open to public speaking and advising engagements.