All those AI Agents will that will soon be swarming about will need fresh data, which is causing the data platform community to urgently think about ways to better inject analytics directly into decision-making processes.
In October, Databricks quietly acquired a technology that will provide a crucial piece to its emerging Lakebase platform for AI agents: Mooncake, a single package that supports both rich transactional processing and fast columnar analysis.
Selling point? No [ETL pipelines](https://thenewstack.io/aws-makes-etl-disappear-for-aurora-postgresql-d…
All those AI Agents will that will soon be swarming about will need fresh data, which is causing the data platform community to urgently think about ways to better inject analytics directly into decision-making processes.
In October, Databricks quietly acquired a technology that will provide a crucial piece to its emerging Lakebase platform for AI agents: Mooncake, a single package that supports both rich transactional processing and fast columnar analysis.
Selling point? No ETL pipelines to manage. From within PostgreSQL itself, data can be tapped into for making routing decisions in the transaction process.
Lakebase is a serverless Postgres service integrated into the company’s Lakehouse managed data platform. It is optimized for AI agents (especially the company’s own Agent Bricks).
Databricks purchased serverless PostgreSQL provider Neon in May for $1 billion. This gave the company a PostgreSQL-based transactional platform, one that, according to Databricks, decoupled compute from storage.
The next piece of the puzzle: Mooncake.
OLTP and OLAP: Torn Asunder
Mooncake was developed by Mooncake Labs, a start-up by three ex-SingleStore engineers to rethink how a combined transactional and analytics database system might operate.
Traditionally, transactional database systems (OLTP) and analytics database systems (OLAP) have been run separately from one another (and often by separate departments) within the enterprise.
The commonly-held fear has been that the latency time of transactional processing — which needs to be fast — would be compromised by some long and/or computationally-heavy analytics jobs running on large data sets.
So put OLTP, with its microsecond insert times needed for speedy transactions, over here; and the OLAP system, with its ability to scan massive tables for large-scale analysis, over yonder.
This separation has since become burdensome. Because the two need to exchange data.
“The users are forced to manually duct tape them together with complex and fragile data pipelines that takes hours to sync and sometimes transform data into something that’s hard to read,” explained Mooncake Labs co-founder Cheng Chen, in a lecture at Carnegie Mellon University’s Database Group’s Future Data Systems Seminar Series.
Network speeds and computational heft have come to such where combining OLTP and OLAP could be a good idea, in that it opens a whole new vista of how transactions can be handled.
OLTP and OLAP: Together Forever
Chen was one of three co-founders who came from SingleStore, which offers a Hybrid Transactional/Analytical Processing (HTAP) database system of the same name (formerly MemSQL).
A distributed database system, SingleStore unifies transactional and columnar analytics, as a way to combine these two types of data stores. With a single engine, it uses working memory for transactional rows and disk for column storage. It scales well, and can support multiple formats such as JSON, full-text and vector.
But SingleStore’s design is monolithic, Chen lamented. Because it is run as a single stand-alone query engine, it must compete with the best of both OLTP and OLAP engines already in use. And those willing to adopt an entirely new database system simply to get the benefits of fast analytics on fresh data (for actions such as fraud detection) are relatively few in number.
Mooncake Bridges PostgreSQL and Iceberg Engines
Instead of trying to build “a magical engine” (Chen’s words) that does both kinds of processing, why not just recreate the functionality as a feature for existing systems?
Mooncake set out to build a “composable” hybrid database system, Chen said.
It is a framework and set of new features built on top of existing OLTP systems and OLAP formats.
The engineering team chose to support PostgreSQL for transactions, for its runaway popularity as an open source database system.
On the analytics side, they went with the open lakehouse formats of Apache Iceberg and (Databricks’ own) Delta Lake, so that data in either of these formats can be accessed by any conversant engine (DuckDB, StarRocks, Trino, Apache Spark).
Mooncake: Not an Engine, Just a Feature
Mooncake has two main components. One (“moonlink”) is a real-time layer on top of Iceberg that allows for a “sub-second ingestion” of data.
The second component (“pg_mooncake”) provides HTAP capability for PostgreSQL, allowing users to add analytical functions to determine transactional routing decisions.
Together, they provide a step forward in the endless divide of transactional and analytics systems, making a bridge to a world of new possibilities from fast analytics. The agents will be pleased.
Check out Chen’s entire talk for a technical deep dive into the challenges of getting Mooncake play nicely with both Iceberg and PostgreSQL:
TRENDING STORIES