Yellowbrick Data - Community Edition Blog

About the Author

Hi, I’m Pratik.

I’m a software engineer at Yellowbrick, where I’ve spent the last few years working on different parts of the database engine and platform. I tend to gravitate toward problems that sit between systems design and real-world usability.

Recently, I’ve been working on Yellowbrick’s Community Edition—adapting an enterprise system to run as a single-node Docker environment. This post is a reflection on that work: what had to change, what surprised me, and what I found most interesting along the way.

TL;DR

We took Yellowbrick’s Enterprise Edition—a PostgreSQL-compatible analytics engine—and built a single-node, Docker-based Community Edition (CE) designed to run on a developer laptop.

To make that work, we rethought several enterpr…

About the Author

Hi, I’m Pratik.

TL;DR

We took Yellowbrick’s Enterprise Edition—a PostgreSQL-compatible analytics engine—and built a single-node, Docker-based Community Edition (CE) designed to run on a developer laptop.

To make that work, we rethought several enterprise assumptions:

Added SQL-based local file loading by treating local paths as a first-class external storage backend, with sandboxing and directory whitelisting.
Replaced the enterprise BBFS storage layer with a lightweight Local FS backend that behaves predictably in containers.
Reworked pod-based loaders into short-lived OS processes with explicit CPU and memory limits.
Removed memory pinning and adopted madvise(MADV_DONTNEED) so memory is returned to the OS when it is no longer needed.
Decoupled Yellowbrick Manager (YM) from Kubernetes using a local connector and simplified configuration.

The result is the same core engine and SQL behavior, packaged in a way that works naturally in a single-container, developer-focused environment.

You can try it here:

https://hub.docker.com/r/yellowbrickdata/yb-community-edition

What Is Yellowbrick?

Yellowbrick is a SQL data platform designed for analytical workloads.

It can be deployed on-premises, in public cloud environments, or in hybrid setups, allowing organizations to choose where and how their data is managed.

The platform is PostgreSQL-compatible, enabling existing tools and workflows to connect without SQL rewrites. Internally, it uses a distributed architecture intended to support analytical queries across large datasets while maintaining isolation between different workload types.

What “Enterprise Edition” Means

The Enterprise Edition represents the fully featured, production-grade version of Yellowbrick.

It is designed for distributed deployments and includes capabilities such as multi-node operation, workload isolation, availability features, and enterprise security controls. It supports a wide range of deployment models and operational requirements typical of large organizations running critical analytical systems.

Why We Built the Community Edition

We wanted Yellowbrick to be approachable without requiring a distributed cluster, a lengthy setup process, or dedicated infrastructure.

That motivation led to Yellowbrick Community Edition (CE): a free, single-node version of the platform that runs entirely in Docker. It is built on the same engine and SQL interface as the Enterprise Edition, but scoped for development, evaluation, and smaller-scale use cases.

Some enterprise features—such as multi-node scaling, advanced workload management, and certain security capabilities—are intentionally excluded. CE focuses on making the core system accessible in a local environment.

Running an enterprise-oriented system inside a single container required revisiting several architectural assumptions. The sections below describe the main changes we made to support that goal.

Engineering Yellowbrick for a Single-Node, Docker Environment

CE was designed to run in environments that are fundamentally different from enterprise clusters. Storage layers, loaders, memory behavior, and management tooling all had assumptions that no longer held in a single-container setup.

The following sections outline the most significant areas we addressed and how those changes were implemented.

Challenge 1: Loading Local Files via SQL

In Enterprise Edition, data is commonly loaded from external object stores using SQL statements such as:

LOAD TABLE my_table FROM 's3://bucket/path/data.csv';

These locations are treated as external storage backends with built-in handling for access and streaming.

Local filesystem paths, however, were not supported directly through SQL. Loading local data required a separate CLI tool, which was not ideal for a Docker-based, self-contained environment.

The Approach

Rather than creating a special case, we extended the existing external storage abstraction to include local paths.

Local files are now treated as another external storage type, implemented as a plugin within the existing YbFile framework. The overall flow—PostgreSQL frontend, middleware, and worker execution—remains unchanged. Only the storage resolution logic differs.

This approach keeps the design modular and makes it possible to add additional storage backends in the future using the same mechanism.

Security Considerations

Because local files are user-controlled, CE restricts access to explicitly whitelisted directories inside the container. Path traversal is blocked, and the model assumes a single-user environment where the user controls what is mounted into the container.

Challenge 2: Replacing BBFS with Local FS

Enterprise Edition uses BBFS, a block-based filesystem designed for long-lived, distributed environments with dedicated storage.

That design was not well suited for a single-node Docker deployment.

The Issues

BBFS introduced several constraints in a local environment:

Disk space was pre-allocated in ways that were not appropriate for developer machines.
Space was not easily released once claimed.
The implementation assumed 32-bit inodes, which conflicted with modern host filesystems exposing 64-bit inodes.
The design assumed persistent infrastructure rather than ephemeral containers.

The Solution

For CE, we replaced BBFS with a Local FS backend that interacts directly with the operating system’s filesystem APIs.

The rest of the system continues to use the same internal interfaces, keeping the change localized. To address inode compatibility, CE generates and manages its own stable 32-bit identifiers rather than relying on OS-provided inodes.

This approach made the storage layer compatible with containerized, single-node environments without introducing broader architectural changes.

Challenge 3: Moving from Pods to Processes for Load Jobs

In Enterprise Edition, load jobs are executed inside Kubernetes pods, which provide isolation and lifecycle management.

CE does not rely on Kubernetes, so this model had to change.

The Change

In CE, each load job runs as a short-lived operating system process spawned for the duration of the job. Resource limits are applied at the process level, and the process exits cleanly when the load completes.

This preserves isolation between load jobs while fitting naturally within a single-container deployment model.

Challenge 4: Memory Management in a Local Environment

Enterprise deployments often rely on memory pinning to support predictable behavior on dedicated hardware. In a local Docker environment, that approach can be problematic.

The Adjustment

CE avoids memory pinning and instead releases memory back to the operating system as soon as it is no longer needed. This is done using the madvise() system call with the MADV_DONTNEED flag.

This allows the operating system to reclaim memory immediately, which is more appropriate for constrained, shared environments such as developer machines.

Challenge 5: Decoupling Yellowbrick Manager from Kubernetes

Yellowbrick Manager (YM) was originally built around Kubernetes concepts such as pod discovery, ConfigMaps, and Secrets.

CE runs without Kubernetes, so YM needed a different operating mode.

The Redesign

We introduced a local mode for YM:

Kubernetes-specific dependencies were removed.
Configuration is provided through local environment variables.
A lightweight authentication mechanism replaces Kubernetes-managed secrets.
Features that only apply to distributed clusters are disabled.

This allows YM to manage and observe a single-node instance without relying on external orchestration systems.

Same Engine, Different Context

Community Edition shows that Yellowbrick’s engine can operate outside of a distributed cluster when its assumptions are adjusted to fit a local environment.

By extending existing abstractions for local storage, simplifying the filesystem layer, adapting execution and memory management, and decoupling management tooling from Kubernetes, we preserved the core system while making it usable in a single-node Docker setup.

The goal of CE remains straightforward: make the Yellowbrick engine accessible for local use, experimentation, and development.

About the Author

TL;DR

About the Author

TL;DR

What Is Yellowbrick?

What “Enterprise Edition” Means

Why We Built the Community Edition

Engineering Yellowbrick for a Single-Node, Docker Environment

Challenge 1: Loading Local Files via SQL

The Approach

Security Considerations

Challenge 2: Replacing BBFS with Local FS

The Issues

The Solution

Challenge 3: Moving from Pods to Processes for Load Jobs

The Change

Challenge 4: Memory Management in a Local Environment

The Adjustment

Challenge 5: Decoupling Yellowbrick Manager from Kubernetes

The Redesign

Same Engine, Different Context

Similar Posts