DuckPond: Building a Self-Hosted Data Warehouse with DuckDB, FastAPI, and React

Most small teams and solo analysts want the convenience of a warehouse without the cost or operational burden that comes with running one. In many real world cases, the workload is not large enough to justify Snowflake, BigQuery, Redshift, or even a managed Postgres instance. What people sometimes need is much simpler: a place to store structured data, a way to run analytical queries, and a way to expose those queries behind a clean interface to many users within the same network.

DuckDB happens to be perfect for this kind of problem.

DuckDB is designed for local analytical workloads. It reads Parquet files directly, operates on a single file database, uses vectorized execution, and can run extremely complex SQL without needing a server. It behaves like a miniature analytical warehouse that lives inside your application. That alone already makes it a powerful tool for a solo analyst, researcher, data engineer, or developer.

This blog post introduces DuckPond, a proof of concept that shows how far you can push this idea. DuckPond turns a local DuckDB file into a simple multi tenant query service that can be accessed from a browser. This is achieved with a small FastAPI backend that exposes a read only SQL endpoint, and a lightweight React frontend that sends SQL over HTTP and displays results. DuckLake is used to attach Parquet backed datasets, which gives the system a warehouse like feel without actually running a warehouse.

None of this is meant to replace a real analytical platform. The point is to show that DuckDB can act as a free, self hosted query layer for smaller workloads, and that you can extend its capabilities with a few hundred lines of code.

DuckDB is the analytical engine under the hood. It runs inside your process, uses a columnar execution model, supports complex SQL, and handles Parquet extremely well. It does not require a server, a background process, or a cluster scheduler. You point it at a file and run queries.

DuckLake provides a way to treat folders of Parquet files as if they were tables. It keeps metadata in a small DuckDB file and maps the storage layout in a predictable way. This gives you the feel of an external table system, which is useful when building a query layer of your own.

FastAPI supplies the HTTP layer. It provides a clean interface for a request such as:

POST /api/query

The backend opens a fresh DuckDB connection in read only mode, runs the SQL, fetches the results, logs the query, and returns everything as JSON. Each request is handled independently, which makes it safe to support multiple users in a simple proof of concept environment.

The frontend is intentionally simple. It contains a text area for SQL, a button to execute it, and a table that renders the results. Vite provides very fast local development and lets the UI run across devices in the same network.

DuckPond brings these components together in one small project directory. The backend folder contains the FastAPI application, the DuckDB file, the DuckLake metadata file, a directory of Parquet data, and a SQLite file that stores a history of executed queries. The frontend folder contains the SQL editor UI and Vite configuration.

This is not a production system. The goal is to demonstrate that DuckDB can be used to power a small multi tenant query service that you can run on your laptop or a single inexpensive machine. It shows that a database file, a thin API layer, and a simple UI are enough to create something useful for internal analytics or local experimentation.

Imagine it like this: react sql editor → FastAPI server → DuckDB + DuckLake Storage

Users send SQL. The API validates the request. DuckDB executes the query in a new read only connection. The backend returns a JSON result set. Everything is file backed and easy to reason about.

DuckDB uses a single file model, which makes people wonder about concurrency. In a proof of concept like DuckPond, concurrency is handled by opening a new read only connection. This approach avoids write locks and keeps the system safe under light parallel workloads.

If multiple users run queries at the same time, each request uses its own isolated connection. DuckDB is very efficient when you treat it this way. It loads data lazily, executes queries quickly, and closes connections cleanly.

There are no background processes, no queue managers, and no cluster resources. You get simple, predictable behavior.

DuckDB already gives an analyst an incredible amount of power with no infrastructure. DuckPond extends this idea by proving that you can turn DuckDB into a small, self hosted query layer with almost no cost. You can:

load Parquet data with ease

attach external datasets with DuckLake

run analytical SQL locally

expose a small query endpoint to others

log all queries for observability

build lightweight tools on top of the API

This proof of concept shows that you do not need a full warehouse to experiment with ideas, explore datasets, or give someone a way to run queries privately.

You might want to try out DuckPond if you:

want a local analytics sandbox

want to expose DuckDB to another machine in your network

want to test ideas without committing to a cloud warehouse

want to build a small internal query tool for a small team

want a backend that costs nothing to run

It will not replace Snowflake or BigQuery. That is not the point. The point is to realize how far you can push simple tools. With a few supporting libraries, you can create a useful multi tenant environment entirely on your own hardware.

Github repo: github.com/jordansgoodman/duckpond

Screenshot of me connecting to the local SQL editor website from my mac that is hosted from my linux machine on the same network:

Thanks for reading!