Storing Apache Iceberg Metadata in PostgreSQL: A Custom FileIO Implementation

5 min readJust now

–

TL;DR: I built apache-iceberg-fileio, a custom FileIO implementation that stores Iceberg metadata in a database instead of object storage. The current implementation uses PostgreSQL and provides consistent low-latency metadata access, especially useful for streaming ingestion workloads.

Apache Iceberg has become the de facto standard for large-scale analytics tables, offering features like schema evolution, time travel, and partition evolution. But one aspect that often gets overlooked is where Iceberg stores its metadata files.

By default, most Iceberg deployments store metadata files (metadata.json, manifest lists, and manifest files) in object storage like S3, GCS, or Azure Blob Storage. While this works, it comes with a hidden cost that many teams discover only after running Iceberg in production.

The Problem with Object Storage for Metadata

Iceberg generates numerous small metadata files. Every commit creates new metadata files, and query planning requires reading multiple manifest files. When these files live in S3 or similar object storage, you’re at the mercy of highly variable latencies.

I’ve seen S3 GET requests range anywhere from 20ms to 500ms+ for the same file size. When query planning requires reading dozens of small metadata files sequentially, these latency spikes compound into noticeable query delays.

This variability isn’t a bug — it’s the nature of distributed object storage systems. But for metadata access patterns (frequent reads of small files), it’s far from ideal.

What About Iceberg’s Built-in Caching?

You might wonder: doesn’t Iceberg have caching to mitigate this? Yes, but with important limitations.

CachingCatalog caches Table objects in memory — not the underlying metadata files. When you call catalog.loadTable(), it can return a cached Table reference. But here’s the catch: the cached Table still needs to refresh its metadata. Each refresh() call re-reads the metadata.json file from storage via FileIO. The Table object is cached; the metadata files are not.

The most resource-intensive part of loading an Iceberg table is fetching metadata from object storage and parsing it. The caching layer tries to reduce this overhead, but it can’t help when the metadata files themselves keep changing.

Loading more...