S3-backed Kafka, ephemeral brokers, no partition shuffles, K8s native

Apache 2.0 licensed. No vendor lock-in. Self-hosted.

Stateless Kafka on S3. Scale brokers, not partitions.

Stateless brokers backed by S3. No rebalancing, no disk alerts, no partition shuffles. Processors read directly from storage — streaming and analytics never compete.

GitHub stars Go version Current release

What teams are saying

"After WarpStream got acquired, KafScale became our go-to. Better S3 integration, lower latency than we expected, fully scalable, and minimal ops burden."

**— P…

Apache 2.0 licensed. No vendor lock-in. Self-hosted.

Stateless Kafka on S3. Scale brokers, not partitions.

Stateless brokers backed by S3. No rebalancing, no disk alerts, no partition shuffles. Processors read directly from storage — streaming and analytics never compete.

GitHub stars Go version Current release

What teams are saying

"After WarpStream got acquired, KafScale became our go-to. Better S3 integration, lower latency than we expected, fully scalable, and minimal ops burden."

— Platform team, Series B fintech

"We moved 50 topics off Kafka in a weekend. No more disk alerts, no more partition rebalancing. Our on-call rotation got a lot quieter."

— SRE lead, e-commerce platform

"The Apache 2.0 license was the deciding factor. We can’t build on BSL projects, and we won’t depend on a vendor’s control plane."

— CTO, healthcare data startup

Why teams adopt KafScale

Stateless brokers

Spin brokers up and down without disk shuffles. S3 is the source of truth. No partition rebalancing, ever.

S3-native durability

11 nines of durability. Immutable segments, lifecycle-based retention, predictable costs.

Storage-native processing

Processors read segments directly from S3, bypassing brokers entirely. Streaming and analytics never compete.

Kubernetes operator

CRDs for clusters, topics, and snapshots. HPA-ready scaling. GitOps-friendly.

Open segment format

The .kfs format is documented. Build custom processors without waiting for vendors to ship features.

Apache 2.0 license

No BSL restrictions. No usage fees. No control plane dependency. Fork it, sell it, run it however you want.

The Rationale: Kafka brokers are a legacy artifact

Kafka brokers were designed for a disk-centric world where durability lived on local machines. Replication and rebalancing were necessary because broker state was the source of truth.

Object storage changes this model. Once log segments are durable, immutable, and external, long-lived broker state stops adding resilience and starts adding operational cost.

Stateless brokers backed by object storage simplify failure, scaling, and recovery. Brokers become ephemeral compute. Data remains durable.

KafScale is built on this assumption. The Kafka protocol still matters. Broker-centric storage does not.

What You Should Consider

KafScale is not a drop-in replacement for every Kafka workload. Here’s when it fits and when it doesn’t.

KafScale is for you if

Latency of 200-500ms is acceptable
You run ETL, logs, or async events
You want processors that bypass brokers (Iceberg, analytics, AI agents)
You want minimal ops and no disk management
Apache 2.0 licensing matters to you
You prefer self-hosted over managed services

KafScale is not for you if

You need sub-10ms latency
You require Kafka transactions (exactly-once across topics)
You rely on compacted topics
You want a fully managed service

How KafScale works

Clients speak the Kafka protocol to stateless brokers. Brokers flush segments to S3 and serve reads with caching. Processors read completed segments directly from S3 without adding load to brokers.

S3 is the source of truth. Brokers are ephemeral. Processors read directly from S3.

Bypass the broker: storage-native processing

Traditional Kafka forces all reads through brokers. Streaming consumers and batch analytics compete for the same resources. Backfills spike broker CPU. AI training jobs block production consumers.

KafScale stores data in S3 using a documented segment format. Processors read directly from S3 without touching brokers. The streaming path and the analytical path share data but never interfere.

Two read paths, one data source. Streaming and analytics scale independently.

Processors and addons

KafScale keeps processing separate from the broker layer. Processors read completed segments directly from S3, enabling independent scaling and custom implementations. See why data processing does not belong in the message broker.

Iceberg Processor

Reads .kfs segments from S3. Writes Parquet to Iceberg tables. Works with Unity Catalog, Polaris, AWS Glue. Zero broker load.

Deployment guide

Build your own

The .kfs segment format is documented and open. Build processors for your use case without waiting for vendors to ship features or negotiating enterprise contracts.

Storage format spec Developer guide

Why AI agents need this architecture

AI agents making decisions need context. That context comes from historical events: what happened, in what order, and why the current state exists. Traditional stream processing optimizes for milliseconds. Agents need something different: completeness, replay capability, and the ability to reconcile current state with historical actions.

Storage-native streaming makes this practical. The immutable log in S3 becomes the source of truth that agents query, replay, and reason over. The Iceberg Processor converts that log to tables that analytical tools understand. Agents get complete historical context without competing with streaming workloads for broker resources.

Two-second latency for analytical access is acceptable when the alternative is incomplete context or degraded streaming performance. AI agents do not need sub-millisecond reads. They need the full picture.

Production-grade operations

Prometheus metrics

S3 health state, produce/fetch throughput, consumer lag, etcd snapshot age. Grafana dashboards included.

Horizontal scaling

Add brokers instantly. No partition rebalancing. HPA scales on CPU or custom metrics.

Automated backups

Operator snapshots etcd to S3 on a schedule. One-command restore.

Health gating

Brokers track S3 availability. Degraded and unavailable states prevent data loss.

Documentation

Protocol compatibility

21 Kafka APIs supported. Produce, Fetch, Metadata, consumer groups, and more.

View API docs

Storage format

Segment layout, index structure, S3 key paths, and cache architecture.

Explore storage

Security

TLS configuration, S3 IAM policies, and the roadmap for SASL and ACLs.

Security guide

Get started

KafScale is designed to be operationally simple from day one. If you already run Kubernetes and Kafka clients, you can deploy a cluster and start producing data in minutes.

Install the operator, define a topic, produce with existing Kafka tools.

Backed by

KafScale is developed and maintained with support from Scalytics, Inc. and NovaTechFlow.

Apache 2.0 licensed. No CLA required. Contributions welcome.

Stateless Kafka on S3. Scale brokers, not partitions.

What teams are saying

Stateless Kafka on S3. Scale brokers, not partitions.

What teams are saying

Why teams adopt KafScale

Stateless brokers

S3-native durability

Storage-native processing

Kubernetes operator

Open segment format

Apache 2.0 license

The Rationale: Kafka brokers are a legacy artifact

What You Should Consider

KafScale is for you if

KafScale is not for you if

How KafScale works

Bypass the broker: storage-native processing

Processors and addons

Iceberg Processor

Build your own

Why AI agents need this architecture

Production-grade operations

Prometheus metrics

Horizontal scaling

Automated backups

Health gating

Documentation

Protocol compatibility

Storage format

Security

Get started

Backed by

Similar Posts