Apache 2.0 licensed. No vendor lock-in. Self-hosted.
Stateless Kafka on S3. Scale brokers, not partitions.
Stateless brokers backed by S3. No rebalancing, no disk alerts, no partition shuffles. Processors read directly from storage — streaming and analytics never compete.
What teams are saying
"After WarpStream got acquired, KafScale became our go-to. Better S3 integration, lower latency than we expected, fully scalable, and minimal ops burden."
**— P…
Apache 2.0 licensed. No vendor lock-in. Self-hosted.
Stateless Kafka on S3. Scale brokers, not partitions.
Stateless brokers backed by S3. No rebalancing, no disk alerts, no partition shuffles. Processors read directly from storage — streaming and analytics never compete.
What teams are saying
"After WarpStream got acquired, KafScale became our go-to. Better S3 integration, lower latency than we expected, fully scalable, and minimal ops burden."
— Platform team, Series B fintech
"We moved 50 topics off Kafka in a weekend. No more disk alerts, no more partition rebalancing. Our on-call rotation got a lot quieter."
— SRE lead, e-commerce platform
"The Apache 2.0 license was the deciding factor. We can’t build on BSL projects, and we won’t depend on a vendor’s control plane."
— CTO, healthcare data startup
Why teams adopt KafScale
Stateless brokers
Spin brokers up and down without disk shuffles. S3 is the source of truth. No partition rebalancing, ever.
S3-native durability
11 nines of durability. Immutable segments, lifecycle-based retention, predictable costs.
Storage-native processing
Processors read segments directly from S3, bypassing brokers entirely. Streaming and analytics never compete.
Kubernetes operator
CRDs for clusters, topics, and snapshots. HPA-ready scaling. GitOps-friendly.
Open segment format
The .kfs format is documented. Build custom processors without waiting for vendors to ship features.
Apache 2.0 license
No BSL restrictions. No usage fees. No control plane dependency. Fork it, sell it, run it however you want.
The Rationale: Kafka brokers are a legacy artifact
Kafka brokers were designed for a disk-centric world where durability lived on local machines. Replication and rebalancing were necessary because broker state was the source of truth.
Object storage changes this model. Once log segments are durable, immutable, and external, long-lived broker state stops adding resilience and starts adding operational cost.
Stateless brokers backed by object storage simplify failure, scaling, and recovery. Brokers become ephemeral compute. Data remains durable.
KafScale is built on this assumption. The Kafka protocol still matters. Broker-centric storage does not.
What You Should Consider
KafScale is not a drop-in replacement for every Kafka workload. Here’s when it fits and when it doesn’t.
KafScale is for you if
- Latency of 200-500ms is acceptable
- You run ETL, logs, or async events
- You want processors that bypass brokers (Iceberg, analytics, AI agents)
- You want minimal ops and no disk management
- Apache 2.0 licensing matters to you
- You prefer self-hosted over managed services
KafScale is not for you if
- You need sub-10ms latency
- You require Kafka transactions (exactly-once across topics)
- You rely on compacted topics
- You want a fully managed service
How KafScale works
Clients speak the Kafka protocol to stateless brokers. Brokers flush segments to S3 and serve reads with caching. Processors read completed segments directly from S3 without adding load to brokers.
S3 is the source of truth. Brokers are ephemeral. Processors read directly from S3.
Bypass the broker: storage-native processing
Traditional Kafka forces all reads through brokers. Streaming consumers and batch analytics compete for the same resources. Backfills spike broker CPU. AI training jobs block production consumers.
KafScale stores data in S3 using a documented segment format. Processors read directly from S3 without touching brokers. The streaming path and the analytical path share data but never interfere.
Two read paths, one data source. Streaming and analytics scale independently.
Processors and addons
KafScale keeps processing separate from the broker layer. Processors read completed segments directly from S3, enabling independent scaling and custom implementations. See why data processing does not belong in the message broker.
Iceberg Processor
Reads .kfs segments from S3. Writes Parquet to Iceberg tables. Works with Unity Catalog, Polaris, AWS Glue. Zero broker load.
Build your own
The .kfs segment format is documented and open. Build processors for your use case without waiting for vendors to ship features or negotiating enterprise contracts.
Storage format spec Developer guide
Why AI agents need this architecture
AI agents making decisions need context. That context comes from historical events: what happened, in what order, and why the current state exists. Traditional stream processing optimizes for milliseconds. Agents need something different: completeness, replay capability, and the ability to reconcile current state with historical actions.
Storage-native streaming makes this practical. The immutable log in S3 becomes the source of truth that agents query, replay, and reason over. The Iceberg Processor converts that log to tables that analytical tools understand. Agents get complete historical context without competing with streaming workloads for broker resources.
Two-second latency for analytical access is acceptable when the alternative is incomplete context or degraded streaming performance. AI agents do not need sub-millisecond reads. They need the full picture.
Production-grade operations
Prometheus metrics
S3 health state, produce/fetch throughput, consumer lag, etcd snapshot age. Grafana dashboards included.
Horizontal scaling
Add brokers instantly. No partition rebalancing. HPA scales on CPU or custom metrics.
Automated backups
Operator snapshots etcd to S3 on a schedule. One-command restore.
Health gating
Brokers track S3 availability. Degraded and unavailable states prevent data loss.
Documentation
Protocol compatibility
21 Kafka APIs supported. Produce, Fetch, Metadata, consumer groups, and more.
Storage format
Segment layout, index structure, S3 key paths, and cache architecture.
Security
TLS configuration, S3 IAM policies, and the roadmap for SASL and ACLs.
Get started
KafScale is designed to be operationally simple from day one. If you already run Kubernetes and Kafka clients, you can deploy a cluster and start producing data in minutes.
Install the operator, define a topic, produce with existing Kafka tools.
Backed by
KafScale is developed and maintained with support from Scalytics, Inc. and NovaTechFlow.
Apache 2.0 licensed. No CLA required. Contributions welcome.