As a technologist and part-time couch potato, I appreciate Netflix on many levels. During the evening hours, it’s the creativity and breathtaking beauty of its productions. During working hours, it’s their technical contributions, like per-title encoding and VMAF, and their development and promotion of AV1.
Unlike other publishers (cough, cough, Amazon), Netflix has always been amazingly gracious in sharing the technologies and stories behind their advancements. Their latest contributions are nothing short of an advanced class in live streaming and monetization at scale.
Specifically, over the last nine months, Netflix has published five articles detailing its transition from VOD to live; comprising a masterclass in resilient systems design that will benefit all readers. While fe…
As a technologist and part-time couch potato, I appreciate Netflix on many levels. During the evening hours, it’s the creativity and breathtaking beauty of its productions. During working hours, it’s their technical contributions, like per-title encoding and VMAF, and their development and promotion of AV1.
Unlike other publishers (cough, cough, Amazon), Netflix has always been amazingly gracious in sharing the technologies and stories behind their advancements. Their latest contributions are nothing short of an advanced class in live streaming and monetization at scale.
Specifically, over the last nine months, Netflix has published five articles detailing its transition from VOD to live; comprising a masterclass in resilient systems design that will benefit all readers. While few companies match Netflix’s scale, the lessons in request handling, telemetry, and bottleneck mitigation are universal.
In Netflix’s own technical posts, the challenge is described formally, but I think of it as a synchronized tsunami. While VOD traffic is a gentle rain spread over time, live events create a thundering herd, with millions of viewers hitting play within a very narrow window. This forces the infrastructure to move from pre-positioned content to real-time orchestration. Here is your syllabus for the most important live-engineering series of the last year or so.
Lots of technical details to absorb and convey; I’m sure there are many errors below. If you spot any, please LMK at jan.ozer@streaminglearningcenter.com and I’ll correct the article that’s on the site.
Contents
- 1. The Brain: Orchestrating the Surge
- 2. The Pipes: High-Reliability Cloud Ingest
- 3. The Context: Real-Time Recommendations
- 4. The Secret Sauce: The Live Origin
- 5. The Payoff: Ad Event Processing at Scale
1. The Brain: Orchestrating the Surge
Before the video starts, the control plane must survive the initial login surge. Netflix modified its playback services to handle millions of simultaneous requests for stream metadata.
Article: Behind the Streams: Live at Netflix (Part 1)
Netflix needed to deliver its first live-streamed event despite lacking sufficient Live infrastructure, after 15 years focused exclusively on on-demand content. The team faced a hard nine-month deadline to launch “Chris Rock: Selective Outrage” as a fully live broadcast, while also planning for future scalability. A critical operational constraint was ensuring compatibility with the wide range of consumer devices already in use, many of which could not support traditional UDP-based protocols. Global signal acquisition also proved difficult, as live ingest facilities were not universally available.
To meet these challenges, Netflix built a new live streaming architecture. Engineering teams established Dedicated Broadcast Facilities to ingest live content from around the world. They used AWS Elemental MediaConnect for cloud-based signal contribution, while developing a custom live transcoder and packager for core processing within the Netflix delivery architecture. Live content was distributed globally via Netflix’s Open Connect CDN, consisting of 18,000+ servers across 6,000+ locations.
Streams were encoded in AVC and HEVC and segmented into 2 second chunksfor HTTP-based delivery. Real-time metrics were collected using internal tools such as Atlas, Mantis, and Lumen, and open-source technologies including Kafka and Druid. For control and reliability, Netflix centralized monitoring in dedicated control centers, ran synthetic load tests simulating up to 100,000 starts-per-second, and applied failure injection and contingency planning during scheduled operational exercises.
The result was a repeatable, automated workflow that allowed Netflix to handle traffic spikes, like a measured 10x increase in load due to user retries, and reduce stream latency by approximately 10 seconds through targeted A/B tests. By investing in low-latency, device-compatible, cloud-scaled architecture, Netflix shifted live event delivery from a one-off engineering challenge to an operationally feasible, globally distributed service.
2. The Pipes: High-Reliability Cloud Ingest
Figure 2. Netflix’s live pipeline
This post covers the ingestion and transcoding “pipes” that bridge the gap between a live source and the viewer.
Article: Building a Reliable Cloud Live Streaming Pipeline (Part 2)
Without existing broadcast infrastructure, Netflix had to acquire and process live content entirely in the cloud. For their first live comedy special, the team had to establish on-site signal transport using two dedicated internet access (DIA) circuits from the venue, while ensuring the pipeline could eventually scale for high-frame-rate sports events like the NFL Christmas Gameday. The architecture also needed to support multiple languages, caption formats, and legacy playback devices without adding redundant field validation steps.
To address these constraints, Netflix engineered a hub-and-spoke contribution model to aggregate live feeds at a central broadcast facility before cloud ingestion. Within the cloud, Netflix configured its live ingest infrastructure in two AWS regions with dual managed network paths and applied SMPTE 2022–7 seamless protection switching to reduce packet loss risk. AWS Elemental MediaConnect provided managed video transport and redundancy.
The encoding pipeline ran in the cloud, supporting AVC and HEVC formats, as well as HE-AAC and Dolby Digital Plus 5.1 audio with a 2-second segment duration. The Netflix Video Algorithms team optimized bitrate ladders based on content requirements, while captions were produced in both WebVTT and TTML/IMSC formats. Packaging and DRM functionality reused Netflix SVOD components to ensure device compatibility.
Initially, segments were published to AWS static buckets configured across two regions, but high request rates, exceeding 100 RPS per event, led to origin throttling and playback latency. In response, Netflix developed an internal, media-aware live origin system built on its key-value datastore platform, allowing better segment candidate selection under live traffic conditions.
To manage the entire live event lifecycle, Netflix first built an orchestration system and later added a Control Room UI. These tools enabled operators to configure pipelines, monitor encoding state, execute manual failovers within seconds, and deprovision resources post-event.
By moving pixel acquisition to the cloud and automating pipeline orchestration, Netflix made it operationally feasible to produce and scale live events without a traditional broadcast backbone.
3. The Context: Real-Time Recommendations
Figure 3. Visualizing constraints for real-time updates.
A live event isn’t just a video; it’s a dynamic platform change. The entire homepage must adapt instantly to what is happening on screen.
Article: Behind the Streams: Real-Time Recommendations for Live Events (Part 3)
When streaming live events to a global audience, Netflix needed to deliver personalized recommendations to over 100 million devices with minimal latency and consistent reliability, despite unpredictable spikes in viewer activity. Existing systems were vulnerable to strain under live traffic surges, particularly due to rigid cache expiration schemes and centralized update triggers.
Live event schedules added further complexity, often shifting in real time and requiring updates accordingly. The team aimed to coordinate high-cardinality updates quickly, without triggering a thundering herd effect or overloading infrastructure during peak demand.
To manage these challenges, Netflix split recommendation delivery into two coordinated phases: prefetching and real-time broadcasting. Devices preloaded recommendation data in advance, such as materialized recommendations, displayed title metadata, and artwork, as members naturally browsed, reducing compute workload during high-traffic moments.
The system then broadcast low-cardinality triggers at key live event moments to prompt devices to use the preloaded content. These triggers were delivered using a two-tier publish/subscribe messaging architecture comprising Netflix’s Pushy WebSocket proxy, Apache Kafka, Netflix’s key-value store, and intermediary components such as the Message Producer and Message Router. Devices call GraphQL APIs (implemented with Netflix’s DGS framework) to retrieve data and trigger cached UI updates based on broadcasts.
The Message Producer microservice monitored live event states and scheduled these broadcasts, while the Message Router used an “at least once” strategy to maintain reliability on unstable networks. Jitter was added to both client and server cache expiration to stagger traffic and reduce refresh spikes. Netflix also conducted synthetic load simulations to test system behavior and later used event signals to shard traffic and prioritize resources for live scenarios. Dynamic rulesets deprioritized non-critical traffic during peak moments, but excessive non-essential logging still emerged as an unexpected outcome, which impacted throughput and visibility into other requests.
During peak live event load, this approach enabled Netflix to successfully deliver broadcast-triggered recommendation updates to over 100 million devices in under a minute. By coordinating preloaded data with real-time event triggers, Netflix met stringent latency and scale demands while mitigating—but not eliminating—sudden traffic surges that previously strained the system.
4. The Secret Sauce: The Live Origin
Figure 4. Live streaming and distribution architecture.
The “Live Origin” is a custom-built service sitting between the transcoder and the internet. It acts as the final buffer for the live stream.
Article: Netflix Live Origin
During a globally streamed live event, like the 2024 Tyson vs. Paul “fight,” Netflix needed to serve tens of millions of households at peak while meeting strict SLAs on latency and reliability. The new live origin stack had to sustain very high write and read throughput, keep origin responses fast and predictable under load, and maintain near-real-time consistency between publishers in AWS and the Open Connect CDN. Traditional storage patterns made it difficult to isolate publishing from read surges and to scale reliably to the required performance envelope shared between AWS and Open Connect.
To address these bottlenecks, Netflix built Live Origin as a multi-tenant microservice running on EC2 in AWS, fronting a write-optimized KeyValue abstraction. The underlying architecture uses Apache Cassandra as the durable backing store and EVCache (Memcached-based) for high-throughput, low-latency reads, so that read-heavy live spikes do not penalize the write path. By pushing most read traffic into EVCache and carefully constraining the responsibilities of origin, the system can sustain very high aggregate bandwidth from origin while keeping write and control operations within tight time bounds.
Live Origin separates media objects from control and metadata, keeping the latter small and highly cacheable so it can be served quickly even during storm conditions. Metadata encodes control-plane details such as segment availability and defect state, allowing the origin to suppress known-bad segments and coordinate behavior with the CDN without impacting the bulk media flow. This separation lets the system respond with low-latency control data while preserving stability for media delivery, even when parts of the system are degraded.
The origin layer also relies heavily on HTTP caching semantics and in-memory caching for its control plane to ride out “404 storms” and other pathological request patterns. By using cache-control headers, short but effective TTLs, and aggressive caching of templated control responses, Live Origin can keep control-plane cache hit ratios very high when clients or CDNs request non-existent segments. In parallel, physical and logical isolation between publishing and CDN-facing stacks, along with priority-aware throttling, prevents overload in one area from cascading into global outages.
By re-architecting origin services specifically for live conditions, Netflix maintained operational control under extreme concurrency and high bandwidth demand, even when viewer interest spiked suddenly. This design improved the resilience of live publishing, enabling Netflix to uphold stringent latency and availability expectations during large global events.
5. The Payoff: Ad Event Processing at Scale
Figure 5. Basic ad event handling system.
Live events drive massive ad-tier engagement. Tracking those interactions for millions of viewers simultaneously is a massive telemetry challenge.
Article: Building a Robust Ads Event Processing Pipeline
When Netflix introduced a basic ads plan in partnership with Microsoft in late 2022, the company needed to adapt its playback systems to support ad insertion and ensure reliable ad tracking. The immediate challenge was operational: enabling ad playback at scale while relaying accurate telemetry to multiple third-party vendors, all within device memory and latency constraints. As ad inventory and tracking requirements grew, so did the volume and complexity of associated metadata. The expansion of event types and provider integrations led to concerns about token size and device memory usage.
During the pilot phase in November 2022, Netflix implemented a three-part system comprising the Microsoft Ad Server, a Netflix Ads Manager, and an Ads Event Handler. The Ads Manager forwarded requests to Microsoft, parsed VAST-formatted responses, and constructed simplified structures embedded in encrypted opaque tokens. Client devices sent ad playback telemetry, including these tokens, to the Netflix telemetry system, which enqueued the events in Kafka. The Ads Event Handler consumed and decrypted these payloads, routing tracking data to vendors such as DV, IAS, and Nielsen.
As token sizes grew with additional vendor URLs and new ad formats like Display and Pause Ads, Netflix implemented an Ads Metadata Registry to offload metadata and reduce client-side token sizes. Later, to support billing, frequency capping, sessionization, and dynamic formats, the company transitioned toward a centralized telemetry pipeline. This included the Ads Event Publisher, real-time processing via Apache Flink, and downstream systems for metrics, billing, and broader support for compliance signal sharing.
By centralizing common operations like enrichment, hashing, and data handling within a central event ingestion pipeline, Netflix reduced complexity for telemetry consumers, enabling further expansion into programmatic buying. This architectural evolution enabled scalable, format-agnostic processing without increasing device burden, addressing the original challenge of delivering ads efficiently across a growing set of use cases.