Bonus Problem 6: Netflix Streaming

The World's Most Sophisticated Video Delivery Platform

🎬 Delivering 94 Billion Hours of Entertainment — Without a Single Buffer

Imagine this challenge: It's 8 PM on a Friday night. Across the globe, 300 million people are about to press "play" at roughly the same time.

Each viewer expects instant playback — no buffering wheel, no quality drops, no frozen frames. They're watching on everything from 4K smart TVs in Tokyo to mobile phones in rural Brazil. Their internet connections range from gigabit fiber to spotty mobile data.

Now do that for 94 billion hours of viewing per year. At peak hours, your traffic accounts for 15% of all downstream internet bandwidth worldwide.

This is Netflix — the platform that reinvented how we consume entertainment and built the most sophisticated video delivery system ever created.

┌──────────────────────────────────────────────────────────────────────────────┐
│                                                                              │
│                       THE NETFLIX SCALE (2025)                               │
│                                                                              │
│   SUBSCRIBERS                                                                │
│   ───────────                                                                │
│   Global paid subscribers:     301.6 Million                                 │
│   Countries served:            190+                                          │
│   Ad-supported tier users:     94 Million monthly actives                    │
│   Viewing hours (H2 2024):     94 Billion                                    │
│                                                                              │
│   CONTENT                                                                    │
│   ───────                                                                    │
│   Total titles available:      ~18,000+                                      │
│   Content spend (2024):        $17 Billion                                   │
│   Original productions (2023): 891 titles                                    │
│   Languages supported:         60+                                           │
│                                                                              │
│   INFRASTRUCTURE                                                             │
│   ──────────────                                                             │
│   Open Connect servers:        17,000+ worldwide                             │
│   ISP partnerships:            1,000+ locations in 158 countries             │
│   % of global internet traffic: ~15% at peak                                 │
│   Backend services (AWS):      1,000+ microservices                          │
│                                                                              │
│   BUSINESS                                                                   │
│   ────────                                                                   │
│   Annual revenue (2024):       $39 Billion                                   │
│   Market cap (May 2025):       $491.7 Billion                                │
│   Employees:                   ~14,000                                       │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘

This is the system we'll design today — and understand how Netflix achieves buffer-free streaming for hundreds of millions of concurrent viewers worldwide.

The Interview Begins

You walk into the interview room. The interviewer smiles and gestures to the whiteboard.

Interviewer: "Thanks for coming in. Today we're going to design a video streaming platform similar to Netflix. I want to see how you think about large-scale content delivery, client-server architecture, and handling global distribution. Feel free to ask questions — this should be collaborative."

They write on the whiteboard:

╔══════════════════════════════════════════════════════════════════════════════╗
║                                                                              ║
║                    Design a Global Video Streaming Platform                  ║
║                                                                              ║
║   Build a streaming service that can:                                        ║
║   - Stream video to 300M+ subscribers worldwide                              ║
║   - Support devices from phones to 4K TVs                                    ║
║   - Deliver content with <2 second start time                                ║
║   - Handle peak traffic of millions of concurrent streams                    ║
║   - Provide personalized recommendations                                     ║
║                                                                              ║
║   Consider:                                                                  ║
║   - How do you encode and store video efficiently?                           ║
║   - How do you deliver content globally with low latency?                    ║
║   - How do you adapt quality to network conditions?                          ║
║   - How do you handle millions of concurrent users?                          ║
║                                                                              ║
╚══════════════════════════════════════════════════════════════════════════════╝

Interviewer: "Take a few minutes to think about this, then walk me through your approach. We have about 45 minutes."

Phase 1: Requirements Clarification (5 minutes)

Before diving in, you take a breath and start asking questions.

Your Questions

You: "Before I start designing, I'd like to clarify some requirements. First, what's our target scale — how many concurrent viewers should we support at peak?"

Interviewer: "At peak, we need to support around 10 million concurrent streams. That's a typical Friday evening."

You: "What quality levels do we need to support? Just HD or up to 4K/HDR?"

Interviewer: "We need the full range — from 240p for poor mobile connections up to 4K HDR for premium users on home theater systems."

You: "How important is start time versus sustained quality?"

Interviewer: "Both are critical. Users expect playback to start within 2 seconds, but we also can't have constant rebuffering. A smooth experience is paramount."

You: "Should we design for VOD only, or also live streaming like sports events?"

Interviewer: "Focus on VOD for now, but keep in mind we're expanding into live events."

You: "What about content protection? Do we need DRM?"

Interviewer: "Yes, premium content requires robust DRM across all devices."

You: "Perfect. Let me summarize the requirements."

Functional Requirements

1. VIDEO PLAYBACK
   - Stream on-demand video content
   - Support resolutions from 240p to 4K HDR
   - Adaptive bitrate based on network conditions
   - Resume playback from where user left off
   - Support subtitles and multiple audio tracks

2. CONTENT CATALOG
   - Browse and search content library
   - Organize content by genre, trending, personalized rows
   - Display metadata, trailers, ratings
   - Support multiple profiles per account

3. RECOMMENDATIONS
   - Personalized content suggestions per user
   - "Continue Watching" functionality
   - "Because You Watched X" recommendations
   - Trending and top 10 lists

4. USER MANAGEMENT
   - Account creation and authentication
   - Multiple profiles per household
   - Parental controls and viewing restrictions
   - Download for offline viewing

Non-Functional Requirements

1. SCALE
   - 300M+ subscribers globally
   - 10M+ concurrent streams at peak
   - 18,000+ titles in catalog
   - 190+ countries

2. LATENCY
   - Playback start: <2 seconds
   - Seek response: <500ms
   - API response: <100ms p99

3. AVAILABILITY
   - 99.99% uptime (52 minutes downtime/year)
   - Graceful degradation under failures
   - Multi-region redundancy

4. QUALITY
   - Zero buffering on stable connections
   - Seamless quality adaptation
   - Consistent experience across devices

Phase 2: Back of the Envelope Estimation (5 minutes)

You: "Let me work through the numbers to understand the infrastructure we need."

Traffic Estimation

CONCURRENT STREAMS AT PEAK

Total subscribers:              300 million
Peak concurrent rate:           ~3% (Friday 8 PM)
Peak concurrent streams:        ~10 million

Average stream bitrate:         5 Mbps (mix of qualities)
Peak bandwidth:                 10M × 5 Mbps = 50 Tbps

Daily viewing hours:
  Average per subscriber:       2 hours/day
  Total daily:                  600 million hours
  Streams per second (avg):     ~7 million

Storage Estimation

VIDEO STORAGE

Titles in catalog:              18,000
Average title length:           90 minutes (mix of movies/series)

Per-title encoding:
  Resolutions:                  8 (240p to 4K)
  Bitrates per resolution:      3-4 variants
  Total streams per title:      ~30 encoded versions
  
Average encoded title size:
  Low quality (240p-480p):      ~500 MB
  HD (720p-1080p):              ~4 GB
  4K HDR:                       ~15 GB
  Total per title:              ~20 GB average

Total storage:
  18,000 titles × 20 GB = 360 TB (source content)
  With CDN replication (100x): ~36 PB distributed globally

Infrastructure Summary

┌──────────────────────────────────────────────────────────────────────────────┐
│                       INFRASTRUCTURE SUMMARY                                 │
│                                                                              │
│   BANDWIDTH                                                                  │
│   ├── Peak egress:            50+ Tbps                                       │
│   ├── Daily data transfer:    ~1 EB (exabyte)                                │
│   └── % of internet traffic:  ~15% at peak                                   │
│                                                                              │
│   STORAGE                                                                    │
│   ├── Source content:         ~500 TB                                        │
│   ├── Encoded versions:       ~5 PB (central)                                │
│   └── CDN distributed:        ~50 PB (globally)                              │
│                                                                              │
│   COMPUTE (Control Plane)                                                    │
│   ├── Microservices:          1,000+                                         │
│   ├── API requests/sec:       ~1 million                                     │
│   └── Recommendation calls:   ~100K/sec                                      │
│                                                                              │
│   CDN (Data Plane)                                                           │
│   ├── Edge servers:           17,000+                                        │
│   ├── ISP locations:          1,000+                                         │
│   └── Countries:              158                                            │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘

Phase 3: High-Level Design (10 minutes)

You: "Netflix has a unique architecture with two distinct planes — let me explain."

System Architecture

┌──────────────────────────────────────────────────────────────────────────────┐
│                       NETFLIX ARCHITECTURE OVERVIEW                          │
│                                                                              │
│   ┌────────────────────────────────────────────────────────────────────┐     │
│   │                        CLIENT DEVICES                              │     │
│   │    ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐      │     │
│   │    │Smart TV│  │ Phone  │  │ Tablet │  │  Web   │  │Console │      │     │
│   │    └───┬────┘  └───┬────┘  └───┬────┘  └───┬────┘  └───┬────┘      │     │
│   └────────┼───────────┼───────────┼───────────┼───────────┼───────────┘     │
│            │           │           │           │           │                 │
│            └───────────┴───────────┴─────┬─────┴───────────┘                 │
│                                          │                                   │
│   ════════════════════════════════════════════════════════════════════════   │
│                                          │                                   │
│   ┌──────────────────────────────────────┴───────────────────────────────┐   │
│   │                    CONTROL PLANE (AWS)                               │   │
│   │                                                                      │   │
│   │   ┌─────────┐     ┌─────────────────────────────────────────────┐    │   │
│   │   │  Zuul   │────▶│              MICROSERVICES                  │    │   │
│   │   │ Gateway │     │  ┌─────────┐ ┌─────────┐ ┌─────────────────┐│    │   │
│   │   └─────────┘     │  │  Auth   │ │ Catalog │ │ Recommendations ││    │   │
│   │                   │  └─────────┘ └─────────┘ └─────────────────┘│    │   │
│   │   ┌─────────┐     │  ┌─────────┐ ┌─────────┐ ┌─────────────────┐│    │   │
│   │   │ Eureka  │     │  │Playback │ │ Billing │ │    Search       ││    │   │
│   │   │Discovery│     │  │ Service │ │ Service │ │    Service      ││    │   │
│   │   └─────────┘     │  └─────────┘ └─────────┘ └─────────────────┘│    │   │
│   │                   └─────────────────────────────────────────────┘    │   │
│   │   ┌─────────────────────────────────────────────────────────────┐    │   │
│   │   │                    DATA STORES                              │    │   │
│   │   │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐     │    │   │
│   │   │  │Cassandra │  │ EVCache  │  │  MySQL   │  │    S3    │     │    │   │
│   │   │  │ (Users)  │  │ (Cache)  │  │(Billing) │  │ (Media)  │     │    │   │
│   │   │  └──────────┘  └──────────┘  └──────────┘  └──────────┘     │    │   │
│   │   └─────────────────────────────────────────────────────────────┘    │   │
│   └──────────────────────────────────────────────────────────────────────┘   │
│                                          │                                   │
│                            Playback URLs │                                   │
│                                          ▼                                   │
│   ┌──────────────────────────────────────────────────────────────────────┐   │
│   │                    DATA PLANE (Open Connect CDN)                     │   │
│   │                                                                      │   │
│   │   ┌─────────────────────────────────────────────────────────────┐    │   │
│   │   │                   OPEN CONNECT APPLIANCES                   │    │   │
│   │   │                                                             │    │   │
│   │   │    ISP Level           IXP Level           Origin           │    │   │
│   │   │   ┌───────┐           ┌───────┐          ┌───────┐          │    │   │
│   │   │   │  OCA  │◀──────────│  OCA  │◀─────────│  AWS  │          │    │   │
│   │   │   │(Edge) │   Miss    │(Metro)│   Miss   │  S3   │          │    │   │
│   │   │   └───────┘           └───────┘          └───────┘          │    │   │
│   │   │      │                    │                                 │    │   │
│   │   │      │ Video Streams      │                                 │    │   │
│   │   │      ▼                    ▼                                 │    │   │
│   │   │   ┌─────────────────────────────────────────────┐           │    │   │
│   │   │   │              END USERS                      │           │    │   │
│   │   │   └─────────────────────────────────────────────┘           │    │   │
│   │   └─────────────────────────────────────────────────────────────┘    │   │
│   └──────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘

The Two-Plane Architecture

You: "Netflix separates its architecture into two distinct systems optimized for different purposes."

Control Plane (AWS)

Purpose: Handle all user interactions before playback — browsing, authentication, recommendations, playback authorization.

Key characteristics:

Runs on AWS across multiple regions
1,000+ microservices
Handles metadata, not video data
Optimized for consistency and availability
~1M API requests/second

Data Plane (Open Connect)

Purpose: Deliver actual video content to users with minimum latency.

Key characteristics:

Netflix's proprietary CDN
17,000+ servers in 1,000+ ISP locations
Servers placed inside ISP networks
Handles 100% of video streaming
Optimized for throughput and proximity

You: "This separation is crucial. The control plane needs strong consistency for things like authentication and billing. The data plane needs raw throughput for video delivery. Different problems, different solutions."

Phase 4: Deep Dives (20 minutes)

Interviewer: "Great overview. Let's dive deeper. How does adaptive bitrate streaming work?"

Deep Dive 1: Adaptive Bitrate Streaming (Week 2 - Timeouts & Latency)

You: "Adaptive bitrate streaming is the core technology that enables buffer-free playback across varying network conditions."

The Problem

NETWORK VARIABILITY CHALLENGE

User watching on mobile:
  - Start: 20 Mbps (WiFi at home)
  - Minute 5: 5 Mbps (left home, on LTE)
  - Minute 15: 500 Kbps (entered subway)
  - Minute 20: 0 Mbps (tunnel)
  - Minute 25: 10 Mbps (emerged from tunnel)

Without ABR:
  - Fixed 8 Mbps stream → Constant buffering on LTE
  - Fixed 500 Kbps stream → Terrible quality on WiFi

With ABR:
  - Dynamically adjust quality based on conditions
  - Seamless transitions between quality levels
  - Buffer management to avoid interruptions

How ABR Works

ADAPTIVE BITRATE STREAMING ARCHITECTURE

┌──────────────────────────────────────────────────────────────────────────────┐
│                                                                              │
│   ENCODING PIPELINE                                                          │
│   ─────────────────                                                          │
│                                                                              │
│   Source Video ──▶ Encoder ──▶ Multiple Quality Levels ──▶ Chunked Storage   │
│                                                                              │
│   Quality Ladder (per-title optimized):                                      │
│   ┌────────────────────────────────────────────────────────────────────┐     │
│   │  Resolution  │  Bitrate   │  Use Case                              │     │
│   ├──────────────┼────────────┼────────────────────────────────────────┤     │
│   │  240p        │  235 Kbps  │  Extremely poor connections            │     │
│   │  360p        │  560 Kbps  │  Mobile data saving                    │     │
│   │  480p        │  1.0 Mbps  │  Standard mobile                       │     │
│   │  720p        │  3.0 Mbps  │  Tablet/laptop                         │     │
│   │  1080p       │  5.0 Mbps  │  HD TV                                 │     │
│   │  1080p HDR   │  7.0 Mbps  │  Premium HD                            │     │
│   │  4K          │  15 Mbps   │  4K TV                                 │     │
│   │  4K HDR      │  25 Mbps   │  Premium 4K experience                 │     │
│   └────────────────────────────────────────────────────────────────────┘     │
│                                                                              │
│   CHUNKING                                                                   │
│   ────────                                                                   │
│   Each quality level is split into 2-4 second chunks                         │
│   Client can switch quality at any chunk boundary                            │
│                                                                              │
│   Video Timeline:                                                            │
│   [Chunk 1][Chunk 2][Chunk 3][Chunk 4][Chunk 5][Chunk 6]...                  │
│      4K      4K      1080p    720p     720p    1080p    ← Quality can vary   │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘

Per-Title Encoding Optimization

You: "Netflix doesn't use a fixed bitrate ladder. They analyze each title and create custom encoding settings."

PER-TITLE ENCODING OPTIMIZATION

Problem: Not all content is equal
  - Animation: Compresses well, needs less bitrate
  - Action movies: Complex scenes need more bitrate
  - Dark scenes: Compression artifacts more visible

Traditional approach:
  One bitrate ladder for all content
  Result: Wasted bandwidth on simple content, poor quality on complex content

Netflix's approach:
  1. Analyze content complexity using VMAF (Video Multi-method Assessment Fusion)
  2. Generate custom bitrate ladder per title
  3. Some titles get good 1080p at 3 Mbps, others need 6 Mbps

Result: 20-30% bandwidth savings with better quality

ABR Client Implementation

# streaming/abr_client.py

"""
Adaptive Bitrate Streaming Client
Demonstrates how Netflix clients select quality levels dynamically.
"""

from dataclasses import dataclass
from typing import List, Optional
from enum import Enum
import time


class QualityLevel(Enum):
    """Available quality levels."""
    Q_240P = (240, 235_000)    # (height, bitrate_bps)
    Q_360P = (360, 560_000)
    Q_480P = (480, 1_000_000)
    Q_720P = (720, 3_000_000)
    Q_1080P = (1080, 5_000_000)
    Q_4K = (2160, 15_000_000)


@dataclass
class ChunkInfo:
    """Information about a video chunk."""
    index: int
    duration_seconds: float
    quality: QualityLevel
    size_bytes: int
    url: str


@dataclass
class BufferState:
    """Current state of playback buffer."""
    buffered_seconds: float
    current_position: float
    is_playing: bool


class ABRController:
    """
    Adaptive Bitrate Controller.
    
    Implements a buffer-based ABR algorithm similar to Netflix's approach.
    Balances between:
    - Maximizing video quality
    - Avoiding rebuffering
    - Minimizing quality switches
    
    Applies concepts from:
    - Week 2: Timeout management and graceful degradation
    - Week 4: Caching and buffer management
    """
    
    # Buffer thresholds (seconds)
    BUFFER_MIN = 5.0      # Minimum buffer before panic
    BUFFER_LOW = 15.0     # Start being conservative
    BUFFER_TARGET = 30.0  # Ideal buffer level
    BUFFER_MAX = 60.0     # Maximum buffer
    
    # Quality switch dampening
    SWITCH_COOLDOWN = 10.0  # Seconds between quality changes
    
    def __init__(self, quality_levels: List[QualityLevel]):
        self.quality_levels = sorted(
            quality_levels, 
            key=lambda q: q.value[1]  # Sort by bitrate
        )
        self.current_quality_idx = 0
        self.last_switch_time = 0.0
        self.bandwidth_estimates: List[float] = []
        
    def estimate_bandwidth(self, chunk_size: int, download_time: float) -> float:
        """
        Estimate available bandwidth from chunk download.
        
        Uses exponential moving average to smooth estimates.
        """
        if download_time <= 0:
            return 0
            
        measured_bps = (chunk_size * 8) / download_time
        
        # Keep last 5 measurements
        self.bandwidth_estimates.append(measured_bps)
        if len(self.bandwidth_estimates) > 5:
            self.bandwidth_estimates.pop(0)
        
        # Use harmonic mean (conservative estimate)
        if not self.bandwidth_estimates:
            return measured_bps
            
        harmonic_mean = len(self.bandwidth_estimates) / sum(
            1/bw for bw in self.bandwidth_estimates
        )
        return harmonic_mean
    
    def select_quality(
        self, 
        buffer_state: BufferState,
        estimated_bandwidth: float,
        current_time: float
    ) -> QualityLevel:
        """
        Select optimal quality level based on buffer and bandwidth.
        
        Algorithm:
        1. If buffer critically low → drop to lowest quality
        2. If buffer low → be conservative, don't increase
        3. If buffer healthy → select highest sustainable quality
        4. Apply switch dampening to avoid oscillation
        """
        buffer = buffer_state.buffered_seconds
        
        # PANIC MODE: Buffer critically low
        if buffer < self.BUFFER_MIN:
            self.current_quality_idx = 0
            self.last_switch_time = current_time
            return self.quality_levels[0]
        
        # Find highest sustainable quality
        target_idx = 0
        for i, quality in enumerate(self.quality_levels):
            bitrate = quality.value[1]
            # Need 20% headroom for safety
            if bitrate * 1.2 <= estimated_bandwidth:
                target_idx = i
        
        # Buffer-based adjustment
        if buffer < self.BUFFER_LOW:
            # Don't increase quality when buffer is low
            target_idx = min(target_idx, self.current_quality_idx)
        elif buffer > self.BUFFER_TARGET:
            # Allow quality increase when buffer is healthy
            pass
        
        # Switch dampening: avoid rapid oscillation
        time_since_switch = current_time - self.last_switch_time
        if time_since_switch < self.SWITCH_COOLDOWN:
            # Only allow quality decrease, not increase
            target_idx = min(target_idx, self.current_quality_idx)
        
        # Apply change
        if target_idx != self.current_quality_idx:
            self.current_quality_idx = target_idx
            self.last_switch_time = current_time
            
        return self.quality_levels[self.current_quality_idx]
    
    def get_initial_quality(self) -> QualityLevel:
        """
        Select initial quality for playback start.
        
        Strategy: Start at medium quality for fast startup,
        then adapt based on actual bandwidth.
        """
        # Start at 480p - reasonable quality, fast start
        for i, q in enumerate(self.quality_levels):
            if q.value[0] >= 480:
                self.current_quality_idx = i
                return q
        return self.quality_levels[0]


class StreamingSession:
    """
    Manages a complete streaming session.
    
    Coordinates:
    - Chunk downloading
    - Buffer management
    - Quality selection
    - Playback control
    """
    
    def __init__(self, manifest_url: str):
        self.manifest_url = manifest_url
        self.abr = ABRController(list(QualityLevel))
        self.buffer = BufferState(
            buffered_seconds=0.0,
            current_position=0.0,
            is_playing=False
        )
        self.chunks_downloaded = 0
        self.total_rebuffers = 0
        self.quality_switches = 0
        
    async def start_playback(self):
        """
        Initialize playback session.
        
        1. Fetch manifest
        2. Select initial quality
        3. Pre-buffer minimum amount
        4. Start playback
        """
        # Start with medium quality for fast startup
        initial_quality = self.abr.get_initial_quality()
        
        # Pre-buffer 5 seconds before starting
        while self.buffer.buffered_seconds < 5.0:
            await self._download_next_chunk(initial_quality)
        
        self.buffer.is_playing = True
        
    async def _download_next_chunk(self, quality: QualityLevel):
        """Download next chunk at specified quality."""
        # Simulated download - in reality would fetch from CDN
        chunk_size = quality.value[1] * 4  # 4-second chunk
        download_time = chunk_size / (5_000_000)  # Simulate 5 Mbps
        
        await self._simulate_download(download_time)
        
        self.buffer.buffered_seconds += 4.0
        self.chunks_downloaded += 1
        
        # Update bandwidth estimate
        self.abr.estimate_bandwidth(chunk_size, download_time)
        
    async def _simulate_download(self, seconds: float):
        """Simulate network download time."""
        import asyncio
        await asyncio.sleep(seconds)
    
    def get_session_stats(self) -> dict:
        """Return session quality metrics."""
        return {
            "chunks_downloaded": self.chunks_downloaded,
            "rebuffer_events": self.total_rebuffers,
            "quality_switches": self.quality_switches,
            "current_quality": self.abr.quality_levels[
                self.abr.current_quality_idx
            ].name
        }

Deep Dive 2: Open Connect CDN (Week 1 - Partitioning & Replication)

Interviewer: "Tell me about Netflix's CDN strategy. Why did they build their own?"

You: "Open Connect is Netflix's secret weapon. It's why they can deliver 15% of internet traffic without breaking the bank."

Why Build Your Own CDN?

THE CDN ECONOMICS PROBLEM

Third-party CDN costs (2010s):
  - $0.02 - $0.05 per GB delivered
  - Netflix daily traffic: ~1 exabyte
  - Daily CDN cost: $20M - $50M (!)
  - Annual: $7B - $18B just for delivery

Netflix's solution:
  - Build own CDN infrastructure
  - One-time investment: ~$1B over decade
  - Ongoing costs: Much lower than third-party
  - Bonus: Better control over quality

Open Connect Architecture

OPEN CONNECT CDN ARCHITECTURE

┌──────────────────────────────────────────────────────────────────────────────┐
│                                                                              │
│   CONTENT DISTRIBUTION TIERS                                                 │
│   ──────────────────────────                                                 │
│                                                                              │
│   TIER 1: ORIGIN (AWS S3)                                                    │
│   ───────────────────────                                                    │
│   • Complete content library                                                 │
│   • All encoded versions                                                     │
│   • Source of truth                                                          │
│                    │                                                         │
│                    │ Nightly fill (popular content)                          │
│                    ▼                                                         │
│   TIER 2: INTERNET EXCHANGE POINTS (IXP)                                     │
│   ──────────────────────────────────────                                     │
│   • Large OCAs at peering points                                             │
│   • Store ~95% of catalog                                                    │
│   • Serve multiple ISPs                                                      │
│                    │                                                         │
│                    │ Cache fill on demand                                    │
│                    ▼                                                         │
│   TIER 3: ISP-EMBEDDED OCAs                                                  │
│   ─────────────────────────                                                  │
│   • Servers inside ISP networks                                              │
│   • Store regionally popular content                                         │
│   • Shortest path to users                                                   │
│   • 17,000+ servers worldwide                                                │
│                    │                                                         │
│                    │ Video streams                                           │
│                    ▼                                                         │
│   END USERS                                                                  │
│   ─────────                                                                  │
│   • Typically served from ISP OCA                                            │
│   • 90%+ cache hit rate                                                      │
│   • Minimal latency                                                          │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘

Open Connect Appliance (OCA) Specs

OCA HARDWARE SPECIFICATIONS

Standard OCA Server:
┌─────────────────────────────────────────────────────────────────────────────┐
│                                                                             │
│   STORAGE                                                                   │
│   • 36 × 8TB HDDs = 288 TB raw                                              │
│   • Or: 18 × 16TB SSDs = 288 TB (flash variant)                             │
│   • Typical usable: ~240 TB                                                 │
│                                                                             │
│   NETWORK                                                                   │
│   • 4 × 25 Gbps NICs = 100 Gbps total                                       │
│   • Can serve ~20,000 concurrent streams                                    │
│                                                                             │
│   COMPUTE                                                                   │
│   • Minimal CPU (video serving is I/O bound)                                │
│   • Custom FreeBSD-based OS                                                 │
│   • Netflix-optimized nginx                                                 │
│                                                                             │
│   EFFICIENCY                                                                │
│   • Power: ~500W under load                                                 │
│   • Cost per GB delivered: <$0.001                                          │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Content Placement Algorithm

# cdn/content_placement.py

"""
Content Placement Algorithm for Open Connect
Determines which content to cache on which OCA servers.
"""

from dataclasses import dataclass
from typing import Dict, List, Set
from datetime import datetime, timedelta
import heapq


@dataclass
class ContentItem:
    """A piece of content in the catalog."""
    content_id: str
    size_bytes: int
    popularity_score: float
    release_date: datetime
    regions: List[str]  # Regions where content is available


@dataclass
class OCAServer:
    """An Open Connect Appliance server."""
    server_id: str
    location: str
    region: str
    capacity_bytes: int
    used_bytes: int
    cached_content: Set[str]
    
    @property
    def available_bytes(self) -> int:
        return self.capacity_bytes - self.used_bytes


class ContentPlacementEngine:
    """
    Determines optimal content placement across OCA fleet.
    
    Goals:
    1. Maximize cache hit rate
    2. Minimize origin fetches
    3. Balance load across servers
    4. Account for regional popularity differences
    
    Applies concepts from:
    - Week 1: Partitioning and replication strategies
    - Week 4: Cache placement and invalidation
    """
    
    def __init__(self):
        self.servers: Dict[str, OCAServer] = {}
        self.content: Dict[str, ContentItem] = {}
        self.regional_popularity: Dict[str, Dict[str, float]] = {}
        
    def add_server(self, server: OCAServer):
        """Register an OCA server."""
        self.servers[server.server_id] = server
        
    def update_popularity(self, region: str, content_id: str, score: float):
        """Update regional popularity score for content."""
        if region not in self.regional_popularity:
            self.regional_popularity[region] = {}
        self.regional_popularity[region][content_id] = score
    
    def compute_placement(self, region: str) -> Dict[str, List[str]]:
        """
        Compute optimal content placement for a region.
        
        Algorithm:
        1. Get all servers in region
        2. Rank content by regional popularity
        3. Place highest-popularity content on all servers
        4. Fill remaining space with long-tail content
        
        Returns: Dict mapping server_id to list of content_ids
        """
        regional_servers = [
            s for s in self.servers.values() 
            if s.region == region
        ]
        
        if not regional_servers:
            return {}
        
        # Get regional popularity scores
        popularity = self.regional_popularity.get(region, {})
        
        # Sort content by regional popularity
        sorted_content = sorted(
            self.content.values(),
            key=lambda c: popularity.get(c.content_id, 0),
            reverse=True
        )
        
        placement: Dict[str, List[str]] = {
            s.server_id: [] for s in regional_servers
        }
        
        # Tier 1: Popular content goes on ALL servers
        popular_threshold = 0.8  # Top 20% by popularity
        popular_content = sorted_content[:int(len(sorted_content) * 0.2)]
        
        for content in popular_content:
            for server in regional_servers:
                if server.available_bytes >= content.size_bytes:
                    placement[server.server_id].append(content.content_id)
                    server.used_bytes += content.size_bytes
                    server.cached_content.add(content.content_id)
        
        # Tier 2: Long-tail content distributed across servers
        # Use consistent hashing to spread load
        remaining_content = sorted_content[int(len(sorted_content) * 0.2):]
        
        for i, content in enumerate(remaining_content):
            # Assign to server based on hash
            server_idx = hash(content.content_id) % len(regional_servers)
            server = regional_servers[server_idx]
            
            if server.available_bytes >= content.size_bytes:
                placement[server.server_id].append(content.content_id)
                server.used_bytes += content.size_bytes
                server.cached_content.add(content.content_id)
        
        return placement
    
    def select_server_for_request(
        self, 
        content_id: str, 
        user_region: str,
        user_isp: str
    ) -> OCAServer:
        """
        Select best OCA server to serve a content request.
        
        Priority:
        1. ISP-embedded OCA with content cached
        2. Regional IXP OCA with content cached
        3. Any OCA with content cached
        4. Fall back to origin (cache miss)
        """
        candidates = []
        
        for server in self.servers.values():
            if content_id not in server.cached_content:
                continue
                
            # Score servers by proximity and load
            score = 0
            
            # Prefer ISP-embedded servers
            if server.location == user_isp:
                score += 100
            # Prefer same-region servers
            elif server.region == user_region:
                score += 50
            
            # Penalize highly loaded servers
            load_pct = server.used_bytes / server.capacity_bytes
            score -= load_pct * 20
            
            candidates.append((score, server))
        
        if not candidates:
            return None  # Cache miss - fetch from origin
        
        # Return highest-scoring server
        candidates.sort(key=lambda x: x[0], reverse=True)
        return candidates[0][1]

Deep Dive 3: Microservices Architecture (Week 2 - Circuit Breakers)

Interviewer: "Netflix is famous for their microservices. How do they handle failures at scale?"

You: "Netflix pioneered many of the patterns we now consider standard — circuit breakers, bulkheads, and chaos engineering."

The Microservices Landscape

NETFLIX MICROSERVICES ECOSYSTEM

┌──────────────────────────────────────────────────────────────────────────────┐
│                                                                              │
│   REQUEST FLOW THROUGH NETFLIX BACKEND                                       │
│                                                                              │
│   Client Request                                                             │
│        │                                                                     │
│        ▼                                                                     │
│   ┌─────────┐                                                                │
│   │  Zuul   │  API Gateway                                                   │
│   │ Gateway │  - Authentication                                              │
│   └────┬────┘  - Rate limiting                                               │
│        │       - Request routing                                             │
│        │                                                                     │
│        ▼                                                                     │
│   ┌─────────┐                                                                │
│   │ Eureka  │  Service Discovery                                             │
│   │Registry │  - Dynamic service registration                                │
│   └────┬────┘  - Health monitoring                                           │
│        │       - Load balancing info                                         │
│        │                                                                     │
│        ▼                                                                     │
│   ┌───────────────────────────────────────────────────────────────────┐      │
│   │                     SERVICE MESH                                  │      │
│   │                                                                   │      │
│   │   ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐        │      │
│   │   │ User    │    │Playback │    │ Reco    │    │ Search  │        │      │
│   │   │ Service │───▶│ Service │───▶│ Service │───▶│ Service │        │      │
│   │   └─────────┘    └─────────┘    └─────────┘    └─────────┘        │      │
│   │        │              │              │              │             │      │
│   │        │              │              │              │             │      │
│   │        ▼              ▼              ▼                            │      │
│   │   ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐        │      │
│   │   │Cassandra│    │ EVCache │    │Cassandra│    │Elastic  │        │      │
│   │   └─────────┘    └─────────┘    └─────────┘    │ Search  │        │      │
│   │                                                └─────────┘        │      │
│   │                                                                   │      │
│   │   Each service-to-service call wrapped in Hystrix circuit breake  │      │
│   │                                                                   │      │
│   └───────────────────────────────────────────────────────────────────┘      │
└──────────────────────────────────────────────────────────────────────────────┘

Circuit Breaker Pattern

# resilience/circuit_breaker.py

"""
Circuit Breaker Implementation
Based on Netflix's Hystrix pattern for fault tolerance.
"""

from dataclasses import dataclass, field
from typing import Callable, Optional, Any
from enum import Enum
from datetime import datetime, timedelta
import threading
import time


class CircuitState(Enum):
    """Circuit breaker states."""
    CLOSED = "closed"      # Normal operation
    OPEN = "open"          # Failing, reject requests
    HALF_OPEN = "half_open"  # Testing if service recovered


@dataclass
class CircuitBreakerConfig:
    """Configuration for circuit breaker behavior."""
    failure_threshold: int = 5         # Failures before opening
    success_threshold: int = 3         # Successes to close from half-open
    timeout_seconds: float = 30.0      # Time before trying half-open
    half_open_max_calls: int = 3       # Max calls in half-open state


@dataclass
class CircuitBreakerStats:
    """Runtime statistics for monitoring."""
    total_calls: int = 0
    successful_calls: int = 0
    failed_calls: int = 0
    rejected_calls: int = 0
    state_changes: int = 0


class CircuitBreaker:
    """
    Circuit Breaker for protecting service calls.
    
    States:
    - CLOSED: Normal operation, requests pass through
    - OPEN: Service failing, reject requests immediately
    - HALF_OPEN: Testing if service recovered
    
    Transitions:
    - CLOSED → OPEN: When failure_threshold exceeded
    - OPEN → HALF_OPEN: After timeout_seconds
    - HALF_OPEN → CLOSED: When success_threshold reached
    - HALF_OPEN → OPEN: On any failure
    
    Applies concepts from:
    - Week 2: Failure handling, timeouts, graceful degradation
    """
    
    def __init__(
        self, 
        name: str, 
        config: CircuitBreakerConfig = None,
        fallback: Callable = None
    ):
        self.name = name
        self.config = config or CircuitBreakerConfig()
        self.fallback = fallback
        
        self._state = CircuitState.CLOSED
        self._failure_count = 0
        self._success_count = 0
        self._last_failure_time: Optional[datetime] = None
        self._half_open_calls = 0
        
        self.stats = CircuitBreakerStats()
        self._lock = threading.Lock()
        
    @property
    def state(self) -> CircuitState:
        """Get current state, checking for timeout transitions."""
        with self._lock:
            if self._state == CircuitState.OPEN:
                if self._should_attempt_reset():
                    self._transition_to(CircuitState.HALF_OPEN)
            return self._state
    
    def _should_attempt_reset(self) -> bool:
        """Check if enough time passed to try half-open."""
        if self._last_failure_time is None:
            return False
        elapsed = (datetime.utcnow() - self._last_failure_time).total_seconds()
        return elapsed >= self.config.timeout_seconds
    
    def _transition_to(self, new_state: CircuitState):
        """Transition to new state with logging."""
        old_state = self._state
        self._state = new_state
        self.stats.state_changes += 1
        
        if new_state == CircuitState.HALF_OPEN:
            self._half_open_calls = 0
            self._success_count = 0
        elif new_state == CircuitState.CLOSED:
            self._failure_count = 0
            
        print(f"Circuit {self.name}: {old_state.value} → {new_state.value}")
    
    def call(self, func: Callable, *args, **kwargs) -> Any:
        """
        Execute function through circuit breaker.
        
        Returns result or fallback value.
        Raises exception if no fallback and circuit open.
        """
        self.stats.total_calls += 1
        current_state = self.state
        
        # OPEN: Reject immediately
        if current_state == CircuitState.OPEN:
            self.stats.rejected_calls += 1
            return self._handle_rejection()
        
        # HALF_OPEN: Limit concurrent calls
        if current_state == CircuitState.HALF_OPEN:
            with self._lock:
                if self._half_open_calls >= self.config.half_open_max_calls:
                    self.stats.rejected_calls += 1
                    return self._handle_rejection()
                self._half_open_calls += 1
        
        # Execute the call
        try:
            result = func(*args, **kwargs)
            self._on_success()
            return result
        except Exception as e:
            self._on_failure()
            return self._handle_failure(e)
    
    def _on_success(self):
        """Handle successful call."""
        with self._lock:
            self.stats.successful_calls += 1
            
            if self._state == CircuitState.HALF_OPEN:
                self._success_count += 1
                if self._success_count >= self.config.success_threshold:
                    self._transition_to(CircuitState.CLOSED)
            elif self._state == CircuitState.CLOSED:
                # Reset failure count on success
                self._failure_count = 0
    
    def _on_failure(self):
        """Handle failed call."""
        with self._lock:
            self.stats.failed_calls += 1
            self._last_failure_time = datetime.utcnow()
            
            if self._state == CircuitState.HALF_OPEN:
                # Any failure in half-open goes back to open
                self._transition_to(CircuitState.OPEN)
            elif self._state == CircuitState.CLOSED:
                self._failure_count += 1
                if self._failure_count >= self.config.failure_threshold:
                    self._transition_to(CircuitState.OPEN)
    
    def _handle_rejection(self) -> Any:
        """Handle rejected call (circuit open)."""
        if self.fallback:
            return self.fallback()
        raise CircuitOpenError(f"Circuit {self.name} is OPEN")
    
    def _handle_failure(self, exception: Exception) -> Any:
        """Handle failed call."""
        if self.fallback:
            return self.fallback()
        raise exception


class CircuitOpenError(Exception):
    """Raised when circuit breaker rejects a call."""
    pass


# Example usage with service calls
class RecommendationService:
    """
    Service that uses circuit breaker for dependency calls.
    """
    
    def __init__(self):
        self.user_service_breaker = CircuitBreaker(
            name="user-service",
            config=CircuitBreakerConfig(
                failure_threshold=5,
                timeout_seconds=30,
                success_threshold=3
            ),
            fallback=self._get_default_user
        )
        
        self.ml_service_breaker = CircuitBreaker(
            name="ml-recommendations",
            config=CircuitBreakerConfig(
                failure_threshold=3,
                timeout_seconds=60,
                success_threshold=2
            ),
            fallback=self._get_popular_content
        )
    
    def get_recommendations(self, user_id: str) -> list:
        """Get personalized recommendations for user."""
        # Get user profile (with circuit breaker)
        user = self.user_service_breaker.call(
            self._fetch_user_profile, user_id
        )
        
        # Get ML recommendations (with circuit breaker)
        recommendations = self.ml_service_breaker.call(
            self._fetch_ml_recommendations, user
        )
        
        return recommendations
    
    def _fetch_user_profile(self, user_id: str) -> dict:
        """Fetch user profile from user service."""
        # Actual HTTP call would go here
        pass
    
    def _fetch_ml_recommendations(self, user: dict) -> list:
        """Fetch recommendations from ML service."""
        # Actual HTTP call would go here
        pass
    
    def _get_default_user(self) -> dict:
        """Fallback: return anonymous user profile."""
        return {"id": "anonymous", "preferences": []}
    
    def _get_popular_content(self) -> list:
        """Fallback: return globally popular content."""
        return [
            {"id": "trending_1", "title": "Popular Movie 1"},
            {"id": "trending_2", "title": "Popular Series 1"},
        ]

Deep Dive 4: Recommendation System (Week 5 - Distributed Computing)

Interviewer: "Tell me about Netflix's recommendation system. How does it power 80% of viewing?"

You: "The recommendation system is Netflix's competitive moat. It processes billions of events to personalize every user's experience."

Recommendation Architecture

NETFLIX RECOMMENDATION SYSTEM

┌─────────────────────────────────────────────────────────────────────────────┐
│                                                                             │
│   DATA COLLECTION                                                           │
│   ───────────────                                                           │
│                                                                             │
│   User Events:                                                              │
│   • What you watched (and for how long)                                     │
│   • What you searched for                                                   │
│   • What you scrolled past                                                  │
│   • Time of day you watch                                                   │
│   • Device you watch on                                                     │
│   • What you added to "My List"                                             │
│                                                                             │
│   Content Metadata:                                                         │
│   • Genre, cast, director                                                   │
│   • Runtime, release year                                                   │
│   • Visual features (extracted by ML)                                       │
│   • Audio features                                                          │
│   • Maturity rating                                                         │
│                                                                             │
│                    │                                                        │
│                    ▼                                                        │
│   ┌────────────────────────────────────────────────────────────────────┐    │
│   │                    ML MODELS                                       │    │
│   │                                                                    │    │
│   │   ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐    │    │
│   │   │Collaborative│  │Content-Based│  │  Deep Learning Models   │    │    │
│   │   │ Filtering   │  │ Filtering   │  │  (Neural Networks)      │    │    │
│   │   └──────┬──────┘  └──────┬──────┘  └───────────┬─────────────┘    │    │
│   │          │                │                     │                  │    │
│   │          └────────────────┴─────────────────────┘                  │    │
│   │                           │                                        │    │
│   │                           ▼                                        │    │
│   │                  ┌─────────────────┐                               │    │
│   │                  │ Ensemble Model  │                               │    │
│   │                  │ (Combines all)  │                               │    │
│   │                  └────────┬────────┘                               │    │
│   └───────────────────────────┼────────────────────────────────────────┘    │
│                               │                                             │
│                               ▼                                             │
│   ┌────────────────────────────────────────────────────────────────────┐    │
│   │                    PERSONALIZATION                                 │    │
│   │                                                                    │    │
│   │   Homepage Rows:                                                   │    │
│   │   ┌─────────────────────────────────────────────────────────────┐  │    │
│   │   │ "Because You Watched Stranger Things"                       │  │    │
│   │   │ [Show1] [Show2] [Show3] [Show4] [Show5] →                   │  │    │
│   │   └─────────────────────────────────────────────────────────────┘  │    │
│   │   ┌─────────────────────────────────────────────────────────────┐  │    │
│   │   │ "Trending Now"                                              │  │    │
│   │   │ [Show1] [Show2] [Show3] [Show4] [Show5] →                   │  │    │
│   │   └─────────────────────────────────────────────────────────────┘  │    │
│   │   ┌─────────────────────────────────────────────────────────────┐  │    │
│   │   │ "Top Picks for You"                                         │  │    │
│   │   │ [Show1] [Show2] [Show3] [Show4] [Show5] →                   │  │    │
│   │   └─────────────────────────────────────────────────────────────┘  │    │
│   │                                                                    │    │
│   │   Even the artwork shown varies per user!                          │    │
│   │                                                                    │    │
│   └────────────────────────────────────────────────────────────────────┘    │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Recommendation Algorithm Implementation

# recommendations/recommender.py

"""
Netflix-style Recommendation System
Demonstrates collaborative filtering and personalization.
"""

from dataclasses import dataclass
from typing import Dict, List, Tuple, Optional
import numpy as np
from collections import defaultdict


@dataclass
class UserProfile:
    """User viewing profile."""
    user_id: str
    viewing_history: List[str]  # Content IDs watched
    ratings: Dict[str, float]   # Content ID → rating
    preferences: Dict[str, float]  # Genre → preference score
    

@dataclass
class ContentItem:
    """Content metadata."""
    content_id: str
    title: str
    genres: List[str]
    tags: List[str]
    avg_rating: float
    popularity_score: float


class CollaborativeFilter:
    """
    Collaborative Filtering using Matrix Factorization.
    
    Learns latent factors for users and items that explain
    viewing patterns. Similar users have similar factor vectors.
    
    Applies concepts from:
    - Week 5: Distributed computation (training at scale)
    """
    
    def __init__(self, num_factors: int = 50, learning_rate: float = 0.01):
        self.num_factors = num_factors
        self.learning_rate = learning_rate
        self.user_factors: Dict[str, np.ndarray] = {}
        self.item_factors: Dict[str, np.ndarray] = {}
        
    def fit(self, interactions: List[Tuple[str, str, float]]):
        """
        Train model on user-item interactions.
        
        Args:
            interactions: List of (user_id, item_id, rating) tuples
        """
        # Initialize random factors
        users = set(u for u, _, _ in interactions)
        items = set(i for _, i, _ in interactions)
        
        for user in users:
            self.user_factors[user] = np.random.randn(self.num_factors) * 0.1
        for item in items:
            self.item_factors[item] = np.random.randn(self.num_factors) * 0.1
        
        # Stochastic Gradient Descent
        for epoch in range(20):
            np.random.shuffle(interactions)
            total_error = 0
            
            for user_id, item_id, rating in interactions:
                # Predict rating
                pred = np.dot(
                    self.user_factors[user_id],
                    self.item_factors[item_id]
                )
                error = rating - pred
                total_error += error ** 2
                
                # Update factors
                user_grad = error * self.item_factors[item_id]
                item_grad = error * self.user_factors[user_id]
                
                self.user_factors[user_id] += self.learning_rate * user_grad
                self.item_factors[item_id] += self.learning_rate * item_grad
            
            rmse = np.sqrt(total_error / len(interactions))
            if epoch % 5 == 0:
                print(f"Epoch {epoch}, RMSE: {rmse:.4f}")
    
    def predict(self, user_id: str, item_id: str) -> float:
        """Predict rating for user-item pair."""
        if user_id not in self.user_factors:
            return 3.0  # Default rating for unknown users
        if item_id not in self.item_factors:
            return 3.0  # Default rating for unknown items
            
        return np.dot(
            self.user_factors[user_id],
            self.item_factors[item_id]
        )
    
    def recommend(self, user_id: str, n: int = 10) -> List[Tuple[str, float]]:
        """Get top N recommendations for user."""
        if user_id not in self.user_factors:
            return []
        
        scores = []
        for item_id, item_factors in self.item_factors.items():
            score = np.dot(self.user_factors[user_id], item_factors)
            scores.append((item_id, score))
        
        scores.sort(key=lambda x: x[1], reverse=True)
        return scores[:n]


class ContentBasedFilter:
    """
    Content-Based Filtering using item features.
    
    Recommends items similar to what user has liked,
    based on content features (genre, tags, etc.)
    """
    
    def __init__(self):
        self.item_features: Dict[str, np.ndarray] = {}
        self.feature_index: Dict[str, int] = {}
        
    def index_content(self, items: List[ContentItem]):
        """Build feature vectors for all content."""
        # Build feature vocabulary
        all_features = set()
        for item in items:
            all_features.update(item.genres)
            all_features.update(item.tags)
        
        self.feature_index = {f: i for i, f in enumerate(all_features)}
        num_features = len(self.feature_index)
        
        # Build feature vectors
        for item in items:
            vector = np.zeros(num_features)
            for genre in item.genres:
                vector[self.feature_index[genre]] = 1.0
            for tag in item.tags:
                vector[self.feature_index[tag]] = 0.5
            
            # Normalize
            norm = np.linalg.norm(vector)
            if norm > 0:
                vector /= norm
            
            self.item_features[item.content_id] = vector
    
    def get_similar(self, item_id: str, n: int = 10) -> List[Tuple[str, float]]:
        """Find N most similar items."""
        if item_id not in self.item_features:
            return []
        
        target = self.item_features[item_id]
        similarities = []
        
        for other_id, features in self.item_features.items():
            if other_id == item_id:
                continue
            similarity = np.dot(target, features)
            similarities.append((other_id, similarity))
        
        similarities.sort(key=lambda x: x[1], reverse=True)
        return similarities[:n]
    
    def recommend_for_user(
        self, 
        user: UserProfile, 
        n: int = 10
    ) -> List[Tuple[str, float]]:
        """Recommend based on user's viewing history."""
        # Aggregate scores from similar items to watched content
        scores = defaultdict(float)
        
        for watched_id in user.viewing_history[-20:]:  # Last 20 watched
            similar = self.get_similar(watched_id, n=50)
            for item_id, similarity in similar:
                if item_id not in user.viewing_history:
                    scores[item_id] += similarity
        
        # Sort by aggregated score
        recommendations = sorted(
            scores.items(), 
            key=lambda x: x[1], 
            reverse=True
        )
        return recommendations[:n]


class HybridRecommender:
    """
    Hybrid Recommender combining multiple signals.
    
    Netflix uses ensemble of multiple models:
    - Collaborative filtering
    - Content-based filtering
    - Popularity-based
    - Context-aware (time, device)
    """
    
    def __init__(self):
        self.collaborative = CollaborativeFilter()
        self.content_based = ContentBasedFilter()
        
        # Weights for combining models
        self.weights = {
            "collaborative": 0.4,
            "content_based": 0.3,
            "popularity": 0.2,
            "recency": 0.1
        }
    
    def recommend(
        self, 
        user: UserProfile,
        context: dict = None,
        n: int = 20
    ) -> List[ContentItem]:
        """
        Generate personalized recommendations.
        
        Combines multiple signals with learned weights.
        """
        scores = defaultdict(float)
        
        # Collaborative filtering scores
        cf_recs = self.collaborative.recommend(user.user_id, n=100)
        for item_id, score in cf_recs:
            scores[item_id] += score * self.weights["collaborative"]
        
        # Content-based scores
        cb_recs = self.content_based.recommend_for_user(user, n=100)
        for item_id, score in cb_recs:
            scores[item_id] += score * self.weights["content_based"]
        
        # Filter out already watched
        for watched_id in user.viewing_history:
            scores.pop(watched_id, None)
        
        # Sort and return top N
        sorted_items = sorted(
            scores.items(),
            key=lambda x: x[1],
            reverse=True
        )
        
        return sorted_items[:n]
    
    def get_row_recommendations(
        self,
        user: UserProfile,
        row_type: str
    ) -> List[str]:
        """
        Get recommendations for a specific homepage row.
        
        Row types:
        - "because_you_watched": Similar to recently watched
        - "trending": Popular in user's region
        - "top_picks": Personalized top recommendations
        - "continue_watching": Incomplete titles
        """
        if row_type == "because_you_watched":
            if not user.viewing_history:
                return []
            last_watched = user.viewing_history[-1]
            similar = self.content_based.get_similar(last_watched, n=10)
            return [item_id for item_id, _ in similar]
        
        elif row_type == "top_picks":
            recs = self.recommend(user, n=10)
            return [item_id for item_id, _ in recs]
        
        # ... other row types
        
        return []

Phase 5: Scaling and Edge Cases (5 minutes)

Interviewer: "How does Netflix handle extreme scale events?"

Scaling for Major Releases

SQUID GAME SEASON 2 LAUNCH (Example)

Peak load characteristics:
  - 87 million views in first week
  - Global simultaneous interest
  - Massive traffic spike at midnight releases

Preparation:
  1. Pre-position content on ALL OCAs globally
  2. Scale up control plane capacity
  3. Implement launch-specific caching
  4. Prepare fallback recommendations if ML overloaded

Traffic shaping:
  - Stagger release times by region (reduce peak)
  - Pre-generate recommendations for likely viewers
  - Cache homepage variations

Edge Cases

Edge Case 1: ISP OCA Failure

Scenario: ISP-embedded OCA goes offline
Impact: Users in that ISP see degraded performance
Solution:
  - Automatic failover to IXP-level OCA
  - BGP routing updates within seconds
  - User sees brief quality dip, then recovery

Edge Case 2: Regional Control Plane Failure

Scenario: AWS region experiences outage
Impact: Users can't browse or start new streams
Solution:
  - Multi-region deployment with automatic failover
  - Continue Watching cached locally
  - Graceful degradation: show cached recommendations

Edge Case 3: Thundering Herd at Midnight Release

Scenario: Millions request same new content simultaneously
Impact: Potential OCA overload
Solution:
  - Pre-warm all caches with new content
  - Distribute load across replica OCAs
  - Queue and rate-limit API requests
  - Start users at slightly different positions

Phase 6: Monitoring and Operations

Monitoring Dashboard

┌──────────────────────────────────────────────────────────────────────────────┐
│                       NETFLIX STREAMING MONITORING                           │
│                                                                              │
│   PLAYBACK HEALTH                                                            │
│   ───────────────                                                            │
│   Active streams:           8.2M concurrent      [████████░░] 82%            │
│   Play start success:       99.7%                [██████████] OK             │
│   Rebuffer rate:            0.02%                [░░░░░░░░░░] OK             │
│   Avg video quality:        4.2 Mbps             [████████░░] Good           │
│                                                                              │
│   CDN HEALTH                                                                 │
│   ──────────                                                                 │
│   OCA servers online:       16,892 / 17,000      [██████████] 99.4%          │
│   Cache hit rate:           94.2%                [█████████░] Good           │
│   Origin bandwidth:         2.1 Tbps             [████░░░░░░] Normal         │
│   P99 latency to OCA:       12ms                 [██░░░░░░░░] Excellent      │
│                                                                              │
│   CONTROL PLANE                                                              │
│   ─────────────                                                              │
│   API requests/sec:         892,341              [████████░░] 89%            │
│   API latency p99:          45ms                 [██████░░░░] Good           │
│   Service health:           1,247 / 1,250        [██████████] 99.8%          │
│   Circuit breakers open:    3                    [░░░░░░░░░░] OK             │
│                                                                              │
│   QUALITY OF EXPERIENCE                                                      │
│   ─────────────────────                                                      │
│   Play start time p50:      1.2s                 [████░░░░░░] Good           │
│   Play start time p99:      3.1s                 [██████░░░░] OK             │
│   Quality switches/hour:    2.1                  [████░░░░░░] Normal         │
│   Session abandonment:      0.8%                 [░░░░░░░░░░] Low            │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘

Chaos Engineering

SIMIAN ARMY - NETFLIX'S CHAOS ENGINEERING TOOLS

┌─────────────────────────────────────────────────────────────────────────────┐
│                                                                             │
│   CHAOS MONKEY                                                              │
│   • Randomly terminates instances during business hours                     │
│   • Ensures services handle instance failures gracefully                    │
│   • "If you can't handle a single instance dying, fix it first"             │
│                                                                             │
│   LATENCY MONKEY                                                            │
│   • Injects artificial delays into service calls                            │
│   • Tests timeout and fallback behavior                                     │
│   • Ensures graceful degradation under slow dependencies                    │
│                                                                             │
│   CHAOS GORILLA                                                             │
│   • Simulates entire AWS Availability Zone failure                          │
│   • Tests regional failover                                                 │
│   • Run during low-traffic periods                                          │
│                                                                             │
│   CHAOS KONG                                                                │
│   • Simulates entire AWS Region failure                                     │
│   • Ultimate test of multi-region resilience                                │
│   • Run very carefully with full preparation                                │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Interview Conclusion

Interviewer: "Excellent work. You've covered the key aspects of Netflix's architecture — the two-plane separation, adaptive streaming, their CDN strategy, and resilience patterns. Any questions for me?"

You: "Thank you! I'm curious about how Netflix handles the transition to live streaming for events like sports. The infrastructure seems optimized for VOD — what changes for live?"

Interviewer: "Great question. Live requires entirely different latency characteristics — you can't pre-cache content. Netflix is building new infrastructure specifically for live, with different CDN strategies focused on minimal latency rather than maximum caching. It's a fascinating new challenge for their team."

Summary: Concepts Applied

┌──────────────────────────────────────────────────────────────────────────────┐
│              CONCEPTS FROM 10-WEEK COURSE IN NETFLIX DESIGN                  │
│                                                                              │
│   WEEK 1: DATA AT SCALE                                                      │
│   ├── Partitioning: Content distributed across OCA tiers                     │
│   ├── Replication: Popular content on all regional servers                   │
│   └── Hot Keys: Trending content pre-positioned everywhere                   │
│                                                                              │
│   WEEK 2: FAILURE-FIRST DESIGN                                               │
│   ├── Circuit breakers: Hystrix pattern for service calls                    │
│   ├── Timeouts: Strict budgets for playback start                            │
│   ├── Graceful degradation: Cached recommendations as fallback               │
│   └── Chaos engineering: Simian Army for resilience testing                  │
│                                                                              │
│   WEEK 3: MESSAGING & ASYNC                                                  │
│   ├── Event streaming: User events for recommendations                       │
│   └── Async processing: Background content encoding                          │
│                                                                              │
│   WEEK 4: CACHING                                                            │
│   ├── Multi-tier CDN: ISP → IXP → Origin                                     │
│   ├── Cache warming: Pre-position content before releases                    │
│   └── EVCache: In-memory caching for API responses                           │
│                                                                              │
│   WEEK 5: CONSISTENCY & COORDINATION                                         │
│   └── Eventual consistency: Recommendations update async                     │
│                                                                              │
│   WEEK 6: NOTIFICATION SYSTEMS                                               │
│   └── Push notifications: New content alerts                                 │
│                                                                              │
│   WEEK 10: PRODUCTION READINESS                                              │
│   ├── SLOs: Play start time, rebuffer rate                                   │
│   ├── Observability: Atlas metrics, distributed tracing                      │
│   └── Canary deployments: Spinnaker for safe rollouts                        │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────────┐
│                                                                              │
│              WHY NETFLIX IS AN ENGINEERING MARVEL                            │
│                                                                              │
│   SCALE                                                                      │
│   ─────                                                                      │
│   • 300+ million subscribers watching content globally                       │
│   • 94 billion hours viewed in just 6 months                                 │
│   • 15% of global internet traffic at peak hours                             │
│   • 17,000+ CDN servers in 158 countries                                     │
│                                                                              │
│   INNOVATION                                                                 │
│   ──────────                                                                 │
│   • Pioneered adaptive bitrate streaming at scale                            │
│   • Invented per-title encoding optimization (20-30% bandwidth savings)      │
│   • Created chaos engineering discipline                                     │
│   • Open-sourced foundational microservices tools                            │
│                                                                              │
│   EFFICIENCY                                                                 │
│   ──────────                                                                 │
│   • Built own CDN saving billions vs third-party                             │
│   • <$0.001 per GB delivered (vs $0.02+ industry)                            │
│   • 80% of viewing driven by recommendations                                 │
│   • Zero buffering goal achieved for stable connections                      │
│                                                                              │
│   RESILIENCE                                                                 │
│   ──────────                                                                 │
│   • 99.99% availability target achieved                                      │
│   • Survives AWS region failures                                             │
│   • Graceful degradation at every layer                                      │
│   • "Chaos Monkey" became industry standard                                  │
│                                                                              │
│   "Netflix didn't just build a streaming service — they reinvented           │
│    how the internet delivers video and how companies build                   │
│    resilient distributed systems."                                           │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘

Sources

Official Documentation:

Netflix Open Connect Overview: https://openconnect.netflix.com/
Netflix Tech Blog: https://netflixtechblog.com/
Netflix Research: https://research.netflix.com/

Statistics and Data:

Netflix Q4 2024 Earnings Report
Netflix Engagement Report H2 2024: https://about.netflix.com/en/news/what-we-watched-the-second-half-of-2024
Statista Netflix Subscriber Statistics

Architecture and Technical:

"Mastering Chaos - A Netflix Guide to Microservices" - Josh Evans (QCon)
Netflix OSS: https://netflix.github.io/
Per-Title Encoding Optimization: Netflix Tech Blog

Open Source Tools:

Eureka: Service discovery
Zuul: API gateway
Hystrix: Circuit breaker (archived but influential)
Spinnaker: Continuous delivery
Chaos Monkey: Chaos engineering

Self-Assessment Checklist

After studying this case study, you should be able to:

This case study demonstrates how Netflix combined innovations in video encoding, content delivery, microservices architecture, and machine learning to create the world's dominant streaming platform. The same principles apply to any large-scale media delivery system.

End of Bonus Problem 6: Netflix Streaming

Document Statistics:

Core concepts covered: 20+
Code implementations: 4 (ABR client, content placement, circuit breaker, recommender)
Architecture diagrams: 10+
Real-world scale numbers: 40+

Back to Course Overview