Himanshu Kukreja
0%

Bonus Problem 7: Spotify

The World's Most Personalized Music Experience at Scale


🎡 How Do You Make 713 Million People Feel Like You Built an App Just for Them?

Imagine this challenge: You need to serve 100+ million songs to 713 million users across 184 countries. Every user expects instant playbackβ€”less than 200ms from tap to music. Every user expects recommendations that feel eerily personal. And you need to do this with an engineering team that pioneered the "squad" model of autonomous teams.

This is Spotifyβ€”and it's not just a music app. It's a masterclass in personalization at scale, microservices architecture, and building developer platforms that 7,300 engineers can use without stepping on each other's toes.

THE SPOTIFY SCALE (2025)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                                          β”‚
β”‚   USERS                                                                  β”‚
β”‚   ─────                                                                  β”‚
β”‚   Monthly Active Users:          713 Million                             β”‚
β”‚   Premium Subscribers:           281 Million                             β”‚
β”‚   Conversion Rate:               ~40% (industry-leading)                 β”‚
β”‚                                                                          β”‚
β”‚   CONTENT                                                                β”‚
β”‚   ───────                                                                β”‚
β”‚   Songs:                         100+ Million                            β”‚
β”‚   Podcasts:                      ~7 Million titles                       β”‚
β”‚   Audiobooks:                    350,000+                                β”‚
β”‚   Playlists:                     4+ Billion (user-created)               β”‚
β”‚                                                                          β”‚
β”‚   ENGAGEMENT                                                             β”‚
β”‚   ──────────                                                             β”‚
β”‚   Average listening time:        114 minutes/day                         β”‚
β”‚   Discover Weekly users:         40+ Million weekly                      β”‚
β”‚   Streams from recommendations:  ~33% of all plays                       β”‚
β”‚                                                                          β”‚
β”‚   INFRASTRUCTURE                                                         β”‚
β”‚   ──────────────                                                         β”‚
β”‚   Markets:                       184 countries                           β”‚
β”‚   Employees:                     ~7,300                                  β”‚
β”‚   Microservices:                 2,000+ backend services                 β”‚
β”‚   Daily events processed:        500+ Billion                            β”‚
β”‚                                                                          β”‚
β”‚   BUSINESS                                                               β”‚
β”‚   ────────                                                               β”‚
β”‚   2024 Revenue:                  €15.67 Billion                          β”‚
β”‚   Market Share:                  31.7% (global music streaming)          β”‚
β”‚   Royalties paid (2024):         $10 Billion to artists                  β”‚
β”‚                                                                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

This is the system we'll design todayβ€”and discover how Spotify built the world's most personalized audio experience.


The Interview Begins

You walk into the interview room. The interviewer smiles and gestures to the whiteboard.

Interviewer: "Thanks for coming in. Today we're going to design a music streaming service like Spotify. I'm interested in how you think about scale, personalization, and delivering a seamless listening experience. Please think out loudβ€”this is collaborative."

They write on the whiteboard:

╔══════════════════════════════════════════════════════════════════════════╗
β•‘                                                                          β•‘
β•‘                    Design a Music Streaming Platform                     β•‘
β•‘                                                                          β•‘
β•‘   Requirements:                                                          β•‘
β•‘   - Stream audio to hundreds of millions of users globally               β•‘
β•‘   - Near-instant playback (< 200ms to first audio)                       β•‘
β•‘   - Highly personalized recommendations                                  β•‘
β•‘   - Search across 100M+ tracks                                           β•‘
β•‘   - Support both free (ad-supported) and premium tiers                   β•‘
β•‘   - Handle massive catalog: songs, podcasts, audiobooks                  β•‘
β•‘                                                                          β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

Interviewer: "Take a few minutes to think about this, then walk me through your approach. We have about 45 minutes."


Phase 1: Requirements Clarification (5 minutes)

Before diving in, you take a breath and start asking questions.

Your Questions

You: "Before I start designing, I'd like to clarify a few requirements. First, what's our target scaleβ€”how many concurrent users should we support?"

Interviewer: "At peak, we might have 50 million concurrent listeners. Average is around 20-30 million."

You: "For playback latency, what's acceptable? I want to understand the user experience bar."

Interviewer: "Users tap a song and expect music within 200 milliseconds. No buffering during playback except on very poor connections."

You: "How personalized are the recommendations? Are we talking basic 'similar artists' or deeply personal like Discover Weekly?"

Interviewer: "Deeply personal. We want users to feel the app knows their taste better than they do. Recommendations should work even for new users with minimal listening history."

You: "What about offline playback? Do premium users expect to download music?"

Interviewer: "Yes, premium users can download. That's a key differentiator from the free tier."

You: "Perfect. Let me summarize the requirements as I understand them."

Functional Requirements

1. AUDIO PLAYBACK
   - Stream songs, podcasts, and audiobooks
   - Support multiple quality levels (96kbps to 320kbps)
   - Adaptive bitrate based on network conditions
   - Offline download for premium users
   - Gapless playback between tracks

2. DISCOVERY & SEARCH
   - Full-text search across songs, artists, albums, podcasts
   - Typo tolerance and autocomplete
   - Browse by genre, mood, activity
   - Personalized recommendations

3. USER FEATURES
   - Create and manage playlists
   - Follow artists and friends
   - Like/save songs to library
   - View listening history

4. MONETIZATION
   - Free tier with ads
   - Premium tier (ad-free, offline, higher quality)
   - Family and student plans

Non-Functional Requirements

1. SCALE
   - 50 million concurrent users (peak)
   - 100+ million songs in catalog
   - 500+ billion events/day for analytics
   - 1+ billion streams/day

2. LATENCY
   - Playback start: < 200ms
   - Search results: < 100ms
   - API responses: < 50ms (p99)

3. AVAILABILITY
   - 99.99% uptime for playback
   - Graceful degradation (recommendations can fail, playback must not)

4. DATA
   - Strong consistency for user data (playlists, library)
   - Eventual consistency acceptable for recommendations

Phase 2: Back of the Envelope Estimation (5 minutes)

You: "Let me work through the numbers to understand the scale."

Traffic Estimation

STREAMING TRAFFIC

Base numbers:
  Daily Active Users:                250 million
  Songs per user per day:            20 songs (average)
  Average song length:               3.5 minutes
  Average file size (160 kbps):      ~4 MB per song

Daily calculations:
  Total streams/day:                 250M Γ— 20 = 5 billion streams
  Streams per second:                5B Γ· 86,400 = ~58,000 streams/sec
  Peak (3x average):                 ~175,000 streams/sec

Bandwidth:
  Data per stream:                   ~4 MB
  Daily data transfer:               5B Γ— 4 MB = 20 PB/day
  Peak bandwidth:                    ~3.5 Tbps

Storage Estimation

AUDIO STORAGE

Song catalog:
  Total songs:                       100 million
  Average song (all quality levels): ~25 MB (multiple encodings)
  Total audio storage:               100M Γ— 25 MB = 2.5 PB

User data:
  Users:                             713 million
  Per user (playlists, library):     ~50 KB average
  Total user data:                   ~35 TB

Event data (analytics):
  Events per day:                    500 billion
  Daily event storage:               ~250 TB/day
  Yearly (compressed):               ~10 PB

Key Metrics Summary

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         ESTIMATION SUMMARY                               β”‚
β”‚                                                                          β”‚
β”‚   TRAFFIC                                                                β”‚
β”‚   β”œβ”€β”€ Peak streams:              175,000 /second                         β”‚
β”‚   β”œβ”€β”€ Daily streams:             5 billion                               β”‚
β”‚   └── API requests:              ~500,000 /second (peak)                 β”‚
β”‚                                                                          β”‚
β”‚   STORAGE                                                                β”‚
β”‚   β”œβ”€β”€ Audio catalog:             2.5 PB                                  β”‚
β”‚   β”œβ”€β”€ User data:                 35 TB                                   β”‚
β”‚   └── Event data:                ~10 PB/year                             β”‚
β”‚                                                                          β”‚
β”‚   BANDWIDTH                                                              β”‚
β”‚   β”œβ”€β”€ Peak egress:               3.5 Tbps                                β”‚
β”‚   └── Daily transfer:            20 PB                                   β”‚
β”‚                                                                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Phase 3: High-Level Design (10 minutes)

You: "Now let me sketch out the high-level architecture."

System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                       SPOTIFY HIGH-LEVEL ARCHITECTURE                   β”‚
β”‚                                                                         β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚   β”‚                         CLIENTS                                β”‚    β”‚
β”‚   β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚    β”‚
β”‚   β”‚    β”‚  iOS    β”‚  β”‚ Android β”‚  β”‚ Desktop β”‚  β”‚   Web   β”‚          β”‚    β”‚
β”‚   β”‚    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜          β”‚    β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β”‚
β”‚                                β”‚                                        β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚   β”‚                            β–Ό                                   β”‚    β”‚
β”‚   β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚    β”‚
β”‚   β”‚    β”‚                   CDN (Fastly/Akamai)                β”‚    β”‚    β”‚
β”‚   β”‚    β”‚         Audio delivery, static assets, caching       β”‚    β”‚    β”‚
β”‚   β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚    β”‚
β”‚   β”‚                                                                β”‚    β”‚
β”‚   β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚    β”‚
β”‚   β”‚    β”‚                   API Gateway                        β”‚    β”‚    β”‚
β”‚   β”‚    β”‚      Rate limiting, auth, routing, load balancing    β”‚    β”‚    β”‚
β”‚   β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚    β”‚
β”‚   β”‚                             β”‚                                  β”‚    β”‚
β”‚   β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚    β”‚
β”‚   β”‚    β”‚                 MICROSERVICES LAYER                 β”‚     β”‚    β”‚
β”‚   β”‚    β”‚                                                     β”‚     β”‚    β”‚
β”‚   β”‚    β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚     β”‚    β”‚
β”‚   β”‚    β”‚   β”‚ User    β”‚ β”‚ Catalog β”‚ β”‚ Search  β”‚ β”‚ Stream  β”‚   β”‚     β”‚    β”‚
β”‚   β”‚    β”‚   β”‚ Service β”‚ β”‚ Service β”‚ β”‚ Service β”‚ β”‚ Service β”‚   β”‚     β”‚    β”‚
β”‚   β”‚    β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚     β”‚    β”‚
β”‚   β”‚    β”‚                                                     β”‚     β”‚    β”‚
β”‚   β”‚    β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚     β”‚    β”‚
β”‚   β”‚    β”‚   β”‚Playlist β”‚ β”‚ Reco    β”‚ β”‚ Social  β”‚ β”‚   Ad    β”‚   β”‚     β”‚    β”‚
β”‚   β”‚    β”‚   β”‚ Service β”‚ β”‚ Service β”‚ β”‚ Service β”‚ β”‚ Service β”‚   β”‚     β”‚    β”‚
β”‚   β”‚    β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚     β”‚    β”‚
β”‚   β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚    β”‚
β”‚   β”‚                                                                β”‚    β”‚
β”‚   β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚    β”‚
β”‚   β”‚    β”‚                   DATA LAYER                          β”‚   β”‚    β”‚
β”‚   β”‚    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚   β”‚    β”‚
β”‚   β”‚    β”‚  β”‚ Postgres β”‚ β”‚Cassandra β”‚ β”‚  Redis   β”‚ β”‚BigQuery  β”‚  β”‚   β”‚    β”‚
β”‚   β”‚    β”‚  β”‚ (Users)  β”‚ β”‚ (Events) β”‚ β”‚ (Cache)  β”‚ β”‚(Analytics)β”‚ β”‚   β”‚    β”‚
β”‚   β”‚    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚   β”‚    β”‚
β”‚   β”‚    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”‚   β”‚    β”‚
β”‚   β”‚    β”‚  β”‚   GCS    β”‚ β”‚Elastic   β”‚ β”‚ Bigtable β”‚               β”‚   β”‚    β”‚
β”‚   β”‚    β”‚  β”‚ (Audio)  β”‚ β”‚ (Search) β”‚ β”‚ (Reco)   β”‚               β”‚   β”‚    β”‚
β”‚   β”‚    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚   β”‚    β”‚
β”‚   β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚    β”‚
β”‚   β”‚                                                                β”‚    β”‚
β”‚   β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚    β”‚
β”‚   β”‚    β”‚                 EVENT STREAMING                       β”‚   β”‚    β”‚
β”‚   β”‚    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”‚   β”‚    β”‚
β”‚   β”‚    β”‚  β”‚ Pub/Sub  β”‚ β”‚ Dataflow β”‚ β”‚   ML     β”‚               β”‚   β”‚    β”‚
β”‚   β”‚    β”‚  β”‚ (Events) β”‚ β”‚ (Process)β”‚ β”‚ Pipeline β”‚               β”‚   β”‚    β”‚
β”‚   β”‚    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚   β”‚    β”‚
β”‚   β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚    β”‚
β”‚   β”‚                    GOOGLE CLOUD PLATFORM                       β”‚    β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Data Flow: Playing a Song

PLAYBACK FLOW

1. TAP TO PLAY
   Client sends request to API Gateway

2. AUTHORIZATION
   Stream Service checks:
   - User subscription tier
   - Regional availability
   - Content licensing

3. GENERATE SIGNED URL
   URL with token, expires in 1 hour
   https://cdn.spotify.com/audio/abc123?token=xyz&expires=...

4. STREAM FROM CDN
   - CDN edge serves if cached (99%+ hit rate for popular)
   - Falls back to GCS origin if cache miss
   - Adaptive bitrate based on network

5. TRACK EVENTS
   Client sends play/skip events to Pub/Sub
   β†’ Analytics pipeline β†’ Recommendations

TOTAL LATENCY: < 200ms to first audio

Phase 4: Deep Dives (20 minutes)

Interviewer: "Let's dive deeper. Tell me about the recommendation system."


Deep Dive 1: Recommendation System

The Problem

PERSONALIZATION CHALLENGE

Scale:
  - 713 million users with unique tastes
  - 100+ million songs to choose from
  - Generate personalized playlists for each user

Cold start problem:
  - New users: No listening history
  - New songs: No user interactions

Exploration vs exploitation:
  - Recommend what users will like (exploitation)
  - Introduce new music (exploration)

The Solution: Hybrid Recommendation Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    THREE RECOMMENDATION MODELS                          β”‚
β”‚                                                                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚  β”‚  COLLABORATIVE  β”‚  β”‚    CONTENT      β”‚  β”‚     AUDIO       β”‚          β”‚
β”‚  β”‚   FILTERING     β”‚  β”‚     BASED       β”‚  β”‚    ANALYSIS     β”‚          β”‚
β”‚  β”‚                 β”‚  β”‚                 β”‚  β”‚                 β”‚          β”‚
β”‚  β”‚ "Users like you β”‚  β”‚ "Songs with     β”‚  β”‚ "Songs that     β”‚          β”‚
β”‚  β”‚  also liked..." β”‚  β”‚  similar tags"  β”‚  β”‚  sound similar" β”‚          β”‚
β”‚  β”‚                 β”‚  β”‚                 β”‚  β”‚                 β”‚          β”‚
β”‚  β”‚ Matrix          β”‚  β”‚ NLP on blogs,   β”‚  β”‚ CNN on audio    β”‚          β”‚
β”‚  β”‚ factorization   β”‚  β”‚ reviews, titles β”‚  β”‚ spectrograms    β”‚          β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β”‚           β”‚                    β”‚                    β”‚                   β”‚
β”‚           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                   β”‚
β”‚                                β”‚                                        β”‚
β”‚                                β–Ό                                        β”‚
β”‚           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                β”‚
β”‚           β”‚              ENSEMBLE MODEL                β”‚                β”‚
β”‚           β”‚    Combines all signals with learned       β”‚                β”‚
β”‚           β”‚    weights from A/B testing                β”‚                β”‚
β”‚           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Collaborative Filtering

MATRIX FACTORIZATION

User-Song Matrix (sparse):
         Song1  Song2  Song3  ...  Song100M
User1  [  5      0      3     ...     2    ]
User2  [  0      4      0     ...     0    ]
User3  [  3      0      5     ...     4    ]
  ...

Decompose into:
  - User vectors: 713M Γ— 128 dimensions
  - Song vectors: 100M Γ— 128 dimensions

Similarity = dot_product(user_vector, song_vector)

For Discover Weekly:
  1. Find users with similar taste profiles
  2. Find songs those users love that you haven't heard
  3. Rank by predicted engagement

Audio Analysis (CNN)

CONVOLUTIONAL NEURAL NETWORKS

Input: Audio spectrogram

Process:
  Raw audio β†’ Mel spectrogram β†’ CNN β†’ 128-dim feature vector

Output features:
  - Tempo, key, mode
  - Danceability, energy, valence
  - Acousticness, instrumentalness

Use case: Cold start for new songs
  - Works even with zero user interaction data
  - Powers "song radio" feature

Implementation

# recommendations/discover_weekly.py

"""
Discover Weekly Generation

Hybrid recommendation combining collaborative filtering,
content-based filtering, and audio analysis.
"""

from dataclasses import dataclass
from typing import List, Dict, Set
import numpy as np


@dataclass
class UserProfile:
    user_id: str
    embedding: np.ndarray      # 128-dim taste vector
    top_artists: List[str]
    top_genres: List[str]
    listening_history: Set[str]


class DiscoverWeeklyGenerator:
    """
    Generates personalized weekly playlists.
    
    Applies concepts:
    - Week 5: Eventual consistency for recommendations
    - Week 8: Batch processing pipeline
    """
    
    def __init__(self, user_store, song_store, similar_users_index):
        self.user_store = user_store
        self.song_store = song_store
        self.similar_users_index = similar_users_index
        
        # Weights from A/B testing
        self.weights = {
            'collaborative': 0.4,
            'content': 0.3,
            'audio': 0.2,
            'popularity': 0.1,
        }
        
        self.playlist_size = 30
        self.exploration_ratio = 0.2
    
    async def generate_playlist(self, user_id: str) -> List[str]:
        """Generate Discover Weekly for a user."""
        user = await self.user_store.get_profile(user_id)
        
        # Get candidates from collaborative filtering
        collab_candidates = await self._get_collaborative_candidates(user)
        
        # Get candidates from content similarity
        content_candidates = await self._get_content_candidates(user)
        
        # Merge and score
        all_candidates = self._merge_candidates(
            collab_candidates, content_candidates
        )
        scored = await self._score_candidates(user, all_candidates)
        
        # Filter already-heard songs
        filtered = [
            (song_id, score) for song_id, score in scored
            if song_id not in user.listening_history
        ]
        
        # Exploitation: top songs
        exploit_count = int(self.playlist_size * (1 - self.exploration_ratio))
        exploit_songs = [s for s, _ in filtered[:exploit_count]]
        
        # Exploration: outside comfort zone
        explore_songs = await self._get_exploration_tracks(
            user, 
            count=self.playlist_size - exploit_count,
            exclude=set(exploit_songs)
        )
        
        return self._shuffle_with_variety(exploit_songs + explore_songs)
    
    async def _get_collaborative_candidates(
        self, user: UserProfile
    ) -> Dict[str, float]:
        """Find songs from similar users."""
        similar_users = await self.similar_users_index.find_similar(
            user.embedding, k=1000
        )
        
        candidates = {}
        for similar_user_id, similarity in similar_users:
            loved_songs = await self.user_store.get_loved_songs(
                similar_user_id, limit=100
            )
            for song_id, engagement in loved_songs:
                score = similarity * engagement
                candidates[song_id] = candidates.get(song_id, 0) + score
        
        return candidates
    
    async def _score_candidates(
        self, user: UserProfile, candidates: Dict[str, float]
    ) -> List[tuple]:
        """Score using ensemble of all signals."""
        scored = []
        
        for song_id, base_score in candidates.items():
            song = await self.song_store.get_song(song_id)
            
            # Combine signals
            final_score = (
                self.weights['collaborative'] * base_score +
                self.weights['content'] * self._content_similarity(user, song) +
                self.weights['audio'] * self._audio_similarity(user, song) +
                self.weights['popularity'] * song.popularity_score
            )
            scored.append((song_id, final_score))
        
        return sorted(scored, key=lambda x: x[1], reverse=True)

Deep Dive 2: Audio Streaming

Interviewer: "How do you achieve < 200ms playback latency?"

Multi-Layer Caching

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                       AUDIO DELIVERY ARCHITECTURE                       β”‚
β”‚                                                                         β”‚
β”‚   LAYER 1: CLIENT CACHE                                                 β”‚
β”‚   ─────────────────────                                                 β”‚
β”‚   - Recently played songs                                               β”‚
β”‚   - Prefetched next songs                                               β”‚
β”‚   - Offline downloads (Premium)                                         β”‚
β”‚   - 1-10 GB local storage                                               β”‚
β”‚                                                                         β”‚
β”‚   LAYER 2: CDN EDGE (Fastly/Akamai)                                     β”‚
β”‚   ─────────────────────────────────                                     β”‚
β”‚   - 200+ global locations                                               β”‚
β”‚   - Top 20% of songs always cached (80% of plays)                       β”‚
β”‚   - 99%+ cache hit rate for popular content                             β”‚
β”‚   - Token validation at edge                                            β”‚
β”‚                                                                         β”‚
β”‚   LAYER 3: ORIGIN (GCS)                                                 β”‚
β”‚   ─────────────────────                                                 β”‚
β”‚   - All 100M+ songs                                                     β”‚
β”‚   - Multiple quality encodings per song                                 β”‚
β”‚   - Multi-region replication                                            β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Audio Quality Tiers

SPOTIFY AUDIO QUALITY

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Tier       β”‚ Bitrate     β”‚ Codec       β”‚ Use Case                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Low        β”‚ 24 kbps     β”‚ AAC         β”‚ Extreme data saving             β”‚
β”‚ Normal     β”‚ 96 kbps     β”‚ Ogg Vorbis  β”‚ Default for free tier           β”‚
β”‚ High       β”‚ 160 kbps    β”‚ Ogg Vorbis  β”‚ Free tier max / Premium default β”‚
β”‚ Very High  β”‚ 320 kbps    β”‚ Ogg Vorbis  β”‚ Premium audiophile              β”‚
β”‚ Lossless   β”‚ ~1411 kbps  β”‚ FLAC        β”‚ Premium (2025 rollout)          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Why Ogg Vorbis?
  - Open source (no licensing fees)
  - Better quality than MP3 at same bitrate

Implementation

# streaming/playback_service.py

"""
Playback Authorization and URL Signing

Optimized for < 30ms authorization latency.
"""

from dataclasses import dataclass
from typing import List
from datetime import datetime, timedelta
import hmac
import hashlib
import base64


@dataclass
class PlaybackResponse:
    stream_url: str
    expires_at: datetime
    quality: str
    prefetch_urls: List[str]


class PlaybackService:
    """
    Handles playback authorization.
    
    Applies concepts:
    - Week 4: Multi-tier caching
    - Week 2: Timeout management
    """
    
    def __init__(self, rights_service, catalog_service, cdn_config):
        self.rights = rights_service
        self.catalog = catalog_service
        self.cdn_base = cdn_config['base_url']
        self.signing_key = cdn_config['signing_key']
    
    async def get_playback_url(
        self, user_id: str, song_id: str, quality: str
    ) -> PlaybackResponse:
        """Generate signed streaming URL."""
        import asyncio
        
        # Parallel checks for speed
        rights_ok, song = await asyncio.gather(
            self.rights.check(user_id, song_id),
            self.catalog.get_song(song_id)
        )
        
        if not rights_ok:
            raise PlaybackNotAllowedError()
        
        # Generate signed URL
        file_id = f"{song_id}_{quality}.ogg"
        expires = datetime.utcnow() + timedelta(hours=1)
        url = self._sign_url(file_id, expires)
        
        # Prefetch hints for next songs
        prefetch = await self._get_prefetch_urls(user_id, song_id, quality)
        
        return PlaybackResponse(
            stream_url=url,
            expires_at=expires,
            quality=quality,
            prefetch_urls=prefetch
        )
    
    def _sign_url(self, file_id: str, expires: datetime) -> str:
        """HMAC-signed CDN URL."""
        expires_ts = int(expires.timestamp())
        message = f"/audio/{file_id}:{expires_ts}"
        
        sig = hmac.new(
            self.signing_key.encode(),
            message.encode(),
            hashlib.sha256
        ).digest()
        sig_b64 = base64.urlsafe_b64encode(sig).decode()
        
        return f"{self.cdn_base}/audio/{file_id}?expires={expires_ts}&sig={sig_b64}"

Deep Dive 3: Microservices & Backstage

Interviewer: "How do you manage 2,000+ microservices?"

Backstage Developer Portal

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         BACKSTAGE PLATFORM                              β”‚
β”‚                                                                         β”‚
β”‚   SOFTWARE CATALOG                                                      β”‚
β”‚   ────────────────                                                      β”‚
β”‚   Every service registered with:                                        β”‚
β”‚   - Owner (which squad)                                                 β”‚
β”‚   - Description and documentation                                       β”‚
β”‚   - API specifications                                                  β”‚
β”‚   - Dependencies                                                        β”‚
β”‚   - Health metrics                                                      β”‚
β”‚                                                                         β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
β”‚   β”‚  Software  β”‚  β”‚  TechDocs  β”‚  β”‚ Scaffolder β”‚  β”‚   Search   β”‚        β”‚
β”‚   β”‚ Templates  β”‚  β”‚            β”‚  β”‚            β”‚  β”‚            β”‚        β”‚
β”‚   β”‚            β”‚  β”‚ Docs as    β”‚  β”‚ Create new β”‚  β”‚ Find any   β”‚        β”‚
β”‚   β”‚ New serviceβ”‚  β”‚ code       β”‚  β”‚ services   β”‚  β”‚ service    β”‚        β”‚
β”‚   β”‚ wizard     β”‚  β”‚            β”‚  β”‚ from       β”‚  β”‚            β”‚        β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚
β”‚                                                                         β”‚
β”‚   100+ plugins: Kubernetes, GitHub, PagerDuty, Datadog...               β”‚
β”‚                                                                         β”‚
β”‚   Result: Engineer onboarding time cut in half                          β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Squad Model

SPOTIFY ORGANIZATIONAL MODEL

SQUAD (8-12 people)
─────────────────────
Cross-functional: Engineers, Designer, Product Owner
Owns: A feature or set of services end-to-end
Autonomy: Decides how to work (Scrum, Kanban, etc.)
Example: "Search Squad" owns search experience

TRIBE (40-100 people)
────────────────────
Collection of squads in related area
Example: "Music Discovery Tribe" includes Search, Browse, Radio

CHAPTER
───────
Specialists across squads within a tribe
Example: All backend engineers in Music Discovery
Led by Chapter Lead (career growth, standards)

GUILD
─────
Community of interest across company
Example: "Web Guild" - all web developers
Voluntary, knowledge sharing

Service Communication

# infrastructure/circuit_breaker.py

"""
Circuit Breaker for Service Calls

Protects against cascading failures.
"""

from dataclasses import dataclass
from datetime import datetime
from typing import Callable, Optional, Any


@dataclass
class CircuitBreakerConfig:
    failure_threshold: int = 5
    success_threshold: int = 3
    timeout_seconds: float = 30.0


class CircuitBreaker:
    """
    States: CLOSED β†’ OPEN β†’ HALF_OPEN β†’ CLOSED
    
    Applies concepts:
    - Week 2: Circuit breaker pattern
    - Week 2: Graceful degradation
    """
    
    def __init__(self, name: str, config: CircuitBreakerConfig):
        self.name = name
        self.config = config
        self.state = "CLOSED"
        self.failure_count = 0
        self.last_failure: Optional[datetime] = None
    
    async def call(
        self, func: Callable, *args,
        fallback: Optional[Callable] = None, **kwargs
    ) -> Any:
        """Execute with circuit breaker protection."""
        if self.state == "OPEN":
            if self._should_try_reset():
                self.state = "HALF_OPEN"
            elif fallback:
                return await fallback(*args, **kwargs)
            else:
                raise CircuitOpenError(self.name)
        
        try:
            result = await func(*args, **kwargs)
            self._on_success()
            return result
        except Exception:
            self._on_failure()
            if fallback:
                return await fallback(*args, **kwargs)
            raise
    
    def _on_success(self):
        if self.state == "HALF_OPEN":
            self.state = "CLOSED"
        self.failure_count = 0
    
    def _on_failure(self):
        self.failure_count += 1
        self.last_failure = datetime.utcnow()
        if self.failure_count >= self.config.failure_threshold:
            self.state = "OPEN"

Deep Dive 4: Event Processing

Interviewer: "How do you process 500 billion events per day?"

Event Pipeline

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    SPOTIFY EVENT PIPELINE                               β”‚
β”‚                                                                         β”‚
β”‚   SOURCES                                                               β”‚
β”‚   β”œβ”€β”€ Mobile/Desktop apps (plays, skips, searches)                      β”‚
β”‚   β”œβ”€β”€ Backend services (errors, latency)                                β”‚
β”‚   └── External (ad impressions)                                         β”‚
β”‚                                                                         β”‚
β”‚                              β”‚                                          β”‚
β”‚                              β–Ό                                          β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚   β”‚                    Google Cloud Pub/Sub                          β”‚  β”‚
β”‚   β”‚   - 1 trillion messages/day capacity                             β”‚  β”‚
β”‚   β”‚   - Partitioned by event type                                    β”‚  β”‚
β”‚   β”‚   - At-least-once delivery                                       β”‚  β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                              β”‚                                          β”‚
β”‚              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                          β”‚
β”‚              β–Ό                               β–Ό                          β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                   β”‚
β”‚   β”‚   STREAMING PATH   β”‚       β”‚     BATCH PATH     β”‚                   β”‚
β”‚   β”‚   (Dataflow)       β”‚       β”‚     (Dataproc)     β”‚                   β”‚
β”‚   β”‚                    β”‚       β”‚                    β”‚                   β”‚
β”‚   β”‚   Real-time:       β”‚       β”‚   Daily/weekly:    β”‚                   β”‚
β”‚   β”‚   - Trending now   β”‚       β”‚   - ML training    β”‚                   β”‚
β”‚   β”‚   - Live stats     β”‚       β”‚   - Reports        β”‚                   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                   β”‚
β”‚             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                              β”‚
β”‚                          β–Ό                                              β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚   β”‚                     STORAGE                                      β”‚  β”‚
β”‚   β”‚   Bigtable (features) β”‚ BigQuery (analytics) β”‚ GCS (archive)     β”‚  β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                                         β”‚
β”‚   Scale: 500B events/day, 70 TB/day, 10M+ BigQuery queries/month        β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Implementation

# data/event_processor.py

"""
Event Processing Pipeline

Processes user events for recommendations and analytics.
"""

from dataclasses import dataclass
from datetime import datetime, timedelta
from typing import Optional
import json


@dataclass
class StreamEndEvent:
    user_id: str
    content_id: str
    total_played_ms: int
    end_reason: str  # "completed", "skipped", "error"
    timestamp: datetime


class EventProcessor:
    """
    Processes events from Pub/Sub.
    
    Applies concepts:
    - Week 3: Stream processing
    - Week 8: Late-arriving data
    """
    
    def __init__(self, pubsub, bigtable, bigquery):
        self.pubsub = pubsub
        self.bigtable = bigtable
        self.bigquery = bigquery
        self.allowed_lateness = timedelta(hours=24)
    
    async def process(self, raw: bytes) -> None:
        event = self._parse(raw)
        if not event:
            return
        
        # Drop very late events
        if datetime.utcnow() - event.timestamp > self.allowed_lateness:
            return
        
        # Always write to BigQuery
        await self.bigquery.insert('events.streams', event)
        
        # Calculate engagement for recommendations
        if isinstance(event, StreamEndEvent):
            engagement = self._calculate_engagement(event)
            await self.pubsub.publish('taste-updates', {
                'user_id': event.user_id,
                'content_id': event.content_id,
                'engagement': engagement
            })
    
    def _calculate_engagement(self, event: StreamEndEvent) -> float:
        """
        Engagement scoring:
        - Completed: 1.0
        - Skipped early: -0.5
        - Partial: proportional
        """
        if event.end_reason == 'error':
            return 0.0
        if event.end_reason == 'skipped' and event.total_played_ms < 30000:
            return -0.5
        
        duration_ms = event.total_played_ms  # Simplified
        if duration_ms == 0:
            return 0.0
        
        completion = event.total_played_ms / duration_ms
        return min(1.0, completion)

Phase 5: Scaling and Edge Cases

Scaling for Major Releases

HANDLING TAYLOR SWIFT ALBUM DROP

PRE-EVENT (1 week before):
β”œβ”€β”€ Pre-cache album on ALL CDN nodes
β”œβ”€β”€ Scale API servers 2-3x
β”œβ”€β”€ Pre-compute recommendation updates
└── War room ready

RELEASE MOMENT:
β”œβ”€β”€ Feature flags for instant rollout
β”œβ”€β”€ CDN serves from cache
β”œβ”€β”€ Rate limiting protects backends
└── Async processing for non-critical

POST-RELEASE (first hour):
β”œβ”€β”€ Monitor error rates
β”œβ”€β”€ Auto-scale on demand
└── Graceful degradation if needed

Edge Cases

1. COLD START (NEW USERS)
   Solution: Onboarding questions, demographic signals,
   editorial playlists, rapid learning from first 10 songs

2. COLD START (NEW SONGS)
   Solution: Content-based + audio analysis,
   artist fan base, editorial placement

3. REGIONAL LICENSING
   Solution: Rights check at authorization,
   search excludes unavailable, VPN detection

4. NETWORK TRANSITIONS
   Solution: 20+ second buffer, prefetch next songs,
   offline cache, graceful quality degradation

Phase 6: Monitoring

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      SPOTIFY MONITORING DASHBOARD                       β”‚
β”‚                                                                         β”‚
β”‚   PLAYBACK HEALTH                                                       β”‚
β”‚   β”œβ”€β”€ Time to first audio (p50):    < 150ms        [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ] 142ms   β”‚
β”‚   β”œβ”€β”€ Time to first audio (p99):    < 500ms        [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘] 380ms   β”‚
β”‚   β”œβ”€β”€ Playback success rate:        > 99.9%        [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ] 99.94%  β”‚
β”‚   └── Rebuffer rate:                < 0.1%         [β–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘] 0.03%   β”‚
β”‚                                                                         β”‚
β”‚   CDN HEALTH                                                            β”‚
β”‚   β”œβ”€β”€ Cache hit rate:               > 95%          [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘] 96.2%   β”‚
β”‚   └── Edge latency (p99):           < 50ms         [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ] 28ms    β”‚
β”‚                                                                         β”‚
β”‚   API HEALTH                                                            β”‚
β”‚   β”œβ”€β”€ Request rate:                 ~500K/sec      [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘] 487K    β”‚
β”‚   β”œβ”€β”€ Error rate:                   < 0.1%         [β–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘] 0.04%   β”‚
β”‚   └── Latency (p99):                < 100ms        [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘] 67ms    β”‚
β”‚                                                                         β”‚
β”‚   DATA PIPELINE                                                         β”‚
β”‚   β”œβ”€β”€ Events ingested/sec:          ~6M            [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘] 5.8M    β”‚
β”‚   └── Processing lag:               < 5 min        [β–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘] 2.1 min β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

ALERTS:
  CRITICAL: Playback < 99%, API errors > 1%
  WARNING: Cache hit < 90%, Pub/Sub backlog > 10M

Interview Conclusion

Interviewer: "Excellent work. You've demonstrated strong understanding of streaming systems, personalization, and microservices architecture. Any questions?"

You: "Thank you! How does Spotify prevent filter bubbles in recommendations?"

Interviewer: "Great question. The 20% exploration ratio you mentioned is key, plus features like 'Enhance' that add variety."


Summary: Concepts Applied

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚          CONCEPTS FROM 10-WEEK COURSE IN SPOTIFY DESIGN                 β”‚
β”‚                                                                         β”‚
β”‚   WEEK 1: DATA AT SCALE                                                 β”‚
β”‚   β”œβ”€β”€ Partitioning: User data by user_id, events by timestamp           β”‚
β”‚   β”œβ”€β”€ Replication: Multi-region audio and user data                     β”‚
β”‚   └── Hot keys: Popular songs cached at all CDN edges                   β”‚
β”‚                                                                         β”‚
β”‚   WEEK 2: FAILURE-FIRST DESIGN                                          β”‚
β”‚   β”œβ”€β”€ Circuit breakers: Protect recommendation service                  β”‚
β”‚   β”œβ”€β”€ Timeouts: 30ms budget for authorization                           β”‚
β”‚   └── Graceful degradation: Popular content fallback                    β”‚
β”‚                                                                         β”‚
β”‚   WEEK 3: MESSAGING & ASYNC                                             β”‚
β”‚   β”œβ”€β”€ Pub/Sub: 500B+ events/day                                         β”‚
β”‚   └── Async processing: Recommendation updates                          β”‚
β”‚                                                                         β”‚
β”‚   WEEK 4: CACHING                                                       β”‚
β”‚   β”œβ”€β”€ Multi-tier: Client β†’ CDN β†’ Origin                                 β”‚
β”‚   └── Hot content: Always cached, never evicted                         β”‚
β”‚                                                                         β”‚
β”‚   WEEK 5: CONSISTENCY                                                   β”‚
β”‚   β”œβ”€β”€ Eventual: Recommendations, play counts                            β”‚
β”‚   └── Strong: Playlists, user library                                   β”‚
β”‚                                                                         β”‚
β”‚   WEEK 8: ANALYTICS PIPELINE                                            β”‚
β”‚   β”œβ”€β”€ Stream: Dataflow for real-time                                    β”‚
β”‚   └── Batch: Dataproc for ML training                                   β”‚
β”‚                                                                         β”‚
β”‚   WEEK 10: PRODUCTION READINESS                                         β”‚
β”‚   β”œβ”€β”€ SLOs: < 200ms playback, 99.99% availability                       β”‚
β”‚   └── Observability: Comprehensive metrics                              β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              WHY SPOTIFY IS AN ENGINEERING MARVEL                       β”‚
β”‚                                                                         β”‚
β”‚   SCALE                                                                 β”‚
β”‚   β€’ 713 million users across 184 countries                              β”‚
β”‚   β€’ 500 billion events processed daily                                  β”‚
β”‚   β€’ < 200ms time-to-first-audio globally                                β”‚
β”‚                                                                         β”‚
β”‚   PERSONALIZATION                                                       β”‚
β”‚   β€’ 33% of streams from recommendations                                 β”‚
β”‚   β€’ Discover Weekly: 40M+ weekly users                                  β”‚
β”‚   β€’ Three-model hybrid approach                                         β”‚
β”‚                                                                         β”‚
β”‚   ENGINEERING CULTURE                                                   β”‚
β”‚   β€’ Squad model influenced entire industry                              β”‚
β”‚   β€’ Backstage: Open-sourced developer portal                            β”‚
β”‚   β€’ 2,000+ microservices, autonomous teams                              β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Sources

Official:

Statistics:

Architecture:

Organization:


Self-Assessment Checklist

After studying this case study, you should be able to:

  • Design multi-tier caching for media delivery
  • Explain collaborative filtering and matrix factorization
  • Design hybrid recommendation systems
  • Handle cold start for users and content
  • Architect event pipelines at 500B+ events/day
  • Implement circuit breakers and graceful degradation
  • Design URL signing for CDN authentication
  • Explain streaming vs batch processing trade-offs
  • Design for predictable and unpredictable traffic spikes
  • Articulate the squad model for team organization
  • Design developer portals for microservices

This case study covers Spotify's architecture for delivering personalized audio experiences at massive scale.