Bonus Problem 7: Spotify
The World's Most Personalized Music Experience at Scale
π΅ How Do You Make 713 Million People Feel Like You Built an App Just for Them?
Imagine this challenge: You need to serve 100+ million songs to 713 million users across 184 countries. Every user expects instant playbackβless than 200ms from tap to music. Every user expects recommendations that feel eerily personal. And you need to do this with an engineering team that pioneered the "squad" model of autonomous teams.
This is Spotifyβand it's not just a music app. It's a masterclass in personalization at scale, microservices architecture, and building developer platforms that 7,300 engineers can use without stepping on each other's toes.
THE SPOTIFY SCALE (2025)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β USERS β
β βββββ β
β Monthly Active Users: 713 Million β
β Premium Subscribers: 281 Million β
β Conversion Rate: ~40% (industry-leading) β
β β
β CONTENT β
β βββββββ β
β Songs: 100+ Million β
β Podcasts: ~7 Million titles β
β Audiobooks: 350,000+ β
β Playlists: 4+ Billion (user-created) β
β β
β ENGAGEMENT β
β ββββββββββ β
β Average listening time: 114 minutes/day β
β Discover Weekly users: 40+ Million weekly β
β Streams from recommendations: ~33% of all plays β
β β
β INFRASTRUCTURE β
β ββββββββββββββ β
β Markets: 184 countries β
β Employees: ~7,300 β
β Microservices: 2,000+ backend services β
β Daily events processed: 500+ Billion β
β β
β BUSINESS β
β ββββββββ β
β 2024 Revenue: β¬15.67 Billion β
β Market Share: 31.7% (global music streaming) β
β Royalties paid (2024): $10 Billion to artists β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
This is the system we'll design todayβand discover how Spotify built the world's most personalized audio experience.
The Interview Begins
You walk into the interview room. The interviewer smiles and gestures to the whiteboard.
Interviewer: "Thanks for coming in. Today we're going to design a music streaming service like Spotify. I'm interested in how you think about scale, personalization, and delivering a seamless listening experience. Please think out loudβthis is collaborative."
They write on the whiteboard:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β Design a Music Streaming Platform β
β β
β Requirements: β
β - Stream audio to hundreds of millions of users globally β
β - Near-instant playback (< 200ms to first audio) β
β - Highly personalized recommendations β
β - Search across 100M+ tracks β
β - Support both free (ad-supported) and premium tiers β
β - Handle massive catalog: songs, podcasts, audiobooks β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Interviewer: "Take a few minutes to think about this, then walk me through your approach. We have about 45 minutes."
Phase 1: Requirements Clarification (5 minutes)
Before diving in, you take a breath and start asking questions.
Your Questions
You: "Before I start designing, I'd like to clarify a few requirements. First, what's our target scaleβhow many concurrent users should we support?"
Interviewer: "At peak, we might have 50 million concurrent listeners. Average is around 20-30 million."
You: "For playback latency, what's acceptable? I want to understand the user experience bar."
Interviewer: "Users tap a song and expect music within 200 milliseconds. No buffering during playback except on very poor connections."
You: "How personalized are the recommendations? Are we talking basic 'similar artists' or deeply personal like Discover Weekly?"
Interviewer: "Deeply personal. We want users to feel the app knows their taste better than they do. Recommendations should work even for new users with minimal listening history."
You: "What about offline playback? Do premium users expect to download music?"
Interviewer: "Yes, premium users can download. That's a key differentiator from the free tier."
You: "Perfect. Let me summarize the requirements as I understand them."
Functional Requirements
1. AUDIO PLAYBACK
- Stream songs, podcasts, and audiobooks
- Support multiple quality levels (96kbps to 320kbps)
- Adaptive bitrate based on network conditions
- Offline download for premium users
- Gapless playback between tracks
2. DISCOVERY & SEARCH
- Full-text search across songs, artists, albums, podcasts
- Typo tolerance and autocomplete
- Browse by genre, mood, activity
- Personalized recommendations
3. USER FEATURES
- Create and manage playlists
- Follow artists and friends
- Like/save songs to library
- View listening history
4. MONETIZATION
- Free tier with ads
- Premium tier (ad-free, offline, higher quality)
- Family and student plans
Non-Functional Requirements
1. SCALE
- 50 million concurrent users (peak)
- 100+ million songs in catalog
- 500+ billion events/day for analytics
- 1+ billion streams/day
2. LATENCY
- Playback start: < 200ms
- Search results: < 100ms
- API responses: < 50ms (p99)
3. AVAILABILITY
- 99.99% uptime for playback
- Graceful degradation (recommendations can fail, playback must not)
4. DATA
- Strong consistency for user data (playlists, library)
- Eventual consistency acceptable for recommendations
Phase 2: Back of the Envelope Estimation (5 minutes)
You: "Let me work through the numbers to understand the scale."
Traffic Estimation
STREAMING TRAFFIC
Base numbers:
Daily Active Users: 250 million
Songs per user per day: 20 songs (average)
Average song length: 3.5 minutes
Average file size (160 kbps): ~4 MB per song
Daily calculations:
Total streams/day: 250M Γ 20 = 5 billion streams
Streams per second: 5B Γ· 86,400 = ~58,000 streams/sec
Peak (3x average): ~175,000 streams/sec
Bandwidth:
Data per stream: ~4 MB
Daily data transfer: 5B Γ 4 MB = 20 PB/day
Peak bandwidth: ~3.5 Tbps
Storage Estimation
AUDIO STORAGE
Song catalog:
Total songs: 100 million
Average song (all quality levels): ~25 MB (multiple encodings)
Total audio storage: 100M Γ 25 MB = 2.5 PB
User data:
Users: 713 million
Per user (playlists, library): ~50 KB average
Total user data: ~35 TB
Event data (analytics):
Events per day: 500 billion
Daily event storage: ~250 TB/day
Yearly (compressed): ~10 PB
Key Metrics Summary
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ESTIMATION SUMMARY β
β β
β TRAFFIC β
β βββ Peak streams: 175,000 /second β
β βββ Daily streams: 5 billion β
β βββ API requests: ~500,000 /second (peak) β
β β
β STORAGE β
β βββ Audio catalog: 2.5 PB β
β βββ User data: 35 TB β
β βββ Event data: ~10 PB/year β
β β
β BANDWIDTH β
β βββ Peak egress: 3.5 Tbps β
β βββ Daily transfer: 20 PB β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Phase 3: High-Level Design (10 minutes)
You: "Now let me sketch out the high-level architecture."
System Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SPOTIFY HIGH-LEVEL ARCHITECTURE β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β CLIENTS β β
β β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β β
β β β iOS β β Android β β Desktop β β Web β β β
β β ββββββ¬βββββ ββββββ¬βββββ ββββββ¬βββββ ββββββ¬βββββ β β
β βββββββββββΌβββββββββββββΌβββββββββββββΌβββββββββββββΌββββββββββββββββ β
β ββββββββββββββ΄ββββββ¬βββββββ΄βββββββββββββ β
β β β
β ββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ β
β β βΌ β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β CDN (Fastly/Akamai) β β β
β β β Audio delivery, static assets, caching β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β API Gateway β β β
β β β Rate limiting, auth, routing, load balancing β β β
β β ββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββ β β
β β β β β
β β ββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββ β β
β β β MICROSERVICES LAYER β β β
β β β β β β
β β β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β β β
β β β β User β β Catalog β β Search β β Stream β β β β
β β β β Service β β Service β β Service β β Service β β β β
β β β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β β β
β β β β β β
β β β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β β β
β β β βPlaylist β β Reco β β Social β β Ad β β β β
β β β β Service β β Service β β Service β β Service β β β β
β β β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β DATA LAYER β β β
β β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β β β
β β β β Postgres β βCassandra β β Redis β βBigQuery β β β β
β β β β (Users) β β (Events) β β (Cache) β β(Analytics)β β β β
β β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β β β
β β β ββββββββββββ ββββββββββββ ββββββββββββ β β β
β β β β GCS β βElastic β β Bigtable β β β β
β β β β (Audio) β β (Search) β β (Reco) β β β β
β β β ββββββββββββ ββββββββββββ ββββββββββββ β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β EVENT STREAMING β β β
β β β ββββββββββββ ββββββββββββ ββββββββββββ β β β
β β β β Pub/Sub β β Dataflow β β ML β β β β
β β β β (Events) β β (Process)β β Pipeline β β β β
β β β ββββββββββββ ββββββββββββ ββββββββββββ β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β GOOGLE CLOUD PLATFORM β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Data Flow: Playing a Song
PLAYBACK FLOW
1. TAP TO PLAY
Client sends request to API Gateway
2. AUTHORIZATION
Stream Service checks:
- User subscription tier
- Regional availability
- Content licensing
3. GENERATE SIGNED URL
URL with token, expires in 1 hour
https://cdn.spotify.com/audio/abc123?token=xyz&expires=...
4. STREAM FROM CDN
- CDN edge serves if cached (99%+ hit rate for popular)
- Falls back to GCS origin if cache miss
- Adaptive bitrate based on network
5. TRACK EVENTS
Client sends play/skip events to Pub/Sub
β Analytics pipeline β Recommendations
TOTAL LATENCY: < 200ms to first audio
Phase 4: Deep Dives (20 minutes)
Interviewer: "Let's dive deeper. Tell me about the recommendation system."
Deep Dive 1: Recommendation System
The Problem
PERSONALIZATION CHALLENGE
Scale:
- 713 million users with unique tastes
- 100+ million songs to choose from
- Generate personalized playlists for each user
Cold start problem:
- New users: No listening history
- New songs: No user interactions
Exploration vs exploitation:
- Recommend what users will like (exploitation)
- Introduce new music (exploration)
The Solution: Hybrid Recommendation Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β THREE RECOMMENDATION MODELS β
β β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
β β COLLABORATIVE β β CONTENT β β AUDIO β β
β β FILTERING β β BASED β β ANALYSIS β β
β β β β β β β β
β β "Users like you β β "Songs with β β "Songs that β β
β β also liked..." β β similar tags" β β sound similar" β β
β β β β β β β β
β β Matrix β β NLP on blogs, β β CNN on audio β β
β β factorization β β reviews, titles β β spectrograms β β
β ββββββββββ¬βββββββββ ββββββββββ¬βββββββββ ββββββββββ¬βββββββββ β
β β β β β
β ββββββββββββββββββββββΌβββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββ β
β β ENSEMBLE MODEL β β
β β Combines all signals with learned β β
β β weights from A/B testing β β
β ββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Collaborative Filtering
MATRIX FACTORIZATION
User-Song Matrix (sparse):
Song1 Song2 Song3 ... Song100M
User1 [ 5 0 3 ... 2 ]
User2 [ 0 4 0 ... 0 ]
User3 [ 3 0 5 ... 4 ]
...
Decompose into:
- User vectors: 713M Γ 128 dimensions
- Song vectors: 100M Γ 128 dimensions
Similarity = dot_product(user_vector, song_vector)
For Discover Weekly:
1. Find users with similar taste profiles
2. Find songs those users love that you haven't heard
3. Rank by predicted engagement
Audio Analysis (CNN)
CONVOLUTIONAL NEURAL NETWORKS
Input: Audio spectrogram
Process:
Raw audio β Mel spectrogram β CNN β 128-dim feature vector
Output features:
- Tempo, key, mode
- Danceability, energy, valence
- Acousticness, instrumentalness
Use case: Cold start for new songs
- Works even with zero user interaction data
- Powers "song radio" feature
Implementation
# recommendations/discover_weekly.py
"""
Discover Weekly Generation
Hybrid recommendation combining collaborative filtering,
content-based filtering, and audio analysis.
"""
from dataclasses import dataclass
from typing import List, Dict, Set
import numpy as np
@dataclass
class UserProfile:
user_id: str
embedding: np.ndarray # 128-dim taste vector
top_artists: List[str]
top_genres: List[str]
listening_history: Set[str]
class DiscoverWeeklyGenerator:
"""
Generates personalized weekly playlists.
Applies concepts:
- Week 5: Eventual consistency for recommendations
- Week 8: Batch processing pipeline
"""
def __init__(self, user_store, song_store, similar_users_index):
self.user_store = user_store
self.song_store = song_store
self.similar_users_index = similar_users_index
# Weights from A/B testing
self.weights = {
'collaborative': 0.4,
'content': 0.3,
'audio': 0.2,
'popularity': 0.1,
}
self.playlist_size = 30
self.exploration_ratio = 0.2
async def generate_playlist(self, user_id: str) -> List[str]:
"""Generate Discover Weekly for a user."""
user = await self.user_store.get_profile(user_id)
# Get candidates from collaborative filtering
collab_candidates = await self._get_collaborative_candidates(user)
# Get candidates from content similarity
content_candidates = await self._get_content_candidates(user)
# Merge and score
all_candidates = self._merge_candidates(
collab_candidates, content_candidates
)
scored = await self._score_candidates(user, all_candidates)
# Filter already-heard songs
filtered = [
(song_id, score) for song_id, score in scored
if song_id not in user.listening_history
]
# Exploitation: top songs
exploit_count = int(self.playlist_size * (1 - self.exploration_ratio))
exploit_songs = [s for s, _ in filtered[:exploit_count]]
# Exploration: outside comfort zone
explore_songs = await self._get_exploration_tracks(
user,
count=self.playlist_size - exploit_count,
exclude=set(exploit_songs)
)
return self._shuffle_with_variety(exploit_songs + explore_songs)
async def _get_collaborative_candidates(
self, user: UserProfile
) -> Dict[str, float]:
"""Find songs from similar users."""
similar_users = await self.similar_users_index.find_similar(
user.embedding, k=1000
)
candidates = {}
for similar_user_id, similarity in similar_users:
loved_songs = await self.user_store.get_loved_songs(
similar_user_id, limit=100
)
for song_id, engagement in loved_songs:
score = similarity * engagement
candidates[song_id] = candidates.get(song_id, 0) + score
return candidates
async def _score_candidates(
self, user: UserProfile, candidates: Dict[str, float]
) -> List[tuple]:
"""Score using ensemble of all signals."""
scored = []
for song_id, base_score in candidates.items():
song = await self.song_store.get_song(song_id)
# Combine signals
final_score = (
self.weights['collaborative'] * base_score +
self.weights['content'] * self._content_similarity(user, song) +
self.weights['audio'] * self._audio_similarity(user, song) +
self.weights['popularity'] * song.popularity_score
)
scored.append((song_id, final_score))
return sorted(scored, key=lambda x: x[1], reverse=True)
Deep Dive 2: Audio Streaming
Interviewer: "How do you achieve < 200ms playback latency?"
Multi-Layer Caching
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AUDIO DELIVERY ARCHITECTURE β
β β
β LAYER 1: CLIENT CACHE β
β βββββββββββββββββββββ β
β - Recently played songs β
β - Prefetched next songs β
β - Offline downloads (Premium) β
β - 1-10 GB local storage β
β β
β LAYER 2: CDN EDGE (Fastly/Akamai) β
β βββββββββββββββββββββββββββββββββ β
β - 200+ global locations β
β - Top 20% of songs always cached (80% of plays) β
β - 99%+ cache hit rate for popular content β
β - Token validation at edge β
β β
β LAYER 3: ORIGIN (GCS) β
β βββββββββββββββββββββ β
β - All 100M+ songs β
β - Multiple quality encodings per song β
β - Multi-region replication β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Audio Quality Tiers
SPOTIFY AUDIO QUALITY
ββββββββββββββ¬ββββββββββββββ¬ββββββββββββββ¬ββββββββββββββββββββββββββββββββββ
β Tier β Bitrate β Codec β Use Case β
ββββββββββββββΌββββββββββββββΌββββββββββββββΌββββββββββββββββββββββββββββββββββ€
β Low β 24 kbps β AAC β Extreme data saving β
β Normal β 96 kbps β Ogg Vorbis β Default for free tier β
β High β 160 kbps β Ogg Vorbis β Free tier max / Premium default β
β Very High β 320 kbps β Ogg Vorbis β Premium audiophile β
β Lossless β ~1411 kbps β FLAC β Premium (2025 rollout) β
ββββββββββββββ΄ββββββββββββββ΄ββββββββββββββ΄ββββββββββββββββββββββββββββββββββ
Why Ogg Vorbis?
- Open source (no licensing fees)
- Better quality than MP3 at same bitrate
Implementation
# streaming/playback_service.py
"""
Playback Authorization and URL Signing
Optimized for < 30ms authorization latency.
"""
from dataclasses import dataclass
from typing import List
from datetime import datetime, timedelta
import hmac
import hashlib
import base64
@dataclass
class PlaybackResponse:
stream_url: str
expires_at: datetime
quality: str
prefetch_urls: List[str]
class PlaybackService:
"""
Handles playback authorization.
Applies concepts:
- Week 4: Multi-tier caching
- Week 2: Timeout management
"""
def __init__(self, rights_service, catalog_service, cdn_config):
self.rights = rights_service
self.catalog = catalog_service
self.cdn_base = cdn_config['base_url']
self.signing_key = cdn_config['signing_key']
async def get_playback_url(
self, user_id: str, song_id: str, quality: str
) -> PlaybackResponse:
"""Generate signed streaming URL."""
import asyncio
# Parallel checks for speed
rights_ok, song = await asyncio.gather(
self.rights.check(user_id, song_id),
self.catalog.get_song(song_id)
)
if not rights_ok:
raise PlaybackNotAllowedError()
# Generate signed URL
file_id = f"{song_id}_{quality}.ogg"
expires = datetime.utcnow() + timedelta(hours=1)
url = self._sign_url(file_id, expires)
# Prefetch hints for next songs
prefetch = await self._get_prefetch_urls(user_id, song_id, quality)
return PlaybackResponse(
stream_url=url,
expires_at=expires,
quality=quality,
prefetch_urls=prefetch
)
def _sign_url(self, file_id: str, expires: datetime) -> str:
"""HMAC-signed CDN URL."""
expires_ts = int(expires.timestamp())
message = f"/audio/{file_id}:{expires_ts}"
sig = hmac.new(
self.signing_key.encode(),
message.encode(),
hashlib.sha256
).digest()
sig_b64 = base64.urlsafe_b64encode(sig).decode()
return f"{self.cdn_base}/audio/{file_id}?expires={expires_ts}&sig={sig_b64}"
Deep Dive 3: Microservices & Backstage
Interviewer: "How do you manage 2,000+ microservices?"
Backstage Developer Portal
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β BACKSTAGE PLATFORM β
β β
β SOFTWARE CATALOG β
β ββββββββββββββββ β
β Every service registered with: β
β - Owner (which squad) β
β - Description and documentation β
β - API specifications β
β - Dependencies β
β - Health metrics β
β β
β ββββββββββββββ ββββββββββββββ ββββββββββββββ ββββββββββββββ β
β β Software β β TechDocs β β Scaffolder β β Search β β
β β Templates β β β β β β β β
β β β β Docs as β β Create new β β Find any β β
β β New serviceβ β code β β services β β service β β
β β wizard β β β β from β β β β
β ββββββββββββββ ββββββββββββββ ββββββββββββββ ββββββββββββββ β
β β
β 100+ plugins: Kubernetes, GitHub, PagerDuty, Datadog... β
β β
β Result: Engineer onboarding time cut in half β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Squad Model
SPOTIFY ORGANIZATIONAL MODEL
SQUAD (8-12 people)
βββββββββββββββββββββ
Cross-functional: Engineers, Designer, Product Owner
Owns: A feature or set of services end-to-end
Autonomy: Decides how to work (Scrum, Kanban, etc.)
Example: "Search Squad" owns search experience
TRIBE (40-100 people)
ββββββββββββββββββββ
Collection of squads in related area
Example: "Music Discovery Tribe" includes Search, Browse, Radio
CHAPTER
βββββββ
Specialists across squads within a tribe
Example: All backend engineers in Music Discovery
Led by Chapter Lead (career growth, standards)
GUILD
βββββ
Community of interest across company
Example: "Web Guild" - all web developers
Voluntary, knowledge sharing
Service Communication
# infrastructure/circuit_breaker.py
"""
Circuit Breaker for Service Calls
Protects against cascading failures.
"""
from dataclasses import dataclass
from datetime import datetime
from typing import Callable, Optional, Any
@dataclass
class CircuitBreakerConfig:
failure_threshold: int = 5
success_threshold: int = 3
timeout_seconds: float = 30.0
class CircuitBreaker:
"""
States: CLOSED β OPEN β HALF_OPEN β CLOSED
Applies concepts:
- Week 2: Circuit breaker pattern
- Week 2: Graceful degradation
"""
def __init__(self, name: str, config: CircuitBreakerConfig):
self.name = name
self.config = config
self.state = "CLOSED"
self.failure_count = 0
self.last_failure: Optional[datetime] = None
async def call(
self, func: Callable, *args,
fallback: Optional[Callable] = None, **kwargs
) -> Any:
"""Execute with circuit breaker protection."""
if self.state == "OPEN":
if self._should_try_reset():
self.state = "HALF_OPEN"
elif fallback:
return await fallback(*args, **kwargs)
else:
raise CircuitOpenError(self.name)
try:
result = await func(*args, **kwargs)
self._on_success()
return result
except Exception:
self._on_failure()
if fallback:
return await fallback(*args, **kwargs)
raise
def _on_success(self):
if self.state == "HALF_OPEN":
self.state = "CLOSED"
self.failure_count = 0
def _on_failure(self):
self.failure_count += 1
self.last_failure = datetime.utcnow()
if self.failure_count >= self.config.failure_threshold:
self.state = "OPEN"
Deep Dive 4: Event Processing
Interviewer: "How do you process 500 billion events per day?"
Event Pipeline
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SPOTIFY EVENT PIPELINE β
β β
β SOURCES β
β βββ Mobile/Desktop apps (plays, skips, searches) β
β βββ Backend services (errors, latency) β
β βββ External (ad impressions) β
β β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Google Cloud Pub/Sub β β
β β - 1 trillion messages/day capacity β β
β β - Partitioned by event type β β
β β - At-least-once delivery β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βββββββββββββββββ΄ββββββββββββββββ β
β βΌ βΌ β
β ββββββββββββββββββββββ ββββββββββββββββββββββ β
β β STREAMING PATH β β BATCH PATH β β
β β (Dataflow) β β (Dataproc) β β
β β β β β β
β β Real-time: β β Daily/weekly: β β
β β - Trending now β β - ML training β β
β β - Live stats β β - Reports β β
β βββββββββββ¬βββββββββββ βββββββββββ¬βββββββββββ β
β ββββββββββββββ¬ββββββββββββββββ β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β STORAGE β β
β β Bigtable (features) β BigQuery (analytics) β GCS (archive) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Scale: 500B events/day, 70 TB/day, 10M+ BigQuery queries/month β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Implementation
# data/event_processor.py
"""
Event Processing Pipeline
Processes user events for recommendations and analytics.
"""
from dataclasses import dataclass
from datetime import datetime, timedelta
from typing import Optional
import json
@dataclass
class StreamEndEvent:
user_id: str
content_id: str
total_played_ms: int
end_reason: str # "completed", "skipped", "error"
timestamp: datetime
class EventProcessor:
"""
Processes events from Pub/Sub.
Applies concepts:
- Week 3: Stream processing
- Week 8: Late-arriving data
"""
def __init__(self, pubsub, bigtable, bigquery):
self.pubsub = pubsub
self.bigtable = bigtable
self.bigquery = bigquery
self.allowed_lateness = timedelta(hours=24)
async def process(self, raw: bytes) -> None:
event = self._parse(raw)
if not event:
return
# Drop very late events
if datetime.utcnow() - event.timestamp > self.allowed_lateness:
return
# Always write to BigQuery
await self.bigquery.insert('events.streams', event)
# Calculate engagement for recommendations
if isinstance(event, StreamEndEvent):
engagement = self._calculate_engagement(event)
await self.pubsub.publish('taste-updates', {
'user_id': event.user_id,
'content_id': event.content_id,
'engagement': engagement
})
def _calculate_engagement(self, event: StreamEndEvent) -> float:
"""
Engagement scoring:
- Completed: 1.0
- Skipped early: -0.5
- Partial: proportional
"""
if event.end_reason == 'error':
return 0.0
if event.end_reason == 'skipped' and event.total_played_ms < 30000:
return -0.5
duration_ms = event.total_played_ms # Simplified
if duration_ms == 0:
return 0.0
completion = event.total_played_ms / duration_ms
return min(1.0, completion)
Phase 5: Scaling and Edge Cases
Scaling for Major Releases
HANDLING TAYLOR SWIFT ALBUM DROP
PRE-EVENT (1 week before):
βββ Pre-cache album on ALL CDN nodes
βββ Scale API servers 2-3x
βββ Pre-compute recommendation updates
βββ War room ready
RELEASE MOMENT:
βββ Feature flags for instant rollout
βββ CDN serves from cache
βββ Rate limiting protects backends
βββ Async processing for non-critical
POST-RELEASE (first hour):
βββ Monitor error rates
βββ Auto-scale on demand
βββ Graceful degradation if needed
Edge Cases
1. COLD START (NEW USERS)
Solution: Onboarding questions, demographic signals,
editorial playlists, rapid learning from first 10 songs
2. COLD START (NEW SONGS)
Solution: Content-based + audio analysis,
artist fan base, editorial placement
3. REGIONAL LICENSING
Solution: Rights check at authorization,
search excludes unavailable, VPN detection
4. NETWORK TRANSITIONS
Solution: 20+ second buffer, prefetch next songs,
offline cache, graceful quality degradation
Phase 6: Monitoring
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SPOTIFY MONITORING DASHBOARD β
β β
β PLAYBACK HEALTH β
β βββ Time to first audio (p50): < 150ms [ββββββββββ] 142ms β
β βββ Time to first audio (p99): < 500ms [ββββββββββ] 380ms β
β βββ Playback success rate: > 99.9% [ββββββββββ] 99.94% β
β βββ Rebuffer rate: < 0.1% [ββββββββββ] 0.03% β
β β
β CDN HEALTH β
β βββ Cache hit rate: > 95% [ββββββββββ] 96.2% β
β βββ Edge latency (p99): < 50ms [ββββββββββ] 28ms β
β β
β API HEALTH β
β βββ Request rate: ~500K/sec [ββββββββββ] 487K β
β βββ Error rate: < 0.1% [ββββββββββ] 0.04% β
β βββ Latency (p99): < 100ms [ββββββββββ] 67ms β
β β
β DATA PIPELINE β
β βββ Events ingested/sec: ~6M [ββββββββββ] 5.8M β
β βββ Processing lag: < 5 min [ββββββββββ] 2.1 min β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ALERTS:
CRITICAL: Playback < 99%, API errors > 1%
WARNING: Cache hit < 90%, Pub/Sub backlog > 10M
Interview Conclusion
Interviewer: "Excellent work. You've demonstrated strong understanding of streaming systems, personalization, and microservices architecture. Any questions?"
You: "Thank you! How does Spotify prevent filter bubbles in recommendations?"
Interviewer: "Great question. The 20% exploration ratio you mentioned is key, plus features like 'Enhance' that add variety."
Summary: Concepts Applied
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CONCEPTS FROM 10-WEEK COURSE IN SPOTIFY DESIGN β
β β
β WEEK 1: DATA AT SCALE β
β βββ Partitioning: User data by user_id, events by timestamp β
β βββ Replication: Multi-region audio and user data β
β βββ Hot keys: Popular songs cached at all CDN edges β
β β
β WEEK 2: FAILURE-FIRST DESIGN β
β βββ Circuit breakers: Protect recommendation service β
β βββ Timeouts: 30ms budget for authorization β
β βββ Graceful degradation: Popular content fallback β
β β
β WEEK 3: MESSAGING & ASYNC β
β βββ Pub/Sub: 500B+ events/day β
β βββ Async processing: Recommendation updates β
β β
β WEEK 4: CACHING β
β βββ Multi-tier: Client β CDN β Origin β
β βββ Hot content: Always cached, never evicted β
β β
β WEEK 5: CONSISTENCY β
β βββ Eventual: Recommendations, play counts β
β βββ Strong: Playlists, user library β
β β
β WEEK 8: ANALYTICS PIPELINE β
β βββ Stream: Dataflow for real-time β
β βββ Batch: Dataproc for ML training β
β β
β WEEK 10: PRODUCTION READINESS β
β βββ SLOs: < 200ms playback, 99.99% availability β
β βββ Observability: Comprehensive metrics β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β WHY SPOTIFY IS AN ENGINEERING MARVEL β
β β
β SCALE β
β β’ 713 million users across 184 countries β
β β’ 500 billion events processed daily β
β β’ < 200ms time-to-first-audio globally β
β β
β PERSONALIZATION β
β β’ 33% of streams from recommendations β
β β’ Discover Weekly: 40M+ weekly users β
β β’ Three-model hybrid approach β
β β
β ENGINEERING CULTURE β
β β’ Squad model influenced entire industry β
β β’ Backstage: Open-sourced developer portal β
β β’ 2,000+ microservices, autonomous teams β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Sources
Official:
- Spotify Engineering Blog: https://engineering.atspotify.com/
- Backstage: https://backstage.io/
- Spotify for Developers: https://developer.spotify.com/
Statistics:
- Backlinko - Spotify Statistics 2025: https://backlinko.com/spotify-users
- DemandSage - Spotify Stats: https://www.demandsage.com/spotify-stats/
Architecture:
- Google Cloud - Spotify Case Study: https://cloud.google.com/customers/spotify
- Spotify CDN: https://engineering.atspotify.com/2020/02/how-spotify-aligned-cdn-services-for-a-lightning-fast-streaming-experience
Organization:
- Spotify Squad Model: https://blog.crisp.se/wp-content/uploads/2012/11/SpotifyScaling.pdf
Self-Assessment Checklist
After studying this case study, you should be able to:
- Design multi-tier caching for media delivery
- Explain collaborative filtering and matrix factorization
- Design hybrid recommendation systems
- Handle cold start for users and content
- Architect event pipelines at 500B+ events/day
- Implement circuit breakers and graceful degradation
- Design URL signing for CDN authentication
- Explain streaming vs batch processing trade-offs
- Design for predictable and unpredictable traffic spikes
- Articulate the squad model for team organization
- Design developer portals for microservices
This case study covers Spotify's architecture for delivering personalized audio experiences at massive scale.