Himanshu Kukreja
0%

Bonus Problem 10: Twitter/X

The Real-Time Information Network That Delivers 500 Million Tweets Per Day


🐦 How Do You Deliver a Celebrity's Tweet to 100 Million Followers in Under 5 Seconds?

Imagine this challenge: A celebrity with 100 million followers posts a tweet. Within 5 seconds, that tweet must appear in the home timeline of every single follower. Meanwhile, 500 million other tweets are being posted that same day, each needing to reach their own audiences.

This is Twitter/Xβ€”a real-time information network where a single tweet can trigger billions of write operations. It's not just a social network. It's a masterclass in fan-out architecture, timeline construction, and solving the "celebrity problem" at scale.

THE TWITTER/X SCALE (2025)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                                          β”‚
β”‚   USERS                                                                  β”‚
β”‚   ─────                                                                  β”‚
β”‚   Monthly Active Users:          ~600 Million                            β”‚
β”‚   Daily Active Users:            ~250 Million                            β”‚
β”‚   Tweets per day:                500+ Million                            β”‚
β”‚   Tweets per second (avg):       ~6,000                                  β”‚
β”‚   Tweets per second (peak):      ~150,000+                               β”‚
β”‚                                                                          β”‚
β”‚   TIMELINES                                                              β”‚
β”‚   ─────────                                                              β”‚
β”‚   Timeline reads (QPS):          300,000+                                β”‚
β”‚   Timeline writes (QPS):         ~6,000                                  β”‚
β”‚   Read/Write ratio:              50:1                                    β”‚
β”‚   Target delivery time:          < 5 seconds                             β”‚
β”‚                                                                          β”‚
β”‚   SOCIAL GRAPH                                                           β”‚
β”‚   ────────────                                                           β”‚
β”‚   Average followers:             ~700                                    β”‚
β”‚   Celebrity followers:           100M+ (top accounts)                    β”‚
β”‚   Follow relationships:          Hundreds of billions                    β”‚
β”‚                                                                          β”‚
β”‚   INFRASTRUCTURE                                                         β”‚
β”‚   ──────────────                                                         β”‚
β”‚   Timeline cache size:           800 tweets per user                     β”‚
β”‚   Redis clusters:                Thousands of nodes                      β”‚
β”‚   Cache replication:             3x per timeline                         β”‚
β”‚   Data centers:                  Multiple (hybrid cloud)                 β”‚
β”‚                                                                          β”‚
β”‚   TOP ACCOUNTS (Feb 2025)                                                β”‚
β”‚   ───────────────────────                                                β”‚
β”‚   @elonmusk:                     215M followers                          β”‚
β”‚   @barackobama:                  131M followers                          β”‚
β”‚   @justinbieber:                 111M followers                          β”‚
β”‚   @cristiano:                    112M followers                          β”‚
β”‚                                                                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

This is the system we'll design todayβ€”and discover how Twitter solved one of the hardest problems in distributed systems: the fan-out problem.


The Interview Begins

You walk into the interview room. The interviewer smiles and gestures to the whiteboard.

Interviewer: "Thanks for coming in. Today we're going to design a social network timeline system like Twitter. I'm particularly interested in how you'd handle the fan-out problemβ€”delivering tweets to millions of followers in real-time. Please think out loud."

They write on the whiteboard:

╔══════════════════════════════════════════════════════════════════════════╗
β•‘                                                                          β•‘
β•‘                    Design a Twitter-like Timeline                        β•‘
β•‘                                                                          β•‘
β•‘   Requirements:                                                          β•‘
β•‘   - Users can post tweets (short messages)                               β•‘
β•‘   - Users can follow other users                                         β•‘
β•‘   - Users see a home timeline with tweets from people they follow        β•‘
β•‘   - Tweets should appear in followers' timelines within 5 seconds        β•‘
β•‘   - Handle celebrities with 100M+ followers                              β•‘
β•‘   - Support 500M+ tweets per day                                         β•‘
β•‘                                                                          β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

Interviewer: "Take a few minutes to think about this, then walk me through your approach. We have about 45 minutes."


Phase 1: Requirements Clarification (5 minutes)

Before diving in, you take a breath and start asking questions.

Your Questions

You: "Before I start designing, let me clarify some requirements. What's the expected scale in terms of users and tweets?"

Interviewer: "Think 500-600 million monthly active users, about 250 million daily active users, and 500 million tweets per day."

You: "For the timeline, how many tweets should we show and how fresh should they be?"

Interviewer: "The home timeline should show the most recent tweets from people you follow. Target delivery under 5 seconds for most users."

You: "What about the follower distribution? How do we handle users with millions of followers?"

Interviewer: "Great question. The top accounts have 100+ million followers. A tweet from them needs special handlingβ€”you can't fan out to 100 million timelines synchronously."

You: "Should we support real-time updates or is pull-based refresh acceptable?"

Interviewer: "Pull-based is fine for the home timeline, but we should discuss the trade-offs."

You: "Perfect. Let me summarize the requirements."

Functional Requirements

1. TWEET OPERATIONS
   - Create tweet (280 characters, media optional)
   - Delete tweet
   - Retweet / Quote tweet
   - Like tweet
   - Reply to tweet

2. FOLLOW OPERATIONS
   - Follow a user
   - Unfollow a user
   - View followers list
   - View following list

3. TIMELINE OPERATIONS
   - View home timeline (tweets from followed users)
   - View user timeline (tweets by specific user)
   - Refresh timeline to see new tweets

4. SEARCH & DISCOVERY
   - Search tweets by keyword/hashtag
   - Trending topics
   - User search

Non-Functional Requirements

1. LATENCY
   - Tweet delivery to timelines: < 5 seconds (p99)
   - Timeline read: < 200ms (p99)
   - Tweet post confirmation: < 500ms

2. AVAILABILITY
   - 99.99% for reads (timeline viewing)
   - 99.9% for writes (posting tweets)
   - Graceful degradation during peaks

3. SCALE
   - 500M tweets/day = ~6,000 tweets/second
   - 300K timeline reads/second
   - Peak: 10-20x during major events

4. CONSISTENCY
   - Eventual consistency acceptable for timelines
   - Strong consistency for tweet storage
   - Read-your-own-writes for tweet authors

Phase 2: Back of the Envelope Estimation (5 minutes)

You: "Let me work through the numbers to understand the scale."

Traffic Estimation

TWEET WRITE TRAFFIC

Tweets per day:                   500 million
Tweets per second (avg):          500M / 86,400 β‰ˆ 6,000 TPS
Peak multiplier:                  10x (major events)
Peak tweets per second:           60,000 TPS

TIMELINE READ TRAFFIC

Daily active users:               250 million
Timeline refreshes per user:      ~20/day
Timeline reads per day:           5 billion
Timeline reads per second:        5B / 86,400 β‰ˆ 58,000 QPS
Peak (with buffer):               300,000 QPS

Fan-Out Estimation

FAN-OUT CALCULATIONS

Average followers per user:       700
Tweets per second:                6,000
Fan-out writes per second:        6,000 Γ— 700 = 4.2 million

Celebrity problem:
  - Celebrity with 100M followers tweets
  - Fan-out to 100M timelines
  - At 100K writes/second = 1,000 seconds (16 minutes!)
  - THIS IS TOO SLOW β†’ Need hybrid approach

Storage Estimation

TWEET STORAGE

Tweet size (avg):                 ~1 KB (text, metadata, indices)
Tweets per day:                   500 million
Daily storage:                    500 GB/day
Yearly storage:                   ~180 TB/year

TIMELINE CACHE

Active users:                     250 million
Timeline entries per user:        800 (tweet IDs only)
Entry size:                       ~20 bytes (tweet_id + metadata)
Per-user timeline:                16 KB
Total timeline cache:             250M Γ— 16 KB = 4 TB
With 3x replication:              12 TB (fits in Redis cluster)

Key Metrics Summary

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         ESTIMATION SUMMARY                               β”‚
β”‚                                                                          β”‚
β”‚   TRAFFIC                                                                β”‚
β”‚   β”œβ”€β”€ Tweet writes:              6,000/second (60K peak)                 β”‚
β”‚   β”œβ”€β”€ Timeline reads:            300,000/second                          β”‚
β”‚   └── Fan-out writes:            4.2 million/second                      β”‚
β”‚                                                                          β”‚
β”‚   STORAGE                                                                β”‚
β”‚   β”œβ”€β”€ Tweet storage:             500 GB/day                              β”‚
β”‚   β”œβ”€β”€ Timeline cache:            12 TB (Redis)                           β”‚
β”‚   └── Social graph:              Hundreds of GB                          β”‚
β”‚                                                                          β”‚
β”‚   KEY INSIGHT                                                            β”‚
β”‚   ───────────                                                            β”‚
β”‚   Read/write ratio is 50:1                                               β”‚
β”‚   β†’ Optimize for reads (pre-compute timelines)                           β”‚
β”‚   β†’ Accept higher write latency                                          β”‚
β”‚                                                                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Phase 3: High-Level Design (10 minutes)

You: "Now let me sketch out the high-level architecture."

System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     TWITTER HIGH-LEVEL ARCHITECTURE                      β”‚
β”‚                                                                          β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚   β”‚                         CLIENTS                                 β”‚    β”‚
β”‚   β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                 β”‚    β”‚
β”‚   β”‚    β”‚   Web    β”‚    β”‚   iOS    β”‚    β”‚ Android  β”‚                 β”‚    β”‚
β”‚   β”‚    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜                 β”‚    β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚             β”‚               β”‚               β”‚                            β”‚
β”‚             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                            β”‚
β”‚                             β–Ό                                            β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚                      API GATEWAY / LB                            β”‚   β”‚
β”‚   β”‚              Rate Limiting β”‚ Auth β”‚ Routing                      β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                β”‚                                         β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚                            β–Ό                                     β”‚   β”‚
β”‚   β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚   β”‚
β”‚   β”‚   β”‚                    WRITE PATH                              β”‚ β”‚   β”‚
β”‚   β”‚   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚ β”‚   β”‚
β”‚   β”‚   β”‚  β”‚   Tweet     β”‚    β”‚   Fan-Out   β”‚    β”‚   Social    β”‚     β”‚ β”‚   β”‚
β”‚   β”‚   β”‚  β”‚   Service   │───▢│   Service   │◀───│   Graph     β”‚     β”‚ β”‚   β”‚
β”‚   β”‚   β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚ β”‚   β”‚
β”‚   β”‚   β”‚         β”‚                  β”‚                               β”‚ β”‚   β”‚
β”‚   β”‚   β”‚         β–Ό                  β–Ό                               β”‚ β”‚   β”‚
β”‚   β”‚   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                        β”‚ β”‚   β”‚
β”‚   β”‚   β”‚  β”‚Tweet Storageβ”‚    β”‚  Timeline   β”‚                        β”‚ β”‚   β”‚
β”‚   β”‚   β”‚  β”‚ (Manhattan) β”‚    β”‚   Cache     β”‚                        β”‚ β”‚   β”‚
β”‚   β”‚   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚  (Redis)    β”‚                        β”‚ β”‚   β”‚
β”‚   β”‚   β”‚                     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                        β”‚ β”‚   β”‚
β”‚   β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚   β”‚
β”‚   β”‚                                                                  β”‚   β”‚
β”‚   β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚   β”‚
β”‚   β”‚   β”‚                    READ PATH                               β”‚ β”‚   β”‚
β”‚   β”‚   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚ β”‚   β”‚
β”‚   β”‚   β”‚  β”‚  Timeline   │◀───│  Timeline   │◀───│   Tweet     β”‚     β”‚ β”‚   β”‚
β”‚   β”‚   β”‚  β”‚   Service   β”‚    β”‚    Cache    β”‚    β”‚   Service   β”‚     β”‚ β”‚   β”‚
β”‚   β”‚   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚   (Redis)   β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚ β”‚   β”‚
β”‚   β”‚   β”‚         β”‚           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                        β”‚ β”‚   β”‚
β”‚   β”‚   β”‚         β–Ό                                                  β”‚ β”‚   β”‚
β”‚   β”‚   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚ β”‚   β”‚
β”‚   β”‚   β”‚  β”‚              CELEBRITY MERGE                        β”‚   β”‚ β”‚   β”‚
β”‚   β”‚   β”‚  β”‚    (Pull celebrity tweets at read time)             β”‚   β”‚ β”‚   β”‚
β”‚   β”‚   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚ β”‚   β”‚
β”‚   β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚   β”‚
β”‚   β”‚                                                                  β”‚   β”‚
β”‚   β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚   β”‚
β”‚   β”‚   β”‚                    SEARCH & TRENDS                         β”‚ β”‚   β”‚
β”‚   β”‚   β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚ β”‚   β”‚
β”‚   β”‚   β”‚  β”‚   Search    β”‚    β”‚  Trending   β”‚    β”‚ Elasticsearchβ”‚    β”‚ β”‚   β”‚
β”‚   β”‚   β”‚  β”‚   Service   β”‚    β”‚   Service   β”‚    β”‚   Cluster    β”‚    β”‚ β”‚   β”‚
β”‚   β”‚   β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚ β”‚   β”‚
β”‚   β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚   β”‚
β”‚   β”‚                                                                  β”‚   β”‚
β”‚   β”‚                       TWITTER SERVICES                           β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The Two Timeline Approaches

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                 FAN-OUT STRATEGIES COMPARED                             β”‚
β”‚                                                                         β”‚
β”‚   APPROACH 1: FAN-OUT ON WRITE (Push)                                   β”‚
β”‚   ─────────────────────────────────────                                 β”‚
β”‚                                                                         β”‚
β”‚   User A posts tweet                                                    β”‚
β”‚        β”‚                                                                β”‚
β”‚        β–Ό                                                                β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”‚
β”‚   β”‚ Fan-Out │────▢│ Write to all followers' timelines   β”‚               β”‚
β”‚   β”‚ Service β”‚     β”‚                                     β”‚               β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚  Follower 1 timeline ← tweet_id    β”‚                β”‚
β”‚                   β”‚  Follower 2 timeline ← tweet_id    β”‚                β”‚
β”‚                   β”‚  Follower 3 timeline ← tweet_id    β”‚                β”‚
β”‚                   β”‚  ...                                β”‚               β”‚
β”‚                   β”‚  Follower N timeline ← tweet_id    β”‚                β”‚
β”‚                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚
β”‚                                                                         β”‚
β”‚   βœ“ Fast reads (timeline already assembled)                             β”‚
β”‚   βœ— Slow writes for users with many followers                           β”‚
β”‚   βœ— Celebrity problem: 100M writes per tweet!                           β”‚
β”‚                                                                         β”‚
β”‚   ────────────────────────────────────────────────────────────────────  β”‚
β”‚                                                                         β”‚
β”‚   APPROACH 2: FAN-OUT ON READ (Pull)                                    β”‚
β”‚   ─────────────────────────────────────                                 β”‚
β”‚                                                                         β”‚
β”‚   User B requests timeline                                              β”‚
β”‚        β”‚                                                                β”‚
β”‚        β–Ό                                                                β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”‚
β”‚   β”‚ Timeline │────▢│ Look up all users B follows         β”‚              β”‚
β”‚   β”‚ Service  β”‚     β”‚ Fetch recent tweets from each       β”‚              β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚ Merge and sort by timestamp         β”‚              β”‚
β”‚                    β”‚ Return top N tweets                 β”‚              β”‚
β”‚                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚
β”‚                                                                         β”‚
β”‚   βœ“ Fast writes (just store the tweet)                                  β”‚
β”‚   βœ— Slow reads (must query many users)                                  β”‚
β”‚   βœ— Doesn't scale for users following many accounts                     β”‚
β”‚                                                                         β”‚
β”‚   ────────────────────────────────────────────────────────────────────  β”‚
β”‚                                                                         β”‚
β”‚   APPROACH 3: HYBRID (Twitter's actual approach)                        β”‚
β”‚   ──────────────────────────────────────────────                        β”‚
β”‚                                                                         β”‚
β”‚   Regular users (< 10K followers): Fan-out on write                     β”‚
β”‚   Celebrities (> 10K followers): Fan-out on read                        β”‚
β”‚                                                                         β”‚
β”‚   At read time:                                                         β”‚
β”‚   1. Fetch pre-computed timeline from cache                             β”‚
β”‚   2. Fetch celebrity tweets separately                                  β”‚
β”‚   3. Merge and sort                                                     β”‚
β”‚   4. Return to user                                                     β”‚
β”‚                                                                         β”‚
β”‚   βœ“ Best of both worlds                                                 β”‚
β”‚   βœ“ Fast reads for most users                                           β”‚
β”‚   βœ“ Handles celebrity problem                                           β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Phase 4: Deep Dives (20 minutes)

Interviewer: "Let's dive deeper. Tell me about the fan-out service."


Deep Dive 1: Fan-Out Service (Week 3 Concepts)

The Problem

FAN-OUT CHALLENGE

When a user with 1,000 followers tweets:
  - Write tweet to storage
  - Insert tweet_id into 1,000 follower timelines
  - Must complete within 5 seconds

When a celebrity with 100M followers tweets:
  - Same tweet storage
  - Insert into 100M timelines = IMPOSSIBLE in 5 seconds
  - At 100K inserts/second = 16 minutes!

Solution: Don't fan out to celebrity followers at all.
Instead, merge celebrity tweets at read time.

Implementation

# timeline/fanout_service.py

"""
Fan-Out Service for Twitter Timeline

Distributes tweets to follower timelines using a hybrid
push/pull approach based on follower count.
"""

from dataclasses import dataclass
from typing import List, Set
from enum import Enum
import asyncio


class FanoutStrategy(Enum):
    PUSH = "push"      # Fan-out on write
    PULL = "pull"      # Fan-out on read (celebrities)


@dataclass
class FanoutConfig:
    celebrity_threshold: int = 10_000  # Followers above this = celebrity
    batch_size: int = 1000             # Followers per batch
    max_concurrent_batches: int = 100  # Parallel batch processing
    timeline_max_size: int = 800       # Max tweets in timeline cache


@dataclass
class Tweet:
    tweet_id: str
    author_id: str
    content: str
    timestamp: int
    is_celebrity: bool = False


class FanoutService:
    """
    Hybrid fan-out service using push for regular users,
    pull for celebrities.
    
    Applies concepts:
    - Week 3: Async message processing
    - Week 4: Cache-aside pattern for timelines
    - Week 1: Partitioning by user_id
    """
    
    def __init__(
        self,
        config: FanoutConfig,
        social_graph,
        timeline_cache,
        tweet_storage,
        message_queue
    ):
        self.config = config
        self.social_graph = social_graph
        self.timeline_cache = timeline_cache
        self.tweet_storage = tweet_storage
        self.queue = message_queue
    
    async def process_tweet(self, tweet: Tweet) -> None:
        """
        Process a new tweet for fan-out.
        
        1. Store the tweet
        2. Determine fan-out strategy
        3. If push: queue fan-out jobs
        4. If pull: mark author as celebrity
        """
        # Store tweet first (source of truth)
        await self.tweet_storage.store(tweet)
        
        # Get follower count to determine strategy
        follower_count = await self.social_graph.get_follower_count(
            tweet.author_id
        )
        
        strategy = self._determine_strategy(follower_count)
        
        if strategy == FanoutStrategy.PUSH:
            # Regular user: fan out to all followers
            await self._push_fanout(tweet)
        else:
            # Celebrity: don't fan out, mark for pull
            await self._mark_celebrity_tweet(tweet)
    
    def _determine_strategy(self, follower_count: int) -> FanoutStrategy:
        """Determine push vs pull based on follower count."""
        if follower_count > self.config.celebrity_threshold:
            return FanoutStrategy.PULL
        return FanoutStrategy.PUSH
    
    async def _push_fanout(self, tweet: Tweet) -> None:
        """
        Fan out tweet to all follower timelines.
        
        Uses batching and parallelism for efficiency.
        """
        # Get all followers
        followers = await self.social_graph.get_followers(
            tweet.author_id
        )
        
        # Process in batches for efficiency
        batches = self._create_batches(
            followers, 
            self.config.batch_size
        )
        
        # Queue batch jobs (processed by workers)
        for batch in batches:
            await self.queue.enqueue({
                "type": "fanout_batch",
                "tweet_id": tweet.tweet_id,
                "author_id": tweet.author_id,
                "timestamp": tweet.timestamp,
                "follower_ids": batch
            })
    
    async def process_fanout_batch(
        self, 
        tweet_id: str,
        author_id: str,
        timestamp: int,
        follower_ids: List[str]
    ) -> None:
        """
        Process a batch of follower timeline updates.
        
        Called by queue workers.
        """
        # Create timeline entry (minimal data)
        entry = {
            "tweet_id": tweet_id,
            "author_id": author_id,
            "timestamp": timestamp
        }
        
        # Update each follower's timeline cache
        tasks = [
            self._update_timeline(follower_id, entry)
            for follower_id in follower_ids
        ]
        
        await asyncio.gather(*tasks, return_exceptions=True)
    
    async def _update_timeline(
        self, 
        user_id: str, 
        entry: dict
    ) -> None:
        """
        Add tweet to user's cached timeline.
        
        Uses Redis list with LPUSH + LTRIM pattern.
        """
        timeline_key = f"timeline:{user_id}"
        
        # Push to front of list
        await self.timeline_cache.lpush(timeline_key, entry)
        
        # Trim to max size (keep only recent 800)
        await self.timeline_cache.ltrim(
            timeline_key, 
            0, 
            self.config.timeline_max_size - 1
        )
    
    async def _mark_celebrity_tweet(self, tweet: Tweet) -> None:
        """
        Mark tweet for celebrity pull-based delivery.
        
        Store in celebrity tweets index for merge at read time.
        """
        await self.timeline_cache.zadd(
            f"celebrity_tweets:{tweet.author_id}",
            {tweet.tweet_id: tweet.timestamp}
        )
        
        # Trim old celebrity tweets (keep last 100)
        await self.timeline_cache.zremrangebyrank(
            f"celebrity_tweets:{tweet.author_id}",
            0,
            -101
        )
    
    def _create_batches(
        self, 
        items: List[str], 
        batch_size: int
    ) -> List[List[str]]:
        """Split list into batches."""
        return [
            items[i:i + batch_size]
            for i in range(0, len(items), batch_size)
        ]

Deep Dive 2: Timeline Service with Celebrity Merge (Week 4 Concepts)

Interviewer: "How does the timeline read work with the celebrity merge?"

The Solution

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    TIMELINE READ WITH CELEBRITY MERGE                   β”‚
β”‚                                                                         β”‚
β”‚   User requests home timeline                                           β”‚
β”‚        β”‚                                                                β”‚
β”‚        β–Ό                                                                β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚ STEP 1: Fetch pre-computed timeline from cache                  β”‚   β”‚
β”‚   β”‚         (Contains tweets from regular users they follow)        β”‚   β”‚
β”‚   β”‚                                                                 β”‚   β”‚
β”‚   β”‚   Redis: LRANGE timeline:user123 0 800                          β”‚   β”‚
β”‚   β”‚   Result: [tweet_501, tweet_499, tweet_495, ...]                β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚        β”‚                                                                β”‚
β”‚        β–Ό                                                                β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚ STEP 2: Get list of celebrities user follows                    β”‚   β”‚
β”‚   β”‚                                                                 β”‚   β”‚
β”‚   β”‚   Graph query: Get followed users where follower_count > 10K    β”‚   β”‚
β”‚   β”‚   Result: [elonmusk, taylorswift, cristiano]                    β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚        β”‚                                                                β”‚
β”‚        β–Ό                                                                β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚ STEP 3: Fetch recent tweets from each celebrity                 β”‚   β”‚
β”‚   β”‚                                                                 β”‚   β”‚
β”‚   β”‚   Redis: ZREVRANGE celebrity_tweets:elonmusk 0 20               β”‚   β”‚
β”‚   β”‚   Redis: ZREVRANGE celebrity_tweets:taylorswift 0 20            β”‚   β”‚
β”‚   β”‚   Result: [tweet_503, tweet_500, tweet_498, ...]                β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚        β”‚                                                                β”‚
β”‚        β–Ό                                                                β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚ STEP 4: Merge and sort by timestamp                             β”‚   β”‚
β”‚   β”‚                                                                 β”‚   β”‚
β”‚   β”‚   Merge: [cached_timeline] + [celebrity_tweets]                 β”‚   β”‚
β”‚   β”‚   Sort: By timestamp descending                                 β”‚   β”‚
β”‚   β”‚   Result: [tweet_503, tweet_501, tweet_500, tweet_499, ...]     β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚        β”‚                                                                β”‚
β”‚        β–Ό                                                                β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚ STEP 5: Hydrate tweets (fetch full content)                     β”‚   β”‚
β”‚   β”‚                                                                 β”‚   β”‚
β”‚   β”‚   MGET tweet:503 tweet:501 tweet:500 tweet:499 ...              β”‚   β”‚
β”‚   β”‚   Return full tweet objects to client                           β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Implementation

# timeline/timeline_service.py

"""
Timeline Service with Celebrity Merge

Constructs user's home timeline by merging pre-computed
timeline with celebrity tweets fetched at read time.
"""

from dataclasses import dataclass
from typing import List, Optional
import asyncio
import heapq


@dataclass
class TimelineEntry:
    tweet_id: str
    author_id: str
    timestamp: int


@dataclass
class FullTweet:
    tweet_id: str
    author_id: str
    author_name: str
    content: str
    timestamp: int
    likes: int
    retweets: int
    replies: int


class TimelineService:
    """
    Timeline service with hybrid push/pull architecture.
    
    Applies concepts:
    - Week 4: Cache-aside for timeline reads
    - Week 1: Read optimization through pre-computation
    - Week 3: Async parallel fetching
    """
    
    CELEBRITY_THRESHOLD = 10_000
    DEFAULT_TIMELINE_SIZE = 50
    MAX_CELEBRITIES_TO_FETCH = 100
    
    def __init__(
        self,
        timeline_cache,
        tweet_storage,
        social_graph,
        user_service
    ):
        self.timeline_cache = timeline_cache
        self.tweet_storage = tweet_storage
        self.social_graph = social_graph
        self.user_service = user_service
    
    async def get_home_timeline(
        self,
        user_id: str,
        count: int = 50,
        max_id: Optional[str] = None
    ) -> List[FullTweet]:
        """
        Get user's home timeline with celebrity merge.
        
        Merges pre-computed timeline (push) with
        celebrity tweets (pull) at read time.
        """
        # Step 1: Get pre-computed timeline from cache
        cached_timeline = await self._get_cached_timeline(
            user_id, count * 2  # Fetch extra for merge buffer
        )
        
        # Step 2: Get celebrities this user follows
        celebrities = await self._get_followed_celebrities(user_id)
        
        # Step 3: Fetch recent tweets from celebrities
        celebrity_tweets = await self._fetch_celebrity_tweets(
            celebrities
        )
        
        # Step 4: Merge and sort
        merged = self._merge_timelines(
            cached_timeline, 
            celebrity_tweets
        )
        
        # Step 5: Apply pagination
        if max_id:
            merged = [t for t in merged if t.tweet_id < max_id]
        
        # Step 6: Hydrate tweets (get full content)
        tweet_ids = [t.tweet_id for t in merged[:count]]
        full_tweets = await self._hydrate_tweets(tweet_ids)
        
        return full_tweets
    
    async def _get_cached_timeline(
        self,
        user_id: str,
        count: int
    ) -> List[TimelineEntry]:
        """Fetch pre-computed timeline from Redis."""
        timeline_key = f"timeline:{user_id}"
        
        # Check if timeline exists
        exists = await self.timeline_cache.exists(timeline_key)
        
        if not exists:
            # Rebuild timeline for inactive user
            await self._rebuild_timeline(user_id)
        
        # Fetch timeline entries
        entries = await self.timeline_cache.lrange(
            timeline_key, 0, count - 1
        )
        
        return [
            TimelineEntry(**entry) 
            for entry in entries
        ]
    
    async def _get_followed_celebrities(
        self,
        user_id: str
    ) -> List[str]:
        """Get list of celebrities the user follows."""
        # Get all users this person follows
        following = await self.social_graph.get_following(user_id)
        
        # Filter to celebrities (high follower count)
        celebrities = []
        for followed_id in following:
            follower_count = await self.social_graph.get_follower_count(
                followed_id
            )
            if follower_count > self.CELEBRITY_THRESHOLD:
                celebrities.append(followed_id)
                if len(celebrities) >= self.MAX_CELEBRITIES_TO_FETCH:
                    break
        
        return celebrities
    
    async def _fetch_celebrity_tweets(
        self,
        celebrity_ids: List[str]
    ) -> List[TimelineEntry]:
        """Fetch recent tweets from celebrities in parallel."""
        if not celebrity_ids:
            return []
        
        # Parallel fetch from all celebrities
        tasks = [
            self._get_celebrity_recent_tweets(celeb_id)
            for celeb_id in celebrity_ids
        ]
        
        results = await asyncio.gather(*tasks)
        
        # Flatten results
        all_tweets = []
        for tweets in results:
            all_tweets.extend(tweets)
        
        return all_tweets
    
    async def _get_celebrity_recent_tweets(
        self,
        celebrity_id: str,
        count: int = 20
    ) -> List[TimelineEntry]:
        """Get recent tweets from a celebrity."""
        key = f"celebrity_tweets:{celebrity_id}"
        
        # Sorted set: score = timestamp
        tweet_data = await self.timeline_cache.zrevrange(
            key, 0, count - 1, withscores=True
        )
        
        return [
            TimelineEntry(
                tweet_id=tweet_id,
                author_id=celebrity_id,
                timestamp=int(score)
            )
            for tweet_id, score in tweet_data
        ]
    
    def _merge_timelines(
        self,
        cached: List[TimelineEntry],
        celebrity: List[TimelineEntry]
    ) -> List[TimelineEntry]:
        """
        Merge two sorted timelines by timestamp.
        
        Uses heap merge for O(n log k) efficiency.
        """
        # Convert to tuples for heap: (-timestamp, entry)
        # Negative because heapq is min-heap
        all_entries = [
            (-e.timestamp, i, e) 
            for i, e in enumerate(cached + celebrity)
        ]
        
        heapq.heapify(all_entries)
        
        merged = []
        seen_ids = set()
        
        while all_entries and len(merged) < 200:
            _, _, entry = heapq.heappop(all_entries)
            
            # Deduplicate (same tweet might appear twice)
            if entry.tweet_id not in seen_ids:
                seen_ids.add(entry.tweet_id)
                merged.append(entry)
        
        return merged
    
    async def _hydrate_tweets(
        self,
        tweet_ids: List[str]
    ) -> List[FullTweet]:
        """Fetch full tweet content from storage."""
        if not tweet_ids:
            return []
        
        # Batch fetch from tweet storage
        tweets = await self.tweet_storage.mget(tweet_ids)
        
        # Get author info
        author_ids = list(set(t.author_id for t in tweets if t))
        authors = await self.user_service.mget(author_ids)
        author_map = {a.user_id: a for a in authors}
        
        return [
            FullTweet(
                tweet_id=t.tweet_id,
                author_id=t.author_id,
                author_name=author_map.get(t.author_id, {}).get("name", "Unknown"),
                content=t.content,
                timestamp=t.timestamp,
                likes=t.likes,
                retweets=t.retweets,
                replies=t.replies
            )
            for t in tweets if t
        ]
    
    async def _rebuild_timeline(self, user_id: str) -> None:
        """
        Rebuild timeline for inactive user.
        
        Called when user returns after 30+ days of inactivity
        and their timeline has been evicted from cache.
        """
        # Get users they follow
        following = await self.social_graph.get_following(user_id)
        
        # Fetch recent tweets from each (excluding celebrities)
        all_tweets = []
        for followed_id in following:
            follower_count = await self.social_graph.get_follower_count(
                followed_id
            )
            if follower_count <= self.CELEBRITY_THRESHOLD:
                tweets = await self.tweet_storage.get_user_tweets(
                    followed_id, limit=50
                )
                all_tweets.extend(tweets)
        
        # Sort by timestamp
        all_tweets.sort(key=lambda t: t.timestamp, reverse=True)
        
        # Store in cache
        timeline_key = f"timeline:{user_id}"
        for tweet in all_tweets[:800]:
            await self.timeline_cache.rpush(timeline_key, {
                "tweet_id": tweet.tweet_id,
                "author_id": tweet.author_id,
                "timestamp": tweet.timestamp
            })
        
        # Set TTL (30 days)
        await self.timeline_cache.expire(timeline_key, 30 * 86400)

Deep Dive 3: Social Graph Service (Week 1 Concepts)

Interviewer: "How do you store and query the social graph efficiently?"

The Problem

SOCIAL GRAPH CHALLENGES

Scale:
  - 600M users
  - Average 700 followers per user
  - Total edges: 420 billion relationships

Queries needed:
  - Get all followers of user X (for fan-out)
  - Get all users that X follows (for timeline)
  - Check if A follows B
  - Get follower count

Naive approach (SQL):
  - followers table: (follower_id, followee_id)
  - "Get followers of @elonmusk": Returns 200M rows!
  - Not practical

Implementation

# graph/social_graph_service.py

"""
Social Graph Service

Stores and queries follow relationships using a
hybrid approach: Redis for hot data, distributed
storage for full graph.
"""

from dataclasses import dataclass
from typing import List, Set, Optional
from enum import Enum


@dataclass
class FollowRelation:
    follower_id: str
    followee_id: str
    timestamp: int


class SocialGraphService:
    """
    Social graph with optimized follower/following queries.
    
    Uses Redis sets for fast membership checks and
    counts, with Manhattan/Cassandra for full graph storage.
    
    Applies concepts:
    - Week 1: Partitioning by user_id
    - Week 4: Caching hot data (follower counts)
    - Week 5: Eventual consistency for graph updates
    """
    
    def __init__(self, redis_client, graph_storage):
        self.redis = redis_client
        self.storage = graph_storage
    
    async def follow(
        self, 
        follower_id: str, 
        followee_id: str
    ) -> bool:
        """
        Create a follow relationship.
        
        Updates both directions for bidirectional queries.
        """
        # Check if already following
        if await self.is_following(follower_id, followee_id):
            return False
        
        timestamp = int(time.time())
        
        # Store in persistent storage
        await self.storage.put(
            FollowRelation(follower_id, followee_id, timestamp)
        )
        
        # Update Redis caches
        # 1. Add to follower's "following" set
        await self.redis.sadd(
            f"following:{follower_id}", 
            followee_id
        )
        
        # 2. Add to followee's "followers" set
        await self.redis.sadd(
            f"followers:{followee_id}", 
            follower_id
        )
        
        # 3. Increment follower count
        await self.redis.incr(f"follower_count:{followee_id}")
        await self.redis.incr(f"following_count:{follower_id}")
        
        return True
    
    async def unfollow(
        self, 
        follower_id: str, 
        followee_id: str
    ) -> bool:
        """Remove a follow relationship."""
        if not await self.is_following(follower_id, followee_id):
            return False
        
        # Remove from persistent storage
        await self.storage.delete(follower_id, followee_id)
        
        # Update Redis caches
        await self.redis.srem(f"following:{follower_id}", followee_id)
        await self.redis.srem(f"followers:{followee_id}", follower_id)
        await self.redis.decr(f"follower_count:{followee_id}")
        await self.redis.decr(f"following_count:{follower_id}")
        
        return True
    
    async def is_following(
        self, 
        follower_id: str, 
        followee_id: str
    ) -> bool:
        """Check if follower follows followee. O(1) operation."""
        return await self.redis.sismember(
            f"following:{follower_id}", 
            followee_id
        )
    
    async def get_followers(
        self, 
        user_id: str,
        cursor: Optional[str] = None,
        count: int = 1000
    ) -> List[str]:
        """
        Get followers of a user.
        
        For users with many followers, uses cursor-based
        pagination to avoid loading millions of IDs.
        """
        # For small follower sets, get all from Redis
        follower_count = await self.get_follower_count(user_id)
        
        if follower_count < 10_000:
            return list(await self.redis.smembers(
                f"followers:{user_id}"
            ))
        
        # For large sets, use cursor-based scan
        return await self._scan_followers(
            user_id, cursor, count
        )
    
    async def get_following(
        self, 
        user_id: str
    ) -> List[str]:
        """Get all users that user_id follows."""
        return list(await self.redis.smembers(
            f"following:{user_id}"
        ))
    
    async def get_follower_count(self, user_id: str) -> int:
        """Get follower count. O(1) operation."""
        count = await self.redis.get(f"follower_count:{user_id}")
        return int(count) if count else 0
    
    async def get_following_count(self, user_id: str) -> int:
        """Get following count. O(1) operation."""
        count = await self.redis.get(f"following_count:{user_id}")
        return int(count) if count else 0
    
    async def _scan_followers(
        self,
        user_id: str,
        cursor: Optional[str],
        count: int
    ) -> List[str]:
        """Cursor-based scan for large follower sets."""
        start_cursor = int(cursor) if cursor else 0
        
        new_cursor, followers = await self.redis.sscan(
            f"followers:{user_id}",
            cursor=start_cursor,
            count=count
        )
        
        return {
            "followers": list(followers),
            "next_cursor": str(new_cursor) if new_cursor else None
        }

Interviewer: "How do you compute trending topics in real-time?"

Implementation

# trending/trending_service.py

"""
Trending Topics Service

Computes trending hashtags and topics using
sliding window counts with decay.
"""

from dataclasses import dataclass
from typing import List, Dict, Tuple
from collections import defaultdict
import time
import math


@dataclass
class TrendingTopic:
    topic: str
    tweet_count: int
    velocity: float  # Rate of growth
    score: float


class TrendingService:
    """
    Real-time trending topics computation.
    
    Uses time-bucketed counts with exponential decay
    to identify rapidly growing topics.
    
    Applies concepts:
    - Week 8: Stream processing
    - Week 4: Time-windowed aggregation
    - Week 3: Async event processing
    """
    
    BUCKET_SIZE_SECONDS = 60      # 1-minute buckets
    WINDOW_SIZE_BUCKETS = 60      # Look at last 60 minutes
    DECAY_FACTOR = 0.95           # Recent tweets count more
    MIN_TWEETS_THRESHOLD = 100    # Minimum to be trending
    
    def __init__(self, redis_client, location_service):
        self.redis = redis_client
        self.location_service = location_service
    
    async def record_hashtag(
        self, 
        hashtag: str, 
        tweet_id: str,
        location: Optional[str] = None
    ) -> None:
        """
        Record a hashtag occurrence from a tweet.
        
        Updates both global and location-specific counts.
        """
        current_bucket = self._get_current_bucket()
        
        # Increment global count for this bucket
        await self.redis.hincrby(
            f"trending:global:{current_bucket}",
            hashtag.lower(),
            1
        )
        
        # Increment location-specific count
        if location:
            await self.redis.hincrby(
                f"trending:{location}:{current_bucket}",
                hashtag.lower(),
                1
            )
        
        # Set expiry on bucket (auto-cleanup)
        await self.redis.expire(
            f"trending:global:{current_bucket}",
            3600 * 2  # 2 hours
        )
    
    async def get_trending(
        self, 
        location: Optional[str] = None,
        count: int = 10
    ) -> List[TrendingTopic]:
        """
        Get current trending topics.
        
        Combines raw counts with velocity (growth rate)
        to surface rapidly emerging topics.
        """
        # Get counts from recent buckets
        topic_counts = await self._aggregate_buckets(location)
        
        # Calculate velocity and score
        scored_topics = []
        for topic, counts in topic_counts.items():
            if sum(counts) < self.MIN_TWEETS_THRESHOLD:
                continue
            
            total_count = self._weighted_count(counts)
            velocity = self._calculate_velocity(counts)
            score = self._calculate_score(total_count, velocity)
            
            scored_topics.append(TrendingTopic(
                topic=topic,
                tweet_count=sum(counts),
                velocity=velocity,
                score=score
            ))
        
        # Sort by score and return top N
        scored_topics.sort(key=lambda t: t.score, reverse=True)
        return scored_topics[:count]
    
    async def _aggregate_buckets(
        self, 
        location: Optional[str]
    ) -> Dict[str, List[int]]:
        """Aggregate counts from recent time buckets."""
        current_bucket = self._get_current_bucket()
        prefix = f"trending:{location or 'global'}"
        
        # Collect counts from each bucket
        topic_counts = defaultdict(list)
        
        for i in range(self.WINDOW_SIZE_BUCKETS):
            bucket = current_bucket - i
            bucket_data = await self.redis.hgetall(f"{prefix}:{bucket}")
            
            for topic, count in bucket_data.items():
                topic_counts[topic].append(int(count))
            
            # Pad with zeros if bucket doesn't have this topic
            for topic in topic_counts:
                if len(topic_counts[topic]) <= i:
                    topic_counts[topic].append(0)
        
        return dict(topic_counts)
    
    def _weighted_count(self, counts: List[int]) -> float:
        """
        Calculate weighted count with exponential decay.
        
        More recent buckets count more heavily.
        """
        total = 0.0
        for i, count in enumerate(counts):
            weight = self.DECAY_FACTOR ** i
            total += count * weight
        return total
    
    def _calculate_velocity(self, counts: List[int]) -> float:
        """
        Calculate velocity (rate of change).
        
        Compare recent period to earlier period.
        """
        if len(counts) < 10:
            return 0.0
        
        recent = sum(counts[:5])   # Last 5 minutes
        earlier = sum(counts[5:10]) or 1  # 5-10 minutes ago
        
        return (recent - earlier) / earlier
    
    def _calculate_score(
        self, 
        weighted_count: float, 
        velocity: float
    ) -> float:
        """
        Calculate trending score.
        
        Combines volume with velocity to surface
        both popular and rapidly growing topics.
        """
        # Log scale for volume (diminishing returns)
        volume_score = math.log10(weighted_count + 1)
        
        # Velocity boost for growing topics
        velocity_boost = max(0, velocity) * 2
        
        return volume_score * (1 + velocity_boost)
    
    def _get_current_bucket(self) -> int:
        """Get current time bucket."""
        return int(time.time()) // self.BUCKET_SIZE_SECONDS

Phase 5: Scaling and Edge Cases

Scaling Strategies

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                       SCALING STRATEGIES                                β”‚
β”‚                                                                         β”‚
β”‚   TIMELINE CACHE SHARDING                                               β”‚
β”‚   ───────────────────────                                               β”‚
β”‚   β€’ Partition by user_id hash                                           β”‚
β”‚   β€’ Each shard handles ~10M users                                       β”‚
β”‚   β€’ 3x replication for fault tolerance                                  β”‚
β”‚   β€’ Total: ~60 Redis shards                                             β”‚
β”‚                                                                         β”‚
β”‚   FAN-OUT WORKERS                                                       β”‚
β”‚   ────────────────                                                      β”‚
β”‚   β€’ Queue-based processing (Kafka)                                      β”‚
β”‚   β€’ Horizontal scaling based on queue depth                             β”‚
β”‚   β€’ Priority lanes for time-sensitive tweets                            β”‚
β”‚   β€’ Backpressure handling during peaks                                  β”‚
β”‚                                                                         β”‚
β”‚   TWEET STORAGE                                                         β”‚
β”‚   ─────────────                                                         β”‚
β”‚   β€’ Manhattan (Twitter's distributed KV store)                          β”‚
β”‚   β€’ Partitioned by tweet_id                                             β”‚
β”‚   β€’ Separate hot (recent) and cold (archive) tiers                      β”‚
β”‚                                                                         β”‚
β”‚   MULTI-DATACENTER                                                      β”‚
β”‚   ────────────────                                                      β”‚
β”‚   β€’ Active-active in multiple regions                                   β”‚
β”‚   β€’ Route users to nearest datacenter                                   β”‚
β”‚   β€’ Async replication of tweets and timelines                           β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Edge Cases

EDGE CASE 1: Celebrity Mentions Another Celebrity

Scenario: @elonmusk replies to @taylorswift
Problem: Both have 100M+ followers
Solution:
  - Store tweet in both celebrity tweet indexes
  - Users following both see it once (deduplication at read)
  - Reply appears in both users' mention timelines


EDGE CASE 2: Reply Arrives Before Original Tweet

Scenario: Fan-out is slow, reply completes first
Problem: User sees reply to tweet they haven't received yet
Solution:
  - Store parent_tweet_id with replies
  - At read time, fetch parent if not in timeline
  - Display as thread with "Show original tweet"


EDGE CASE 3: Inactive User Returns After 30 Days

Scenario: User's timeline evicted from cache
Problem: No pre-computed timeline exists
Solution:
  - Detect cache miss on timeline request
  - Rebuild timeline from followed users' tweets
  - Cache rebuilt timeline
  - Slightly slower first load (acceptable)


EDGE CASE 4: Viral Tweet (Massive Engagement)

Scenario: Tweet gets millions of likes/retweets rapidly
Problem: Hot key for like/retweet counters
Solution:
  - Counter sharding (like_count_1, like_count_2, ...)
  - Aggregate on read or periodically
  - Eventually consistent counts (acceptable for social)


EDGE CASE 5: Major Event (Super Bowl, Elections)

Scenario: 10x normal tweet volume
Problem: Fan-out queues back up
Solution:
  - Pre-scale based on event calendar
  - Prioritize timeline reads over real-time delivery
  - Acceptable to have 10-30 second delays
  - Graceful degradation: show slightly stale timeline

Phase 6: Monitoring and Operations

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      TWITTER MONITORING DASHBOARD                       β”‚
β”‚                                                                         β”‚
β”‚   TIMELINE HEALTH                                                       β”‚
β”‚   β”œβ”€β”€ Timeline read latency (p50):    45ms        [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ] healthy  β”‚
β”‚   β”œβ”€β”€ Timeline read latency (p99):    180ms       [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘] good     β”‚
β”‚   β”œβ”€β”€ Cache hit rate:                 99.2%       [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ] healthy  β”‚
β”‚   └── Celebrity merge time:           25ms        [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ] healthy  β”‚
β”‚                                                                         β”‚
β”‚   FAN-OUT PIPELINE                                                      β”‚
β”‚   β”œβ”€β”€ Fan-out queue depth:            45K         [β–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘] normal   β”‚
β”‚   β”œβ”€β”€ Fan-out latency (p50):          1.2s        [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘] good     β”‚
β”‚   β”œβ”€β”€ Fan-out latency (p99):          4.8s        [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘] watch    β”‚
β”‚   └── Failed fan-outs:                0.01%       [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ] healthy  β”‚
β”‚                                                                         β”‚
β”‚   TWEET INGESTION                                                       β”‚
β”‚   β”œβ”€β”€ Tweets/second:                  6,200       [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘] normal   β”‚
β”‚   β”œβ”€β”€ Tweet storage latency:          15ms        [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ] healthy  β”‚
β”‚   └── Search indexing lag:            2.1s        [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘] good     β”‚
β”‚                                                                         β”‚
β”‚   INFRASTRUCTURE                                                        β”‚
β”‚   β”œβ”€β”€ Redis cluster CPU:              45%         [β–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘] healthy  β”‚
β”‚   β”œβ”€β”€ Redis memory usage:             72%         [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘] normal   β”‚
β”‚   β”œβ”€β”€ Kafka consumer lag:             12K msgs    [β–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘] healthy  β”‚
β”‚   └── API error rate:                 0.02%       [β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘] healthy  β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Alerting Strategy

CRITICAL (PagerDuty):
  - Timeline read latency p99 > 500ms
  - Fan-out queue depth > 1M messages
  - Cache hit rate < 95%
  - Tweet storage failures > 0.1%

WARNING (Slack):
  - Fan-out latency p99 > 10 seconds
  - Redis memory > 85%
  - Celebrity merge taking > 100ms
  - Search indexing lag > 30 seconds

INFO (Dashboard):
  - Tweets per second trends
  - Top trending topics
  - Celebrity tweet frequency
  - User engagement metrics

Interview Conclusion

Interviewer: "Excellent work. You've covered the fan-out problem really well. Any questions?"

You: "Thank you! I'm curiousβ€”how does Twitter handle the case where a user unfollows someone? Do you remove old tweets from their timeline cache?"

Interviewer: "Great question. We don't actively remove tweets on unfollowβ€”that would be expensive. Instead, when serving the timeline, we filter out tweets from users that are no longer followed. The old tweets naturally age out as new ones push them past the 800-tweet limit."


Summary: Concepts Applied

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚             CONCEPTS FROM 10-WEEK COURSE IN TWITTER DESIGN              β”‚
β”‚                                                                         β”‚
β”‚   WEEK 1: DATA AT SCALE                                                 β”‚
β”‚   β”œβ”€β”€ Partitioning: User-based sharding for timelines                   β”‚
β”‚   β”œβ”€β”€ Read optimization: Pre-computed timelines                         β”‚
β”‚   └── Hot keys: Celebrity handling with pull-based reads                β”‚
β”‚                                                                         β”‚
β”‚   WEEK 2: FAILURE-FIRST DESIGN                                          β”‚
β”‚   β”œβ”€β”€ Graceful degradation: Stale timelines during peaks                β”‚
β”‚   β”œβ”€β”€ Retries: Fan-out job retry with idempotency                       β”‚
β”‚   └── Timeouts: Fast-fail on slow fan-out batches                       β”‚
β”‚                                                                         β”‚
β”‚   WEEK 3: MESSAGING & ASYNC                                             β”‚
β”‚   β”œβ”€β”€ Kafka: Fan-out job queue                                          β”‚
β”‚   β”œβ”€β”€ Backpressure: Queue depth monitoring and scaling                  β”‚
β”‚   └── Event-driven: Tweet β†’ Fan-out β†’ Timeline update                   β”‚
β”‚                                                                         β”‚
β”‚   WEEK 4: CACHING                                                       β”‚
β”‚   β”œβ”€β”€ Redis lists: Timeline storage with LPUSH/LTRIM                    β”‚
β”‚   β”œβ”€β”€ Cache-aside: Rebuild timeline on miss                             β”‚
β”‚   └── Multi-tier: Hot timelines in memory, cold on disk                 β”‚
β”‚                                                                         β”‚
β”‚   WEEK 5: CONSISTENCY                                                   β”‚
β”‚   β”œβ”€β”€ Eventual consistency: Timelines (acceptable lag)                  β”‚
β”‚   β”œβ”€β”€ Strong consistency: Tweet storage (source of truth)               β”‚
β”‚   └── Read-your-own-writes: Author sees their tweet immediately         β”‚
β”‚                                                                         β”‚
β”‚   WEEK 8: ANALYTICS PIPELINE                                            β”‚
β”‚   β”œβ”€β”€ Trending topics: Real-time stream processing                      β”‚
β”‚   β”œβ”€β”€ Time-bucketed counts: Sliding window aggregation                  β”‚
β”‚   └── Velocity scoring: Detect rapidly growing topics                   β”‚
β”‚                                                                         β”‚
β”‚   WEEK 10: PRODUCTION READINESS                                         β”‚
β”‚   β”œβ”€β”€ SLOs: < 5s fan-out, < 200ms reads                                 β”‚
β”‚   β”œβ”€β”€ Observability: Queue depth, cache hit rate, latencies             β”‚
β”‚   └── Graceful degradation: Prioritize reads during overload            β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  WHY TWITTER IS AN ENGINEERING MARVEL                   β”‚
β”‚                                                                         β”‚
β”‚   THE FAN-OUT PROBLEM                                                   β”‚
β”‚   ───────────────────                                                   β”‚
β”‚   β€’ Single tweet can trigger 100M+ write operations                     β”‚
β”‚   β€’ Hybrid push/pull elegantly solves celebrity problem                 β”‚
β”‚   β€’ Pre-computed timelines enable 300K QPS reads                        β”‚
β”‚                                                                         β”‚
β”‚   REAL-TIME AT SCALE                                                    β”‚
β”‚   ──────────────────                                                    β”‚
β”‚   β€’ 500M tweets/day delivered in seconds                                β”‚
β”‚   β€’ Breaking news spreads globally in real-time                         β”‚
β”‚   β€’ Trending topics computed on live stream                             β”‚
β”‚                                                                         β”‚
β”‚   CACHE AS SOURCE OF TRUTH                                              β”‚
β”‚   ────────────────────────                                              β”‚
β”‚   β€’ Timeline cache IS the primary data store                            β”‚
β”‚   β€’ Database is for persistence, not serving                            β”‚
β”‚   β€’ Inverts traditional cache/DB relationship                           β”‚
β”‚                                                                         β”‚
β”‚   ASYMMETRIC SOCIAL GRAPH                                               β”‚
β”‚   ───────────────────────                                               β”‚
β”‚   β€’ Followers β‰  Following (unlike Facebook)                             β”‚
β”‚   β€’ Creates unique scaling challenges                                   β”‚
β”‚   β€’ Enables powerful interest-based discovery                           β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Sources

Twitter Engineering Blog:

Architecture Deep Dives:

Statistics:

Open Source:


Self-Assessment Checklist

After studying this case study, you should be able to:

  • Explain fan-out on write vs fan-out on read trade-offs
  • Design a hybrid push/pull timeline system
  • Solve the celebrity problem (users with 100M+ followers)
  • Implement timeline cache using Redis lists
  • Design a social graph service with efficient queries
  • Calculate real-time trending topics with velocity scoring
  • Handle the reply-before-original edge case
  • Scale timeline delivery to 300K+ QPS
  • Implement graceful degradation during traffic spikes
  • Build observability for fan-out pipeline health
  • Articulate when eventual consistency is acceptable

This case study covers Twitter's architecture for delivering 500 million tweets per day to 600 million users worldwide, with a focus on the fan-out problem and timeline construction.