Himanshu Kukreja
0%
LearnSystem DesignFoundationsPart 3 Back Of The Envelope Estimation
Foundation

Week 0 — Part 3: Back-of-the-Envelope Estimation

The Art of System Sizing


Introduction: Why Estimation Matters

In system design interviews, you'll be asked to design systems that handle real-world scale. Interviewers want to see that you can:

  1. Think about scale — Not just "it works" but "it works at 1M users"
  2. Make informed decisions — Choose technologies based on numbers
  3. Identify bottlenecks — Know where the system will break
  4. Size infrastructure — How many servers, how much storage
  5. Communicate clearly — Show your reasoning

This document gives you the formulas, numbers, and techniques to estimate quickly and accurately.


Chapter 1: Numbers Every Engineer Should Know

1.1 Powers of Two

These are essential for memory, storage, and data size calculations.

┌─────────────────────────────────────────────────────────────────────────┐
│                      POWERS OF TWO                                      │
│                                                                         │
│  Power    Exact Value          Approximate     Name                     │
│  ────────────────────────────────────────────────────────────────────── │
│  2^10     1,024                ~1 Thousand     1 KB (Kilobyte)          │
│  2^20     1,048,576            ~1 Million      1 MB (Megabyte)          │
│  2^30     1,073,741,824        ~1 Billion      1 GB (Gigabyte)          │
│  2^40     ~1 Trillion          ~1 Trillion     1 TB (Terabyte)          │
│  2^50     ~1 Quadrillion       ~1 Quadrillion  1 PB (Petabyte)          │
│                                                                         │
│  MEMORY AIDS:                                                           │
│  • Every 10 powers of 2 ≈ 3 powers of 10                                │
│  • 2^10 ≈ 10^3 (thousand)                                               │
│  • 2^20 ≈ 10^6 (million)                                                │
│  • 2^30 ≈ 10^9 (billion)                                                │
│                                                                         │
│  PRACTICAL CONVERSIONS:                                                 │
│  • 1 GB = 1,024 MB ≈ 1,000 MB (use 1,000 for estimation)                │
│  • 1 TB = 1,024 GB ≈ 1,000 GB                                           │
│  • 1 PB = 1,024 TB ≈ 1,000 TB                                           │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

1.2 Time Conversions

┌─────────────────────────────────────────────────────────────────────────┐
│                       TIME CONVERSIONS                                  │
│                                                                         │
│  Unit              Seconds              Round To                        │
│  ────────────────────────────────────────────────────────────────────── │
│  1 minute          60                   60                              │
│  1 hour            3,600                ~4,000                          │
│  1 day             86,400               ~100,000 (10^5)                 │
│  1 week            604,800              ~600,000                        │
│  1 month           2,592,000            ~2.5 million                    │
│  1 year            31,536,000           ~30 million (3 × 10^7)          │
│                                                                         │
│  KEY SHORTCUTS:                                                         │
│  ────────────────────────────────────────────────────────────────────── │
│  • 1 day ≈ 10^5 seconds (use 100,000)                                   │
│  • 1 year ≈ 3 × 10^7 seconds                                            │
│  • 1 month ≈ 2.5 × 10^6 seconds                                         │
│                                                                         │
│  DAILY TO PER-SECOND CONVERSION:                                        │
│  ────────────────────────────────────────────────────────────────────── │
│  X per day ÷ 100,000 ≈ X per second                                     │
│                                                                         │
│  Examples:                                                              │
│  • 100 million requests/day = 1,000 requests/second                     │
│  • 1 billion requests/day = 10,000 requests/second                      │
│  • 10 million requests/day = 100 requests/second                        │
│                                                                         │
│  FORMULA:                                                               │
│                 Daily Volume                                            │
│  RPS = ────────────────────────                                         │
│              86,400 (or 100,000)                                        │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

1.3 Latency Numbers Every Programmer Should Know

These numbers are from Jeff Dean's famous talk and are essential for understanding system performance.

┌────────────────────────────────────────────────────────────────────────┐
│                   LATENCY COMPARISON NUMBERS                           │
│                                                                        │
│  Operation                                    Time           Scale     │
│  ────────────────────────────────────────────────────────────────────  │
│  L1 cache reference                          0.5 ns         |          │
│  Branch mispredict                           5 ns           |          │
│  L2 cache reference                          7 ns           |          │
│  Mutex lock/unlock                           25 ns          |          │
│  Main memory reference                       100 ns         |=         │
│  Compress 1KB with Snappy                    3,000 ns       |===       │
│  Send 1 KB over 1 Gbps network               10,000 ns      |====      │
│  Read 4 KB randomly from SSD                 150,000 ns     |======    │
│  Read 1 MB sequentially from memory          250,000 ns     |=======   │
│  Round trip within same datacenter           500,000 ns     |========  │ 
│  Read 1 MB sequentially from SSD             1,000,000 ns   |========= │
│  HDD seek                                    10,000,000 ns  |long bar  │
│  Read 1 MB sequentially from HDD             20,000,000 ns             │
│  Send packet CA → Netherlands → CA           150,000,000 ns            │
│                                                                        │
│  HUMAN-READABLE SCALE:                                                 │
│  ────────────────────────────────────────────────────────────────────  │
│  • L1 cache: 0.5 ns                                                    │
│  • Main memory: 100 ns = 200x slower than L1                           │
│  • SSD random read: 150 μs = 150,000 ns = 1,500x slower than memory    │
│  • HDD seek: 10 ms = 10,000,000 ns = 67x slower than SSD               │
│  • Network round trip (datacenter): 0.5 ms                             │
│  • Network round trip (cross-continent): 150 ms                        │
│                                                                        │
│  KEY INSIGHTS:                                                         │
│  ───────────────────────────────────────────────────────────────────── │
│  1. Memory is ~1000x faster than SSD                                   │
│  2. SSD is ~100x faster than HDD                                       │
│  3. Network within datacenter is fast (~0.5ms)                         │
│  4. Cross-continent network is slow (~150ms)                           │
│  5. Sequential reads are MUCH faster than random reads                 │
│                                                                        │
│  DESIGN IMPLICATIONS:                                                  │
│  ───────────────────────────────────────────────────────────────────── │
│  • Cache aggressively (memory >> disk)                                 │
│  • Minimize network round trips                                        │
│  • Batch operations when possible                                      │
│  • Use CDN for global users                                            │
│  • Prefer SSDs over HDDs for performance                               │
│  • Use sequential access patterns when possible                        │
│                                                                        │
└────────────────────────────────────────────────────────────────────────┘

📚 Source: Jeff Dean's "Numbers Everyone Should Know" - https://brenocon.com/dean_perf.html

1.4 Typical Data Sizes

┌─────────────────────────────────────────────────────────────────────────┐
│                     COMMON DATA SIZES                                   │
│                                                                         │
│  TEXT DATA                                                              │
│  ─────────────────────────────────────────────────────────────────────  │
│  Item                                 Size                              │
│  ──────────────────────────────────────────────                         │
│  Character (ASCII)                    1 byte                            │
│  Character (UTF-8, average)           1-4 bytes                         │
│  UUID                                 36 bytes (string) / 16 bytes (bin)│
│  Email address                        ~30 bytes                         │
│  URL                                  ~100 bytes                        │
│  Tweet (280 chars + metadata)         ~500 bytes                        │
│  Typical JSON API response            2-10 KB                           │
│  Average email                        50-100 KB                         │
│  Average web page (HTML + assets)     2-3 MB                            │
│                                                                         │
│  MEDIA                                                                  │
│  ─────────────────────────────────────────────────────────────────────  │
│  Item                                 Size                              │
│  ──────────────────────────────────────────────                         │
│  Favicon                              ~1 KB                             │
│  Thumbnail image                      5-10 KB                           │
│  Profile picture (small)              50-100 KB                         │
│  Regular photo (compressed)           200 KB - 2 MB                     │
│  High-res photo                       5-10 MB                           │
│  1 minute of video (720p)             ~50 MB                            │
│  1 minute of video (1080p)            ~150 MB                           │
│  1 minute of video (4K)               ~350 MB                           │
│  1 minute of audio (MP3)              ~1 MB                             │
│                                                                         │
│  DATABASE RECORDS (typical)                                             │
│  ─────────────────────────────────────────────────────────────────────  │
│  Record Type                          Size                              │
│  ──────────────────────────────────────────────                         │
│  User profile (basic)                 500 bytes - 1 KB                  │
│  Product listing                      1-5 KB                            │
│  Order record                         500 bytes - 1 KB                  │
│  Log entry                            200-500 bytes                     │
│  Search index entry                   200 bytes                         │
│  Session data                         1-2 KB                            │
│                                                                         │
│  RULE OF THUMB: When in doubt, estimate 1 KB per record                 │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

1.5 Throughput Reference Numbers

┌────────────────────────────────────────────────────────────────────────┐
│                    THROUGHPUT REFERENCE                                │
│                                                                        │
│  NETWORK BANDWIDTH                                                     │
│  ────────────────────────────────────────────────────────────────────  │
│  Connection Type                      Speed         Data/second        │
│  ──────────────────────────────────────────────────────────────────    │
│  Home broadband                       100 Mbps      12.5 MB/s          │
│  Home fiber                           1 Gbps        125 MB/s           │
│  AWS within Availability Zone         10-25 Gbps    1-3 GB/s           │
│  Datacenter backbone                  100+ Gbps     12+ GB/s           │
│                                                                        │
│  Formula: Gbps ÷ 8 = GB/s (bits to bytes)                              │
│                                                                        │
│  DATABASE THROUGHPUT                                                   │
│  ────────────────────────────────────────────────────────────────────  │
│  Database                             Operations/second (approximate)  │
│  ──────────────────────────────────────────────────────────────────    │
│  PostgreSQL (single)                  10,000-50,000 queries/sec        │
│  MySQL (single)                       10,000-50,000 queries/sec        │
│  Redis (single)                       100,000+ ops/sec                 │
│  Cassandra (per node)                 10,000-50,000 writes/sec         │
│  MongoDB (single)                     10,000-30,000 ops/sec            │
│  DynamoDB (on-demand)                 Virtually unlimited              │
│                                                                        │
│  Note: Actual throughput depends heavily on:                           │
│  • Query complexity                                                    │
│  • Data size                                                           │
│  • Hardware specs                                                      │
│  • Network latency                                                     │
│                                                                        │
│  WEB SERVER THROUGHPUT                                                 │
│  ────────────────────────────────────────────────────────────────────  │
│  Server Type                          Requests/second (approximate)    │
│  ──────────────────────────────────────────────────────────────────    │
│  Node.js (simple API)                 5,000-10,000 req/sec             │
│  Go (simple API)                      30,000-50,000 req/sec            │
│  Nginx (static files)                 50,000-100,000 req/sec           │
│  Node.js (complex logic)              1,000-3,000 req/sec              │
│  Python/Django                        500-2,000 req/sec                │
│                                                                        │
│  Conservative estimate: 5,000 req/sec per server for simple APIs       │
│                                                                        │
└────────────────────────────────────────────────────────────────────────┘

Chapter 2: The Estimation Framework

2.1 The Four Types of Estimates

Every system design requires estimating these four things:

┌────────────────────────────────────────────────────────────────────────┐
│                    THE FOUR ESTIMATES                                  │
│                                                                        │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │  1. TRAFFIC ESTIMATES                                           │   │
│  │                                                                 │   │
│  │  • Daily/monthly active users (DAU/MAU)                         │   │
│  │  • Requests per second (RPS)                                    │   │
│  │  • Read vs write ratio                                          │   │
│  │  • Peak vs average traffic                                      │   │
│  │                                                                 │   │
│  │  Key question: How many requests will we handle?                │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│                              │                                         │
│                              ▼                                         │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │  2. STORAGE ESTIMATES                                           │   │
│  │                                                                 │   │
│  │  • Data per user/object                                         │   │
│  │  • Total storage needed (now and future)                        │   │
│  │  • Growth rate                                                  │   │
│  │  • Retention period                                             │   │
│  │                                                                 │   │
│  │  Key question: How much data will we store?                     │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│                              │                                         │
│                              ▼                                         │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │  3. BANDWIDTH ESTIMATES                                         │   │
│  │                                                                 │   │
│  │  • Ingress (data coming in)                                     │   │
│  │  • Egress (data going out)                                      │   │
│  │  • Peak bandwidth requirements                                  │   │
│  │                                                                 │   │
│  │  Key question: How much data will flow through the system?      │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│                              │                                         │
│                              ▼                                         │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │  4. INFRASTRUCTURE ESTIMATES                                    │   │
│  │                                                                 │   │
│  │  • Number of servers                                            │   │
│  │  • Database sizing and replication                              │   │
│  │  • Cache memory requirements                                    │   │
│  │                                                                 │   │
│  │  Key question: What infrastructure do we need?                  │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│                                                                        │
└────────────────────────────────────────────────────────────────────────┘

2.2 Traffic Estimation

The Formula

┌─────────────────────────────────────────────────────────────────────────┐
│                   TRAFFIC ESTIMATION FORMULAS                           │
│                                                                         │
│  STEP 1: CALCULATE DAILY VOLUME                                         │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                         │
│  Daily Actions = DAU × Actions per User                                 │
│                                                                         │
│  Example:                                                               │
│  • DAU: 10 million                                                      │
│  • Actions per user: 5 posts + 50 reads = 55 actions                    │
│  • Daily actions: 10M × 55 = 550 million                                │
│                                                                         │
│                                                                         │
│  STEP 2: CONVERT TO REQUESTS PER SECOND (RPS)                           │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                         │
│              Daily Actions                                              │
│  RPS = ────────────────────────                                         │
│         86,400 (seconds per day)                                        │
│                                                                         │
│  Shortcut: Daily / 100,000 ≈ RPS                                        │
│                                                                         │
│  Example:                                                               │
│  • 550 million / 100,000 = 5,500 RPS                                    │
│                                                                         │
│                                                                         │
│  STEP 3: CALCULATE PEAK RPS                                             │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                         │
│  Peak RPS = Average RPS × Peak Factor                                   │
│                                                                         │
│  Peak factors:                                                          │
│  • Normal apps: 2x average                                              │
│  • Social/viral: 3x average                                             │
│  • Flash sales/events: 5-10x average                                    │
│                                                                         │
│  Example:                                                               │
│  • Average: 5,500 RPS                                                   │
│  • Peak (2x): 11,000 RPS                                                │
│                                                                         │
│                                                                         │
│  STEP 4: SPLIT BY READ/WRITE                                            │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                         │
│  If ratio is 10:1 (reads:writes):                                       │
│  • Total: 5,500 RPS                                                     │
│  • Reads: 5,500 × 10/11 = 5,000 RPS                                     │
│  • Writes: 5,500 × 1/11 = 500 RPS                                       │
│                                                                         │
│  Common ratios:                                                         │
│  • Read-heavy (social media): 100:1 to 1000:1                           │
│  • Mixed (e-commerce): 10:1 to 100:1                                    │
│  • Write-heavy (logging): 1:1 to 1:10                                   │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Worked Example: Twitter-like Service

┌─────────────────────────────────────────────────────────────────────────┐
│            EXAMPLE: TWITTER TRAFFIC ESTIMATION                          │
│                                                                         │
│  GIVEN:                                                                 │
│  • 500 million monthly active users (MAU)                               │
│  • 200 million daily active users (DAU)                                 │
│  • Each user views 20 tweets per day                                    │
│  • Each user posts 0.5 tweets per day                                   │
│                                                                         │
│  CALCULATE:                                                             │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                         │
│  Reads per day:                                                         │
│  200M users × 20 views = 4,000,000,000 reads/day                        │
│                        = 4 billion reads/day                            │
│                                                                         │
│  Writes per day:                                                        │
│  200M users × 0.5 posts = 100,000,000 writes/day                        │
│                         = 100 million writes/day                        │
│                                                                         │
│  Read/Write ratio:                                                      │
│  4B / 100M = 40:1                                                       │
│                                                                         │
│  Reads per second:                                                      │
│  4B / 100,000 = 40,000 read RPS                                         │
│  (More precise: 4B / 86,400 = 46,296 RPS)                               │
│                                                                         │
│  Writes per second:                                                     │
│  100M / 100,000 = 1,000 write RPS                                       │
│  (More precise: 100M / 86,400 = 1,157 RPS)                              │
│                                                                         │
│  Total: ~47,000 RPS average                                             │
│  Peak (2x): ~94,000 RPS                                                 │
│                                                                         │
│  INTERVIEW PHRASE:                                                      │
│  "With 200M DAU viewing 20 tweets each, that's 4 billion reads          │
│   per day, or about 46,000 reads per second. Writes are much            │
│   lower at about 1,000 per second. Peak would be roughly double,        │
│   so we need to design for ~100,000 RPS."                               │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

2.3 Storage Estimation

The Formula

┌─────────────────────────────────────────────────────────────────────────┐
│                   STORAGE ESTIMATION FORMULAS                           │
│                                                                         │
│  BASIC FORMULA:                                                         │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                         │
│  Storage = Number of Objects × Size per Object × Retention Period       │
│                                                                         │
│                                                                         │
│  DAILY GROWTH:                                                          │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                         │
│  Daily Growth = New Objects per Day × Size per Object                   │
│                                                                         │
│                                                                         │
│  YEARLY STORAGE:                                                        │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                         │
│  Yearly Storage = Daily Growth × 365                                    │
│                                                                         │
│                                                                         │
│  WITH REPLICATION:                                                      │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                         │
│  Total Storage = Raw Storage × Replication Factor                       │
│                                                                         │
│  Common replication factors:                                            │
│  • Databases: 3x (1 primary + 2 replicas)                               │
│  • Object storage (S3): Already replicated (built-in)                   │
│  • Kafka: 3x (3 replicas per partition)                                 │
│                                                                         │
│                                                                         │
│  DON'T FORGET:                                                          │
│  ─────────────────────────────────────────────────────────────────────  │
│  • Indexes: Add 20-30% for database indexes                             │
│  • Logs: Often as large as main data                                    │
│  • Backups: Often 2-3x for backup retention                             │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Worked Example: Twitter-like Service

┌─────────────────────────────────────────────────────────────────────────┐
│            EXAMPLE: TWITTER STORAGE ESTIMATION                          │
│                                                                         │
│  GIVEN:                                                                 │
│  • 100 million new tweets per day                                       │
│  • Average tweet: 300 bytes (text + metadata)                           │
│  • 10% of tweets have images (average 200 KB)                           │
│  • 1% of tweets have videos (average 5 MB)                              │
│  • Retention: Keep everything (5 years for estimation)                  │
│                                                                         │
│  CALCULATE:                                                             │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                         │
│  TEXT STORAGE (per day):                                                │
│  100M tweets × 300 bytes = 30 GB/day                                    │
│                                                                         │
│  IMAGE STORAGE (per day):                                               │
│  100M × 10% = 10M images                                                │
│  10M × 200 KB = 2 TB/day                                                │
│                                                                         │
│  VIDEO STORAGE (per day):                                               │
│  100M × 1% = 1M videos                                                  │
│  1M × 5 MB = 5 TB/day                                                   │
│                                                                         │
│  TOTAL DAILY:                                                           │
│  30 GB + 2 TB + 5 TB ≈ 7 TB/day                                         │
│  (Media dominates storage!)                                             │
│                                                                         │
│  YEARLY:                                                                │
│  7 TB × 365 = 2.5 PB/year                                               │
│                                                                         │
│  5-YEAR STORAGE:                                                        │
│  2.5 PB × 5 = 12.5 PB                                                   │
│                                                                         │
│  WITH REPLICATION (3x for database, S3 handles media):                  │
│  Text DB: 30 GB/day × 365 × 5 × 3 = 165 TB                              │
│  Media (S3): 7 TB × 365 × 5 = 12.8 PB (S3 replicates internally)        │
│                                                                         │
│  SUMMARY:                                                               │
│  "We'll need about 7 TB of new storage per day, or 2.5 PB per year.     │
│   Over 5 years, that's roughly 13 PB. Media is 99%+ of storage."        │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

2.4 Bandwidth Estimation

The Formula

┌─────────────────────────────────────────────────────────────────────────┐
│                   BANDWIDTH ESTIMATION FORMULAS                         │
│                                                                         │
│  INGRESS (data coming IN):                                              │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                         │
│  Ingress = Write RPS × Average Request Size                             │
│                                                                         │
│  Typically smaller (just the data being uploaded)                       │
│                                                                         │
│                                                                         │
│  EGRESS (data going OUT):                                               │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                         │
│  Egress = Read RPS × Average Response Size                              │
│                                                                         │
│  Usually larger (responses include more data)                           │
│                                                                         │
│                                                                         │
│  CONVERTING UNITS:                                                      │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                         │
│  MB/s × 8 = Mbps (megabits per second)                                  │
│  GB/s × 8 = Gbps (gigabits per second)                                  │
│                                                                         │
│  Example: 100 MB/s = 800 Mbps = 0.8 Gbps                                │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Worked Example: Twitter-like Service

┌─────────────────────────────────────────────────────────────────────────┐
│            EXAMPLE: TWITTER BANDWIDTH ESTIMATION                        │
│                                                                         │
│  GIVEN (from traffic estimation):                                       │
│  • Read RPS: 46,000                                                     │
│  • Write RPS: 1,150                                                     │
│  • Tweet read response: 2 KB (tweet + user info)                        │
│  • Tweet write request: 300 bytes                                       │
│  • 20% of read responses include image (200 KB)                         │
│                                                                         │
│  CALCULATE:                                                             │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                         │
│  INGRESS (writes coming in):                                            │
│  1,150 writes/sec × 300 bytes = 345 KB/sec                              │
│  With 10% images: 1,150 × 0.1 × 200 KB = 23 MB/sec                      │
│  Total ingress: ~24 MB/sec = ~200 Mbps                                  │
│                                                                         │
│  EGRESS (reads going out):                                              │
│  Text only: 46,000 × 2 KB = 92 MB/sec                                   │
│  With images: 46,000 × 0.2 × 200 KB = 1,840 MB/sec                      │
│  Total egress: ~1.9 GB/sec = ~15 Gbps                                   │
│                                                                         │
│  PEAK (2x):                                                             │
│  Ingress peak: ~400 Mbps                                                │
│  Egress peak: ~30 Gbps                                                  │
│                                                                         │
│  NOTE:                                                                  │
│  This is why CDN is essential! CDN handles most of the 30 Gbps          │
│  egress for images. Origin servers see much less.                       │
│                                                                         │
│  With CDN (90% cache hit rate for images):                              │
│  Origin egress: 92 MB/sec + (1,840 × 0.1) = ~280 MB/sec                 │
│                = ~2.3 Gbps (much more manageable)                       │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

2.5 Infrastructure Estimation

Server Sizing

┌─────────────────────────────────────────────────────────────────────────┐
│                   INFRASTRUCTURE ESTIMATION                             │
│                                                                         │
│  WEB/API SERVERS                                                        │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                         │
│                     Peak RPS                                            │
│  Servers = ────────────────────────────────                             │
│            RPS per Server × Utilization Target                          │
│                                                                         │
│  Guidelines:                                                            │
│  • Simple API (CRUD): 5,000-10,000 RPS per server                       │
│  • Complex logic: 1,000-3,000 RPS per server                            │
│  • CPU-intensive: 100-500 RPS per server                                │
│  • Target utilization: 70% (leave headroom)                             │
│                                                                         │
│  Example:                                                               │
│  Peak RPS: 100,000                                                      │
│  RPS per server: 5,000                                                  │
│  Utilization: 70%                                                       │
│  Servers = 100,000 / (5,000 × 0.7) = 29 servers                         │
│  Round up to 30-35 for availability                                     │
│                                                                         │
│                                                                         │
│  DATABASE SIZING                                                        │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                         │
│  For reads (read replicas):                                             │
│                        Read RPS                                         │
│  Read Replicas = ──────────────────────                                 │
│                  Queries per DB (30,000)                                │
│                                                                         │
│  For writes:                                                            │
│  Usually single primary (shard if > 10,000 writes/sec)                  │
│                                                                         │
│  Example:                                                               │
│  Read RPS: 46,000                                                       │
│  Replicas: 46,000 / 30,000 = 2 read replicas                            │
│  Total: 1 primary + 2 replicas = 3 DB servers                           │
│                                                                         │
│                                                                         │
│  CACHE SIZING                                                           │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                         │
│  Memory = Hot Data Size                                                 │
│                                                                         │
│  Rules of thumb:                                                        │
│  • Cache 20% of data (80/20 rule)                                       │
│  • Or cache last N hours/days of activity                               │
│  • Aim for 90%+ cache hit rate                                          │
│                                                                         │
│  Example:                                                               │
│  Daily new data: 30 GB (text only)                                      │
│  Cache last 7 days: 210 GB                                              │
│  Use 256 GB Redis cluster                                               │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Chapter 3: Quick Reference Formulas

3.1 The Cheat Sheet

┌────────────────────────────────────────────────────────────────────────┐
│                    ESTIMATION CHEAT SHEET                              │
│                                                                        │
│  TRAFFIC                                                               │
│  ────────────────────────────────────────────────────────────────────  │
│  RPS = Daily requests / 100,000                                        │
│  Peak RPS = Average RPS × 2 (or 3 for viral apps)                      │
│                                                                        │
│  STORAGE                                                               │
│  ────────────────────────────────────────────────────────────────────  │
│  Daily storage = New objects × Size per object                         │
│  Yearly storage = Daily × 365                                          │
│  With replication = Raw × 3                                            │
│                                                                        │
│  BANDWIDTH                                                             │
│  ────────────────────────────────────────────────────────────────────  │
│  Egress = Read RPS × Response size                                     │
│  Ingress = Write RPS × Request size                                    │
│  MB/s × 8 = Mbps                                                       │
│                                                                        │
│  SERVERS                                                               │
│  ────────────────────────────────────────────────────────────────────  │
│  API servers = Peak RPS / 5,000 / 0.7                                  │
│  DB replicas = Read RPS / 30,000                                       │
│  Cache = Hot data size (20% of total or N days)                        │
│                                                                        │
│  COMMON VALUES                                                         │
│  ────────────────────────────────────────────────────────────────────  │
│  1 day = 100,000 seconds (approximately)                               │
│  1 year = 30 million seconds (approximately)                           │
│  Typical API response: 2-5 KB                                          │
│  Typical DB record: 500 bytes - 1 KB                                   │
│  Image: 200 KB - 2 MB                                                  │
│  Server capacity: 5,000 simple requests/sec                            │
│  DB capacity: 30,000 queries/sec                                       │
│  Redis capacity: 100,000 ops/sec                                       │
│                                                                        │
└────────────────────────────────────────────────────────────────────────┘

3.2 Common Estimation Templates

┌─────────────────────────────────────────────────────────────────────────┐
│                    SOCIAL MEDIA TEMPLATE                                │
│                                                                         │
│  INPUTS:                                                                │
│  • MAU: _______                                                         │
│  • DAU: _______ (usually 20-40% of MAU)                                 │
│  • Posts per user per day: _______                                      │
│  • Views per user per day: _______                                      │
│  • % posts with media: _______                                          │
│                                                                         │
│  CALCULATIONS:                                                          │
│  Write RPS = DAU × Posts / 100,000                                      │
│  Read RPS = DAU × Views / 100,000                                       │
│  Storage/day = DAU × Posts × (text size + media%)                       │
│  API servers = (Read + Write RPS) × 2 / 5,000                           │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────────┐
│                    E-COMMERCE TEMPLATE                                 │
│                                                                        │
│  INPUTS:                                                               │
│  • Daily visitors: _______                                             │
│  • Pages per visit: _______                                            │
│  • Conversion rate: _______ (1-3%)                                     │
│  • Products in catalog: _______                                        │
│                                                                        │
│  CALCULATIONS:                                                         │
│  Page views/day = Visitors × Pages                                     │
│  Orders/day = Visitors × Conversion rate                               │
│  Read RPS = Page views / 100,000                                       │
│  Product catalog size = Products × 5 KB                                │
│                                                                        │
└────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────┐
│                    URL SHORTENER TEMPLATE                               │
│                                                                         │
│  INPUTS:                                                                │
│  • New URLs per month: _______                                          │
│  • Read:Write ratio: _______ (usually 100:1)                            │
│  • URL record size: _______ (~500 bytes)                                │
│                                                                         │
│  CALCULATIONS:                                                          │
│  Write RPS = URLs/month / 2.5M (seconds/month)                          │
│  Read RPS = Write RPS × ratio                                           │
│  Storage = URLs × 500 bytes × retention years                           │
│  Short code length = log₆₂(total URLs needed)                           │
│    6 chars = 56B combinations, 7 chars = 3.5T                           │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Chapter 4: Interview Tips

4.1 How to Present Estimates

┌─────────────────────────────────────────────────────────────────────────┐
│                    PRESENTING ESTIMATES                                 │
│                                                                         │
│  DO:                                                                    │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                         │
│  ✓ State assumptions clearly FIRST                                      │
│    "Let me assume we have 100 million DAU..."                           │
│                                                                         │
│  ✓ Show your math step by step                                          │
│    "100 million users × 10 posts = 1 billion posts per day"             │
│                                                                         │
│  ✓ Round to nice numbers                                                │
│    "That's about 12,000 per second, let's call it 10K RPS"              │
│                                                                         │
│  ✓ Convert to useful units                                              │
│    "10 KB per request × 10K RPS = 100 MB/sec"                           │
│                                                                         │
│  ✓ Sanity check your results                                            │
│    "10 PB seems large but Netflix stores 100+ PB, so this is            │
│     reasonable for a video platform"                                    │
│                                                                         │
│  ✓ Explain implications                                                 │
│    "At 100K RPS, we'll need about 20 API servers and should             │
│     definitely use a caching layer"                                     │
│                                                                         │
│                                                                         │
│  DON'T:                                                                 │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                         │
│  ✗ Spend more than 5 minutes on estimation                              │
│    Get to approximate numbers and move on                               │
│                                                                         │
│  ✗ Use overly precise numbers                                           │
│    "12,453 requests per second" → Just say "~12,000" or "~10K"          │
│                                                                         │
│  ✗ Forget peak traffic                                                  │
│    Always mention peak is 2-3x average                                  │
│                                                                         │
│  ✗ Skip the "so what"                                                   │
│    Numbers should lead to design decisions                              │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

4.2 Common Pitfalls

┌─────────────────────────────────────────────────────────────────────────┐
│                    ESTIMATION PITFALLS                                  │
│                                                                         │
│  PITFALL 1: FORGETTING PEAK TRAFFIC                                     │
│  ─────────────────────────────────────────────────────────────────────  │
│  ✗ "We get 10,000 RPS" → designs for 10K                                │
│  ✓ "Average is 10K RPS, peak is 20-30K" → designs for 30K               │
│                                                                         │
│  PITFALL 2: IGNORING GROWTH                                             │
│  ─────────────────────────────────────────────────────────────────────  │
│  ✗ "We need 100 GB storage"                                             │
│  ✓ "We need 100 GB now, 1 TB in a year at 10x growth"                   │
│                                                                         │
│  PITFALL 3: FORGETTING REPLICATION                                      │
│  ─────────────────────────────────────────────────────────────────────  │
│  ✗ "Database needs 500 GB"                                              │
│  ✓ "500 GB × 3 replicas = 1.5 TB total"                                 │
│                                                                         │
│  PITFALL 4: UNDERESTIMATING MEDIA                                       │
│  ─────────────────────────────────────────────────────────────────────  │
│  ✗ "Posts are 1 KB, so storage is small"                                │
│  ✓ "Text is 1 KB, but 20% have 500 KB images = 100x more storage"       │
│                                                                         │
│  PITFALL 5: OVERLOOKING BANDWIDTH                                       │
│  ─────────────────────────────────────────────────────────────────────  │
│  ✗ Calculate storage but forget egress costs                            │
│  ✓ "30 Gbps egress = need CDN to reduce costs"                          │
│                                                                         │
│  PITFALL 6: WRONG UNITS                                                 │
│  ─────────────────────────────────────────────────────────────────────  │
│  ✗ Mixing MB and Mb (megabytes vs megabits)                             │
│  ✓ Always clarify: "100 megabytes per second"                           │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

4.3 Quick Mental Math Tricks

┌─────────────────────────────────────────────────────────────────────────┐
│                    MENTAL MATH TRICKS                                   │
│                                                                         │
│  MULTIPLYING LARGE NUMBERS:                                             │
│  ─────────────────────────────────────────────────────────────────────  │
│  100 million × 10 = 1 billion                                           │
│  100 million × 100 = 10 billion                                         │
│  1 billion × 1000 = 1 trillion                                          │
│                                                                         │
│  Trick: Count zeros, add exponents                                      │
│  10^8 × 10^2 = 10^10 = 10 billion                                       │
│                                                                         │
│  DAILY TO PER-SECOND:                                                   │
│  ─────────────────────────────────────────────────────────────────────  │
│  Divide by 100,000 (or 10^5)                                            │
│  1 billion/day = 10,000/second                                          │
│  100 million/day = 1,000/second                                         │
│  10 million/day = 100/second                                            │
│                                                                         │
│  PERCENTAGES:                                                           │
│  ─────────────────────────────────────────────────────────────────────  │
│  1% = ÷ 100                                                             │
│  10% = ÷ 10                                                             │
│  20% = ÷ 5                                                              │
│  25% = ÷ 4                                                              │
│  50% = ÷ 2                                                              │
│                                                                         │
│  STORAGE CONVERSIONS:                                                   │
│  ─────────────────────────────────────────────────────────────────────  │
│  1000 KB = 1 MB                                                         │
│  1000 MB = 1 GB                                                         │
│  1000 GB = 1 TB                                                         │
│  1000 TB = 1 PB                                                         │
│                                                                         │
│  1 million KB = 1 GB                                                    │
│  1 billion KB = 1 TB                                                    │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Chapter 5: Practice Problems

5.1 Practice: Instagram

┌─────────────────────────────────────────────────────────────────────────┐
│                    PRACTICE: INSTAGRAM                                  │
│                                                                         │
│  PROBLEM:                                                               │
│  Estimate the infrastructure needs for Instagram-like app               │
│                                                                         │
│  GIVEN:                                                                 │
│  • 1 billion MAU                                                        │
│  • 500 million DAU                                                      │
│  • Each user views 50 photos per day                                    │
│  • Each user posts 1 photo per week                                     │
│  • Average photo: 2 MB                                                  │
│  • Keep photos forever                                                  │
│                                                                         │
│  CALCULATE:                                                             │
│  1. Photo views per second                                              │
│  2. Photo uploads per second                                            │
│  3. Daily storage growth                                                │
│  4. 5-year storage                                                      │
│  5. Egress bandwidth                                                    │
│  6. Number of API servers needed                                        │
│                                                                         │
│  TRY IT YOURSELF FIRST!                                                 │
│                                                                         │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                         │
│  SOLUTION:                                                              │
│                                                                         │
│  1. Photo views/second:                                                 │
│     500M DAU × 50 views = 25 billion views/day                          │
│     25B / 100,000 = 250,000 read RPS                                    │
│                                                                         │
│  2. Photo uploads/second:                                               │
│     500M DAU / 7 (weekly) = 71 million uploads/day                      │
│     71M / 100,000 = 710 write RPS                                       │
│                                                                         │
│  3. Daily storage:                                                      │
│     71M photos × 2 MB = 142 TB/day                                      │
│                                                                         │
│  4. 5-year storage:                                                     │
│     142 TB × 365 × 5 = 259 PB                                           │
│     (This is just raw photos - thumbnails add more)                     │
│                                                                         │
│  5. Egress bandwidth:                                                   │
│     250,000 RPS × 2 MB = 500 GB/sec = 4 Tbps                            │
│     (This is why CDN is ESSENTIAL)                                      │
│     With 95% CDN hit rate: 25 GB/sec origin traffic                     │
│                                                                         │
│  6. API servers:                                                        │
│     Peak: 250,000 × 2 = 500,000 RPS                                     │
│     Servers: 500,000 / 5,000 / 0.7 = 143 servers                        │
│     Round to: 150 API servers (more for redundancy)                     │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

5.2 Practice: YouTube

┌─────────────────────────────────────────────────────────────────────────┐
│                    PRACTICE: YOUTUBE                                    │
│                                                                         │
│  PROBLEM:                                                               │
│  Estimate infrastructure for YouTube-like video platform                │
│                                                                         │
│  GIVEN:                                                                 │
│  • 2 billion MAU                                                        │
│  • 1 billion DAU                                                        │
│  • Average watch time: 1 hour/day                                       │
│  • 500 hours of video uploaded per minute                               │
│  • Average video: 5 minutes, stored at multiple resolutions             │
│  • Resolutions: 360p (200MB/hr), 720p (500MB/hr), 1080p (1.5GB/hr)      │
│                                                                         │
│  CALCULATE:                                                             │
│  1. Video streaming bandwidth                                           │
│  2. Upload volume per day                                               │
│  3. Storage per year                                                    │
│  4. CDN requirements                                                    │
│                                                                         │
│  HINTS:                                                                 │
│  • Focus on streaming bandwidth (it's massive)                          │
│  • Consider multiple resolutions                                        │
│  • CDN is absolutely essential                                          │
│                                                                         │
│  TRY IT YOURSELF!                                                       │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Summary

┌────────────────────────────────────────────────────────────────────────┐
│                    PART 3 KEY TAKEAWAYS                                │
│                                                                        │
│  MEMORIZE THESE NUMBERS:                                               │
│  • 1 day = 86,400 seconds ≈ 100,000 for estimation                     │
│  • 1 year ≈ 30 million seconds                                         │
│  • L1 cache: 1ns, Memory: 100ns, SSD: 150μs, HDD: 10ms                 │
│  • Datacenter RTT: 0.5ms, Cross-continent: 150ms                       │
│                                                                        │
│  THE FOUR ESTIMATES:                                                   │
│  1. Traffic (RPS = daily / 100,000, peak = 2-3x)                       │
│  2. Storage (objects × size × retention × replication)                 │
│  3. Bandwidth (RPS × payload size)                                     │
│  4. Infrastructure (RPS / capacity per server)                         │
│                                                                        │
│  KEY FORMULAS:                                                         │
│  • RPS = Daily actions / 100,000                                       │
│  • Peak = Average × 2 (or 3)                                           │
│  • Servers = Peak RPS / 5,000 / 0.7                                    │
│  • DB replicas = Read RPS / 30,000                                     │
│                                                                        │
│  INTERVIEW TIPS:                                                       │
│  • State assumptions clearly                                           │
│  • Show your math                                                      │
│  • Round to nice numbers                                               │
│  • Sanity check results                                                │
│  • Connect numbers to design decisions                                 │
│  • Don't spend more than 5 minutes                                     │
│                                                                        │
└────────────────────────────────────────────────────────────────────────┘

📚 Further Reading

References

Books

  • Designing Data-Intensive Applications - Martin Kleppmann (Chapter 1)
  • System Design Interview - Alex Xu (Chapter 2)

Practice

  • Pramp: Free mock interviews
  • Interviewing.io: Practice with real interviewers
  • LeetCode Discuss: System design questions

End of Week 0 — Part 3


Week 0 Complete!

You now have the foundation for the 10-week journey:

Part 1: System Design Framework (HLD/LLD, interview structure) Part 2: Infrastructure Building Blocks (all the components) Part 3: Back-of-the-Envelope Estimation (sizing systems)

Ready for Week 1: Foundations of Scale!