Capstone

Week 1 Capstone: Design a URL Shortener at Scale

🎯 A Real-World Problem Covering Week 1 Concepts

The Interview Begins

You walk into the interview room. The interviewer smiles and gestures to the whiteboard.

Interviewer: "Thanks for coming in. Today we're going to work through a system design problem together. I'm interested in your thought process, so please think out loud. Feel free to ask questions — this is meant to be collaborative."

They write on the whiteboard:

╔══════════════════════════════════════════════════════════════════════════╗
║                                                                          ║
║              Design a URL Shortener Service ("TinyLink")                 ║
║                                                                          ║
║   TinyLink is a URL shortening service like bit.ly or tinyurl.com.       ║
║   Users paste long URLs and get short, shareable links.                  ║
║                                                                          ║
║   - 100 million new URLs created per day                                 ║
║   - 10:1 read to write ratio (1 billion redirects/day)                   ║
║   - URLs should be as short as possible                                  ║
║   - Links expire after 5 years by default                                ║
║   - Analytics: track click counts per URL                                ║
║   - API access for enterprise customers                                  ║
║                                                                          ║
╚══════════════════════════════════════════════════════════════════════════╝

Interviewer: "Take a few minutes to think about this, then walk me through your approach. We have about 45 minutes."

Phase 1: Requirements Clarification (5 minutes)

Before diving in, you take a breath and start asking questions. This is crucial — never assume.

Your Questions

You: "Before I start designing, I'd like to clarify a few requirements. With 100 million URLs per day, are these evenly distributed, or do we have peak hours?"

Interviewer: "Good question. Traffic is global, so it's relatively even with about 20% variation. But during major events — like when a viral tweet includes a TinyLink — we can see 10x spikes for specific URLs."

You: "For the short URLs — do they need to be truly random, or can they be sequential? And what characters can we use?"

Interviewer: "They should appear random to users for security reasons. Use alphanumeric characters — uppercase, lowercase, and digits. So 62 possible characters."

You: "What happens when a user creates a short URL for the same long URL twice? Same short URL or different?"

Interviewer: "Different. Each creation is unique. Users might share the same destination but want separate tracking."

You: "For the analytics — do users need real-time click counts, or is eventual consistency acceptable?"

Interviewer: "Eventually consistent is fine. A few seconds of delay is acceptable. But the redirect itself must be fast — that's the critical path."

You: "Last question — you mentioned enterprise API access. Do we need to rate limit API consumers?"

Interviewer: "Absolutely. Free tier gets 100 requests/minute, paid tiers get more. We've had issues with abuse in the past."

You: "Perfect. Let me summarize the requirements as I understand them."

Functional Requirements

1. URL SHORTENING
   - Create short URL from long URL
   - Short codes should be 7 characters (alphanumeric)
   - Each creation generates unique short URL
   - Support custom aliases (user-defined short codes)

2. URL REDIRECTION
   - Redirect short URL to original long URL
   - Must be fast (<100ms p99)
   - Handle non-existent URLs gracefully

3. ANALYTICS
   - Track total clicks per URL
   - Track clicks by day (for charts)
   - Eventually consistent (seconds of delay OK)

4. URL MANAGEMENT
   - URLs expire after 5 years by default
   - Users can set custom expiration
   - Users can delete their URLs
   - Users can view their created URLs

5. API ACCESS
   - RESTful API for create/read/delete
   - Authentication via API keys
   - Rate limiting per tier (free/paid)

Non-Functional Requirements

1. SCALE
   - 100M URL creations per day (~1,200/sec average)
   - 1B redirects per day (~12,000/sec average)
   - Peak traffic: 10x for viral URLs (120,000 redirects/sec)
   - Storage: 5 years of URLs

2. RELIABILITY
   - No duplicate short codes (uniqueness guaranteed)
   - No lost URLs (durability)
   - Redirects must work even during partial failures

3. LATENCY
   - URL creation: <500ms at p99
   - Redirect: <100ms at p99
   - Analytics read: <1s at p99

4. AVAILABILITY
   - 99.99% uptime for redirects (critical path)
   - 99.9% uptime for creation (less critical)

Phase 2: Back of the Envelope Estimation (5 minutes)

You: "Let me work through the numbers to understand the scale we're dealing with."

Traffic Estimation

URL CREATION TRAFFIC

Daily creates:              100,000,000
Seconds per day:            86,400
Average writes/sec:         100M / 86,400 ≈ 1,160 /sec

Peak write traffic (2x):    ~2,500 /sec

REDIRECT TRAFFIC

Daily redirects:            1,000,000,000
Average reads/sec:          1B / 86,400 ≈ 11,600 /sec

Peak read traffic (10x for viral URLs):
Normal peak (2x):           ~25,000 /sec
Viral spike (10x):          ~120,000 /sec

This is our design target:  120,000 redirects/sec

Storage Estimation

URL RECORD SIZE

Short code:                 7 bytes
Long URL:                   ~200 bytes (average)
User ID:                    8 bytes
Created timestamp:          8 bytes
Expiration timestamp:       8 bytes
Click count:                8 bytes
Metadata:                   ~50 bytes
─────────────────────────────────────
Total per URL:              ~300 bytes

With indexes and overhead:  ~500 bytes per URL

STORAGE REQUIREMENTS

URLs per day:               100M
URLs per year:              36.5B
URLs for 5 years:           182.5B

Storage needed:             182.5B × 500 bytes = 91.25 TB

With 3x replication:        ~275 TB total storage

Let's round to:             300 TB over 5 years

Key Metrics Summary

ESTIMATION SUMMARY

TRAFFIC
├── Write throughput:       1,200 /sec (peak: 2,500/sec)
├── Read throughput:        12,000 /sec (peak: 120,000/sec)
└── Read:Write ratio:       10:1

STORAGE
├── Per URL:                ~500 bytes
├── Daily growth:           50 GB
├── Yearly growth:          18 TB
└── 5-year total:           ~300 TB (with replication)

SHORT CODE SPACE
├── Characters:             62 (a-z, A-Z, 0-9)
├── Length:                 7 characters
├── Total combinations:     62^7 = 3.5 trillion
├── URLs over 5 years:      182.5 billion
└── Utilization:            ~5% (plenty of room)

Phase 3: High-Level Design (10 minutes)

You: "Now let me sketch out the high-level architecture."

System Architecture

                            HIGH-LEVEL ARCHITECTURE

    ┌─────────────┐
    │   Clients   │
    │ Web/Mobile  │
    │   /API      │
    └──────┬──────┘
           │
           ▼
    ┌─────────────┐         ┌──────────────────┐
    │     CDN     │────────▶│  Static Assets   │
    │ (CloudFlare)│         │  (S3/CloudFront) │
    └──────┬──────┘         └──────────────────┘
           │
           ▼
    ┌─────────────┐
    │    Load     │
    │  Balancer   │
    │   (ALB)     │
    └──────┬──────┘
           │
     ┌─────┴─────┐
     ▼           ▼
┌─────────┐ ┌─────────┐
│   API   │ │Redirect │
│ Service │ │ Service │
│(Create) │ │ (Read)  │
└────┬────┘ └────┬────┘
     │           │
     │     ┌─────┴─────┐
     │     ▼           ▼
     │  ┌───────┐  ┌───────┐
     │  │ Redis │  │ Redis │
     │  │Cache 1│  │Cache 2│
     │  └───┬───┘  └───┬───┘
     │      └─────┬────┘
     │            │
     └─────┬──────┘
           │
           ▼
    ┌─────────────┐
    │  PostgreSQL │
    │   Cluster   │
    │ (Sharded)   │
    └─────────────┘
           │
           ▼
    ┌─────────────┐
    │   Analytics │
    │   Pipeline  │
    │   (Kafka)   │
    └─────────────┘

Component Breakdown

You: "Let me walk through each component and explain why it's there."

1. CDN (CloudFlare)

Purpose: Cache redirect responses for popular URLs at the edge

Why it matters:

Viral URLs can get millions of hits
CDN can serve 301/302 redirects directly
Reduces load on our servers by 80%+ for hot URLs
Geographic distribution reduces latency globally

Configuration:

Cache 301 redirects for 1 hour
Use short TTL to allow for URL deletion
Bypass cache for analytics tracking pixel

2. Load Balancer (ALB)

Purpose: Distribute traffic and route to appropriate service

Routing rules:

POST /api/urls → API Service (URL creation)
GET /{shortCode} → Redirect Service
GET /api/* → API Service (management)

3. API Service (URL Creation)

Purpose: Handle URL creation, management, and analytics reads

Key responsibilities:

Generate unique short codes
Validate and sanitize long URLs
Rate limiting per user/API key
User session management

4. Redirect Service (Read-Heavy)

Purpose: Handle URL redirects — the critical path

Key responsibilities:

Look up short code → long URL
Increment click counter (async)
Return 301/302 redirect
Handle 404 for non-existent URLs

Why separate services?:

Different scaling needs (redirects are 10x writes)
Different availability requirements
Can optimize each service independently

5. Redis Cache

Purpose: Cache hot URLs for fast redirects

Why Redis?:

Sub-millisecond lookups
Perfect for key-value (shortCode → longURL)
Supports TTL for automatic expiration
Cluster mode for horizontal scaling

6. PostgreSQL (Sharded)

Purpose: Persistent storage for URL mappings

Why PostgreSQL?:

ACID guarantees for URL uniqueness
Mature, battle-tested
Good partitioning support
Rich querying for analytics

7. Analytics Pipeline (Kafka)

Purpose: Track clicks asynchronously without slowing redirects

Flow:

Redirect happens → publish click event to Kafka
Consumer aggregates clicks
Batch update to analytics store
Dashboard reads from analytics store

Data Flow: URL Creation

URL CREATION FLOW

1. Client sends POST /api/urls with long URL
   │
2. API Service receives request
   │
3. Rate limiter checks user's quota ──────────────▶ REJECT if exceeded
   │
4. Generate unique short code (see Deep Dive 1)
   │
5. Write to PostgreSQL (with uniqueness constraint)
   │
   ├── If duplicate (race condition): retry with new code
   │
6. Write to Redis cache
   │
7. Return short URL to client

Data Flow: URL Redirect

URL REDIRECT FLOW

1. Client requests GET /abc1234
   │
2. CDN checks cache ─────────────────────────────▶ HIT: Return redirect
   │
   │ MISS
   ▼
3. Redirect Service receives request
   │
4. Check Redis cache ────────────────────────────▶ HIT: Go to step 6
   │
   │ MISS
   ▼
5. Query PostgreSQL for long URL
   │
   ├── Not found: Return 404
   │
   ├── Found: Write to Redis cache (async)
   │
6. Publish click event to Kafka (async, fire-and-forget)
   │
7. Return 301 redirect to client

Phase 4: Deep Dives (20 minutes)

Interviewer: "Great high-level design. Let's dive deeper into a few areas. How do you generate unique short codes efficiently?"

Deep Dive 1: Short Code Generation (Week 1, Day 1 — Partitioning)

You: "This is a critical part of the system. Let me explain the challenge and my solution."

The Problem

SHORT CODE UNIQUENESS CHALLENGE

We need to generate 100M unique short codes per day.
Each code must be:
  ✓ Unique across all time
  ✓ Unpredictable (can't guess next code)
  ✓ Fast to generate (no bottlenecks)
  ✓ Works across distributed servers

Naive approaches that fail:
  ✗ Auto-increment: Predictable, single point of failure
  ✗ Random + check DB: Race conditions, slow at scale
  ✗ UUID: Too long (36 chars vs our 7)

The Solution: Pre-generated Key Ranges

You: "I would use a pre-generated key range approach, combined with the partitioning concepts from our course."

KEY RANGE ALLOCATION

┌─────────────────────────────────────────────────────────────────────┐
│                      KEY GENERATION SERVICE                         │
│                                                                     │
│   Maintains ranges of pre-generated short codes                     │
│   ┌─────────────────────────────────────────────────────────────┐   │
│   │  Range 1: aaa0000 - aaa9999  (10,000 codes)                 │   │
│   │  Range 2: aab0000 - aab9999  (10,000 codes)                 │   │
│   │  Range 3: aac0000 - aac9999  (10,000 codes)                 │   │
│   │  ...                                                        │   │
│   └─────────────────────────────────────────────────────────────┘   │
│                                                                     │
│   API servers request a range, use locally until exhausted          │
└─────────────────────────────────────────────────────────────────────┘

    ┌──────────┐      ┌──────────┐      ┌──────────┐
    │ Server 1 │      │ Server 2 │      │ Server 3 │
    │          │      │          │      │          │
    │ Range:   │      │ Range:   │      │ Range:   │
    │ aaa****  │      │ aab****  │      │ aac****  │
    │          │      │          │      │          │
    │ Next:    │      │ Next:    │      │ Next:    │
    │ aaa0042  │      │ aab0007  │      │ aac0099  │
    └──────────┘      └──────────┘      └──────────┘

Implementation

# Short Code Generation with Pre-allocated Ranges
# Applies: Week 1, Day 1 - Partitioning concepts

import threading
from dataclasses import dataclass
from typing import Optional
import redis


@dataclass
class KeyRange:
    """A range of pre-allocated short codes."""
    prefix: str           # e.g., "aab"
    current: int          # Current position in range
    max_value: int        # Maximum value in range (e.g., 9999)
    
    def next_code(self) -> Optional[str]:
        """Get next available code from this range."""
        if self.current > self.max_value:
            return None  # Range exhausted
        
        # Generate code: prefix + zero-padded number
        code = f"{self.prefix}{self.current:04d}"
        self.current += 1
        return code
    
    def remaining(self) -> int:
        """How many codes left in this range."""
        return max(0, self.max_value - self.current + 1)


class ShortCodeGenerator:
    """
    Generates unique short codes using pre-allocated ranges.
    
    Key concepts applied:
    - Partitioning (Day 1): Each server gets its own range
    - No coordination needed for individual code generation
    - Range allocation is the only synchronized operation
    """
    
    def __init__(self, redis_client: redis.Redis, server_id: str):
        self.redis = redis_client
        self.server_id = server_id
        self.current_range: Optional[KeyRange] = None
        self.lock = threading.Lock()
        
        # Configuration
        self.range_size = 10_000  # Codes per range
        self.refill_threshold = 1_000  # Get new range when below this
    
    def generate(self) -> str:
        """
        Generate a unique short code.
        
        This is O(1) in the common case - just increment a counter.
        Only contacts central service when range is exhausted.
        """
        with self.lock:
            # Check if we need a new range
            if self._needs_new_range():
                self._allocate_new_range()
            
            # Generate code from current range
            code = self.current_range.next_code()
            
            if code is None:
                # Range exhausted mid-operation (shouldn't happen often)
                self._allocate_new_range()
                code = self.current_range.next_code()
            
            return code
    
    def _needs_new_range(self) -> bool:
        """Check if we need to allocate a new range."""
        if self.current_range is None:
            return True
        return self.current_range.remaining() < self.refill_threshold
    
    def _allocate_new_range(self) -> None:
        """
        Allocate a new range from the central service.
        
        Uses Redis INCR for atomic range allocation.
        This is the only point of coordination.
        """
        # Atomically get next range number
        range_num = self.redis.incr("shortcode:range_counter")
        
        # Convert range number to prefix (base-62 encoding)
        prefix = self._range_to_prefix(range_num)
        
        self.current_range = KeyRange(
            prefix=prefix,
            current=0,
            max_value=self.range_size - 1
        )
        
        # Log allocation for debugging
        print(f"Server {self.server_id} allocated range: {prefix}****")
    
    def _range_to_prefix(self, range_num: int) -> str:
        """
        Convert range number to 3-character prefix.
        
        Uses base-62 encoding (a-z, A-Z, 0-9).
        This gives us 62^3 = 238,328 possible prefixes.
        With 10K codes per prefix = 2.38 billion total codes per cycle.
        """
        chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
        base = len(chars)
        
        prefix = ""
        num = range_num
        for _ in range(3):
            prefix = chars[num % base] + prefix
            num //= base
        
        return prefix


# Usage Example
async def create_short_url(long_url: str, generator: ShortCodeGenerator) -> str:
    """Create a new short URL."""
    
    # Generate unique code (O(1), no DB lookup needed)
    short_code = generator.generate()
    
    # Store mapping in database
    await db.execute(
        """
        INSERT INTO urls (short_code, long_url, created_at, expires_at)
        VALUES ($1, $2, NOW(), NOW() + INTERVAL '5 years')
        """,
        short_code, long_url
    )
    
    # Cache for fast redirects
    await redis.setex(f"url:{short_code}", 86400, long_url)
    
    return f"https://tiny.link/{short_code}"

Why This Approach?

You: "This design applies the partitioning principles we learned:"

Principle	Application
Partition by range	Each server owns a range of codes
Minimize coordination	Only range allocation needs sync
Even distribution	Ranges allocated round-robin
No hot spots	Each server generates from its own range

Interviewer: "What if a server crashes with unused codes in its range?"

You: "Great question. Those codes are simply lost — but that's acceptable. With 3.5 trillion possible codes and only 182 billion needed over 5 years, losing a few thousand codes per crash is negligible. The simplicity gained is worth it."

Deep Dive 2: Handling Hot URLs (Week 1, Day 4 — Hot Keys)

Interviewer: "You mentioned viral URLs getting 10x traffic. How do you handle that?"

You: "This is exactly the hot key problem we studied. Let me show you the solution."

The Problem

HOT URL SCENARIO

Normal URL:     10 requests/second
Viral URL:      100,000 requests/second

If all requests for viral URL hit the same:
  - Cache server: Server overloaded
  - Database shard: Shard becomes bottleneck
  - Single point of failure

Examples:
  - Celebrity tweets a TinyLink
  - Product launch (Apple keynote links)
  - News events (election results link)

The Solution: Multi-Layer Hot Key Mitigation

HOT KEY DEFENSE LAYERS

Layer 1: CDN Caching
─────────────────────────────────────────────────────────────────────
  CloudFlare caches redirect responses at 200+ edge locations
  Hit rate for hot URLs: >95%
  Only 5% of requests reach our servers

Layer 2: Local Server Cache  
─────────────────────────────────────────────────────────────────────
  Each API server maintains local LRU cache
  Hot URLs naturally rise to top of cache
  Eliminates Redis calls for hottest URLs

Layer 3: Redis Read Replicas
─────────────────────────────────────────────────────────────────────
  Multiple Redis replicas for read scaling
  Random replica selection distributes load
  Hot keys spread across all replicas

Layer 4: Key Replication with Suffixes
─────────────────────────────────────────────────────────────────────
  For extremely hot keys, replicate with suffixes:
  url:abc1234 → url:abc1234:0, url:abc1234:1, url:abc1234:2
  Client randomly picks suffix, spreading load

Implementation

# Hot Key Handling for URL Redirects
# Applies: Week 1, Day 4 - Hot Keys and Skew

import random
import time
from functools import lru_cache
from typing import Optional, Tuple
import redis


class HotKeyAwareCache:
    """
    Multi-layer cache with hot key detection and mitigation.
    
    Applies Day 4 concepts:
    - Detection: Track access frequency
    - Mitigation: Multiple layers of caching
    - Distribution: Spread hot keys across replicas
    """
    
    def __init__(self, redis_cluster: redis.RedisCluster):
        self.redis = redis_cluster
        self.hot_key_threshold = 1000  # requests/minute
        self.replica_count = 3  # Replicas for hot keys
        
        # Local tracking of access patterns
        self.access_counts: dict[str, int] = {}
        self.hot_keys: set[str] = set()
        self.last_reset = time.time()
    
    # Layer 2: Local server cache (LRU)
    @lru_cache(maxsize=10_000)
    def _local_cache_get(self, key: str, version: int) -> Optional[str]:
        """
        Local in-memory cache.
        
        The 'version' parameter is a trick to invalidate cache
        entries when needed (pass incremented version).
        """
        return None  # Actual value comes from decorator's cache
    
    def get(self, short_code: str) -> Optional[str]:
        """
        Get long URL for short code with hot key handling.
        """
        key = f"url:{short_code}"
        
        # Layer 2: Check local cache first
        # (LRU cache handles this transparently)
        
        # Track access for hot key detection
        self._track_access(key)
        
        # Layer 3 & 4: Redis lookup with hot key awareness
        if key in self.hot_keys:
            # Hot key: use replicated keys
            return self._get_hot_key(key)
        else:
            # Normal key: direct lookup
            return self._get_normal_key(key)
    
    def _get_normal_key(self, key: str) -> Optional[str]:
        """Normal Redis lookup."""
        return self.redis.get(key)
    
    def _get_hot_key(self, key: str) -> Optional[str]:
        """
        Hot key lookup with distribution.
        
        Hot keys are replicated with suffixes to spread load.
        Client randomly selects a replica.
        """
        # Randomly select replica
        replica_suffix = random.randint(0, self.replica_count - 1)
        replicated_key = f"{key}:{replica_suffix}"
        
        value = self.redis.get(replicated_key)
        
        if value is None:
            # Fallback to original key (replication might be in progress)
            value = self.redis.get(key)
        
        return value
    
    def set(self, short_code: str, long_url: str, ttl: int = 86400) -> None:
        """
        Set URL mapping with hot key awareness.
        """
        key = f"url:{short_code}"
        
        # Always set the primary key
        self.redis.setex(key, ttl, long_url)
        
        # If this is a hot key, also set replicas
        if key in self.hot_keys:
            self._replicate_hot_key(key, long_url, ttl)
    
    def _replicate_hot_key(self, key: str, value: str, ttl: int) -> None:
        """Create replicas of a hot key."""
        pipeline = self.redis.pipeline()
        for i in range(self.replica_count):
            pipeline.setex(f"{key}:{i}", ttl, value)
        pipeline.execute()
    
    def _track_access(self, key: str) -> None:
        """
        Track access patterns to detect hot keys.
        
        Simple sliding window counter.
        In production, use Redis HyperLogLog or streaming algorithms.
        """
        current_time = time.time()
        
        # Reset counts every minute
        if current_time - self.last_reset > 60:
            self._analyze_and_reset()
        
        # Increment access count
        self.access_counts[key] = self.access_counts.get(key, 0) + 1
    
    def _analyze_and_reset(self) -> None:
        """Analyze access patterns and identify hot keys."""
        new_hot_keys = set()
        
        for key, count in self.access_counts.items():
            if count > self.hot_key_threshold:
                new_hot_keys.add(key)
                
                # If newly hot, create replicas
                if key not in self.hot_keys:
                    self._promote_to_hot(key)
        
        # Keys that cooled down
        for key in self.hot_keys - new_hot_keys:
            self._demote_from_hot(key)
        
        self.hot_keys = new_hot_keys
        self.access_counts = {}
        self.last_reset = time.time()
    
    def _promote_to_hot(self, key: str) -> None:
        """Promote a key to hot status by creating replicas."""
        value = self.redis.get(key)
        if value:
            ttl = self.redis.ttl(key)
            self._replicate_hot_key(key, value, max(ttl, 3600))
            print(f"Promoted hot key: {key}")
    
    def _demote_from_hot(self, key: str) -> None:
        """Remove replicas for a key that's no longer hot."""
        pipeline = self.redis.pipeline()
        for i in range(self.replica_count):
            pipeline.delete(f"{key}:{i}")
        pipeline.execute()
        print(f"Demoted key: {key}")


# Redirect service using hot key aware cache
async def redirect(short_code: str, cache: HotKeyAwareCache) -> str:
    """
    Handle URL redirect with hot key protection.
    """
    # Try cache first (handles hot keys automatically)
    long_url = cache.get(short_code)
    
    if long_url is None:
        # Cache miss - query database
        long_url = await db.fetchval(
            "SELECT long_url FROM urls WHERE short_code = $1",
            short_code
        )
        
        if long_url is None:
            raise NotFoundError(f"URL not found: {short_code}")
        
        # Populate cache
        cache.set(short_code, long_url)
    
    # Fire-and-forget analytics event
    await analytics.record_click(short_code)
    
    return long_url

Hot Key Defense Summary

Layer	Technology	Hit Rate	Latency
CDN	CloudFlare	95%+ for hot	<10ms
Local Cache	LRU in-memory	80% for warm	<1ms
Redis Replicas	Redis Cluster	99%+	<5ms
Key Replication	Suffix distribution	100%	<5ms

Interviewer: "How do you handle the case where a URL goes viral suddenly?"

You: "The CDN provides immediate relief — it starts caching within seconds of the first request. Our hot key detection runs every minute, so there's a brief window where a new viral URL hits Redis directly. But with Redis handling 100K+ ops/sec, we can absorb the initial spike while detection kicks in."

Deep Dive 3: Database Sharding Strategy (Week 1, Day 1 — Partitioning)

Interviewer: "You mentioned PostgreSQL is sharded. How do you decide which shard holds which URL?"

You: "This is a classic partitioning decision. Let me walk through the options and my recommendation."

Sharding Strategy Analysis

OPTION 1: Hash by Short Code
─────────────────────────────────────────────────────────────────────
shard = hash(short_code) % num_shards

Pros:
  ✓ Even distribution (hash spreads uniformly)
  ✓ Simple lookup (compute hash, go to shard)
  ✓ No hot shards (assuming no hot keys)

Cons:
  ✗ Can't query by user ("show me my URLs")
  ✗ Can't query by time ("URLs created today")
  ✗ Resharding requires moving data

Best for: Simple key-value lookups (our redirect path)


OPTION 2: Hash by User ID
─────────────────────────────────────────────────────────────────────
shard = hash(user_id) % num_shards

Pros:
  ✓ All user's URLs on same shard
  ✓ Efficient "show my URLs" queries
  ✓ User operations are single-shard

Cons:
  ✗ Power users create skew (one user = millions of URLs)
  ✗ Still can't query by time
  ✗ Anonymous URLs have no user_id

Best for: User-centric applications


OPTION 3: Range by Time (Monthly)
─────────────────────────────────────────────────────────────────────
shard = year_month(created_at)
2024-01 → shard_2024_01
2024-02 → shard_2024_02

Pros:
  ✓ Old data easily archived (drop old shards)
  ✓ Time-based queries efficient
  ✓ Natural data lifecycle management

Cons:
  ✗ Current month shard is HOT (all writes)
  ✗ Need to query all shards for redirect (don't know creation time)
  ✗ Uneven shard sizes if traffic varies

Best for: Time-series data, logs, analytics

My Recommendation: Hybrid Approach

You: "I recommend a hybrid approach that optimizes for our two main access patterns:"

HYBRID SHARDING STRATEGY

For REDIRECTS (read path):
─────────────────────────────────────────────────────────────────────
  Use consistent hashing on short_code
  
  short_code "abc1234" → hash → shard_7
  
  Why: Redirect needs to be fast. We know the short_code,
  we can compute the shard instantly.


For USER QUERIES (management path):
─────────────────────────────────────────────────────────────────────
  Secondary index table, sharded by user_id
  
  user_urls table:
  ├── user_id (shard key)
  ├── short_code
  ├── created_at
  └── (lightweight, just for listing)
  
  Why: Users need to see their URLs. Keep user's data together.


The URLs live on short_code shards (for fast redirect)
The user_urls index lives on user_id shards (for fast listing)

Write path creates entries in both.

Implementation: Consistent Hashing

# Database Sharding with Consistent Hashing
# Applies: Week 1, Day 1 - Partitioning strategies

import hashlib
from bisect import bisect_left
from dataclasses import dataclass
from typing import List, Dict, Any
import asyncpg


@dataclass
class ShardConfig:
    """Configuration for a database shard."""
    shard_id: int
    host: str
    port: int
    database: str
    
    @property
    def dsn(self) -> str:
        return f"postgresql://{self.host}:{self.port}/{self.database}"


class ConsistentHashRing:
    """
    Consistent hash ring for shard selection.
    
    Applies Day 1 concepts:
    - Consistent hashing minimizes data movement on rebalance
    - Virtual nodes ensure even distribution
    - Deterministic: same key always maps to same shard
    """
    
    def __init__(self, shards: List[ShardConfig], virtual_nodes: int = 150):
        self.shards = {s.shard_id: s for s in shards}
        self.virtual_nodes = virtual_nodes
        self.ring: List[tuple[int, int]] = []  # (hash, shard_id)
        
        self._build_ring()
    
    def _build_ring(self) -> None:
        """Build the consistent hash ring with virtual nodes."""
        ring = []
        
        for shard in self.shards.values():
            # Create virtual nodes for each shard
            for i in range(self.virtual_nodes):
                # Hash the shard_id + virtual node index
                key = f"{shard.shard_id}:{i}"
                hash_val = self._hash(key)
                ring.append((hash_val, shard.shard_id))
        
        # Sort by hash value
        ring.sort(key=lambda x: x[0])
        self.ring = ring
        
        # Extract just the hash values for binary search
        self.ring_hashes = [h for h, _ in ring]
    
    def _hash(self, key: str) -> int:
        """Compute hash value for a key."""
        return int(hashlib.md5(key.encode()).hexdigest(), 16)
    
    def get_shard(self, key: str) -> ShardConfig:
        """
        Get the shard responsible for a key.
        
        Uses binary search for O(log n) lookup.
        """
        if not self.ring:
            raise ValueError("No shards configured")
        
        hash_val = self._hash(key)
        
        # Find the first node with hash >= key's hash
        idx = bisect_left(self.ring_hashes, hash_val)
        
        # Wrap around if we're past the last node
        if idx >= len(self.ring):
            idx = 0
        
        shard_id = self.ring[idx][1]
        return self.shards[shard_id]
    
    def add_shard(self, shard: ShardConfig) -> Dict[int, List[str]]:
        """
        Add a new shard to the ring.
        
        Returns mapping of shard_id -> keys that need to move.
        With consistent hashing, only ~1/n keys need to move.
        """
        self.shards[shard.shard_id] = shard
        self._build_ring()
        # In practice, you'd track which keys moved
        return {}
    
    def remove_shard(self, shard_id: int) -> Dict[int, List[str]]:
        """
        Remove a shard from the ring.
        
        Returns mapping of destination shard_id -> keys to receive.
        """
        if shard_id in self.shards:
            del self.shards[shard_id]
            self._build_ring()
        return {}


class ShardedURLDatabase:
    """
    Sharded database client for URL storage.
    
    Handles:
    - Shard selection via consistent hashing
    - Connection pooling per shard
    - Cross-shard queries when needed
    """
    
    def __init__(self, shard_configs: List[ShardConfig]):
        self.hash_ring = ConsistentHashRing(shard_configs)
        self.pools: Dict[int, asyncpg.Pool] = {}
    
    async def initialize(self) -> None:
        """Create connection pools for all shards."""
        for shard in self.hash_ring.shards.values():
            self.pools[shard.shard_id] = await asyncpg.create_pool(
                shard.dsn,
                min_size=5,
                max_size=20
            )
    
    async def create_url(
        self, 
        short_code: str, 
        long_url: str,
        user_id: Optional[str] = None
    ) -> None:
        """
        Create a new URL mapping.
        
        Writes to:
        1. URL shard (keyed by short_code) - for redirects
        2. User shard (keyed by user_id) - for listing
        """
        # Get shard for this short_code
        shard = self.hash_ring.get_shard(short_code)
        pool = self.pools[shard.shard_id]
        
        async with pool.acquire() as conn:
            await conn.execute(
                """
                INSERT INTO urls (short_code, long_url, user_id, created_at, expires_at)
                VALUES ($1, $2, $3, NOW(), NOW() + INTERVAL '5 years')
                """,
                short_code, long_url, user_id
            )
        
        # Also update user index if user is known
        if user_id:
            await self._update_user_index(user_id, short_code)
    
    async def get_url(self, short_code: str) -> Optional[str]:
        """
        Get long URL by short code.
        
        Single-shard operation - very fast.
        """
        shard = self.hash_ring.get_shard(short_code)
        pool = self.pools[shard.shard_id]
        
        async with pool.acquire() as conn:
            row = await conn.fetchrow(
                """
                SELECT long_url FROM urls 
                WHERE short_code = $1 
                  AND expires_at > NOW()
                """,
                short_code
            )
            return row['long_url'] if row else None
    
    async def get_user_urls(
        self, 
        user_id: str, 
        limit: int = 100
    ) -> List[Dict[str, Any]]:
        """
        Get all URLs for a user.
        
        Queries user index shard, then fetches details from URL shards.
        """
        # Get user's shard
        user_shard = self.hash_ring.get_shard(f"user:{user_id}")
        pool = self.pools[user_shard.shard_id]
        
        async with pool.acquire() as conn:
            rows = await conn.fetch(
                """
                SELECT short_code, created_at
                FROM user_urls
                WHERE user_id = $1
                ORDER BY created_at DESC
                LIMIT $2
                """,
                user_id, limit
            )
        
        # Fetch full details from URL shards (parallel)
        urls = []
        for row in rows:
            url_data = await self.get_url(row['short_code'])
            if url_data:
                urls.append({
                    'short_code': row['short_code'],
                    'long_url': url_data,
                    'created_at': row['created_at']
                })
        
        return urls
    
    async def _update_user_index(self, user_id: str, short_code: str) -> None:
        """Update the user's URL index."""
        user_shard = self.hash_ring.get_shard(f"user:{user_id}")
        pool = self.pools[user_shard.shard_id]
        
        async with pool.acquire() as conn:
            await conn.execute(
                """
                INSERT INTO user_urls (user_id, short_code, created_at)
                VALUES ($1, $2, NOW())
                """,
                user_id, short_code
            )

Shard Distribution Visualization

CONSISTENT HASH RING

                        0
                        │
            ┌───────────┼───────────┐
           ╱            │            ╲
          ╱      Shard 1│             ╲
         ╱              │              ╲
        ▼               │               ▼
   270 ─┼───────────────┼───────────────┼─ 90
        │               │               │
        │          Shard 3              │
        │               │               │
         ╲              │              ╱
          ╲     Shard 2 │             ╱
           ╲            │            ╱
            └───────────┼───────────┘
                        │
                       180

When a new shard is added:
- Only keys in one segment need to move
- ~1/n of total keys (not all keys)
- Minimal disruption

Deep Dive 4: Rate Limiting (Week 1, Day 3 — Rate Limiting at Scale)

Interviewer: "You mentioned rate limiting for the API. How would you implement that at this scale?"

You: "Let me show you a distributed rate limiting approach that handles our scale."

The Problem

RATE LIMITING REQUIREMENTS

API Tiers:
├── Free:       100 requests/minute per API key
├── Starter:    1,000 requests/minute
├── Business:   10,000 requests/minute
└── Enterprise: 100,000 requests/minute

Challenges:
- Distributed servers (can't use local counters)
- Must be fast (<5ms overhead)
- Handle clock skew across servers
- Graceful degradation if rate limiter fails

Solution: Sliding Window with Redis

# Distributed Rate Limiting
# Applies: Week 1, Day 3 - Rate Limiting at Scale

import time
from dataclasses import dataclass
from enum import Enum
from typing import Tuple, Optional
import redis


class RateLimitTier(Enum):
    FREE = 100
    STARTER = 1_000
    BUSINESS = 10_000
    ENTERPRISE = 100_000


@dataclass
class RateLimitResult:
    """Result of a rate limit check."""
    allowed: bool
    limit: int
    remaining: int
    reset_at: int  # Unix timestamp when limit resets
    retry_after: Optional[int] = None  # Seconds to wait if blocked


class SlidingWindowRateLimiter:
    """
    Sliding window rate limiter using Redis.
    
    Applies Day 3 concepts:
    - Sliding window for smooth limiting (no burst at window start)
    - Redis for distributed state
    - Lua script for atomic operations
    """
    
    # Lua script for atomic rate limit check and update
    RATE_LIMIT_SCRIPT = """
    local key = KEYS[1]
    local now = tonumber(ARGV[1])
    local window = tonumber(ARGV[2])
    local limit = tonumber(ARGV[3])
    
    -- Remove old entries outside the window
    redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
    
    -- Count current requests in window
    local count = redis.call('ZCARD', key)
    
    if count < limit then
        -- Under limit: add this request
        redis.call('ZADD', key, now, now .. ':' .. math.random())
        redis.call('EXPIRE', key, window)
        return {1, limit - count - 1}  -- allowed, remaining
    else
        -- Over limit: reject
        -- Get oldest entry to calculate retry time
        local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
        local retry_after = 0
        if #oldest > 0 then
            retry_after = math.ceil(oldest[2] + window - now)
        end
        return {0, 0, retry_after}  -- blocked, remaining, retry_after
    end
    """
    
    def __init__(self, redis_client: redis.Redis):
        self.redis = redis_client
        self.window_seconds = 60  # 1-minute window
        
        # Register the Lua script
        self.rate_limit_sha = self.redis.script_load(self.RATE_LIMIT_SCRIPT)
    
    def check(
        self, 
        api_key: str, 
        tier: RateLimitTier
    ) -> RateLimitResult:
        """
        Check if request is allowed under rate limit.
        
        Uses sliding window algorithm:
        - Smoother than fixed windows (no thundering herd at window start)
        - More accurate than token bucket for per-minute limits
        - Single Redis round-trip with Lua script
        """
        key = f"ratelimit:{api_key}"
        now = time.time()
        limit = tier.value
        
        try:
            result = self.redis.evalsha(
                self.rate_limit_sha,
                1,  # Number of keys
                key,
                now,
                self.window_seconds,
                limit
            )
            
            allowed = result[0] == 1
            remaining = result[1]
            
            if allowed:
                return RateLimitResult(
                    allowed=True,
                    limit=limit,
                    remaining=remaining,
                    reset_at=int(now) + self.window_seconds
                )
            else:
                retry_after = result[2] if len(result) > 2 else self.window_seconds
                return RateLimitResult(
                    allowed=False,
                    limit=limit,
                    remaining=0,
                    reset_at=int(now) + retry_after,
                    retry_after=retry_after
                )
                
        except redis.RedisError as e:
            # Fail open: if Redis is down, allow the request
            # Log and alert, but don't block users
            print(f"Rate limiter error: {e}")
            return RateLimitResult(
                allowed=True,
                limit=limit,
                remaining=limit,
                reset_at=int(now) + self.window_seconds
            )
    
    def get_headers(self, result: RateLimitResult) -> dict:
        """Generate standard rate limit response headers."""
        headers = {
            'X-RateLimit-Limit': str(result.limit),
            'X-RateLimit-Remaining': str(result.remaining),
            'X-RateLimit-Reset': str(result.reset_at)
        }
        
        if not result.allowed:
            headers['Retry-After'] = str(result.retry_after)
        
        return headers


# FastAPI middleware integration
from fastapi import Request, HTTPException
from fastapi.responses import JSONResponse

class RateLimitMiddleware:
    """
    FastAPI middleware for rate limiting.
    """
    
    def __init__(self, rate_limiter: SlidingWindowRateLimiter):
        self.rate_limiter = rate_limiter
    
    async def __call__(self, request: Request, call_next):
        # Extract API key from header
        api_key = request.headers.get('X-API-Key')
        
        if not api_key:
            # No API key = free tier with IP-based limiting
            api_key = f"ip:{request.client.host}"
            tier = RateLimitTier.FREE
        else:
            # Look up tier for this API key (cached)
            tier = await self._get_tier(api_key)
        
        # Check rate limit
        result = self.rate_limiter.check(api_key, tier)
        
        if not result.allowed:
            return JSONResponse(
                status_code=429,
                content={
                    'error': 'Rate limit exceeded',
                    'retry_after': result.retry_after
                },
                headers=self.rate_limiter.get_headers(result)
            )
        
        # Process request
        response = await call_next(request)
        
        # Add rate limit headers to response
        for key, value in self.rate_limiter.get_headers(result).items():
            response.headers[key] = value
        
        return response
    
    async def _get_tier(self, api_key: str) -> RateLimitTier:
        """Look up the rate limit tier for an API key."""
        # In production, this would query a cache/database
        # For now, return a default
        return RateLimitTier.STARTER

Rate Limiting Visualization

SLIDING WINDOW RATE LIMITING

Timeline (requests marked as |):
─────────────────────────────────────────────────────────────────────

Fixed Window Problem:
  Window 1         │  Window 2
  [    ||||||||||] │ [||||||||||    ]
                   │
  User sends 10 at end of window 1
  Then 10 at start of window 2
  = 20 requests in ~1 second! ❌


Sliding Window Solution:
  ◄─── 60 second window slides with time ───►
  
  Time: 0         30        60        90
        [─────────────────────]
              Window at t=60
        
        Counts ALL requests in last 60 seconds
        No gaming the window boundary ✓

Deep Dive 5: Replication and Consistency (Week 1, Day 2 — Replication)

Interviewer: "You mentioned PostgreSQL replication. What's your replication strategy?"

You: "Let me explain how we balance consistency and availability for different operations."

Replication Strategy

REPLICATION TOPOLOGY

                    ┌─────────────────┐
                    │    Primary      │
                    │   PostgreSQL    │
                    │   (Writes)      │
                    └────────┬────────┘
                             │
              ┌──────────────┼──────────────┐
              │              │              │
              ▼              ▼              ▼
       ┌──────────┐   ┌──────────┐   ┌──────────┐
       │ Replica  │   │ Replica  │   │ Replica  │
       │    1     │   │    2     │   │    3     │
       │ (Sync)   │   │ (Async)  │   │ (Async)  │
       └──────────┘   └──────────┘   └──────────┘
              │              │              │
              └──────────────┼──────────────┘
                             │
                   ┌─────────┴─────────┐
                   │   Read Queries    │
                   │   (Load Balanced) │
                   └───────────────────┘

Consistency Requirements by Operation

# Replication and Consistency Configuration
# Applies: Week 1, Day 2 - Replication Trade-offs

from enum import Enum
from typing import Optional
import asyncpg


class ConsistencyLevel(Enum):
    """
    Consistency levels for different operations.
    
    Applies Day 2 concepts:
    - Strong consistency when correctness matters
    - Eventual consistency when speed matters
    """
    STRONG = "strong"       # Read from primary
    READ_YOUR_WRITES = "ryw"  # Use session affinity
    EVENTUAL = "eventual"   # Any replica is fine


class ReplicatedDatabase:
    """
    Database client with configurable consistency levels.
    """
    
    def __init__(
        self,
        primary_dsn: str,
        replica_dsns: list[str],
        sync_replica_dsn: str
    ):
        self.primary_dsn = primary_dsn
        self.replica_dsns = replica_dsns
        self.sync_replica_dsn = sync_replica_dsn
        
        self.primary_pool: Optional[asyncpg.Pool] = None
        self.replica_pools: list[asyncpg.Pool] = []
        self.sync_replica_pool: Optional[asyncpg.Pool] = None
    
    async def initialize(self) -> None:
        """Create connection pools."""
        self.primary_pool = await asyncpg.create_pool(
            self.primary_dsn,
            min_size=10,
            max_size=50
        )
        
        for dsn in self.replica_dsns:
            pool = await asyncpg.create_pool(dsn, min_size=5, max_size=20)
            self.replica_pools.append(pool)
        
        self.sync_replica_pool = await asyncpg.create_pool(
            self.sync_replica_dsn,
            min_size=5,
            max_size=20
        )
    
    async def write(self, query: str, *args) -> None:
        """
        Execute a write query on the primary.
        
        Writes always go to primary for consistency.
        """
        async with self.primary_pool.acquire() as conn:
            await conn.execute(query, *args)
    
    async def read(
        self,
        query: str,
        *args,
        consistency: ConsistencyLevel = ConsistencyLevel.EVENTUAL
    ):
        """
        Execute a read query with specified consistency level.
        
        Consistency levels:
        - STRONG: Read from primary (freshest, but adds load)
        - READ_YOUR_WRITES: Read from sync replica (good balance)
        - EVENTUAL: Read from any replica (fastest, slight delay)
        """
        pool = self._select_pool(consistency)
        
        async with pool.acquire() as conn:
            return await conn.fetch(query, *args)
    
    def _select_pool(self, consistency: ConsistencyLevel) -> asyncpg.Pool:
        """Select appropriate pool based on consistency requirement."""
        if consistency == ConsistencyLevel.STRONG:
            return self.primary_pool
        elif consistency == ConsistencyLevel.READ_YOUR_WRITES:
            return self.sync_replica_pool
        else:  # EVENTUAL
            # Round-robin across async replicas
            import random
            return random.choice(self.replica_pools)


# URL Service with appropriate consistency levels
class URLService:
    """
    URL service demonstrating consistency trade-offs.
    """
    
    def __init__(self, db: ReplicatedDatabase, cache: HotKeyAwareCache):
        self.db = db
        self.cache = cache
    
    async def create_url(self, short_code: str, long_url: str) -> dict:
        """
        Create a new URL.
        
        Consistency: Strong write (must not lose data)
        """
        await self.db.write(
            """
            INSERT INTO urls (short_code, long_url, created_at)
            VALUES ($1, $2, NOW())
            """,
            short_code, long_url
        )
        
        # Also cache it (eventual consistency OK here)
        self.cache.set(short_code, long_url)
        
        return {'short_code': short_code, 'long_url': long_url}
    
    async def get_url_for_redirect(self, short_code: str) -> Optional[str]:
        """
        Get URL for redirect (critical path).
        
        Consistency: Eventual (speed matters most)
        
        Stale read is OK: worst case, user sees old destination
        for a few hundred milliseconds after an update.
        """
        # Try cache first
        long_url = self.cache.get(short_code)
        if long_url:
            return long_url
        
        # Cache miss: query replica
        rows = await self.db.read(
            "SELECT long_url FROM urls WHERE short_code = $1",
            short_code,
            consistency=ConsistencyLevel.EVENTUAL
        )
        
        if rows:
            long_url = rows[0]['long_url']
            self.cache.set(short_code, long_url)
            return long_url
        
        return None
    
    async def get_url_after_create(self, short_code: str) -> Optional[dict]:
        """
        Get URL immediately after creation.
        
        Consistency: Read-your-writes (user expects to see their URL)
        
        After creating a URL, user immediately views it.
        Must see the URL they just created.
        """
        rows = await self.db.read(
            """
            SELECT short_code, long_url, created_at, click_count
            FROM urls WHERE short_code = $1
            """,
            short_code,
            consistency=ConsistencyLevel.READ_YOUR_WRITES
        )
        
        if rows:
            return dict(rows[0])
        return None
    
    async def get_analytics(self, short_code: str) -> Optional[dict]:
        """
        Get analytics for a URL.
        
        Consistency: Eventual (analytics can be slightly delayed)
        
        Analytics are inherently delayed anyway, so eventual
        consistency is perfect here.
        """
        rows = await self.db.read(
            """
            SELECT click_count, 
                   DATE(clicked_at) as date,
                   COUNT(*) as clicks
            FROM url_clicks
            WHERE short_code = $1
            GROUP BY DATE(clicked_at)
            ORDER BY date DESC
            LIMIT 30
            """,
            short_code,
            consistency=ConsistencyLevel.EVENTUAL
        )
        
        return {'daily_clicks': [dict(r) for r in rows]}

Replication Trade-off Summary

Operation	Consistency	Why
URL Creation	Strong (primary)	Must not lose writes
Redirect Lookup	Eventual (any replica)	Speed is critical, stale OK
User's URL List	Read-your-writes	User expects to see their URLs
Analytics	Eventual	Already delayed, speed preferred
URL Deletion	Strong + cache invalidation	Must take effect immediately

Phase 5: Scaling and Edge Cases (5 minutes)

Interviewer: "How would this system scale to 10x the current load?"

Scaling Strategy

You: "Let me walk through the scaling vectors."

Horizontal Scaling Plan

CURRENT → 10X SCALE

Component           Current              10x Scale
─────────────────────────────────────────────────────────────────────
API Servers         10 instances         100 instances (stateless)
Redirect Servers    20 instances         200 instances (stateless)
Redis Cache         3-node cluster       30-node cluster
PostgreSQL          5 shards             50 shards
Kafka               3 brokers            12 brokers

Bottleneck Analysis

BOTTLENECK ANALYSIS

Component        Limit              Scaling Solution
─────────────────────────────────────────────────────────────────────
CDN              Virtually unlimited   CloudFlare handles automatically
Load Balancer    1M conn/sec          Add more ALBs, use NLB for L4
API Servers      ~500 req/s each      Add instances (stateless)
Redis            ~100K ops/s/node     Add nodes to cluster
PostgreSQL       ~10K writes/s/shard  Add more shards
Network          10 Gbps              Multiple NICs, better instances

Edge Cases

Interviewer: "What edge cases should we handle?"

You: "Several important ones..."

Edge Case 1: Duplicate Short Code

Scenario: Two servers generate the same short code simultaneously
          (shouldn't happen with our range allocation, but...)

Handling:
  - Database has UNIQUE constraint on short_code
  - If insert fails with duplicate error, retry with new code
  - Maximum 3 retries before returning error

Code:
  for attempt in range(3):
      code = generator.generate()
      try:
          await db.create_url(code, long_url)
          return code
      except UniqueViolationError:
          continue  # Try again
  raise ServiceError("Failed to generate unique code")

Edge Case 2: Expired URL Still Cached

Scenario: URL expires in database but remains in Redis cache
          User can still access expired URL

Handling:
  - Include expiration time in cache value
  - Check expiration on cache hit, not just DB lookup
  - Background job to clean expired URLs from cache
  
Cache value format:
  {
    "long_url": "https://example.com/...",
    "expires_at": 1735689600
  }

Edge Case 3: Database Failover During Write

Scenario: Primary database fails mid-write
          URL created but not confirmed to user

Handling:
  - Use idempotency keys (user can retry safely)
  - On retry, check if URL already exists with same idempotency key
  - If exists, return existing URL instead of error

Failure Scenarios

Failure	Detection	Impact	Recovery
Redis down	Health checks	Slower redirects (DB fallback)	Fail open, use DB directly
Primary DB down	Connection errors	No new URL creation	Promote sync replica to primary
CDN down	External monitoring	Higher origin load	DNS failover to backup CDN
API server crash	ALB health checks	Requests routed elsewhere	Auto-scaling replaces instance

Phase 6: Monitoring and Operations

Interviewer: "How would you monitor this system in production?"

Key Metrics

You: "I would track metrics at multiple levels."

Business Metrics

- URLs created per minute: Target 1,200/min, Alert if <800 or >3,000
- Redirects per second: Target 12,000/sec, Alert if <5,000
- 404 rate: Target <0.1%, Alert if >1%
- Unique short codes remaining: Alert if <10% capacity

System Metrics

MONITORING DASHBOARD

SERVICE HEALTH
├── Redirect latency p99: < 50ms       [████████░░] 42ms
├── Create latency p99: < 200ms        [██████░░░░] 156ms
├── Error rate: < 0.1%                 [█░░░░░░░░░] 0.02%
└── Throughput: > 10K/sec              [████████░░] 11,247/sec

CACHE
├── Redis hit rate: > 95%              [█████████░] 97%
├── Redis latency p99: < 5ms           [██░░░░░░░░] 1.2ms
└── Cache size: < 80% capacity         [██████░░░░] 62%

DATABASE
├── Query latency p99: < 50ms          [████░░░░░░] 23ms
├── Connection pool usage: < 80%       [██████░░░░] 58%
├── Replication lag: < 100ms           [██░░░░░░░░] 45ms
└── Disk usage: < 70%                  [█████░░░░░] 52%

Alerting Strategy

CRITICAL (PagerDuty - wake someone up):
  - Redirect error rate > 5%
  - Database primary unreachable
  - All Redis nodes down
  - Redirect latency p99 > 500ms

WARNING (Slack - business hours):
  - Redirect error rate > 1%
  - Cache hit rate < 90%
  - Replication lag > 1 second
  - API rate limit errors > 10%

INFO (Dashboard only):
  - New URL creation rate changes
  - Traffic pattern anomalies
  - Approaching capacity thresholds

Runbook Snippet

RUNBOOK: High Redirect Latency

SYMPTOMS:
  - Redirect p99 > 100ms
  - User complaints about slow links

DIAGNOSIS:
  1. Check Redis latency:
     redis-cli --latency -h redis-cluster
     
  2. Check cache hit rate:
     redis-cli INFO stats | grep keyspace_hits
     
  3. Check database load:
     SELECT * FROM pg_stat_activity WHERE state = 'active';
     
  4. Check for hot keys:
     redis-cli --hotkeys

RESOLUTION:
  1. If Redis slow: Check memory, restart if needed
  2. If cache miss rate high: Check TTLs, warm cache
  3. If DB overloaded: Add read replicas, check slow queries
  4. If hot key detected: Enable key replication (see Hot Key runbook)
  
ESCALATE TO: Platform team if not resolved in 15 minutes

Interview Conclusion

Interviewer: "Excellent work. You've demonstrated strong understanding of partitioning, replication, rate limiting, and hot key handling. I particularly liked how you connected each design decision to specific trade-offs. Any questions for me?"

You: "Thank you! I'd love to hear how your team currently handles URL shortening at scale, and what unexpected challenges you've faced in production."

Summary: Week 1 Concepts Applied

Week 1 Concepts (Foundations of Scale)

Day	Concept	Application in This Design
Day 1	Partitioning	Short code generation with ranges, database sharding with consistent hashing
Day 2	Replication	PostgreSQL primary/replica setup, consistency levels per operation type
Day 3	Rate Limiting	Sliding window rate limiter with Redis, tiered API limits
Day 4	Hot Keys	Multi-layer caching, hot key detection and replication
Day 5	Session Store	User session for API authentication, rate limit state

Code Patterns Demonstrated

1. PRE-ALLOCATED KEY RANGES
   - Used for: Short code generation
   - Key insight: Minimize coordination by allocating ranges

2. CONSISTENT HASHING
   - Used for: Database sharding
   - Key insight: Minimize data movement when scaling

3. SLIDING WINDOW RATE LIMITING
   - Used for: API rate limiting
   - Key insight: Lua scripts for atomic Redis operations

4. HOT KEY REPLICATION
   - Used for: Handling viral URLs
   - Key insight: Detect hot keys, replicate with suffixes

5. CONSISTENCY LEVEL SELECTION
   - Used for: Database reads
   - Key insight: Match consistency to operation requirements

Self-Assessment Checklist

After studying this capstone, you should be able to:

This capstone integrates all concepts from Week 1 of the System Design Mastery Series. The URL shortener problem is a classic interview question that touches on partitioning, replication, caching, and rate limiting — all core topics for building scalable systems.

Back to Course Overview