Week 1 Capstone: Design a URL Shortener at Scale
π― A Real-World Problem Covering Week 1 Concepts
The Interview Begins
You walk into the interview room. The interviewer smiles and gestures to the whiteboard.
Interviewer: "Thanks for coming in. Today we're going to work through a system design problem together. I'm interested in your thought process, so please think out loud. Feel free to ask questions β this is meant to be collaborative."
They write on the whiteboard:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β Design a URL Shortener Service ("TinyLink") β
β β
β TinyLink is a URL shortening service like bit.ly or tinyurl.com. β
β Users paste long URLs and get short, shareable links. β
β β
β - 100 million new URLs created per day β
β - 10:1 read to write ratio (1 billion redirects/day) β
β - URLs should be as short as possible β
β - Links expire after 5 years by default β
β - Analytics: track click counts per URL β
β - API access for enterprise customers β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Interviewer: "Take a few minutes to think about this, then walk me through your approach. We have about 45 minutes."
Phase 1: Requirements Clarification (5 minutes)
Before diving in, you take a breath and start asking questions. This is crucial β never assume.
Your Questions
You: "Before I start designing, I'd like to clarify a few requirements. With 100 million URLs per day, are these evenly distributed, or do we have peak hours?"
Interviewer: "Good question. Traffic is global, so it's relatively even with about 20% variation. But during major events β like when a viral tweet includes a TinyLink β we can see 10x spikes for specific URLs."
You: "For the short URLs β do they need to be truly random, or can they be sequential? And what characters can we use?"
Interviewer: "They should appear random to users for security reasons. Use alphanumeric characters β uppercase, lowercase, and digits. So 62 possible characters."
You: "What happens when a user creates a short URL for the same long URL twice? Same short URL or different?"
Interviewer: "Different. Each creation is unique. Users might share the same destination but want separate tracking."
You: "For the analytics β do users need real-time click counts, or is eventual consistency acceptable?"
Interviewer: "Eventually consistent is fine. A few seconds of delay is acceptable. But the redirect itself must be fast β that's the critical path."
You: "Last question β you mentioned enterprise API access. Do we need to rate limit API consumers?"
Interviewer: "Absolutely. Free tier gets 100 requests/minute, paid tiers get more. We've had issues with abuse in the past."
You: "Perfect. Let me summarize the requirements as I understand them."
Functional Requirements
1. URL SHORTENING
- Create short URL from long URL
- Short codes should be 7 characters (alphanumeric)
- Each creation generates unique short URL
- Support custom aliases (user-defined short codes)
2. URL REDIRECTION
- Redirect short URL to original long URL
- Must be fast (<100ms p99)
- Handle non-existent URLs gracefully
3. ANALYTICS
- Track total clicks per URL
- Track clicks by day (for charts)
- Eventually consistent (seconds of delay OK)
4. URL MANAGEMENT
- URLs expire after 5 years by default
- Users can set custom expiration
- Users can delete their URLs
- Users can view their created URLs
5. API ACCESS
- RESTful API for create/read/delete
- Authentication via API keys
- Rate limiting per tier (free/paid)
Non-Functional Requirements
1. SCALE
- 100M URL creations per day (~1,200/sec average)
- 1B redirects per day (~12,000/sec average)
- Peak traffic: 10x for viral URLs (120,000 redirects/sec)
- Storage: 5 years of URLs
2. RELIABILITY
- No duplicate short codes (uniqueness guaranteed)
- No lost URLs (durability)
- Redirects must work even during partial failures
3. LATENCY
- URL creation: <500ms at p99
- Redirect: <100ms at p99
- Analytics read: <1s at p99
4. AVAILABILITY
- 99.99% uptime for redirects (critical path)
- 99.9% uptime for creation (less critical)
Phase 2: Back of the Envelope Estimation (5 minutes)
You: "Let me work through the numbers to understand the scale we're dealing with."
Traffic Estimation
URL CREATION TRAFFIC
Daily creates: 100,000,000
Seconds per day: 86,400
Average writes/sec: 100M / 86,400 β 1,160 /sec
Peak write traffic (2x): ~2,500 /sec
REDIRECT TRAFFIC
Daily redirects: 1,000,000,000
Average reads/sec: 1B / 86,400 β 11,600 /sec
Peak read traffic (10x for viral URLs):
Normal peak (2x): ~25,000 /sec
Viral spike (10x): ~120,000 /sec
This is our design target: 120,000 redirects/sec
Storage Estimation
URL RECORD SIZE
Short code: 7 bytes
Long URL: ~200 bytes (average)
User ID: 8 bytes
Created timestamp: 8 bytes
Expiration timestamp: 8 bytes
Click count: 8 bytes
Metadata: ~50 bytes
βββββββββββββββββββββββββββββββββββββ
Total per URL: ~300 bytes
With indexes and overhead: ~500 bytes per URL
STORAGE REQUIREMENTS
URLs per day: 100M
URLs per year: 36.5B
URLs for 5 years: 182.5B
Storage needed: 182.5B Γ 500 bytes = 91.25 TB
With 3x replication: ~275 TB total storage
Let's round to: 300 TB over 5 years
Key Metrics Summary
ESTIMATION SUMMARY
TRAFFIC
βββ Write throughput: 1,200 /sec (peak: 2,500/sec)
βββ Read throughput: 12,000 /sec (peak: 120,000/sec)
βββ Read:Write ratio: 10:1
STORAGE
βββ Per URL: ~500 bytes
βββ Daily growth: 50 GB
βββ Yearly growth: 18 TB
βββ 5-year total: ~300 TB (with replication)
SHORT CODE SPACE
βββ Characters: 62 (a-z, A-Z, 0-9)
βββ Length: 7 characters
βββ Total combinations: 62^7 = 3.5 trillion
βββ URLs over 5 years: 182.5 billion
βββ Utilization: ~5% (plenty of room)
Phase 3: High-Level Design (10 minutes)
You: "Now let me sketch out the high-level architecture."
System Architecture
HIGH-LEVEL ARCHITECTURE
βββββββββββββββ
β Clients β
β Web/Mobile β
β /API β
ββββββββ¬βββββββ
β
βΌ
βββββββββββββββ ββββββββββββββββββββ
β CDN ββββββββββΆβ Static Assets β
β (CloudFlare)β β (S3/CloudFront) β
ββββββββ¬βββββββ ββββββββββββββββββββ
β
βΌ
βββββββββββββββ
β Load β
β Balancer β
β (ALB) β
ββββββββ¬βββββββ
β
βββββββ΄ββββββ
βΌ βΌ
βββββββββββ βββββββββββ
β API β βRedirect β
β Service β β Service β
β(Create) β β (Read) β
ββββββ¬βββββ ββββββ¬βββββ
β β
β βββββββ΄ββββββ
β βΌ βΌ
β βββββββββ βββββββββ
β β Redis β β Redis β
β βCache 1β βCache 2β
β βββββ¬ββββ βββββ¬ββββ
β βββββββ¬βββββ
β β
βββββββ¬βββββββ
β
βΌ
βββββββββββββββ
β PostgreSQL β
β Cluster β
β (Sharded) β
βββββββββββββββ
β
βΌ
βββββββββββββββ
β Analytics β
β Pipeline β
β (Kafka) β
βββββββββββββββ
Component Breakdown
You: "Let me walk through each component and explain why it's there."
1. CDN (CloudFlare)
Purpose: Cache redirect responses for popular URLs at the edge
Why it matters:
- Viral URLs can get millions of hits
- CDN can serve 301/302 redirects directly
- Reduces load on our servers by 80%+ for hot URLs
- Geographic distribution reduces latency globally
Configuration:
- Cache 301 redirects for 1 hour
- Use short TTL to allow for URL deletion
- Bypass cache for analytics tracking pixel
2. Load Balancer (ALB)
Purpose: Distribute traffic and route to appropriate service
Routing rules:
POST /api/urlsβ API Service (URL creation)GET /{shortCode}β Redirect ServiceGET /api/*β API Service (management)
3. API Service (URL Creation)
Purpose: Handle URL creation, management, and analytics reads
Key responsibilities:
- Generate unique short codes
- Validate and sanitize long URLs
- Rate limiting per user/API key
- User session management
4. Redirect Service (Read-Heavy)
Purpose: Handle URL redirects β the critical path
Key responsibilities:
- Look up short code β long URL
- Increment click counter (async)
- Return 301/302 redirect
- Handle 404 for non-existent URLs
Why separate services?:
- Different scaling needs (redirects are 10x writes)
- Different availability requirements
- Can optimize each service independently
5. Redis Cache
Purpose: Cache hot URLs for fast redirects
Why Redis?:
- Sub-millisecond lookups
- Perfect for key-value (shortCode β longURL)
- Supports TTL for automatic expiration
- Cluster mode for horizontal scaling
6. PostgreSQL (Sharded)
Purpose: Persistent storage for URL mappings
Why PostgreSQL?:
- ACID guarantees for URL uniqueness
- Mature, battle-tested
- Good partitioning support
- Rich querying for analytics
7. Analytics Pipeline (Kafka)
Purpose: Track clicks asynchronously without slowing redirects
Flow:
- Redirect happens β publish click event to Kafka
- Consumer aggregates clicks
- Batch update to analytics store
- Dashboard reads from analytics store
Data Flow: URL Creation
URL CREATION FLOW
1. Client sends POST /api/urls with long URL
β
2. API Service receives request
β
3. Rate limiter checks user's quota βββββββββββββββΆ REJECT if exceeded
β
4. Generate unique short code (see Deep Dive 1)
β
5. Write to PostgreSQL (with uniqueness constraint)
β
βββ If duplicate (race condition): retry with new code
β
6. Write to Redis cache
β
7. Return short URL to client
Data Flow: URL Redirect
URL REDIRECT FLOW
1. Client requests GET /abc1234
β
2. CDN checks cache ββββββββββββββββββββββββββββββΆ HIT: Return redirect
β
β MISS
βΌ
3. Redirect Service receives request
β
4. Check Redis cache βββββββββββββββββββββββββββββΆ HIT: Go to step 6
β
β MISS
βΌ
5. Query PostgreSQL for long URL
β
βββ Not found: Return 404
β
βββ Found: Write to Redis cache (async)
β
6. Publish click event to Kafka (async, fire-and-forget)
β
7. Return 301 redirect to client
Phase 4: Deep Dives (20 minutes)
Interviewer: "Great high-level design. Let's dive deeper into a few areas. How do you generate unique short codes efficiently?"
Deep Dive 1: Short Code Generation (Week 1, Day 1 β Partitioning)
You: "This is a critical part of the system. Let me explain the challenge and my solution."
The Problem
SHORT CODE UNIQUENESS CHALLENGE
We need to generate 100M unique short codes per day.
Each code must be:
β Unique across all time
β Unpredictable (can't guess next code)
β Fast to generate (no bottlenecks)
β Works across distributed servers
Naive approaches that fail:
β Auto-increment: Predictable, single point of failure
β Random + check DB: Race conditions, slow at scale
β UUID: Too long (36 chars vs our 7)
The Solution: Pre-generated Key Ranges
You: "I would use a pre-generated key range approach, combined with the partitioning concepts from our course."
KEY RANGE ALLOCATION
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β KEY GENERATION SERVICE β
β β
β Maintains ranges of pre-generated short codes β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Range 1: aaa0000 - aaa9999 (10,000 codes) β β
β β Range 2: aab0000 - aab9999 (10,000 codes) β β
β β Range 3: aac0000 - aac9999 (10,000 codes) β β
β β ... β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β API servers request a range, use locally until exhausted β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββ ββββββββββββ ββββββββββββ
β Server 1 β β Server 2 β β Server 3 β
β β β β β β
β Range: β β Range: β β Range: β
β aaa**** β β aab**** β β aac**** β
β β β β β β
β Next: β β Next: β β Next: β
β aaa0042 β β aab0007 β β aac0099 β
ββββββββββββ ββββββββββββ ββββββββββββ
Implementation
# Short Code Generation with Pre-allocated Ranges
# Applies: Week 1, Day 1 - Partitioning concepts
import threading
from dataclasses import dataclass
from typing import Optional
import redis
@dataclass
class KeyRange:
"""A range of pre-allocated short codes."""
prefix: str # e.g., "aab"
current: int # Current position in range
max_value: int # Maximum value in range (e.g., 9999)
def next_code(self) -> Optional[str]:
"""Get next available code from this range."""
if self.current > self.max_value:
return None # Range exhausted
# Generate code: prefix + zero-padded number
code = f"{self.prefix}{self.current:04d}"
self.current += 1
return code
def remaining(self) -> int:
"""How many codes left in this range."""
return max(0, self.max_value - self.current + 1)
class ShortCodeGenerator:
"""
Generates unique short codes using pre-allocated ranges.
Key concepts applied:
- Partitioning (Day 1): Each server gets its own range
- No coordination needed for individual code generation
- Range allocation is the only synchronized operation
"""
def __init__(self, redis_client: redis.Redis, server_id: str):
self.redis = redis_client
self.server_id = server_id
self.current_range: Optional[KeyRange] = None
self.lock = threading.Lock()
# Configuration
self.range_size = 10_000 # Codes per range
self.refill_threshold = 1_000 # Get new range when below this
def generate(self) -> str:
"""
Generate a unique short code.
This is O(1) in the common case - just increment a counter.
Only contacts central service when range is exhausted.
"""
with self.lock:
# Check if we need a new range
if self._needs_new_range():
self._allocate_new_range()
# Generate code from current range
code = self.current_range.next_code()
if code is None:
# Range exhausted mid-operation (shouldn't happen often)
self._allocate_new_range()
code = self.current_range.next_code()
return code
def _needs_new_range(self) -> bool:
"""Check if we need to allocate a new range."""
if self.current_range is None:
return True
return self.current_range.remaining() < self.refill_threshold
def _allocate_new_range(self) -> None:
"""
Allocate a new range from the central service.
Uses Redis INCR for atomic range allocation.
This is the only point of coordination.
"""
# Atomically get next range number
range_num = self.redis.incr("shortcode:range_counter")
# Convert range number to prefix (base-62 encoding)
prefix = self._range_to_prefix(range_num)
self.current_range = KeyRange(
prefix=prefix,
current=0,
max_value=self.range_size - 1
)
# Log allocation for debugging
print(f"Server {self.server_id} allocated range: {prefix}****")
def _range_to_prefix(self, range_num: int) -> str:
"""
Convert range number to 3-character prefix.
Uses base-62 encoding (a-z, A-Z, 0-9).
This gives us 62^3 = 238,328 possible prefixes.
With 10K codes per prefix = 2.38 billion total codes per cycle.
"""
chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
base = len(chars)
prefix = ""
num = range_num
for _ in range(3):
prefix = chars[num % base] + prefix
num //= base
return prefix
# Usage Example
async def create_short_url(long_url: str, generator: ShortCodeGenerator) -> str:
"""Create a new short URL."""
# Generate unique code (O(1), no DB lookup needed)
short_code = generator.generate()
# Store mapping in database
await db.execute(
"""
INSERT INTO urls (short_code, long_url, created_at, expires_at)
VALUES ($1, $2, NOW(), NOW() + INTERVAL '5 years')
""",
short_code, long_url
)
# Cache for fast redirects
await redis.setex(f"url:{short_code}", 86400, long_url)
return f"https://tiny.link/{short_code}"
Why This Approach?
You: "This design applies the partitioning principles we learned:"
| Principle | Application |
|---|---|
| Partition by range | Each server owns a range of codes |
| Minimize coordination | Only range allocation needs sync |
| Even distribution | Ranges allocated round-robin |
| No hot spots | Each server generates from its own range |
Interviewer: "What if a server crashes with unused codes in its range?"
You: "Great question. Those codes are simply lost β but that's acceptable. With 3.5 trillion possible codes and only 182 billion needed over 5 years, losing a few thousand codes per crash is negligible. The simplicity gained is worth it."
Deep Dive 2: Handling Hot URLs (Week 1, Day 4 β Hot Keys)
Interviewer: "You mentioned viral URLs getting 10x traffic. How do you handle that?"
You: "This is exactly the hot key problem we studied. Let me show you the solution."
The Problem
HOT URL SCENARIO
Normal URL: 10 requests/second
Viral URL: 100,000 requests/second
If all requests for viral URL hit the same:
- Cache server: Server overloaded
- Database shard: Shard becomes bottleneck
- Single point of failure
Examples:
- Celebrity tweets a TinyLink
- Product launch (Apple keynote links)
- News events (election results link)
The Solution: Multi-Layer Hot Key Mitigation
HOT KEY DEFENSE LAYERS
Layer 1: CDN Caching
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
CloudFlare caches redirect responses at 200+ edge locations
Hit rate for hot URLs: >95%
Only 5% of requests reach our servers
Layer 2: Local Server Cache
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Each API server maintains local LRU cache
Hot URLs naturally rise to top of cache
Eliminates Redis calls for hottest URLs
Layer 3: Redis Read Replicas
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Multiple Redis replicas for read scaling
Random replica selection distributes load
Hot keys spread across all replicas
Layer 4: Key Replication with Suffixes
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
For extremely hot keys, replicate with suffixes:
url:abc1234 β url:abc1234:0, url:abc1234:1, url:abc1234:2
Client randomly picks suffix, spreading load
Implementation
# Hot Key Handling for URL Redirects
# Applies: Week 1, Day 4 - Hot Keys and Skew
import random
import time
from functools import lru_cache
from typing import Optional, Tuple
import redis
class HotKeyAwareCache:
"""
Multi-layer cache with hot key detection and mitigation.
Applies Day 4 concepts:
- Detection: Track access frequency
- Mitigation: Multiple layers of caching
- Distribution: Spread hot keys across replicas
"""
def __init__(self, redis_cluster: redis.RedisCluster):
self.redis = redis_cluster
self.hot_key_threshold = 1000 # requests/minute
self.replica_count = 3 # Replicas for hot keys
# Local tracking of access patterns
self.access_counts: dict[str, int] = {}
self.hot_keys: set[str] = set()
self.last_reset = time.time()
# Layer 2: Local server cache (LRU)
@lru_cache(maxsize=10_000)
def _local_cache_get(self, key: str, version: int) -> Optional[str]:
"""
Local in-memory cache.
The 'version' parameter is a trick to invalidate cache
entries when needed (pass incremented version).
"""
return None # Actual value comes from decorator's cache
def get(self, short_code: str) -> Optional[str]:
"""
Get long URL for short code with hot key handling.
"""
key = f"url:{short_code}"
# Layer 2: Check local cache first
# (LRU cache handles this transparently)
# Track access for hot key detection
self._track_access(key)
# Layer 3 & 4: Redis lookup with hot key awareness
if key in self.hot_keys:
# Hot key: use replicated keys
return self._get_hot_key(key)
else:
# Normal key: direct lookup
return self._get_normal_key(key)
def _get_normal_key(self, key: str) -> Optional[str]:
"""Normal Redis lookup."""
return self.redis.get(key)
def _get_hot_key(self, key: str) -> Optional[str]:
"""
Hot key lookup with distribution.
Hot keys are replicated with suffixes to spread load.
Client randomly selects a replica.
"""
# Randomly select replica
replica_suffix = random.randint(0, self.replica_count - 1)
replicated_key = f"{key}:{replica_suffix}"
value = self.redis.get(replicated_key)
if value is None:
# Fallback to original key (replication might be in progress)
value = self.redis.get(key)
return value
def set(self, short_code: str, long_url: str, ttl: int = 86400) -> None:
"""
Set URL mapping with hot key awareness.
"""
key = f"url:{short_code}"
# Always set the primary key
self.redis.setex(key, ttl, long_url)
# If this is a hot key, also set replicas
if key in self.hot_keys:
self._replicate_hot_key(key, long_url, ttl)
def _replicate_hot_key(self, key: str, value: str, ttl: int) -> None:
"""Create replicas of a hot key."""
pipeline = self.redis.pipeline()
for i in range(self.replica_count):
pipeline.setex(f"{key}:{i}", ttl, value)
pipeline.execute()
def _track_access(self, key: str) -> None:
"""
Track access patterns to detect hot keys.
Simple sliding window counter.
In production, use Redis HyperLogLog or streaming algorithms.
"""
current_time = time.time()
# Reset counts every minute
if current_time - self.last_reset > 60:
self._analyze_and_reset()
# Increment access count
self.access_counts[key] = self.access_counts.get(key, 0) + 1
def _analyze_and_reset(self) -> None:
"""Analyze access patterns and identify hot keys."""
new_hot_keys = set()
for key, count in self.access_counts.items():
if count > self.hot_key_threshold:
new_hot_keys.add(key)
# If newly hot, create replicas
if key not in self.hot_keys:
self._promote_to_hot(key)
# Keys that cooled down
for key in self.hot_keys - new_hot_keys:
self._demote_from_hot(key)
self.hot_keys = new_hot_keys
self.access_counts = {}
self.last_reset = time.time()
def _promote_to_hot(self, key: str) -> None:
"""Promote a key to hot status by creating replicas."""
value = self.redis.get(key)
if value:
ttl = self.redis.ttl(key)
self._replicate_hot_key(key, value, max(ttl, 3600))
print(f"Promoted hot key: {key}")
def _demote_from_hot(self, key: str) -> None:
"""Remove replicas for a key that's no longer hot."""
pipeline = self.redis.pipeline()
for i in range(self.replica_count):
pipeline.delete(f"{key}:{i}")
pipeline.execute()
print(f"Demoted key: {key}")
# Redirect service using hot key aware cache
async def redirect(short_code: str, cache: HotKeyAwareCache) -> str:
"""
Handle URL redirect with hot key protection.
"""
# Try cache first (handles hot keys automatically)
long_url = cache.get(short_code)
if long_url is None:
# Cache miss - query database
long_url = await db.fetchval(
"SELECT long_url FROM urls WHERE short_code = $1",
short_code
)
if long_url is None:
raise NotFoundError(f"URL not found: {short_code}")
# Populate cache
cache.set(short_code, long_url)
# Fire-and-forget analytics event
await analytics.record_click(short_code)
return long_url
Hot Key Defense Summary
| Layer | Technology | Hit Rate | Latency |
|---|---|---|---|
| CDN | CloudFlare | 95%+ for hot | <10ms |
| Local Cache | LRU in-memory | 80% for warm | <1ms |
| Redis Replicas | Redis Cluster | 99%+ | <5ms |
| Key Replication | Suffix distribution | 100% | <5ms |
Interviewer: "How do you handle the case where a URL goes viral suddenly?"
You: "The CDN provides immediate relief β it starts caching within seconds of the first request. Our hot key detection runs every minute, so there's a brief window where a new viral URL hits Redis directly. But with Redis handling 100K+ ops/sec, we can absorb the initial spike while detection kicks in."
Deep Dive 3: Database Sharding Strategy (Week 1, Day 1 β Partitioning)
Interviewer: "You mentioned PostgreSQL is sharded. How do you decide which shard holds which URL?"
You: "This is a classic partitioning decision. Let me walk through the options and my recommendation."
Sharding Strategy Analysis
OPTION 1: Hash by Short Code
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
shard = hash(short_code) % num_shards
Pros:
β Even distribution (hash spreads uniformly)
β Simple lookup (compute hash, go to shard)
β No hot shards (assuming no hot keys)
Cons:
β Can't query by user ("show me my URLs")
β Can't query by time ("URLs created today")
β Resharding requires moving data
Best for: Simple key-value lookups (our redirect path)
OPTION 2: Hash by User ID
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
shard = hash(user_id) % num_shards
Pros:
β All user's URLs on same shard
β Efficient "show my URLs" queries
β User operations are single-shard
Cons:
β Power users create skew (one user = millions of URLs)
β Still can't query by time
β Anonymous URLs have no user_id
Best for: User-centric applications
OPTION 3: Range by Time (Monthly)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
shard = year_month(created_at)
2024-01 β shard_2024_01
2024-02 β shard_2024_02
Pros:
β Old data easily archived (drop old shards)
β Time-based queries efficient
β Natural data lifecycle management
Cons:
β Current month shard is HOT (all writes)
β Need to query all shards for redirect (don't know creation time)
β Uneven shard sizes if traffic varies
Best for: Time-series data, logs, analytics
My Recommendation: Hybrid Approach
You: "I recommend a hybrid approach that optimizes for our two main access patterns:"
HYBRID SHARDING STRATEGY
For REDIRECTS (read path):
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Use consistent hashing on short_code
short_code "abc1234" β hash β shard_7
Why: Redirect needs to be fast. We know the short_code,
we can compute the shard instantly.
For USER QUERIES (management path):
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Secondary index table, sharded by user_id
user_urls table:
βββ user_id (shard key)
βββ short_code
βββ created_at
βββ (lightweight, just for listing)
Why: Users need to see their URLs. Keep user's data together.
The URLs live on short_code shards (for fast redirect)
The user_urls index lives on user_id shards (for fast listing)
Write path creates entries in both.
Implementation: Consistent Hashing
# Database Sharding with Consistent Hashing
# Applies: Week 1, Day 1 - Partitioning strategies
import hashlib
from bisect import bisect_left
from dataclasses import dataclass
from typing import List, Dict, Any
import asyncpg
@dataclass
class ShardConfig:
"""Configuration for a database shard."""
shard_id: int
host: str
port: int
database: str
@property
def dsn(self) -> str:
return f"postgresql://{self.host}:{self.port}/{self.database}"
class ConsistentHashRing:
"""
Consistent hash ring for shard selection.
Applies Day 1 concepts:
- Consistent hashing minimizes data movement on rebalance
- Virtual nodes ensure even distribution
- Deterministic: same key always maps to same shard
"""
def __init__(self, shards: List[ShardConfig], virtual_nodes: int = 150):
self.shards = {s.shard_id: s for s in shards}
self.virtual_nodes = virtual_nodes
self.ring: List[tuple[int, int]] = [] # (hash, shard_id)
self._build_ring()
def _build_ring(self) -> None:
"""Build the consistent hash ring with virtual nodes."""
ring = []
for shard in self.shards.values():
# Create virtual nodes for each shard
for i in range(self.virtual_nodes):
# Hash the shard_id + virtual node index
key = f"{shard.shard_id}:{i}"
hash_val = self._hash(key)
ring.append((hash_val, shard.shard_id))
# Sort by hash value
ring.sort(key=lambda x: x[0])
self.ring = ring
# Extract just the hash values for binary search
self.ring_hashes = [h for h, _ in ring]
def _hash(self, key: str) -> int:
"""Compute hash value for a key."""
return int(hashlib.md5(key.encode()).hexdigest(), 16)
def get_shard(self, key: str) -> ShardConfig:
"""
Get the shard responsible for a key.
Uses binary search for O(log n) lookup.
"""
if not self.ring:
raise ValueError("No shards configured")
hash_val = self._hash(key)
# Find the first node with hash >= key's hash
idx = bisect_left(self.ring_hashes, hash_val)
# Wrap around if we're past the last node
if idx >= len(self.ring):
idx = 0
shard_id = self.ring[idx][1]
return self.shards[shard_id]
def add_shard(self, shard: ShardConfig) -> Dict[int, List[str]]:
"""
Add a new shard to the ring.
Returns mapping of shard_id -> keys that need to move.
With consistent hashing, only ~1/n keys need to move.
"""
self.shards[shard.shard_id] = shard
self._build_ring()
# In practice, you'd track which keys moved
return {}
def remove_shard(self, shard_id: int) -> Dict[int, List[str]]:
"""
Remove a shard from the ring.
Returns mapping of destination shard_id -> keys to receive.
"""
if shard_id in self.shards:
del self.shards[shard_id]
self._build_ring()
return {}
class ShardedURLDatabase:
"""
Sharded database client for URL storage.
Handles:
- Shard selection via consistent hashing
- Connection pooling per shard
- Cross-shard queries when needed
"""
def __init__(self, shard_configs: List[ShardConfig]):
self.hash_ring = ConsistentHashRing(shard_configs)
self.pools: Dict[int, asyncpg.Pool] = {}
async def initialize(self) -> None:
"""Create connection pools for all shards."""
for shard in self.hash_ring.shards.values():
self.pools[shard.shard_id] = await asyncpg.create_pool(
shard.dsn,
min_size=5,
max_size=20
)
async def create_url(
self,
short_code: str,
long_url: str,
user_id: Optional[str] = None
) -> None:
"""
Create a new URL mapping.
Writes to:
1. URL shard (keyed by short_code) - for redirects
2. User shard (keyed by user_id) - for listing
"""
# Get shard for this short_code
shard = self.hash_ring.get_shard(short_code)
pool = self.pools[shard.shard_id]
async with pool.acquire() as conn:
await conn.execute(
"""
INSERT INTO urls (short_code, long_url, user_id, created_at, expires_at)
VALUES ($1, $2, $3, NOW(), NOW() + INTERVAL '5 years')
""",
short_code, long_url, user_id
)
# Also update user index if user is known
if user_id:
await self._update_user_index(user_id, short_code)
async def get_url(self, short_code: str) -> Optional[str]:
"""
Get long URL by short code.
Single-shard operation - very fast.
"""
shard = self.hash_ring.get_shard(short_code)
pool = self.pools[shard.shard_id]
async with pool.acquire() as conn:
row = await conn.fetchrow(
"""
SELECT long_url FROM urls
WHERE short_code = $1
AND expires_at > NOW()
""",
short_code
)
return row['long_url'] if row else None
async def get_user_urls(
self,
user_id: str,
limit: int = 100
) -> List[Dict[str, Any]]:
"""
Get all URLs for a user.
Queries user index shard, then fetches details from URL shards.
"""
# Get user's shard
user_shard = self.hash_ring.get_shard(f"user:{user_id}")
pool = self.pools[user_shard.shard_id]
async with pool.acquire() as conn:
rows = await conn.fetch(
"""
SELECT short_code, created_at
FROM user_urls
WHERE user_id = $1
ORDER BY created_at DESC
LIMIT $2
""",
user_id, limit
)
# Fetch full details from URL shards (parallel)
urls = []
for row in rows:
url_data = await self.get_url(row['short_code'])
if url_data:
urls.append({
'short_code': row['short_code'],
'long_url': url_data,
'created_at': row['created_at']
})
return urls
async def _update_user_index(self, user_id: str, short_code: str) -> None:
"""Update the user's URL index."""
user_shard = self.hash_ring.get_shard(f"user:{user_id}")
pool = self.pools[user_shard.shard_id]
async with pool.acquire() as conn:
await conn.execute(
"""
INSERT INTO user_urls (user_id, short_code, created_at)
VALUES ($1, $2, NOW())
""",
user_id, short_code
)
Shard Distribution Visualization
CONSISTENT HASH RING
0
β
βββββββββββββΌββββββββββββ
β± β β²
β± Shard 1β β²
β± β β²
βΌ β βΌ
270 ββΌββββββββββββββββΌββββββββββββββββΌβ 90
β β β
β Shard 3 β
β β β
β² β β±
β² Shard 2 β β±
β² β β±
βββββββββββββΌββββββββββββ
β
180
When a new shard is added:
- Only keys in one segment need to move
- ~1/n of total keys (not all keys)
- Minimal disruption
Deep Dive 4: Rate Limiting (Week 1, Day 3 β Rate Limiting at Scale)
Interviewer: "You mentioned rate limiting for the API. How would you implement that at this scale?"
You: "Let me show you a distributed rate limiting approach that handles our scale."
The Problem
RATE LIMITING REQUIREMENTS
API Tiers:
βββ Free: 100 requests/minute per API key
βββ Starter: 1,000 requests/minute
βββ Business: 10,000 requests/minute
βββ Enterprise: 100,000 requests/minute
Challenges:
- Distributed servers (can't use local counters)
- Must be fast (<5ms overhead)
- Handle clock skew across servers
- Graceful degradation if rate limiter fails
Solution: Sliding Window with Redis
# Distributed Rate Limiting
# Applies: Week 1, Day 3 - Rate Limiting at Scale
import time
from dataclasses import dataclass
from enum import Enum
from typing import Tuple, Optional
import redis
class RateLimitTier(Enum):
FREE = 100
STARTER = 1_000
BUSINESS = 10_000
ENTERPRISE = 100_000
@dataclass
class RateLimitResult:
"""Result of a rate limit check."""
allowed: bool
limit: int
remaining: int
reset_at: int # Unix timestamp when limit resets
retry_after: Optional[int] = None # Seconds to wait if blocked
class SlidingWindowRateLimiter:
"""
Sliding window rate limiter using Redis.
Applies Day 3 concepts:
- Sliding window for smooth limiting (no burst at window start)
- Redis for distributed state
- Lua script for atomic operations
"""
# Lua script for atomic rate limit check and update
RATE_LIMIT_SCRIPT = """
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
-- Remove old entries outside the window
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
-- Count current requests in window
local count = redis.call('ZCARD', key)
if count < limit then
-- Under limit: add this request
redis.call('ZADD', key, now, now .. ':' .. math.random())
redis.call('EXPIRE', key, window)
return {1, limit - count - 1} -- allowed, remaining
else
-- Over limit: reject
-- Get oldest entry to calculate retry time
local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
local retry_after = 0
if #oldest > 0 then
retry_after = math.ceil(oldest[2] + window - now)
end
return {0, 0, retry_after} -- blocked, remaining, retry_after
end
"""
def __init__(self, redis_client: redis.Redis):
self.redis = redis_client
self.window_seconds = 60 # 1-minute window
# Register the Lua script
self.rate_limit_sha = self.redis.script_load(self.RATE_LIMIT_SCRIPT)
def check(
self,
api_key: str,
tier: RateLimitTier
) -> RateLimitResult:
"""
Check if request is allowed under rate limit.
Uses sliding window algorithm:
- Smoother than fixed windows (no thundering herd at window start)
- More accurate than token bucket for per-minute limits
- Single Redis round-trip with Lua script
"""
key = f"ratelimit:{api_key}"
now = time.time()
limit = tier.value
try:
result = self.redis.evalsha(
self.rate_limit_sha,
1, # Number of keys
key,
now,
self.window_seconds,
limit
)
allowed = result[0] == 1
remaining = result[1]
if allowed:
return RateLimitResult(
allowed=True,
limit=limit,
remaining=remaining,
reset_at=int(now) + self.window_seconds
)
else:
retry_after = result[2] if len(result) > 2 else self.window_seconds
return RateLimitResult(
allowed=False,
limit=limit,
remaining=0,
reset_at=int(now) + retry_after,
retry_after=retry_after
)
except redis.RedisError as e:
# Fail open: if Redis is down, allow the request
# Log and alert, but don't block users
print(f"Rate limiter error: {e}")
return RateLimitResult(
allowed=True,
limit=limit,
remaining=limit,
reset_at=int(now) + self.window_seconds
)
def get_headers(self, result: RateLimitResult) -> dict:
"""Generate standard rate limit response headers."""
headers = {
'X-RateLimit-Limit': str(result.limit),
'X-RateLimit-Remaining': str(result.remaining),
'X-RateLimit-Reset': str(result.reset_at)
}
if not result.allowed:
headers['Retry-After'] = str(result.retry_after)
return headers
# FastAPI middleware integration
from fastapi import Request, HTTPException
from fastapi.responses import JSONResponse
class RateLimitMiddleware:
"""
FastAPI middleware for rate limiting.
"""
def __init__(self, rate_limiter: SlidingWindowRateLimiter):
self.rate_limiter = rate_limiter
async def __call__(self, request: Request, call_next):
# Extract API key from header
api_key = request.headers.get('X-API-Key')
if not api_key:
# No API key = free tier with IP-based limiting
api_key = f"ip:{request.client.host}"
tier = RateLimitTier.FREE
else:
# Look up tier for this API key (cached)
tier = await self._get_tier(api_key)
# Check rate limit
result = self.rate_limiter.check(api_key, tier)
if not result.allowed:
return JSONResponse(
status_code=429,
content={
'error': 'Rate limit exceeded',
'retry_after': result.retry_after
},
headers=self.rate_limiter.get_headers(result)
)
# Process request
response = await call_next(request)
# Add rate limit headers to response
for key, value in self.rate_limiter.get_headers(result).items():
response.headers[key] = value
return response
async def _get_tier(self, api_key: str) -> RateLimitTier:
"""Look up the rate limit tier for an API key."""
# In production, this would query a cache/database
# For now, return a default
return RateLimitTier.STARTER
Rate Limiting Visualization
SLIDING WINDOW RATE LIMITING
Timeline (requests marked as |):
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Fixed Window Problem:
Window 1 β Window 2
[ ||||||||||] β [|||||||||| ]
β
User sends 10 at end of window 1
Then 10 at start of window 2
= 20 requests in ~1 second! β
Sliding Window Solution:
ββββ 60 second window slides with time ββββΊ
Time: 0 30 60 90
[βββββββββββββββββββββ]
Window at t=60
Counts ALL requests in last 60 seconds
No gaming the window boundary β
Deep Dive 5: Replication and Consistency (Week 1, Day 2 β Replication)
Interviewer: "You mentioned PostgreSQL replication. What's your replication strategy?"
You: "Let me explain how we balance consistency and availability for different operations."
Replication Strategy
REPLICATION TOPOLOGY
βββββββββββββββββββ
β Primary β
β PostgreSQL β
β (Writes) β
ββββββββββ¬βββββββββ
β
ββββββββββββββββΌβββββββββββββββ
β β β
βΌ βΌ βΌ
ββββββββββββ ββββββββββββ ββββββββββββ
β Replica β β Replica β β Replica β
β 1 β β 2 β β 3 β
β (Sync) β β (Async) β β (Async) β
ββββββββββββ ββββββββββββ ββββββββββββ
β β β
ββββββββββββββββΌβββββββββββββββ
β
βββββββββββ΄ββββββββββ
β Read Queries β
β (Load Balanced) β
βββββββββββββββββββββ
Consistency Requirements by Operation
# Replication and Consistency Configuration
# Applies: Week 1, Day 2 - Replication Trade-offs
from enum import Enum
from typing import Optional
import asyncpg
class ConsistencyLevel(Enum):
"""
Consistency levels for different operations.
Applies Day 2 concepts:
- Strong consistency when correctness matters
- Eventual consistency when speed matters
"""
STRONG = "strong" # Read from primary
READ_YOUR_WRITES = "ryw" # Use session affinity
EVENTUAL = "eventual" # Any replica is fine
class ReplicatedDatabase:
"""
Database client with configurable consistency levels.
"""
def __init__(
self,
primary_dsn: str,
replica_dsns: list[str],
sync_replica_dsn: str
):
self.primary_dsn = primary_dsn
self.replica_dsns = replica_dsns
self.sync_replica_dsn = sync_replica_dsn
self.primary_pool: Optional[asyncpg.Pool] = None
self.replica_pools: list[asyncpg.Pool] = []
self.sync_replica_pool: Optional[asyncpg.Pool] = None
async def initialize(self) -> None:
"""Create connection pools."""
self.primary_pool = await asyncpg.create_pool(
self.primary_dsn,
min_size=10,
max_size=50
)
for dsn in self.replica_dsns:
pool = await asyncpg.create_pool(dsn, min_size=5, max_size=20)
self.replica_pools.append(pool)
self.sync_replica_pool = await asyncpg.create_pool(
self.sync_replica_dsn,
min_size=5,
max_size=20
)
async def write(self, query: str, *args) -> None:
"""
Execute a write query on the primary.
Writes always go to primary for consistency.
"""
async with self.primary_pool.acquire() as conn:
await conn.execute(query, *args)
async def read(
self,
query: str,
*args,
consistency: ConsistencyLevel = ConsistencyLevel.EVENTUAL
):
"""
Execute a read query with specified consistency level.
Consistency levels:
- STRONG: Read from primary (freshest, but adds load)
- READ_YOUR_WRITES: Read from sync replica (good balance)
- EVENTUAL: Read from any replica (fastest, slight delay)
"""
pool = self._select_pool(consistency)
async with pool.acquire() as conn:
return await conn.fetch(query, *args)
def _select_pool(self, consistency: ConsistencyLevel) -> asyncpg.Pool:
"""Select appropriate pool based on consistency requirement."""
if consistency == ConsistencyLevel.STRONG:
return self.primary_pool
elif consistency == ConsistencyLevel.READ_YOUR_WRITES:
return self.sync_replica_pool
else: # EVENTUAL
# Round-robin across async replicas
import random
return random.choice(self.replica_pools)
# URL Service with appropriate consistency levels
class URLService:
"""
URL service demonstrating consistency trade-offs.
"""
def __init__(self, db: ReplicatedDatabase, cache: HotKeyAwareCache):
self.db = db
self.cache = cache
async def create_url(self, short_code: str, long_url: str) -> dict:
"""
Create a new URL.
Consistency: Strong write (must not lose data)
"""
await self.db.write(
"""
INSERT INTO urls (short_code, long_url, created_at)
VALUES ($1, $2, NOW())
""",
short_code, long_url
)
# Also cache it (eventual consistency OK here)
self.cache.set(short_code, long_url)
return {'short_code': short_code, 'long_url': long_url}
async def get_url_for_redirect(self, short_code: str) -> Optional[str]:
"""
Get URL for redirect (critical path).
Consistency: Eventual (speed matters most)
Stale read is OK: worst case, user sees old destination
for a few hundred milliseconds after an update.
"""
# Try cache first
long_url = self.cache.get(short_code)
if long_url:
return long_url
# Cache miss: query replica
rows = await self.db.read(
"SELECT long_url FROM urls WHERE short_code = $1",
short_code,
consistency=ConsistencyLevel.EVENTUAL
)
if rows:
long_url = rows[0]['long_url']
self.cache.set(short_code, long_url)
return long_url
return None
async def get_url_after_create(self, short_code: str) -> Optional[dict]:
"""
Get URL immediately after creation.
Consistency: Read-your-writes (user expects to see their URL)
After creating a URL, user immediately views it.
Must see the URL they just created.
"""
rows = await self.db.read(
"""
SELECT short_code, long_url, created_at, click_count
FROM urls WHERE short_code = $1
""",
short_code,
consistency=ConsistencyLevel.READ_YOUR_WRITES
)
if rows:
return dict(rows[0])
return None
async def get_analytics(self, short_code: str) -> Optional[dict]:
"""
Get analytics for a URL.
Consistency: Eventual (analytics can be slightly delayed)
Analytics are inherently delayed anyway, so eventual
consistency is perfect here.
"""
rows = await self.db.read(
"""
SELECT click_count,
DATE(clicked_at) as date,
COUNT(*) as clicks
FROM url_clicks
WHERE short_code = $1
GROUP BY DATE(clicked_at)
ORDER BY date DESC
LIMIT 30
""",
short_code,
consistency=ConsistencyLevel.EVENTUAL
)
return {'daily_clicks': [dict(r) for r in rows]}
Replication Trade-off Summary
| Operation | Consistency | Why |
|---|---|---|
| URL Creation | Strong (primary) | Must not lose writes |
| Redirect Lookup | Eventual (any replica) | Speed is critical, stale OK |
| User's URL List | Read-your-writes | User expects to see their URLs |
| Analytics | Eventual | Already delayed, speed preferred |
| URL Deletion | Strong + cache invalidation | Must take effect immediately |
Phase 5: Scaling and Edge Cases (5 minutes)
Interviewer: "How would this system scale to 10x the current load?"
Scaling Strategy
You: "Let me walk through the scaling vectors."
Horizontal Scaling Plan
CURRENT β 10X SCALE
Component Current 10x Scale
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
API Servers 10 instances 100 instances (stateless)
Redirect Servers 20 instances 200 instances (stateless)
Redis Cache 3-node cluster 30-node cluster
PostgreSQL 5 shards 50 shards
Kafka 3 brokers 12 brokers
Bottleneck Analysis
BOTTLENECK ANALYSIS
Component Limit Scaling Solution
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
CDN Virtually unlimited CloudFlare handles automatically
Load Balancer 1M conn/sec Add more ALBs, use NLB for L4
API Servers ~500 req/s each Add instances (stateless)
Redis ~100K ops/s/node Add nodes to cluster
PostgreSQL ~10K writes/s/shard Add more shards
Network 10 Gbps Multiple NICs, better instances
Edge Cases
Interviewer: "What edge cases should we handle?"
You: "Several important ones..."
Edge Case 1: Duplicate Short Code
Scenario: Two servers generate the same short code simultaneously
(shouldn't happen with our range allocation, but...)
Handling:
- Database has UNIQUE constraint on short_code
- If insert fails with duplicate error, retry with new code
- Maximum 3 retries before returning error
Code:
for attempt in range(3):
code = generator.generate()
try:
await db.create_url(code, long_url)
return code
except UniqueViolationError:
continue # Try again
raise ServiceError("Failed to generate unique code")
Edge Case 2: Expired URL Still Cached
Scenario: URL expires in database but remains in Redis cache
User can still access expired URL
Handling:
- Include expiration time in cache value
- Check expiration on cache hit, not just DB lookup
- Background job to clean expired URLs from cache
Cache value format:
{
"long_url": "https://example.com/...",
"expires_at": 1735689600
}
Edge Case 3: Database Failover During Write
Scenario: Primary database fails mid-write
URL created but not confirmed to user
Handling:
- Use idempotency keys (user can retry safely)
- On retry, check if URL already exists with same idempotency key
- If exists, return existing URL instead of error
Failure Scenarios
| Failure | Detection | Impact | Recovery |
|---|---|---|---|
| Redis down | Health checks | Slower redirects (DB fallback) | Fail open, use DB directly |
| Primary DB down | Connection errors | No new URL creation | Promote sync replica to primary |
| CDN down | External monitoring | Higher origin load | DNS failover to backup CDN |
| API server crash | ALB health checks | Requests routed elsewhere | Auto-scaling replaces instance |
Phase 6: Monitoring and Operations
Interviewer: "How would you monitor this system in production?"
Key Metrics
You: "I would track metrics at multiple levels."
Business Metrics
- URLs created per minute: Target 1,200/min, Alert if <800 or >3,000
- Redirects per second: Target 12,000/sec, Alert if <5,000
- 404 rate: Target <0.1%, Alert if >1%
- Unique short codes remaining: Alert if <10% capacity
System Metrics
MONITORING DASHBOARD
SERVICE HEALTH
βββ Redirect latency p99: < 50ms [ββββββββββ] 42ms
βββ Create latency p99: < 200ms [ββββββββββ] 156ms
βββ Error rate: < 0.1% [ββββββββββ] 0.02%
βββ Throughput: > 10K/sec [ββββββββββ] 11,247/sec
CACHE
βββ Redis hit rate: > 95% [ββββββββββ] 97%
βββ Redis latency p99: < 5ms [ββββββββββ] 1.2ms
βββ Cache size: < 80% capacity [ββββββββββ] 62%
DATABASE
βββ Query latency p99: < 50ms [ββββββββββ] 23ms
βββ Connection pool usage: < 80% [ββββββββββ] 58%
βββ Replication lag: < 100ms [ββββββββββ] 45ms
βββ Disk usage: < 70% [ββββββββββ] 52%
Alerting Strategy
CRITICAL (PagerDuty - wake someone up):
- Redirect error rate > 5%
- Database primary unreachable
- All Redis nodes down
- Redirect latency p99 > 500ms
WARNING (Slack - business hours):
- Redirect error rate > 1%
- Cache hit rate < 90%
- Replication lag > 1 second
- API rate limit errors > 10%
INFO (Dashboard only):
- New URL creation rate changes
- Traffic pattern anomalies
- Approaching capacity thresholds
Runbook Snippet
RUNBOOK: High Redirect Latency
SYMPTOMS:
- Redirect p99 > 100ms
- User complaints about slow links
DIAGNOSIS:
1. Check Redis latency:
redis-cli --latency -h redis-cluster
2. Check cache hit rate:
redis-cli INFO stats | grep keyspace_hits
3. Check database load:
SELECT * FROM pg_stat_activity WHERE state = 'active';
4. Check for hot keys:
redis-cli --hotkeys
RESOLUTION:
1. If Redis slow: Check memory, restart if needed
2. If cache miss rate high: Check TTLs, warm cache
3. If DB overloaded: Add read replicas, check slow queries
4. If hot key detected: Enable key replication (see Hot Key runbook)
ESCALATE TO: Platform team if not resolved in 15 minutes
Interview Conclusion
Interviewer: "Excellent work. You've demonstrated strong understanding of partitioning, replication, rate limiting, and hot key handling. I particularly liked how you connected each design decision to specific trade-offs. Any questions for me?"
You: "Thank you! I'd love to hear how your team currently handles URL shortening at scale, and what unexpected challenges you've faced in production."
Summary: Week 1 Concepts Applied
Week 1 Concepts (Foundations of Scale)
| Day | Concept | Application in This Design |
|---|---|---|
| Day 1 | Partitioning | Short code generation with ranges, database sharding with consistent hashing |
| Day 2 | Replication | PostgreSQL primary/replica setup, consistency levels per operation type |
| Day 3 | Rate Limiting | Sliding window rate limiter with Redis, tiered API limits |
| Day 4 | Hot Keys | Multi-layer caching, hot key detection and replication |
| Day 5 | Session Store | User session for API authentication, rate limit state |
Code Patterns Demonstrated
1. PRE-ALLOCATED KEY RANGES
- Used for: Short code generation
- Key insight: Minimize coordination by allocating ranges
2. CONSISTENT HASHING
- Used for: Database sharding
- Key insight: Minimize data movement when scaling
3. SLIDING WINDOW RATE LIMITING
- Used for: API rate limiting
- Key insight: Lua scripts for atomic Redis operations
4. HOT KEY REPLICATION
- Used for: Handling viral URLs
- Key insight: Detect hot keys, replicate with suffixes
5. CONSISTENCY LEVEL SELECTION
- Used for: Database reads
- Key insight: Match consistency to operation requirements
Self-Assessment Checklist
After studying this capstone, you should be able to:
- Design a partitioning strategy based on access patterns
- Implement consistent hashing for shard selection
- Generate unique IDs without central coordination
- Choose appropriate consistency levels for different operations
- Implement distributed rate limiting with Redis
- Detect and mitigate hot key problems
- Design multi-layer caching strategies
- Configure database replication for availability
- Handle edge cases like duplicate keys and failovers
- Define SLOs and monitoring strategies for a distributed system
This capstone integrates all concepts from Week 1 of the System Design Mastery Series. The URL shortener problem is a classic interview question that touches on partitioning, replication, caching, and rate limiting β all core topics for building scalable systems.