Himanshu Kukreja
0%
LearnSystem DesignWeek 4Caching Patterns
Day 01

Week 4 — Day 1: Caching Patterns

System Design Mastery Series


Preface

It's Monday morning. Your e-commerce site is crawling.

INCIDENT TIMELINE

09:00 - Marketing launches flash sale
09:01 - Traffic spikes 10x
09:02 - Database CPU hits 100%
09:03 - Response times: 200ms → 5,000ms
09:05 - First timeout errors
09:10 - Site effectively down
09:15 - Revenue loss: $50,000 and counting

Post-incident analysis:
  Every product page = 5 database queries
  10,000 requests/sec × 5 queries = 50,000 queries/sec
  Database capacity: 5,000 queries/sec
  
  Result: 10x overload, cascade failure

The fix everyone suggests: "Just add Redis!"

But here's what they don't tell you:

  • Which caching pattern should you use?
  • When do you write to cache vs read from cache?
  • What happens when cache and database disagree?
  • How do you handle cache failures?

"Just add Redis" is not a strategy. It's a prayer.

This week, we turn caching from prayer into engineering.

Today, we start with the foundations: the four caching patterns that every system uses. By the end of today, you'll know exactly which pattern to use for any scenario.


Part I: Foundations

Chapter 1: What Is Caching?

1.1 The Simple Definition

Caching is storing copies of data in a faster storage layer to reduce access time and load on the primary data store.

EVERYDAY ANALOGY: Your Kitchen

Database = Grocery store
  - Has everything
  - Takes time to get there
  - Always has fresh stock

Cache = Refrigerator
  - Limited space
  - Instant access
  - Food might be stale

You don't go to the grocery store every time you want milk.
You check the fridge first.
If it's there and not expired, you use it.
If it's not there or expired, you go to the store.

That's caching.

1.2 Why Cache?

THE THREE REASONS TO CACHE

1. LATENCY
   Database query:     10-100ms
   Redis lookup:       0.5-2ms
   Local memory:       0.0001ms
   
   For user experience, milliseconds matter.

2. THROUGHPUT
   Database:           5,000 queries/sec (limited by disk, connections)
   Redis:              100,000+ ops/sec (limited by network)
   Local memory:       Millions of ops/sec
   
   Cache absorbs traffic that would overwhelm the database.

3. COST
   Database queries cost:
     - CPU time
     - Disk I/O
     - Network bandwidth
     - Connection pool slots
   
   Serving from cache is dramatically cheaper.

1.3 The Cache Hierarchy

MEMORY HIERARCHY (Fastest → Slowest)

CPU Registers     │ < 1 ns      │ Bytes
       ↓          │             │
L1 Cache          │ ~1 ns       │ KB
       ↓          │             │
L2 Cache          │ ~4 ns       │ KB
       ↓          │             │
L3 Cache          │ ~12 ns      │ MB
       ↓          │             │
RAM               │ ~100 ns     │ GB
       ↓          │             │
SSD               │ ~100 μs     │ TB
       ↓          │             │
HDD               │ ~10 ms      │ TB
       ↓          │             │
Network           │ ~1-100 ms   │ Unlimited


APPLICATION CACHE HIERARCHY

In-Process Cache  │ < 0.1 ms    │ Limited by heap
(HashMap, Guava)  │             │
       ↓          │             │
Distributed Cache │ 0.5-2 ms    │ Limited by cluster
(Redis, Memcached)│             │
       ↓          │             │
Database          │ 10-100 ms   │ Limited by disk
(PostgreSQL, etc) │             │

1.4 Key Terminology

Term Definition
Cache Hit Requested data found in cache
Cache Miss Requested data not in cache, must fetch from source
Hit Ratio Hits / (Hits + Misses) — higher is better
TTL Time-To-Live — how long before cache entry expires
Eviction Removing entries when cache is full
Invalidation Removing or updating stale entries
Cache Stampede Many requests hitting database after cache expires
Cold Cache Cache with no data (after restart or deploy)
Warm Cache Cache populated with frequently accessed data

Chapter 2: The Four Caching Patterns

2.1 Pattern 1: Cache-Aside (Lazy Loading)

The most common pattern. Application manages both cache and database.

CACHE-ASIDE FLOW

Read Path:
                          ┌─────────┐
     1. Check cache       │         │
  ┌────────────────────── │  Cache  │
  │                       │         │
  │   ┌───────────────────┴─────────┘
  │   │
  │   │ 2a. HIT: Return cached data
  │   │
  │   │ 2b. MISS:
  │   ▼
  │  ┌─────────────────────────────┐
  │  │                             │
  │  │  3. Fetch from database     │
  │  │                             │
  │  └─────────────┬───────────────┘
  │                │
  │                ▼
  │  ┌─────────────────────────────┐
  │  │                             │
  │  │  4. Write to cache          │
  │  │                             │
  │  └─────────────┬───────────────┘
  │                │
  └────────────────┼───────────────────▶ 5. Return to caller
                   │

Write Path:
  Application writes directly to database.
  Then either:
    a) Invalidate cache (delete key)
    b) Update cache (write new value)

How it works:

  1. Application checks cache first
  2. If hit, return cached value
  3. If miss, query database
  4. Store result in cache for future requests
  5. Return result to caller
# Cache-Aside Implementation

class CacheAsideRepository:
    """
    Cache-aside pattern: Application manages cache explicitly.
    
    Also known as "lazy loading" because cache is only populated
    on read, not proactively.
    """
    
    def __init__(self, cache_client, db_client, default_ttl: int = 300):
        self.cache = cache_client
        self.db = db_client
        self.default_ttl = default_ttl
    
    async def get_product(self, product_id: str) -> dict:
        """
        Get product with cache-aside pattern.
        
        1. Check cache
        2. If miss, load from DB
        3. Populate cache
        4. Return
        """
        cache_key = f"product:{product_id}"
        
        # Step 1: Check cache
        cached = await self.cache.get(cache_key)
        if cached:
            return json.loads(cached)
        
        # Step 2: Cache miss - load from database
        product = await self.db.fetch_one(
            "SELECT * FROM products WHERE id = $1",
            product_id
        )
        
        if not product:
            return None
        
        # Step 3: Populate cache for next time
        await self.cache.setex(
            cache_key,
            self.default_ttl,
            json.dumps(dict(product))
        )
        
        # Step 4: Return
        return dict(product)
    
    async def update_product(self, product_id: str, data: dict) -> dict:
        """
        Update product - write to DB, invalidate cache.
        
        We invalidate rather than update to avoid race conditions.
        """
        # Write to database
        product = await self.db.fetch_one(
            """
            UPDATE products 
            SET name = $2, price = $3, updated_at = NOW()
            WHERE id = $1
            RETURNING *
            """,
            product_id, data['name'], data['price']
        )
        
        # Invalidate cache (not update!)
        cache_key = f"product:{product_id}"
        await self.cache.delete(cache_key)
        
        return dict(product)

Pros:

  • Simple to understand and implement
  • Only caches data that's actually used
  • Cache failure doesn't break the application (graceful degradation)
  • Works with any database

Cons:

  • First request for any data is always slow (cache miss)
  • Potential for stale data if invalidation fails
  • Application code coupled to caching logic

Use when:

  • Read-heavy workloads
  • Can tolerate slightly stale data
  • Want resilience to cache failures

2.2 Pattern 2: Read-Through

Cache sits between application and database. Cache handles loading.

READ-THROUGH FLOW

Read Path:
                          ┌─────────────────────────┐
     1. Request data      │                         │
  ────────────────────────▶        Cache            │
                          │                         │
                          │  2a. HIT: Return        │
                          │                         │
                          │  2b. MISS:              │
                          │      ┌──────────────┐   │
                          │      │ Load from DB │   │
                          │      └──────────────┘   │
                          │      Store in cache     │
                          │      Return             │
                          │                         │
                          └─────────────────────────┘
                                      │
                                      │ (on miss)
                                      ▼
                          ┌─────────────────────────┐
                          │       Database          │
                          └─────────────────────────┘

Application doesn't know about database on reads.
Cache is responsible for loading data.

How it works:

  1. Application requests data from cache
  2. Cache checks if data exists
  3. If hit, cache returns data
  4. If miss, cache fetches from database, stores it, then returns
  5. Application only talks to cache, never directly to database for reads
# Read-Through Implementation (Cache-side)

class ReadThroughCache:
    """
    Read-through cache: Cache is responsible for loading data.
    
    The application only interacts with the cache, not the database.
    Cache handles misses by loading from the database.
    """
    
    def __init__(self, db_client, default_ttl: int = 300):
        self.db = db_client
        self.default_ttl = default_ttl
        self._cache = {}  # In real impl, this is Redis with loader
    
    async def get(self, key: str, loader_query: str = None) -> any:
        """
        Get value, loading from database if not cached.
        
        In production, this is typically configured at the cache level,
        not per-request. Libraries like Caffeine (Java) or dogpile (Python)
        support this pattern natively.
        """
        # Check cache
        if key in self._cache:
            entry = self._cache[key]
            if not self._is_expired(entry):
                return entry['value']
        
        # Cache miss - load from database
        if loader_query:
            value = await self.db.fetch_one(loader_query)
        else:
            value = await self._default_loader(key)
        
        # Store in cache
        self._cache[key] = {
            'value': value,
            'expires_at': time.time() + self.default_ttl
        }
        
        return value
    
    async def _default_loader(self, key: str) -> any:
        """Default loader parses key to determine query."""
        # key format: "entity:id" e.g., "product:123"
        entity, id = key.split(':')
        query = f"SELECT * FROM {entity}s WHERE id = $1"
        return await self.db.fetch_one(query, id)
    
    def _is_expired(self, entry: dict) -> bool:
        return time.time() > entry['expires_at']


# Application code is simpler
class ProductService:
    def __init__(self, cache: ReadThroughCache):
        self.cache = cache
    
    async def get_product(self, product_id: str) -> dict:
        # Application doesn't know about database!
        return await self.cache.get(f"product:{product_id}")

Pros:

  • Simpler application code (just talks to cache)
  • Consistent data loading logic
  • Cache handles all the complexity

Cons:

  • Cache becomes a critical dependency
  • Harder to debug (abstraction hides database access)
  • Need cache that supports read-through (or custom implementation)

Use when:

  • Want to simplify application code
  • Using a cache that supports read-through natively
  • Database access patterns are uniform

2.3 Pattern 3: Write-Through

Writes go to cache first, which synchronously writes to database.

WRITE-THROUGH FLOW

Write Path:
                          ┌─────────────────────────┐
     1. Write data        │                         │
  ────────────────────────▶        Cache            │
                          │                         │
                          │  2. Write to database   │
                          │         │               │
                          │         ▼               │
                          │  ┌──────────────────┐   │
                          │  │ Database Write   │   │
                          │  └──────────────────┘   │
                          │         │               │
                          │  3. Store in cache      │
                          │                         │
                          │  4. Return success      │
                          │                         │
                          └─────────────────────────┘

Both cache and database are updated synchronously.
Write only succeeds when both succeed.

How it works:

  1. Application writes to cache
  2. Cache synchronously writes to database
  3. Cache stores the value locally
  4. Returns success only after both operations complete
# Write-Through Implementation

class WriteThroughCache:
    """
    Write-through cache: Writes go through cache to database.
    
    Ensures cache and database are always consistent.
    Trade-off: Higher write latency.
    """
    
    def __init__(self, cache_client, db_client, default_ttl: int = 300):
        self.cache = cache_client
        self.db = db_client
        self.default_ttl = default_ttl
    
    async def set(self, key: str, value: any, db_query: str, db_params: tuple) -> bool:
        """
        Write-through: Update database, then cache.
        
        Both must succeed for the write to be considered successful.
        """
        try:
            # Step 1: Write to database FIRST
            await self.db.execute(db_query, *db_params)
            
            # Step 2: Write to cache
            await self.cache.setex(
                key,
                self.default_ttl,
                json.dumps(value)
            )
            
            return True
            
        except Exception as e:
            # If DB write fails, don't update cache
            # If cache write fails after DB write, we have inconsistency!
            # In production, may need to invalidate cache here
            logger.error(f"Write-through failed: {e}")
            raise
    
    async def get(self, key: str) -> any:
        """Read from cache (with read-through on miss)."""
        cached = await self.cache.get(key)
        if cached:
            return json.loads(cached)
        
        # Miss - need to load from database
        # (combine with read-through for complete solution)
        return None


# Usage
class ProductService:
    def __init__(self, write_through_cache: WriteThroughCache):
        self.cache = write_through_cache
    
    async def update_product(self, product_id: str, data: dict) -> bool:
        return await self.cache.set(
            key=f"product:{product_id}",
            value=data,
            db_query="UPDATE products SET name=$2, price=$3 WHERE id=$1",
            db_params=(product_id, data['name'], data['price'])
        )

Pros:

  • Cache is always consistent with database
  • Never serve stale data from cache
  • Simpler mental model (cache = database)

Cons:

  • Higher write latency (must write to both)
  • Write availability depends on both cache AND database
  • If cache write fails after DB write, inconsistency possible

Use when:

  • Consistency is critical (financial data, inventory counts)
  • Write frequency is low to moderate
  • Can tolerate slightly higher write latency

2.4 Pattern 4: Write-Behind (Write-Back)

Writes go to cache immediately; database is updated asynchronously.

WRITE-BEHIND FLOW

Write Path:
                          ┌─────────────────────────┐
     1. Write data        │                         │
  ────────────────────────▶        Cache            │
                          │                         │
                          │  2. Store in cache      │
                          │                         │
                          │  3. Return immediately  │
                          │                         │
                          │  4. (async) Queue DB    │
                          │      write              │
                          │         │               │
                          └─────────┼───────────────┘
                                    │
                                    │ Asynchronous
                                    ▼
                          ┌─────────────────────────┐
                          │  Background Writer      │
                          │                         │
                          │  5. Batch writes to DB  │
                          │                         │
                          └─────────────────────────┘
                                    │
                                    ▼
                          ┌─────────────────────────┐
                          │       Database          │
                          └─────────────────────────┘

Write returns immediately after cache update.
Database is updated asynchronously, often in batches.

How it works:

  1. Application writes to cache
  2. Cache stores value and returns immediately
  3. Write is queued for database
  4. Background process batches and writes to database
  5. Database eventually consistent with cache
# Write-Behind Implementation

import asyncio
from collections import defaultdict
from dataclasses import dataclass
from datetime import datetime
from typing import Dict, List


@dataclass
class PendingWrite:
    key: str
    value: any
    db_query: str
    db_params: tuple
    queued_at: datetime


class WriteBehindCache:
    """
    Write-behind cache: Writes are cached immediately,
    database is updated asynchronously.
    
    WARNING: Risk of data loss if cache fails before DB write!
    """
    
    def __init__(
        self,
        cache_client,
        db_client,
        batch_size: int = 100,
        flush_interval: float = 1.0
    ):
        self.cache = cache_client
        self.db = db_client
        self.batch_size = batch_size
        self.flush_interval = flush_interval
        
        # Pending writes queue
        self._pending: Dict[str, PendingWrite] = {}
        self._running = False
    
    async def start(self):
        """Start the background writer."""
        self._running = True
        asyncio.create_task(self._flush_loop())
    
    async def stop(self):
        """Stop and flush remaining writes."""
        self._running = False
        await self._flush_to_database()
    
    async def set(self, key: str, value: any, db_query: str, db_params: tuple) -> bool:
        """
        Write-behind: Update cache immediately, queue DB write.
        
        Returns immediately after cache update - very fast!
        """
        # Step 1: Write to cache immediately
        await self.cache.set(key, json.dumps(value))
        
        # Step 2: Queue for database write
        # If same key written multiple times, only latest is kept
        self._pending[key] = PendingWrite(
            key=key,
            value=value,
            db_query=db_query,
            db_params=db_params,
            queued_at=datetime.utcnow()
        )
        
        # Step 3: Return immediately (don't wait for DB)
        return True
    
    async def _flush_loop(self):
        """Background loop that flushes writes to database."""
        while self._running:
            await asyncio.sleep(self.flush_interval)
            
            if self._pending:
                await self._flush_to_database()
    
    async def _flush_to_database(self):
        """Flush pending writes to database in batch."""
        if not self._pending:
            return
        
        # Take a snapshot of pending writes
        writes = list(self._pending.values())[:self.batch_size]
        keys_to_remove = [w.key for w in writes]
        
        try:
            # Batch write to database
            async with self.db.transaction():
                for write in writes:
                    await self.db.execute(write.db_query, *write.db_params)
            
            # Remove from pending queue
            for key in keys_to_remove:
                self._pending.pop(key, None)
            
            logger.info(f"Flushed {len(writes)} writes to database")
            
        except Exception as e:
            # DB write failed - writes stay in pending queue
            # They'll be retried on next flush
            logger.error(f"Database flush failed: {e}")
            # In production: alert, retry with backoff, eventual DLQ


# Usage - note how fast writes are
class AnalyticsService:
    """Write-behind is great for high-volume, loss-tolerant writes."""
    
    def __init__(self, write_behind_cache: WriteBehindCache):
        self.cache = write_behind_cache
    
    async def record_page_view(self, page_id: str, user_id: str):
        """
        Record a page view - happens millions of times per day.
        We can tolerate some loss, but need low latency.
        """
        await self.cache.set(
            key=f"pageview:{page_id}:{user_id}:{time.time()}",
            value={"page_id": page_id, "user_id": user_id},
            db_query="INSERT INTO page_views (page_id, user_id) VALUES ($1, $2)",
            db_params=(page_id, user_id)
        )
        # Returns immediately, DB write happens later

Pros:

  • Extremely fast writes (just cache update)
  • Can batch database writes for efficiency
  • Reduces database write load significantly

Cons:

  • Data loss risk if cache fails before database write
  • Complex failure handling
  • Database may be behind cache (eventual consistency)

Use when:

  • Write performance is critical
  • Can tolerate some data loss
  • High write volume where batching helps
  • Examples: Analytics, logging, view counts, likes

Chapter 3: Pattern Comparison and Trade-offs

3.1 Side-by-Side Comparison

Aspect Cache-Aside Read-Through Write-Through Write-Behind
Read latency Fast (hit) / Slow (miss) Fast (hit) / Slow (miss) Fast (always) Fast (always)
Write latency DB latency DB latency Cache + DB Cache only
Consistency Eventual Eventual Strong Eventual
Data loss risk Low Low Low High
Complexity Low Medium Medium High
Best for General use Uniform access Critical data High write volume

3.2 Decision Matrix

CHOOSING A CACHING PATTERN

Start here: What's your priority?

                    ┌─────────────────────────────────────────┐
                    │        What matters most?               │
                    └───────────────────┬─────────────────────┘
                                        │
            ┌───────────────────────────┼───────────────────────────┐
            │                           │                           │
            ▼                           ▼                           ▼
    ┌───────────────┐           ┌───────────────┐           ┌───────────────┐
    │  Simplicity   │           │  Consistency  │           │  Performance  │
    └───────┬───────┘           └───────┬───────┘           └───────┬───────┘
            │                           │                           │
            ▼                           ▼                           │
    ┌───────────────┐           ┌───────────────┐                   │
    │ Cache-Aside   │           │Write-Through  │                   │
    │               │           │      +        │                   │
    │ Simple, works │           │ Read-Through  │                   │
    │ for most cases│           │               │                   │
    └───────────────┘           └───────────────┘                   │
                                                                    │
                                        ┌───────────────────────────┘
                                        │
                        ┌───────────────┼───────────────┐
                        │               │               │
                        ▼               ▼               ▼
                ┌───────────────┐ ┌───────────────┐ ┌───────────────┐
                │  Read-heavy   │ │  Write-heavy  │ │ Balanced, but │
                │               │ │               │ │ loss-tolerant │
                └───────┬───────┘ └───────┬───────┘ └───────┬───────┘
                        │               │                   │
                        ▼               │                   │
                ┌───────────────┐       │                   │
                │ Cache-Aside   │       │                   │
                │ or            │       │                   │
                │ Read-Through  │       │                   │
                └───────────────┘       │                   │
                                        ▼                   ▼
                                ┌───────────────┐   ┌───────────────┐
                                │ Write-Behind  │   │ Write-Behind  │
                                │               │   │               │
                                │ Fast writes,  │   │ With careful  │
                                │ accept risk   │   │ failure       │
                                └───────────────┘   │ handling      │
                                                    └───────────────┘

3.3 Real-World Pattern Usage

WHICH COMPANIES USE WHICH PATTERNS

CACHE-ASIDE (Most Common)
├── Amazon: Product catalog
├── Netflix: User profiles  
├── Uber: Driver locations (with short TTL)
└── Most CRUD applications

READ-THROUGH
├── Hibernate/JPA second-level cache
├── CDN edge caches
└── Custom ORM implementations

WRITE-THROUGH
├── Banking: Account balances
├── E-commerce: Inventory counts
├── Gaming: Player state in competitive games
└── Any "never serve stale data" requirement

WRITE-BEHIND
├── Facebook: Like counts
├── Twitter: Tweet view counts
├── Analytics: Page views, clicks
├── Logging: Application events
└── Any high-volume, loss-tolerant writes

3.4 Combining Patterns

Most production systems combine patterns:

HYBRID APPROACH: E-COMMERCE EXAMPLE

Product Catalog:
  Pattern: Cache-Aside
  Reason: Read-heavy, can tolerate minutes of staleness
  
Inventory:
  Pattern: Write-Through + Read-Through
  Reason: Must be accurate, overselling is expensive
  
Price:
  Pattern: Cache-Aside with event-driven invalidation
  Reason: Changes infrequently, but must update quickly when it does
  
Shopping Cart:
  Pattern: Write-Through
  Reason: User expects cart to persist across sessions
  
View Count:
  Pattern: Write-Behind
  Reason: High volume, approximate is fine
  
User Session:
  Pattern: Cache-Aside with sticky sessions
  Reason: Read-heavy, session-specific

Part II: Implementation

Chapter 4: Basic Cache-Aside Implementation

4.1 The Simplest Version

# Basic Cache-Aside Implementation
# WARNING: Not production-ready - for learning only

import json
from typing import Optional, Any


class SimpleCache:
    """
    Minimal cache-aside implementation.
    
    Good for understanding the pattern, not for production.
    """
    
    def __init__(self, redis_client, default_ttl: int = 300):
        self.redis = redis_client
        self.default_ttl = default_ttl
    
    async def get(self, key: str) -> Optional[Any]:
        """Get from cache, returns None on miss."""
        value = await self.redis.get(key)
        if value:
            return json.loads(value)
        return None
    
    async def set(self, key: str, value: Any, ttl: int = None) -> None:
        """Set in cache with TTL."""
        ttl = ttl or self.default_ttl
        await self.redis.setex(key, ttl, json.dumps(value))
    
    async def delete(self, key: str) -> None:
        """Delete from cache (invalidation)."""
        await self.redis.delete(key)


# Usage example
async def get_product(product_id: str, cache: SimpleCache, db) -> dict:
    cache_key = f"product:{product_id}"
    
    # Check cache
    product = await cache.get(cache_key)
    if product:
        return product  # Cache hit!
    
    # Cache miss - load from database
    product = await db.fetch_one(
        "SELECT * FROM products WHERE id = $1",
        product_id
    )
    
    if product:
        # Store in cache for next time
        await cache.set(cache_key, dict(product))
    
    return dict(product) if product else None

4.2 Understanding the Flow

CACHE-ASIDE READ FLOW

Request: GET /products/123

Step 1: Build cache key
        cache_key = "product:123"

Step 2: Check Redis
        redis.get("product:123")
        
        ├── EXISTS (Hit):
        │   Return cached JSON
        │   Total time: ~2ms
        │
        └── NOT EXISTS (Miss):
            Continue to Step 3

Step 3: Query database
        SELECT * FROM products WHERE id = 123
        Total time: ~50ms

Step 4: Store in cache
        redis.setex("product:123", 300, json_data)
        TTL = 300 seconds

Step 5: Return to client
        Total time (miss): ~55ms
        Next request (hit): ~2ms

Chapter 5: Production-Ready Implementation

5.1 Requirements for Production

Our production implementation needs:

  1. Serialization — Handle complex objects
  2. Error handling — Cache failures shouldn't break the app
  3. Metrics — Track hit/miss ratios
  4. Key management — Versioned keys for safe deploys
  5. Bulk operations — Efficient multi-get/set
  6. Circuit breaker — Protect against cache outages

5.2 Full Production Implementation

# Production-Ready Cache-Aside Implementation

import asyncio
import json
import hashlib
import logging
from dataclasses import dataclass, field
from datetime import datetime, timedelta
from typing import Optional, Dict, Any, List, Callable, TypeVar, Generic
from enum import Enum
import time

logger = logging.getLogger(__name__)

T = TypeVar('T')


# =============================================================================
# Configuration
# =============================================================================

@dataclass
class CacheConfig:
    """Configuration for the cache layer."""
    default_ttl: int = 300  # 5 minutes
    max_ttl: int = 86400    # 24 hours
    key_prefix: str = "app"
    key_version: str = "v1"  # Bump to invalidate all keys on deploy
    
    # Circuit breaker settings
    failure_threshold: int = 5
    recovery_timeout: int = 30
    
    # Timeouts
    get_timeout: float = 0.1   # 100ms
    set_timeout: float = 0.2   # 200ms


class CacheStatus(Enum):
    HIT = "hit"
    MISS = "miss"
    ERROR = "error"
    SKIP = "skip"  # Circuit breaker open


@dataclass
class CacheResult(Generic[T]):
    """Result of a cache operation."""
    status: CacheStatus
    value: Optional[T] = None
    latency_ms: float = 0
    error: Optional[str] = None


# =============================================================================
# Serialization
# =============================================================================

class CacheSerializer:
    """Handles serialization of cache values."""
    
    @staticmethod
    def serialize(value: Any) -> str:
        """Serialize value to JSON string."""
        return json.dumps(value, default=str, ensure_ascii=False)
    
    @staticmethod
    def deserialize(data: str) -> Any:
        """Deserialize JSON string to value."""
        return json.loads(data)


# =============================================================================
# Circuit Breaker
# =============================================================================

class CircuitBreaker:
    """
    Circuit breaker for cache operations.
    
    Prevents cascading failures when cache is unhealthy.
    """
    
    def __init__(self, failure_threshold: int = 5, recovery_timeout: int = 30):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        
        self._failures = 0
        self._last_failure_time: Optional[float] = None
        self._state = "closed"  # closed, open, half-open
    
    def record_success(self):
        """Record a successful operation."""
        self._failures = 0
        self._state = "closed"
    
    def record_failure(self):
        """Record a failed operation."""
        self._failures += 1
        self._last_failure_time = time.time()
        
        if self._failures >= self.failure_threshold:
            self._state = "open"
            logger.warning(f"Circuit breaker opened after {self._failures} failures")
    
    def is_open(self) -> bool:
        """Check if circuit is open (should skip cache)."""
        if self._state == "closed":
            return False
        
        if self._state == "open":
            # Check if recovery timeout has passed
            if time.time() - self._last_failure_time > self.recovery_timeout:
                self._state = "half-open"
                return False
            return True
        
        return False  # half-open: allow one request through


# =============================================================================
# Metrics
# =============================================================================

@dataclass
class CacheMetrics:
    """Tracks cache performance metrics."""
    hits: int = 0
    misses: int = 0
    errors: int = 0
    skips: int = 0
    total_latency_ms: float = 0
    
    @property
    def total_requests(self) -> int:
        return self.hits + self.misses + self.errors + self.skips
    
    @property
    def hit_ratio(self) -> float:
        if self.total_requests == 0:
            return 0.0
        return self.hits / self.total_requests
    
    @property
    def avg_latency_ms(self) -> float:
        if self.total_requests == 0:
            return 0.0
        return self.total_latency_ms / self.total_requests
    
    def record(self, result: CacheResult):
        """Record a cache operation result."""
        self.total_latency_ms += result.latency_ms
        
        if result.status == CacheStatus.HIT:
            self.hits += 1
        elif result.status == CacheStatus.MISS:
            self.misses += 1
        elif result.status == CacheStatus.ERROR:
            self.errors += 1
        elif result.status == CacheStatus.SKIP:
            self.skips += 1


# =============================================================================
# Main Cache Implementation
# =============================================================================

class ProductionCache:
    """
    Production-ready cache-aside implementation.
    
    Features:
    - Automatic serialization
    - Circuit breaker for resilience
    - Metrics tracking
    - Versioned keys
    - Bulk operations
    - Graceful degradation
    """
    
    def __init__(
        self,
        redis_client,
        config: CacheConfig = None,
        metrics_client = None
    ):
        self.redis = redis_client
        self.config = config or CacheConfig()
        self.external_metrics = metrics_client
        
        self.serializer = CacheSerializer()
        self.circuit_breaker = CircuitBreaker(
            failure_threshold=self.config.failure_threshold,
            recovery_timeout=self.config.recovery_timeout
        )
        self.metrics = CacheMetrics()
    
    def _build_key(self, key: str) -> str:
        """Build full cache key with prefix and version."""
        return f"{self.config.key_prefix}:{self.config.key_version}:{key}"
    
    async def get(self, key: str) -> CacheResult[Any]:
        """
        Get value from cache.
        
        Returns CacheResult with status, value, and metrics.
        """
        start_time = time.time()
        full_key = self._build_key(key)
        
        # Check circuit breaker
        if self.circuit_breaker.is_open():
            result = CacheResult(status=CacheStatus.SKIP)
            self.metrics.record(result)
            return result
        
        try:
            # Get from Redis with timeout
            raw_value = await asyncio.wait_for(
                self.redis.get(full_key),
                timeout=self.config.get_timeout
            )
            
            latency_ms = (time.time() - start_time) * 1000
            
            if raw_value is None:
                result = CacheResult(
                    status=CacheStatus.MISS,
                    latency_ms=latency_ms
                )
            else:
                value = self.serializer.deserialize(raw_value)
                result = CacheResult(
                    status=CacheStatus.HIT,
                    value=value,
                    latency_ms=latency_ms
                )
            
            self.circuit_breaker.record_success()
            self.metrics.record(result)
            return result
            
        except asyncio.TimeoutError:
            logger.warning(f"Cache get timeout for key: {key}")
            self.circuit_breaker.record_failure()
            result = CacheResult(
                status=CacheStatus.ERROR,
                error="timeout",
                latency_ms=(time.time() - start_time) * 1000
            )
            self.metrics.record(result)
            return result
            
        except Exception as e:
            logger.error(f"Cache get error for key {key}: {e}")
            self.circuit_breaker.record_failure()
            result = CacheResult(
                status=CacheStatus.ERROR,
                error=str(e),
                latency_ms=(time.time() - start_time) * 1000
            )
            self.metrics.record(result)
            return result
    
    async def set(
        self,
        key: str,
        value: Any,
        ttl: int = None
    ) -> bool:
        """
        Set value in cache.
        
        Returns True on success, False on failure.
        Failures are logged but don't raise exceptions.
        """
        if self.circuit_breaker.is_open():
            return False
        
        full_key = self._build_key(key)
        ttl = min(ttl or self.config.default_ttl, self.config.max_ttl)
        
        try:
            serialized = self.serializer.serialize(value)
            
            await asyncio.wait_for(
                self.redis.setex(full_key, ttl, serialized),
                timeout=self.config.set_timeout
            )
            
            self.circuit_breaker.record_success()
            return True
            
        except asyncio.TimeoutError:
            logger.warning(f"Cache set timeout for key: {key}")
            self.circuit_breaker.record_failure()
            return False
            
        except Exception as e:
            logger.error(f"Cache set error for key {key}: {e}")
            self.circuit_breaker.record_failure()
            return False
    
    async def delete(self, key: str) -> bool:
        """Delete a key from cache (invalidation)."""
        if self.circuit_breaker.is_open():
            return False
        
        full_key = self._build_key(key)
        
        try:
            await self.redis.delete(full_key)
            self.circuit_breaker.record_success()
            return True
        except Exception as e:
            logger.error(f"Cache delete error for key {key}: {e}")
            self.circuit_breaker.record_failure()
            return False
    
    async def get_many(self, keys: List[str]) -> Dict[str, CacheResult]:
        """
        Get multiple values in a single round-trip.
        
        Much more efficient than multiple individual gets.
        """
        if self.circuit_breaker.is_open():
            return {key: CacheResult(status=CacheStatus.SKIP) for key in keys}
        
        start_time = time.time()
        full_keys = [self._build_key(k) for k in keys]
        
        try:
            # MGET for all keys at once
            raw_values = await asyncio.wait_for(
                self.redis.mget(full_keys),
                timeout=self.config.get_timeout * 2  # Allow more time for bulk
            )
            
            latency_ms = (time.time() - start_time) * 1000
            self.circuit_breaker.record_success()
            
            results = {}
            for key, raw_value in zip(keys, raw_values):
                if raw_value is None:
                    results[key] = CacheResult(
                        status=CacheStatus.MISS,
                        latency_ms=latency_ms / len(keys)
                    )
                else:
                    value = self.serializer.deserialize(raw_value)
                    results[key] = CacheResult(
                        status=CacheStatus.HIT,
                        value=value,
                        latency_ms=latency_ms / len(keys)
                    )
                self.metrics.record(results[key])
            
            return results
            
        except Exception as e:
            logger.error(f"Cache get_many error: {e}")
            self.circuit_breaker.record_failure()
            return {
                key: CacheResult(status=CacheStatus.ERROR, error=str(e))
                for key in keys
            }
    
    async def set_many(self, items: Dict[str, Any], ttl: int = None) -> bool:
        """Set multiple values in a single round-trip."""
        if self.circuit_breaker.is_open():
            return False
        
        ttl = min(ttl or self.config.default_ttl, self.config.max_ttl)
        
        try:
            pipe = self.redis.pipeline()
            
            for key, value in items.items():
                full_key = self._build_key(key)
                serialized = self.serializer.serialize(value)
                pipe.setex(full_key, ttl, serialized)
            
            await asyncio.wait_for(
                pipe.execute(),
                timeout=self.config.set_timeout * 2
            )
            
            self.circuit_breaker.record_success()
            return True
            
        except Exception as e:
            logger.error(f"Cache set_many error: {e}")
            self.circuit_breaker.record_failure()
            return False
    
    def get_stats(self) -> Dict[str, Any]:
        """Get cache statistics."""
        return {
            "hits": self.metrics.hits,
            "misses": self.metrics.misses,
            "errors": self.metrics.errors,
            "skips": self.metrics.skips,
            "hit_ratio": round(self.metrics.hit_ratio, 4),
            "avg_latency_ms": round(self.metrics.avg_latency_ms, 2),
            "circuit_breaker_state": self.circuit_breaker._state,
        }


# =============================================================================
# Repository Pattern with Cache
# =============================================================================

class CachedProductRepository:
    """
    Product repository with caching.
    
    Demonstrates the cache-aside pattern in a repository layer.
    """
    
    def __init__(
        self,
        db_pool,
        cache: ProductionCache,
        ttl_seconds: int = 300
    ):
        self.db = db_pool
        self.cache = cache
        self.ttl = ttl_seconds
    
    async def get_by_id(self, product_id: str) -> Optional[Dict]:
        """Get product by ID with caching."""
        cache_key = f"product:{product_id}"
        
        # Try cache first
        result = await self.cache.get(cache_key)
        
        if result.status == CacheStatus.HIT:
            logger.debug(f"Cache hit for product {product_id}")
            return result.value
        
        # Cache miss or error - load from database
        product = await self.db.fetch_one(
            "SELECT * FROM products WHERE id = $1",
            product_id
        )
        
        if product:
            product_dict = dict(product)
            # Populate cache (best effort)
            await self.cache.set(cache_key, product_dict, self.ttl)
            return product_dict
        
        return None
    
    async def get_by_ids(self, product_ids: List[str]) -> Dict[str, Dict]:
        """Get multiple products efficiently."""
        cache_keys = [f"product:{pid}" for pid in product_ids]
        
        # Try cache for all
        cache_results = await self.cache.get_many(cache_keys)
        
        # Separate hits and misses
        products = {}
        missing_ids = []
        
        for pid, key in zip(product_ids, cache_keys):
            result = cache_results[key]
            if result.status == CacheStatus.HIT:
                products[pid] = result.value
            else:
                missing_ids.append(pid)
        
        # Load missing from database
        if missing_ids:
            rows = await self.db.fetch(
                "SELECT * FROM products WHERE id = ANY($1)",
                missing_ids
            )
            
            to_cache = {}
            for row in rows:
                product = dict(row)
                pid = product['id']
                products[pid] = product
                to_cache[f"product:{pid}"] = product
            
            # Cache the missing ones (best effort)
            if to_cache:
                await self.cache.set_many(to_cache, self.ttl)
        
        return products
    
    async def update(self, product_id: str, data: Dict) -> Optional[Dict]:
        """Update product and invalidate cache."""
        # Update database
        product = await self.db.fetch_one(
            """
            UPDATE products 
            SET name = $2, price = $3, updated_at = NOW()
            WHERE id = $1
            RETURNING *
            """,
            product_id, data['name'], data['price']
        )
        
        if product:
            # Invalidate cache
            await self.cache.delete(f"product:{product_id}")
            return dict(product)
        
        return None
    
    async def delete(self, product_id: str) -> bool:
        """Delete product and invalidate cache."""
        result = await self.db.execute(
            "DELETE FROM products WHERE id = $1",
            product_id
        )
        
        # Always try to invalidate (even if delete found nothing)
        await self.cache.delete(f"product:{product_id}")
        
        return result == "DELETE 1"

Chapter 6: Edge Cases and Error Handling

6.1 Edge Case 1: Cache Failure on Read

SCENARIO: Redis is down, read request comes in

Without proper handling:
  1. Try to read from Redis
  2. Connection error
  3. Exception propagates
  4. User sees 500 error

With graceful degradation:
  1. Try to read from Redis
  2. Connection error → Circuit breaker opens
  3. Subsequent reads skip cache entirely
  4. All requests go to database
  5. System slower but functional
# Graceful degradation on cache failure
async def get_product_safe(product_id: str) -> Optional[dict]:
    cache_key = f"product:{product_id}"
    
    # Cache.get() returns CacheResult, never throws
    result = await cache.get(cache_key)
    
    if result.status == CacheStatus.HIT:
        return result.value
    
    # Miss, error, or skip - all go to database
    # The circuit breaker ensures we don't keep hammering a dead cache
    product = await db.fetch_one(
        "SELECT * FROM products WHERE id = $1",
        product_id
    )
    
    if product:
        # Only try to cache if circuit is closed
        if result.status != CacheStatus.SKIP:
            await cache.set(cache_key, dict(product))
        return dict(product)
    
    return None

6.2 Edge Case 2: Race Condition on Write

SCENARIO: Two requests update the same product simultaneously

Timeline:
  T0: Request A reads product (price: $100)
  T1: Request B reads product (price: $100)
  T2: Request A updates price to $90, invalidates cache
  T3: Request B updates price to $110, invalidates cache
  T4: Request C reads product - cache miss
  T5: Request C loads from DB (price: $110), caches it

This is fine! Last write wins, cache reflects latest DB state.

BUT what if invalidation fails?

  T0: Request A reads product (price: $100, cached)
  T1: Request A updates price to $90 in DB
  T2: Request A tries to invalidate cache - FAILS (Redis timeout)
  T3: Request B reads product - gets $100 from cache (stale!)

SOLUTION: Use TTL as safety net
  Even if invalidation fails, stale data expires eventually.
  Set TTL based on acceptable staleness window.

6.3 Edge Case 3: Thundering Herd on Cold Start

SCENARIO: Service restarts, cache is empty

Normal state:
  1000 requests/sec, 99% cache hit ratio
  10 requests/sec hit database

After restart:
  1000 requests/sec, 0% cache hit ratio
  1000 requests/sec hit database
  Database overwhelmed!

SOLUTIONS:

1. CACHE WARMING (Pre-populate cache before serving traffic)
   async def warm_cache():
       popular_products = await db.fetch(
           "SELECT * FROM products ORDER BY view_count DESC LIMIT 1000"
       )
       for product in popular_products:
           await cache.set(f"product:{product['id']}", dict(product))
   
   # Call before accepting traffic
   await warm_cache()

2. GRADUAL TRAFFIC SHIFT (If using load balancer)
   New instance starts with 1% traffic
   Gradually increase as cache warms naturally

3. REQUEST COALESCING (Covered in Day 3)
   Multiple simultaneous requests for same key
   → Only one hits database
   → Others wait for result

6.4 Edge Case 4: Serialization Errors

SCENARIO: Object can't be serialized to JSON

product = {
    "id": "123",
    "created_at": datetime(2024, 1, 15),  # datetime not JSON serializable!
    "data": bytes([1, 2, 3])               # bytes not JSON serializable!
}

await cache.set("product:123", product)  # Fails!

SOLUTIONS:

1. CUSTOM SERIALIZER
   class DateTimeEncoder(json.JSONEncoder):
       def default(self, obj):
           if isinstance(obj, datetime):
               return obj.isoformat()
           if isinstance(obj, bytes):
               return base64.b64encode(obj).decode()
           return super().default(obj)

2. EXPLICIT CONVERSION BEFORE CACHING
   product_cacheable = {
       "id": product["id"],
       "created_at": product["created_at"].isoformat(),
       "data": base64.b64encode(product["data"]).decode()
   }

3. USE PICKLE (But be careful with security/versioning)
   # Not recommended for untrusted data
   await cache.set(key, pickle.dumps(product))

6.5 Error Handling Matrix

Error Impact Handling Prevention
Cache timeout Single request slower Skip cache, go to DB Set appropriate timeouts
Cache down All requests to DB Circuit breaker, degrade gracefully Redis Cluster, replicas
Serialization error Cache write fails Log, continue without cache Custom serializer
Invalidation fails Stale data served TTL as safety net Retry invalidation
Cache full Evictions happen LRU evicts old data Monitor memory, scale
Key collision Wrong data served Versioned keys, proper naming Key design review

Part III: Real-World Application

Chapter 7: How Big Tech Does It

7.1 Case Study: Amazon — Product Catalog Caching

AMAZON'S PRODUCT CACHE ARCHITECTURE

Scale:
  - 350+ million products
  - Millions of requests per second
  - Sub-100ms response time requirement

Architecture:

  User Request
       │
       ▼
  ┌─────────────────┐
  │   Edge Cache    │ (CloudFront)
  │   TTL: 60s      │ Static assets, some product data
  └────────┬────────┘
           │ Miss
           ▼
  ┌─────────────────┐
  │  Regional Cache │ (ElastiCache)
  │   TTL: 5min     │ Product details, prices
  └────────┬────────┘
           │ Miss
           ▼
  ┌─────────────────┐
  │  Product Service│
  │  In-memory cache│ Frequently accessed products
  └────────┬────────┘
           │ Miss
           ▼
  ┌─────────────────┐
  │    DynamoDB     │
  │  (Source of     │
  │   Truth)        │
  └─────────────────┘


Key Decisions:

1. TIERED CACHING
   Different TTLs at different layers
   Edge: Static content, long TTL
   Regional: Dynamic content, shorter TTL
   
2. CACHE-ASIDE FOR FLEXIBILITY
   Application controls cache logic
   Different products have different caching rules
   
3. INVENTORY SEPARATE FROM CATALOG
   Inventory changes frequently (seconds)
   Catalog changes rarely (days)
   Different caching strategies for each

4. EVENTUAL CONSISTENCY ACCEPTED
   User might see slightly stale price
   But cart/checkout uses real-time data

7.2 Case Study: Netflix — User Profile Caching

NETFLIX USER PROFILE CACHING

Challenge:
  - 200+ million subscribers
  - Each has personalized profile
  - Every request needs profile data
  - Profile rarely changes

Solution: EVCache (Their custom distributed cache)

Architecture:

  API Request (needs user context)
       │
       ▼
  ┌─────────────────┐
  │  API Service    │
  │                 │
  │  Check EVCache  │─────▶ HIT: Return profile
  │                 │
  │  Miss:          │
  │  Load from DB   │─────▶ Cassandra
  │  Cache for 24h  │
  └─────────────────┘


Key Innovations:

1. ZONE-AWARE CACHING
   Cache replicas in multiple availability zones
   Read from nearest, write to all
   Survives zone failure
   
2. CACHE WARMING ON DEPLOY
   Pre-populate cache before serving traffic
   Avoids cold-start thundering herd
   
3. LONG TTL FOR STABLE DATA
   User profile rarely changes
   24-hour TTL acceptable
   Event-driven invalidation for changes

4. CACHE-ASIDE WITH FALLBACK
   If EVCache fails, fall back to Cassandra
   Higher latency but still works

7.3 Case Study: Uber — Driver Location Caching

UBER DRIVER LOCATION CACHE

Challenge:
  - Millions of active drivers
  - Location updates every 4 seconds
  - Rider needs nearby drivers instantly
  - Data is extremely time-sensitive

This is the OPPOSITE of typical caching!

Architecture:

  Driver App
       │
       │ Location update every 4s
       ▼
  ┌─────────────────┐
  │  Location       │
  │  Service        │
  │                 │
  │  Write-through  │──────▶ Redis Cluster (Geo)
  │  to cache       │        GEOADD drivers:sf <lon> <lat> driver123
  │                 │
  │  Async to DB    │──────▶ Cassandra (historical)
  └─────────────────┘

  Rider Request: "Show nearby drivers"
       │
       ▼
  ┌─────────────────┐
  │  Matching       │
  │  Service        │
  │                 │
  │  GEORADIUS      │◀────── Redis Cluster
  │  drivers:sf     │        Returns drivers within 5km
  │  5km            │
  └─────────────────┘


Key Decisions:

1. CACHE IS THE PRIMARY STORE
   For real-time location, cache IS the source of truth
   Database is for history/analytics only
   
2. VERY SHORT TTL (or none)
   Location expires automatically
   If driver stops updating, they disappear
   
3. GEO-SHARDING
   Separate Redis cluster per city
   Reduces data size, improves locality
   
4. WRITE-THROUGH (not write-behind)
   Can't afford to lose location updates
   Every update goes to cache immediately

7.4 Summary: Industry Patterns

Company Use Case Pattern TTL Key Insight
Amazon Product catalog Cache-aside 5 min Tiered caching
Netflix User profiles Cache-aside 24 hr Zone-aware
Uber Driver location Write-through ~10 sec Cache as primary
Twitter Timelines Write-behind N/A Fan-out on write
Instagram Post counts Write-behind N/A Approximate OK

Chapter 8: Common Mistakes to Avoid

8.1 Mistake 1: Caching Everything

❌ WRONG: Cache all the things!

async def get_anything(key: str) -> any:
    cached = await cache.get(key)
    if cached:
        return cached
    
    # Load from DB
    value = await db.fetch(key)
    await cache.set(key, value, ttl=3600)  # Cache for 1 hour
    return value

Problems:
  - Rarely accessed data wastes cache memory
  - Low hit ratio
  - Cache evictions hurt actually hot data


✅ CORRECT: Cache strategically

# Only cache frequently accessed, expensive to compute data
CACHE_CONFIG = {
    "product_details": {"ttl": 300, "cache": True},   # Hot, expensive
    "user_profile": {"ttl": 600, "cache": True},      # Hot, stable
    "order_history": {"ttl": 0, "cache": False},      # Rarely re-read
    "audit_log": {"ttl": 0, "cache": False},          # Write-once, read-rarely
}

async def get_with_strategy(entity: str, id: str) -> any:
    config = CACHE_CONFIG.get(entity, {"cache": False})
    
    if not config["cache"]:
        return await db.fetch(entity, id)
    
    # ... caching logic for entities that benefit

8.2 Mistake 2: Not Handling Cache Failures

❌ WRONG: Cache failure = Application failure

async def get_product(product_id: str) -> dict:
    cache_key = f"product:{product_id}"
    
    # If Redis is down, this throws an exception!
    cached = await redis.get(cache_key)
    
    if cached:
        return json.loads(cached)
    
    product = await db.fetch_one("SELECT * FROM products WHERE id = $1", product_id)
    await redis.setex(cache_key, 300, json.dumps(product))  # Also throws!
    return product


✅ CORRECT: Graceful degradation

async def get_product(product_id: str) -> dict:
    cache_key = f"product:{product_id}"
    
    # Try cache, but don't fail if it's down
    try:
        cached = await asyncio.wait_for(
            redis.get(cache_key),
            timeout=0.1  # 100ms max
        )
        if cached:
            return json.loads(cached)
    except Exception as e:
        logger.warning(f"Cache read failed: {e}")
        # Continue to database
    
    # Load from database
    product = await db.fetch_one("SELECT * FROM products WHERE id = $1", product_id)
    
    # Try to cache, but don't fail if we can't
    try:
        await redis.setex(cache_key, 300, json.dumps(dict(product)))
    except Exception as e:
        logger.warning(f"Cache write failed: {e}")
    
    return dict(product)

8.3 Mistake 3: Ignoring Cache Metrics

❌ WRONG: Deploy and forget

# Set up cache
cache = Redis(host='redis')

# Use cache
await cache.get(key)
await cache.set(key, value)

# Hope for the best!


✅ CORRECT: Monitor and alert

# Track every cache operation
async def get_with_metrics(key: str) -> any:
    start = time.time()
    
    result = await cache.get(key)
    
    latency = (time.time() - start) * 1000
    
    if result:
        metrics.increment('cache.hit', tags={'key_prefix': key.split(':')[0]})
    else:
        metrics.increment('cache.miss', tags={'key_prefix': key.split(':')[0]})
    
    metrics.timing('cache.latency', latency)
    
    return result


# Alert on poor performance
ALERTS:
  - cache.hit_ratio < 0.8: "Cache hit ratio dropped below 80%"
  - cache.latency.p99 > 50ms: "Cache latency elevated"
  - cache.errors > 10/min: "Cache errors spiking"

8.4 Mistake 4: Wrong Key Design

❌ WRONG: Non-unique or collision-prone keys

# Key collision!
cache.set("user", user1)  # Overwritten by...
cache.set("user", user2)  # Different user, same key!

# Non-deterministic keys
cache.set(f"user:{user.email}", user)  # What if email changes?

# Keys without versioning
cache.set("product:123", product)  # Schema changes break deserialization


✅ CORRECT: Properly designed keys

# Unique, stable identifier
cache.set(f"user:{user.id}", user)

# Versioned for schema changes
cache.set(f"v2:product:{product.id}", product)  # Bump version on schema change

# Namespaced for clarity
cache.set(f"myapp:prod:user:{user.id}", user)

# With prefix for bulk operations
# Can delete all products: redis.delete_pattern("myapp:prod:product:*")

8.5 Mistake Checklist

Before deploying caching, verify:

  • Right pattern chosen — Cache-aside vs write-through for your use case
  • Appropriate TTL — Based on acceptable staleness
  • Failure handling — App works (slower) when cache is down
  • Invalidation strategy — How/when cache is cleared
  • Key design — Unique, versioned, namespaced
  • Metrics in place — Hit ratio, latency, errors tracked
  • Memory limits set — Redis maxmemory with eviction policy
  • Serialization tested — All data types serialize correctly

Part IV: Interview Preparation

Chapter 9: Interview Tips and Phrases

9.1 When to Bring Up Caching

Bring up caching when:

  • System has read-heavy workload (>10:1 read/write ratio)
  • Database is a bottleneck or scaling concern
  • Latency requirements are strict (<100ms)
  • Same data is requested repeatedly
  • Computing the data is expensive

Don't default to caching when:

  • Data changes frequently (real-time stock prices)
  • Every request is unique (search queries)
  • Consistency is critical and invalidation is complex
  • Data is write-heavy

9.2 Key Phrases to Use

INTRODUCING CACHING:

"For this read-heavy workload, I'd add a caching layer. With 
100,000 reads per second and only 100 writes, we'd see around 
a 99% hit ratio, reducing database load by two orders of 
magnitude."


EXPLAINING PATTERN CHOICE:

"I'd use cache-aside here because it gives us the most control. 
The application can decide what to cache and for how long. Plus, 
if the cache fails, we gracefully fall back to the database—the 
system is slower but not broken."


DISCUSSING TRADE-OFFS:

"The trade-off with caching is consistency versus latency. With 
a 5-minute TTL, users might see slightly stale product prices, 
but we reduce database load significantly. For checkout, we'd 
bypass the cache and hit the database directly to ensure accuracy."


ADDRESSING INVALIDATION:

"For invalidation, I'd use a hybrid approach: event-driven 
invalidation for immediate consistency when data changes, plus 
a short TTL as a safety net in case an invalidation event is 
lost. This gives us near real-time updates with protection 
against stale data."


HANDLING FAILURE QUESTIONS:

"If Redis goes down, the system degrades gracefully. The circuit 
breaker opens after a few failures, and all requests go directly 
to the database. Latency increases but functionality remains. 
Once Redis recovers, the circuit breaker allows test requests 
through, and normal operation resumes."

9.3 Questions to Ask Interviewer

  • "What's the read/write ratio for this data?"
  • "How fresh does the data need to be? Can users tolerate 1-minute staleness?"
  • "What's the current database load? Is it a bottleneck?"
  • "Is there existing cache infrastructure we should use?"

9.4 Common Follow-up Questions

Question Good Answer
"What if the cache goes down?" "Circuit breaker detects failures, subsequent requests go directly to database. Higher latency but still functional. We'd alert on this and investigate."
"How do you handle cache stampede?" "Request coalescing—multiple requests for the same key wait on a single database fetch. Also probabilistic early expiration to stagger cache misses."
"How do you decide TTL?" "Based on acceptable staleness. Product descriptions: 1 hour. Inventory counts: 30 seconds. Prices: 5 minutes with event-driven invalidation for flash sales."
"Which caching pattern would you use?" "Cache-aside for most cases—simple, resilient, flexible. Write-through only when consistency is critical. Write-behind only for high-volume, loss-tolerant writes like analytics."

Chapter 10: Practice Problems

Problem 1: E-commerce Product Page

Setup: Design caching for an e-commerce product page. Page shows: product details, current price, inventory count, reviews, and related products.

Requirements:

  • 50,000 product views per second
  • Price changes during flash sales
  • Inventory updated with every purchase
  • Reviews added ~100 per minute

Questions:

  1. What caching pattern would you use for each data type?
  2. What TTLs would you set?
  3. How would you handle a flash sale where prices change instantly?
  • Different data has different freshness requirements
  • Inventory is most time-sensitive
  • Reviews rarely change and are expensive to compute
  • Consider event-driven invalidation for prices

Pattern by Data Type:

Data Pattern TTL Invalidation
Product details Cache-aside 1 hour On update
Price Cache-aside 5 min Event-driven for flash sales
Inventory Cache-aside 30 sec Or skip cache entirely
Reviews Cache-aside 15 min Event-driven on new review
Related products Cache-aside 1 hour Daily recompute

Flash Sale Handling:

async def start_flash_sale(product_ids: List[str], sale_prices: Dict):
    # Update database
    await db.execute("UPDATE products SET price = ... WHERE id IN ...")
    
    # Publish invalidation event
    await kafka.produce("product-updates", {
        "type": "flash_sale_started",
        "product_ids": product_ids
    })
    
    # Cache invalidation consumer
    async def handle_flash_sale(event):
        for product_id in event["product_ids"]:
            await cache.delete(f"product:{product_id}")
            await cache.delete(f"price:{product_id}")

For inventory, consider not caching at all during high-activity periods:

async def get_inventory(product_id: str) -> int:
    if is_flash_sale_active(product_id):
        # During flash sale, always hit database for accuracy
        return await db.fetch_one("SELECT quantity FROM inventory WHERE product_id = $1", product_id)
    
    # Normal operation: use cache
    return await cached_inventory(product_id)

Problem 2: User Session Store

Setup: Design a session cache for a web application with 10 million concurrent users.

Requirements:

  • Session data: user ID, permissions, preferences (~2KB each)
  • Sessions expire after 30 minutes of inactivity
  • User can be logged in on multiple devices
  • Must handle datacenter failover

Questions:

  1. How would you structure the cache keys?
  2. How would you handle session updates (sliding expiration)?
  3. What happens during datacenter failover?
  • Sliding expiration means TTL resets on every access
  • Multiple devices = multiple sessions per user
  • Failover requires replication or persistence

Key Structure:

session:{session_id} → session data
user_sessions:{user_id} → set of session_ids (for logout-all)

Sliding Expiration:

async def get_session(session_id: str) -> Optional[dict]:
    key = f"session:{session_id}"
    
    # Get and refresh TTL in one operation
    pipe = redis.pipeline()
    pipe.get(key)
    pipe.expire(key, 1800)  # Reset to 30 minutes
    results = await pipe.execute()
    
    if results[0]:
        return json.loads(results[0])
    return None

Failover Strategy:

Option 1: Redis Cluster with replicas
  - Automatic failover
  - Some sessions may be lost during failover
  
Option 2: Redis with AOF persistence
  - Sessions survive restart
  - Slightly higher latency
  
Option 3: Multi-region replication
  - Active-active or active-passive
  - Session available in either region

Logout All Devices:

async def logout_all(user_id: str):
    # Get all sessions for user
    session_ids = await redis.smembers(f"user_sessions:{user_id}")
    
    # Delete all sessions
    if session_ids:
        pipe = redis.pipeline()
        for sid in session_ids:
            pipe.delete(f"session:{sid}")
        pipe.delete(f"user_sessions:{user_id}")
        await pipe.execute()

Problem 3: API Rate Limiter

Setup: Design a distributed rate limiter using Redis. Limit: 100 requests per minute per API key.

Requirements:

  • Must be accurate (can't significantly over-allow)
  • Must be fast (<5ms overhead)
  • Works across multiple API servers
  • Handle clock skew between servers

Questions:

  1. Which caching pattern applies here?
  2. How would you implement sliding window rate limiting?
  3. What happens if Redis is down?
  • This is write-heavy, not read-heavy
  • Consider Redis sorted sets for sliding window
  • Rate limiter failure mode: allow or deny?

Pattern: None of the four exactly—this is a specialized use case

  • Cache IS the authoritative store for rate limit state
  • Similar to write-through but cache-only (no database)

Sliding Window with Sorted Sets:

async def is_rate_limited(api_key: str, limit: int = 100, window: int = 60) -> bool:
    now = time.time()
    key = f"ratelimit:{api_key}"
    
    pipe = redis.pipeline()
    
    # Remove old entries outside window
    pipe.zremrangebyscore(key, 0, now - window)
    
    # Count current entries
    pipe.zcard(key)
    
    # Add current request
    pipe.zadd(key, {str(now): now})
    
    # Set expiry on the key
    pipe.expire(key, window)
    
    results = await pipe.execute()
    current_count = results[1]
    
    return current_count >= limit

# Usage
if await is_rate_limited(api_key):
    return Response(status=429, body="Too Many Requests")

Redis Down Strategy:

async def check_rate_limit(api_key: str) -> bool:
    try:
        return await is_rate_limited(api_key)
    except RedisError:
        # Option 1: Fail open (allow requests)
        logger.warning("Rate limiter unavailable, allowing request")
        return False
        
        # Option 2: Fail closed (deny requests)
        # return True
        
        # Option 3: Local fallback (less accurate)
        # return local_rate_limiter.check(api_key)

Chapter 11: Mock Interview Dialogue

Scenario: Design Product Catalog Cache

Interviewer: "We have an e-commerce platform with 1 million products. Product pages get 50,000 views per second. How would you design the caching layer?"

You: "Great question. Let me start by understanding the data and access patterns.

For product pages, I assume we're showing product details like name, description, and images—which rarely change—plus price and inventory which change more frequently. What's the breakdown of read vs write operations?"

Interviewer: "It's heavily read-dominated. Products are updated maybe 1,000 times per day total, but viewed millions of times."

You: "Perfect, that's an ideal caching scenario. With a 50,000:1 read-to-write ratio, even a 5-minute TTL would give us an excellent hit ratio.

I'd use the cache-aside pattern here. The application checks the cache first, and on a miss, loads from the database and populates the cache. This pattern gives us:

  1. Resilience — If Redis goes down, we fall back to the database. Slower, but functional.
  2. Simplicity — Easy to implement and reason about.
  3. Flexibility — We can cache different data with different TTLs.

For the cache key design, I'd use: product:{product_id}:v1 where v1 is the schema version. This lets us invalidate all cached products if we change the data structure.

Interviewer: "What about inventory? It changes with every purchase."

You: "Good point. Inventory is tricky because it changes frequently and accuracy matters—we don't want to show 'In Stock' when we're actually sold out.

I'd handle inventory differently:

For most products, I'd cache inventory with a 30-second TTL. Users might see slightly stale counts, but it self-corrects quickly.

For high-demand items during sales, I'd skip the cache entirely and hit the database. Better to be slightly slower and accurate than fast and wrong.

For the add-to-cart and checkout flow, I'd always check real-time inventory—never rely on cache here. The product page can show an approximation, but the purchase flow needs truth.

Interviewer: "What if we do a flash sale and prices change instantly? How do you invalidate?"

You: "For flash sales, TTL-based expiration isn't fast enough. I'd use event-driven invalidation.

When the sale starts:

  1. Update prices in the database
  2. Publish an event to Kafka: {type: 'flash_sale', product_ids: [...]}
  3. Cache invalidation service consumes the event
  4. Delete cache entries for affected products

Next request triggers a cache miss and loads the new price. This gives near-instant updates—typically under 1 second.

I'd keep a short TTL as a safety net too. If the invalidation event somehow fails, stale prices expire within 5 minutes anyway."

Interviewer: "How do you handle a cold start? What if you need to restart the cache cluster?"

You: "Cold start is a thundering herd waiting to happen. 50,000 requests per second, all hitting the database at once—that would be bad.

I'd use a few strategies:

  1. Cache warming — Before putting a new instance in rotation, run a job that pre-populates the cache with the top 10,000 most-viewed products. Covers 80% of traffic.

  2. Request coalescing — If multiple requests come in for the same product during a cache miss, only one actually queries the database. The others wait for that result. This is also called 'single-flight' pattern.

  3. Gradual traffic shift — If using a load balancer, start a new instance with 1% of traffic and slowly increase as the cache warms naturally.

For truly critical situations, I might also have a read replica database dedicated to cache rebuilding, so warming doesn't impact production traffic."

Interviewer: "Good. What metrics would you monitor?"

You: "For the cache layer specifically:

Hit ratio — Should be >95% for this use case. Drop below 90% means something's wrong.

Latency — p50 should be <2ms, p99 <10ms. If p99 climbs, we might have hot keys or network issues.

Memory usage — Track against maxmemory. Alert at 80% so we can scale before evictions spike.

Eviction rate — Some eviction is normal with LRU, but sudden spikes indicate we need more capacity.

Error rate — Connection errors, timeouts. Feed into circuit breaker.

I'd have alerts on hit ratio dropping and latency spiking. Those are the early warning signs before users notice problems."


Summary

DAY 1 KEY TAKEAWAYS

CORE CONCEPT:
• Caching stores copies of data in faster storage
• Reduces latency, increases throughput, lowers cost
• But adds complexity: invalidation, consistency, failures

THE FOUR PATTERNS:

Cache-Aside (Lazy Loading):
  • App manages cache + database
  • Most common, most flexible
  • Use for: General purpose, read-heavy workloads

Read-Through:
  • Cache automatically loads from database
  • Simpler app code
  • Use for: Uniform data access patterns

Write-Through:
  • Writes go to both cache and database synchronously
  • Strong consistency
  • Use for: Critical data where stale reads are unacceptable

Write-Behind:
  • Writes to cache, async to database
  • Highest performance, highest risk
  • Use for: Analytics, counters, loss-tolerant data

TRADE-OFFS:
• Consistency vs Performance
• Complexity vs Control
• Cache dependency vs Graceful degradation

DEFAULT CHOICE:
• Start with Cache-Aside
• Add write-through for critical paths
• Consider write-behind for analytics

📚 Further Reading

Official Documentation

Engineering Blogs

Books

  • "Designing Data-Intensive Applications" by Martin Kleppmann — Chapter 5
  • "Redis in Action" by Josiah Carlson

End of Day 1: Caching Patterns

Tomorrow: Day 2 — Invalidation Strategies. We've learned how to put data into caches. Tomorrow, we learn the harder problem: how to get stale data out. We'll cover TTL, event-driven invalidation, and versioned keys—and why Phil Karlton called this one of the two hardest problems in computer science.