Himanshu Kukreja
0%
Day 03

Week 4 — Day 3: Thundering Herd

System Design Mastery Series


Preface

Yesterday, you learned how to invalidate stale cache entries. You can now keep your cache fresh.

But there's a dark side to invalidation that we didn't fully address:

BLACK FRIDAY, 12:00:00 AM

The moment the sale starts:

  Homepage cache TTL expires
  50,000 users refresh simultaneously
  
  Cache status: EMPTY
  
  Request #1:     Cache miss → Query database
  Request #2:     Cache miss → Query database
  Request #3:     Cache miss → Query database
  ...
  Request #50,000: Cache miss → Query database

  Database receives: 50,000 identical queries
  Database capacity: 5,000 queries/second
  
  Result:
    12:00:01 — Database CPU: 100%
    12:00:02 — Query latency: 50ms → 5,000ms
    12:00:03 — Connection pool exhausted
    12:00:04 — Database: "Too many connections"
    12:00:05 — Site down, customers angry, revenue lost

  All because a single cache key expired.

This is the thundering herd problem — and it's destroyed more Black Friday sales than any other caching issue.

Today, you'll learn how to tame the herd.


Part I: Foundations

Chapter 1: What Is Thundering Herd?

1.1 The Simple Definition

Thundering herd occurs when many requests simultaneously attempt to access an unavailable resource, overwhelming the system that provides it.

EVERYDAY ANALOGY: Store Opening

Black Friday, 6:00 AM
Store doors: LOCKED (cache empty)

  5,000 people waiting outside
  
  6:00:00 — Doors open
  6:00:01 — 5,000 people rush in simultaneously
  6:00:02 — Entrance bottleneck
  6:00:03 — People trampled, chaos
  
The store can handle 100 people entering per minute.
5,000 at once = disaster.

SOLUTION: Controlled entry
  - Open doors gradually
  - Let people in batches of 50
  - Others wait in organized queue
  
Same principle applies to cache misses.

1.2 When Thundering Herd Occurs

THUNDERING HERD TRIGGERS

1. CACHE EXPIRATION
   Popular key expires
   Many concurrent requests miss
   All hit database simultaneously

2. COLD START
   Service restarts, cache empty
   All traffic goes to database
   Database overwhelmed

3. CACHE FAILURE
   Redis goes down
   All requests bypass cache
   Database sees 100x normal load

4. INVALIDATION STORM
   Bulk invalidation (deployment, sale start)
   Many keys deleted at once
   Sudden spike in cache misses

5. HOT KEY CREATION
   New viral content
   Not in cache yet
   Massive traffic from first second

1.3 The Math of Thundering Herd

QUANTIFYING THE PROBLEM

Normal operation:
  Requests/second:        10,000
  Cache hit ratio:        99%
  Cache misses/second:    100
  Database queries/sec:   100 ✓

Cache expires for hot key (1% of traffic):
  Requests for hot key:   100/second
  All miss simultaneously
  Database queries/sec:   100 (normal) + 100 (hot key) = 200 ✓
  Still manageable!

Cache expires for VERY hot key (50% of traffic):
  Requests for hot key:   5,000/second
  All miss simultaneously
  Database queries/sec:   100 (normal) + 5,000 (hot key) = 5,100
  Database capacity:      1,000 queries/second
  Result: 5x overload 💥


THE AMPLIFICATION FACTOR

Without protection:
  1 cache miss = N database queries
  (where N = concurrent requests for same key)

With thundering herd protection:
  1 cache miss = 1 database query
  (other N-1 requests wait for result)

1.4 Key Terminology

Term Definition
Thundering herd Many requests hitting uncached resource simultaneously
Cache stampede Synonym for thundering herd
Dog-pile effect Another synonym, common in Ruby/Python communities
Hot key Cache key with disproportionately high traffic
Request coalescing Combining duplicate requests into one
Lock/mutex Mechanism to ensure only one request fetches data
Probabilistic expiration Random early refresh to prevent synchronized expiry
Cache warming Pre-populating cache before traffic hits

Chapter 2: Protection Strategies

2.1 Strategy 1: Locking (Mutex)

Only one request fetches from database; others wait.

LOCKING STRATEGY

Without lock:
  Request A ──▶ Cache miss ──▶ Query DB ──▶ Response
  Request B ──▶ Cache miss ──▶ Query DB ──▶ Response
  Request C ──▶ Cache miss ──▶ Query DB ──▶ Response
  
  3 identical database queries!

With lock:
  Request A ──▶ Cache miss ──▶ Acquire lock ──▶ Query DB ──▶ Cache result ──▶ Release lock
  Request B ──▶ Cache miss ──▶ Wait for lock...
  Request C ──▶ Cache miss ──▶ Wait for lock...
  
  Lock released:
  Request B ──▶ Read from cache ──▶ Response
  Request C ──▶ Read from cache ──▶ Response
  
  1 database query, 3 responses!

Implementation:

# Locking Strategy for Thundering Herd Prevention

import asyncio
from typing import Optional, Any, Callable
import json


class LockingCache:
    """
    Cache with mutex-based thundering herd protection.
    
    Only one request fetches from database on cache miss;
    others wait for the result.
    """
    
    def __init__(self, redis_client, default_ttl: int = 300):
        self.redis = redis_client
        self.default_ttl = default_ttl
        self.lock_ttl = 10  # Lock expires after 10 seconds (safety)
    
    async def get_or_fetch(
        self,
        key: str,
        fetch_func: Callable,
        ttl: int = None
    ) -> Any:
        """
        Get from cache, or fetch with lock protection.
        
        Args:
            key: Cache key
            fetch_func: Async function to fetch data if cache miss
            ttl: Cache TTL in seconds
        """
        ttl = ttl or self.default_ttl
        
        # Step 1: Try cache
        cached = await self.redis.get(key)
        if cached:
            return json.loads(cached)
        
        # Step 2: Cache miss - try to acquire lock
        lock_key = f"lock:{key}"
        lock_acquired = await self.redis.set(
            lock_key,
            "1",
            nx=True,  # Only set if not exists
            ex=self.lock_ttl  # Expire lock after 10 seconds
        )
        
        if lock_acquired:
            # We got the lock - fetch and cache
            try:
                value = await fetch_func()
                
                # Cache the result
                await self.redis.setex(key, ttl, json.dumps(value))
                
                return value
            finally:
                # Release lock
                await self.redis.delete(lock_key)
        else:
            # Lock held by another request - wait and retry
            return await self._wait_for_cache(key, fetch_func, ttl)
    
    async def _wait_for_cache(
        self,
        key: str,
        fetch_func: Callable,
        ttl: int,
        max_wait: float = 5.0,
        poll_interval: float = 0.05
    ) -> Any:
        """
        Wait for another request to populate cache.
        
        Falls back to fetching if wait times out.
        """
        waited = 0
        
        while waited < max_wait:
            # Check if cache is now populated
            cached = await self.redis.get(key)
            if cached:
                return json.loads(cached)
            
            # Wait a bit and retry
            await asyncio.sleep(poll_interval)
            waited += poll_interval
        
        # Timeout - fetch ourselves (lock holder may have failed)
        value = await fetch_func()
        await self.redis.setex(key, ttl, json.dumps(value))
        return value


# Usage
cache = LockingCache(redis_client)

async def get_product(product_id: str) -> dict:
    return await cache.get_or_fetch(
        key=f"product:{product_id}",
        fetch_func=lambda: db.fetch_one(
            "SELECT * FROM products WHERE id = $1",
            product_id
        ),
        ttl=300
    )

Pros:

  • Guarantees only one database query
  • Simple mental model
  • Works well for moderate traffic

Cons:

  • Waiters block, increasing latency
  • Lock can become bottleneck
  • Risk of deadlock if lock holder crashes
  • Polling adds overhead

2.2 Strategy 2: Request Coalescing (Single Flight)

Multiple concurrent requests for the same key share a single fetch operation.

REQUEST COALESCING

Request A ──┐
Request B ──┼──▶ Single fetch_func() call ──▶ Single DB query
Request C ──┘                                       │
                                                    │
            ┌───────────────────────────────────────┘
            │
            ▼
  ┌─────────────────┐
  │ Result shared   │
  │ with all three  │
  │ requests        │
  └─────────────────┘

Unlike locking:
  - No polling/waiting
  - Requests "join" the in-flight operation
  - All get result simultaneously

Implementation:

# Request Coalescing (Single Flight Pattern)

import asyncio
from typing import Dict, Any, Callable, Optional
from dataclasses import dataclass
import json


@dataclass
class InflightRequest:
    """Represents an in-flight fetch operation."""
    future: asyncio.Future
    created_at: float


class CoalescingCache:
    """
    Cache with request coalescing for thundering herd prevention.
    
    Multiple requests for the same key share a single database fetch.
    This is more efficient than locking because there's no polling.
    """
    
    def __init__(self, redis_client, default_ttl: int = 300):
        self.redis = redis_client
        self.default_ttl = default_ttl
        
        # Track in-flight requests: key -> Future
        self._inflight: Dict[str, InflightRequest] = {}
        self._lock = asyncio.Lock()
    
    async def get_or_fetch(
        self,
        key: str,
        fetch_func: Callable,
        ttl: int = None
    ) -> Any:
        """
        Get from cache, or fetch with request coalescing.
        
        If another request is already fetching this key,
        we join that request instead of making a duplicate fetch.
        """
        ttl = ttl or self.default_ttl
        
        # Step 1: Try cache
        cached = await self.redis.get(key)
        if cached:
            return json.loads(cached)
        
        # Step 2: Check for in-flight request
        async with self._lock:
            if key in self._inflight:
                # Another request is fetching - join it
                inflight = self._inflight[key]
                return await inflight.future
            
            # No in-flight request - we'll be the fetcher
            future = asyncio.get_event_loop().create_future()
            self._inflight[key] = InflightRequest(
                future=future,
                created_at=asyncio.get_event_loop().time()
            )
        
        # Step 3: Fetch data (we're the designated fetcher)
        try:
            value = await fetch_func()
            
            # Cache the result
            await self.redis.setex(key, ttl, json.dumps(value))
            
            # Notify all waiting requests
            future.set_result(value)
            
            return value
            
        except Exception as e:
            # Notify waiters of failure
            future.set_exception(e)
            raise
            
        finally:
            # Clean up in-flight tracking
            async with self._lock:
                self._inflight.pop(key, None)


# More robust implementation with timeout
class RobustCoalescingCache:
    """
    Production-ready coalescing cache with timeouts and cleanup.
    """
    
    def __init__(
        self,
        redis_client,
        default_ttl: int = 300,
        fetch_timeout: float = 5.0,
        cleanup_interval: float = 60.0
    ):
        self.redis = redis_client
        self.default_ttl = default_ttl
        self.fetch_timeout = fetch_timeout
        
        self._inflight: Dict[str, InflightRequest] = {}
        self._lock = asyncio.Lock()
        
        # Start cleanup task
        asyncio.create_task(self._cleanup_loop(cleanup_interval))
    
    async def get_or_fetch(
        self,
        key: str,
        fetch_func: Callable,
        ttl: int = None
    ) -> Any:
        """Get from cache with coalescing and timeout."""
        ttl = ttl or self.default_ttl
        
        # Try cache first
        cached = await self.redis.get(key)
        if cached:
            return json.loads(cached)
        
        # Check/register in-flight
        async with self._lock:
            if key in self._inflight:
                inflight = self._inflight[key]
                try:
                    # Wait with timeout
                    return await asyncio.wait_for(
                        asyncio.shield(inflight.future),
                        timeout=self.fetch_timeout
                    )
                except asyncio.TimeoutError:
                    # Fetcher is taking too long, fetch ourselves
                    pass
            
            # Register as fetcher
            future = asyncio.get_event_loop().create_future()
            self._inflight[key] = InflightRequest(
                future=future,
                created_at=asyncio.get_event_loop().time()
            )
        
        # Fetch with timeout
        try:
            value = await asyncio.wait_for(
                fetch_func(),
                timeout=self.fetch_timeout
            )
            
            await self.redis.setex(key, ttl, json.dumps(value))
            
            if not future.done():
                future.set_result(value)
            
            return value
            
        except Exception as e:
            if not future.done():
                future.set_exception(e)
            raise
            
        finally:
            async with self._lock:
                self._inflight.pop(key, None)
    
    async def _cleanup_loop(self, interval: float):
        """Clean up stale in-flight entries."""
        while True:
            await asyncio.sleep(interval)
            
            now = asyncio.get_event_loop().time()
            stale_keys = []
            
            async with self._lock:
                for key, inflight in self._inflight.items():
                    if now - inflight.created_at > self.fetch_timeout * 2:
                        stale_keys.append(key)
                
                for key in stale_keys:
                    self._inflight.pop(key, None)

Pros:

  • No polling overhead
  • All requests get result simultaneously
  • Lower latency than locking
  • Cleaner code

Cons:

  • More complex implementation
  • Memory overhead for tracking in-flight requests
  • Need to handle timeout for stale entries

2.3 Strategy 3: Probabilistic Early Expiration

Randomly refresh cache before it expires, preventing synchronized expiration.

PROBABILISTIC EARLY EXPIRATION

Standard TTL (Problem):
  All requests use same TTL
  All entries expire at same time
  All requests miss simultaneously = thundering herd

  Timeline:
  T=0s    Cache populated, TTL=60s
  T=60s   Cache expires
  T=60s   1000 requests all miss!


Probabilistic expiration (Solution):
  Each request has small chance of refreshing early
  Entries refresh at different times
  Never synchronized expiration

  Timeline:
  T=0s    Cache populated, TTL=60s
  T=55s   Request A: random < threshold → REFRESH (1 request)
  T=60s   Cache already fresh, no mass expiration
  
  
THE FORMULA

beta = 1.0  # Tuning parameter (higher = more aggressive refresh)
ttl_remaining = expiry_time - current_time

# Probability of early refresh increases as expiry approaches
should_refresh = random() < beta * log(random()) * -1 / ttl_remaining

Or simpler version:
should_refresh = random() < (1.0 - ttl_remaining / original_ttl) * 0.1

Implementation:

# Probabilistic Early Expiration

import random
import math
import time
from typing import Any, Callable, Optional
from dataclasses import dataclass
import json


@dataclass
class CachedValue:
    """Cached value with expiration metadata."""
    value: Any
    created_at: float
    ttl: float
    
    @property
    def expires_at(self) -> float:
        return self.created_at + self.ttl
    
    @property
    def remaining_ttl(self) -> float:
        return max(0, self.expires_at - time.time())
    
    @property
    def ttl_fraction_remaining(self) -> float:
        """Fraction of TTL remaining (0.0 to 1.0)."""
        return self.remaining_ttl / self.ttl if self.ttl > 0 else 0


class ProbabilisticCache:
    """
    Cache with probabilistic early expiration.
    
    Randomly refreshes entries before they expire,
    preventing synchronized mass expiration.
    """
    
    def __init__(
        self,
        redis_client,
        default_ttl: int = 300,
        beta: float = 1.0  # Higher = more aggressive refresh
    ):
        self.redis = redis_client
        self.default_ttl = default_ttl
        self.beta = beta
    
    async def get_or_fetch(
        self,
        key: str,
        fetch_func: Callable,
        ttl: int = None
    ) -> Any:
        """
        Get from cache with probabilistic early refresh.
        """
        ttl = ttl or self.default_ttl
        
        # Get cached value with metadata
        cached_data = await self._get_with_metadata(key)
        
        if cached_data:
            # Check if we should refresh early
            if self._should_refresh_early(cached_data):
                # Refresh in background (don't block request)
                asyncio.create_task(
                    self._refresh_cache(key, fetch_func, ttl)
                )
            
            return cached_data.value
        
        # Cache miss - fetch and cache
        value = await fetch_func()
        await self._set_with_metadata(key, value, ttl)
        return value
    
    def _should_refresh_early(self, cached: CachedValue) -> bool:
        """
        Probabilistically decide if we should refresh early.
        
        Probability increases as expiration approaches.
        Uses the XFetch algorithm from Redis documentation.
        """
        remaining = cached.remaining_ttl
        
        if remaining <= 0:
            return True  # Already expired
        
        # XFetch probability calculation
        # P(refresh) = beta * log(random()) * -1 / remaining
        # This creates exponentially increasing probability near expiry
        
        try:
            random_factor = -1 * math.log(random.random())
            threshold = self.beta * random_factor
            return threshold > remaining
        except ValueError:
            return False
    
    async def _get_with_metadata(self, key: str) -> Optional[CachedValue]:
        """Get cached value with expiration metadata."""
        # Store metadata alongside value
        pipe = self.redis.pipeline()
        pipe.get(f"{key}:data")
        pipe.get(f"{key}:meta")
        
        data, meta = await pipe.execute()
        
        if not data or not meta:
            return None
        
        meta = json.loads(meta)
        return CachedValue(
            value=json.loads(data),
            created_at=meta['created_at'],
            ttl=meta['ttl']
        )
    
    async def _set_with_metadata(self, key: str, value: Any, ttl: int):
        """Set cached value with metadata."""
        now = time.time()
        
        pipe = self.redis.pipeline()
        pipe.setex(f"{key}:data", ttl, json.dumps(value))
        pipe.setex(f"{key}:meta", ttl, json.dumps({
            'created_at': now,
            'ttl': ttl
        }))
        await pipe.execute()
    
    async def _refresh_cache(self, key: str, fetch_func: Callable, ttl: int):
        """Background refresh of cache entry."""
        try:
            value = await fetch_func()
            await self._set_with_metadata(key, value, ttl)
        except Exception as e:
            # Log but don't fail - old value still valid
            logger.warning(f"Background refresh failed for {key}: {e}")


# Simpler implementation using TTL with jitter
class JitteredTTLCache:
    """
    Simpler probabilistic expiration using TTL jitter.
    
    Instead of calculating probability, just randomize the TTL.
    """
    
    def __init__(
        self,
        redis_client,
        default_ttl: int = 300,
        jitter_percent: float = 0.1  # 10% jitter
    ):
        self.redis = redis_client
        self.default_ttl = default_ttl
        self.jitter_percent = jitter_percent
    
    def _jittered_ttl(self, base_ttl: int) -> int:
        """Add random jitter to TTL."""
        jitter_range = base_ttl * self.jitter_percent
        jitter = random.uniform(-jitter_range, jitter_range)
        return max(1, int(base_ttl + jitter))
    
    async def set(self, key: str, value: Any, ttl: int = None):
        """Set with jittered TTL."""
        ttl = ttl or self.default_ttl
        actual_ttl = self._jittered_ttl(ttl)
        
        await self.redis.setex(key, actual_ttl, json.dumps(value))
    
    async def get_or_fetch(
        self,
        key: str,
        fetch_func: Callable,
        ttl: int = None
    ) -> Any:
        """Get with jittered TTL on cache population."""
        cached = await self.redis.get(key)
        if cached:
            return json.loads(cached)
        
        value = await fetch_func()
        await self.set(key, value, ttl)
        return value

Pros:

  • No coordination needed
  • Graceful degradation
  • Works well for distributed caches
  • Simple jitter version is very easy

Cons:

  • Not deterministic
  • Small chance of thundering herd still
  • Slightly reduced cache efficiency (early refreshes)

2.4 Strategy 4: Background Refresh (Proactive Refresh)

Never let cache expire — refresh before TTL.

BACKGROUND REFRESH

Standard approach (reactive):
  Request → Cache miss → Fetch → Cache → Response
  
  Problem: Cache miss = latency spike

Background refresh (proactive):
  Background job → Fetch → Update cache (before expiry)
  Request → Cache hit → Response
  
  No cache miss ever!


Timeline:

Standard:
  T=0     Cache populated, TTL=60s
  T=60s   Expires
  T=60.1s Request → MISS → Fetch (slow)

Background:
  T=0     Cache populated, TTL=60s
  T=45s   Background job refreshes (before expiry)
  T=60s   Would have expired, but already refreshed
  T=60.1s Request → HIT → Fast response

Implementation:

# Background Refresh Strategy

import asyncio
from typing import Dict, Any, Callable, List
from dataclasses import dataclass, field
from datetime import datetime, timedelta
import json


@dataclass
class RefreshConfig:
    """Configuration for a cached key that needs background refresh."""
    key: str
    fetch_func: Callable
    ttl: int
    refresh_interval: int  # Refresh this many seconds before expiry
    priority: int = 1  # Higher = more important


@dataclass
class CacheEntry:
    """Track cache entry for refresh scheduling."""
    key: str
    expires_at: datetime
    config: RefreshConfig


class BackgroundRefreshCache:
    """
    Cache with background refresh.
    
    A background task proactively refreshes entries before they expire.
    This ensures cache is always warm and no thundering herd occurs.
    """
    
    def __init__(
        self,
        redis_client,
        refresh_workers: int = 4
    ):
        self.redis = redis_client
        self.refresh_workers = refresh_workers
        
        # Registry of keys to refresh
        self._registered: Dict[str, RefreshConfig] = {}
        
        # Queue of pending refreshes
        self._refresh_queue: asyncio.PriorityQueue = asyncio.PriorityQueue()
        
        self._running = False
    
    async def start(self):
        """Start background refresh workers."""
        self._running = True
        
        # Start scheduler
        asyncio.create_task(self._scheduler_loop())
        
        # Start workers
        for i in range(self.refresh_workers):
            asyncio.create_task(self._worker_loop(i))
    
    async def stop(self):
        """Stop background refresh."""
        self._running = False
    
    def register(
        self,
        key: str,
        fetch_func: Callable,
        ttl: int = 300,
        refresh_before: int = 30  # Refresh 30s before expiry
    ):
        """Register a key for background refresh."""
        self._registered[key] = RefreshConfig(
            key=key,
            fetch_func=fetch_func,
            ttl=ttl,
            refresh_interval=ttl - refresh_before
        )
    
    async def get(self, key: str) -> Any:
        """Get from cache (should always hit if registered)."""
        cached = await self.redis.get(key)
        if cached:
            return json.loads(cached)
        
        # Miss - key might not be registered or first access
        if key in self._registered:
            config = self._registered[key]
            value = await config.fetch_func()
            await self._cache_and_schedule(key, value, config)
            return value
        
        return None
    
    async def _cache_and_schedule(
        self,
        key: str,
        value: Any,
        config: RefreshConfig
    ):
        """Cache value and schedule next refresh."""
        await self.redis.setex(key, config.ttl, json.dumps(value))
        
        # Schedule refresh before expiry
        refresh_at = datetime.utcnow() + timedelta(seconds=config.refresh_interval)
        
        await self._refresh_queue.put((
            refresh_at.timestamp(),  # Priority (earlier = higher priority)
            CacheEntry(key=key, expires_at=refresh_at, config=config)
        ))
    
    async def _scheduler_loop(self):
        """Schedule refreshes at the right time."""
        while self._running:
            try:
                # Peek at next item
                if self._refresh_queue.empty():
                    await asyncio.sleep(1)
                    continue
                
                priority, entry = await self._refresh_queue.get()
                
                # Wait until refresh time
                now = datetime.utcnow().timestamp()
                if priority > now:
                    # Put back and wait
                    await self._refresh_queue.put((priority, entry))
                    await asyncio.sleep(min(1, priority - now))
                    continue
                
                # Time to refresh - requeue for worker
                await self._do_refresh(entry)
                
            except Exception as e:
                logger.error(f"Scheduler error: {e}")
                await asyncio.sleep(1)
    
    async def _do_refresh(self, entry: CacheEntry):
        """Execute a cache refresh."""
        try:
            config = entry.config
            value = await config.fetch_func()
            await self._cache_and_schedule(entry.key, value, config)
            
            logger.debug(f"Refreshed cache: {entry.key}")
            
        except Exception as e:
            logger.error(f"Failed to refresh {entry.key}: {e}")
            # Reschedule with backoff
            await asyncio.sleep(5)
            await self._refresh_queue.put((
                datetime.utcnow().timestamp() + 5,
                entry
            ))
    
    async def _worker_loop(self, worker_id: int):
        """Worker that processes refresh queue."""
        # In this implementation, scheduler does the work
        # Workers could be used for parallel refreshes
        pass


# Usage
cache = BackgroundRefreshCache(redis_client)

# Register keys for background refresh
cache.register(
    key="homepage:featured",
    fetch_func=lambda: db.fetch("SELECT * FROM featured_products"),
    ttl=60,
    refresh_before=10  # Refresh 10s before expiry
)

cache.register(
    key="deals:page",
    fetch_func=lambda: db.fetch("SELECT * FROM active_deals"),
    ttl=300,
    refresh_before=30
)

await cache.start()

# Now gets always hit (assuming registration happened before first request)
featured = await cache.get("homepage:featured")

Pros:

  • No cache misses for registered keys
  • Consistent low latency
  • Predictable database load
  • No thundering herd possible

Cons:

  • Complex implementation
  • Must know keys in advance
  • Refreshes even if key not accessed
  • Not suitable for millions of keys

2.5 Strategy 5: Stale-While-Revalidate

Serve stale data immediately while refreshing in background.

STALE-WHILE-REVALIDATE

Standard cache miss:
  Request → Miss → Wait for fetch → Response (slow)

Stale-while-revalidate:
  Request → Stale hit → Response immediately (fast!)
            └──▶ Background refresh

User gets fast response (stale but recent)
Next user gets fresh data


Timeline:
  T=0     Cache populated
  T=60s   Cache "expires" but data kept as stale
  T=61s   Request comes in:
          - Return stale data immediately
          - Trigger background refresh
  T=62s   Background refresh completes
  T=63s   Next request gets fresh data

Implementation:

# Stale-While-Revalidate Pattern

import asyncio
import time
from typing import Any, Callable, Optional
from dataclasses import dataclass
import json


@dataclass
class StaleableValue:
    """Value that can be served stale."""
    data: Any
    cached_at: float
    fresh_ttl: float    # Serve fresh until this
    stale_ttl: float    # Allow stale until this
    
    @property
    def is_fresh(self) -> bool:
        return time.time() < self.cached_at + self.fresh_ttl
    
    @property
    def is_stale_but_usable(self) -> bool:
        now = time.time()
        return (self.cached_at + self.fresh_ttl <= now < 
                self.cached_at + self.stale_ttl)
    
    @property
    def is_expired(self) -> bool:
        return time.time() >= self.cached_at + self.stale_ttl


class StaleWhileRevalidateCache:
    """
    Cache that serves stale content while revalidating.
    
    Provides fast responses even on cache "expiry" by serving
    slightly stale data while refreshing in background.
    """
    
    def __init__(
        self,
        redis_client,
        fresh_ttl: int = 60,     # Serve fresh for 60s
        stale_ttl: int = 300     # Allow stale for 5min total
    ):
        self.redis = redis_client
        self.fresh_ttl = fresh_ttl
        self.stale_ttl = stale_ttl
        
        # Track in-flight revalidations
        self._revalidating: set = set()
    
    async def get_or_fetch(
        self,
        key: str,
        fetch_func: Callable
    ) -> Any:
        """
        Get from cache with stale-while-revalidate.
        
        Returns:
            - Fresh data if available
            - Stale data (with background refresh) if within stale window
            - Fresh fetch if expired
        """
        cached = await self._get_with_metadata(key)
        
        if cached:
            if cached.is_fresh:
                # Fresh - return immediately
                return cached.data
            
            elif cached.is_stale_but_usable:
                # Stale but usable - return and refresh
                self._trigger_revalidation(key, fetch_func)
                return cached.data
        
        # Expired or not cached - must fetch
        value = await fetch_func()
        await self._set_with_metadata(key, value)
        return value
    
    def _trigger_revalidation(self, key: str, fetch_func: Callable):
        """Trigger background revalidation if not already running."""
        if key in self._revalidating:
            return  # Already revalidating
        
        self._revalidating.add(key)
        asyncio.create_task(self._revalidate(key, fetch_func))
    
    async def _revalidate(self, key: str, fetch_func: Callable):
        """Background revalidation."""
        try:
            value = await fetch_func()
            await self._set_with_metadata(key, value)
            logger.debug(f"Revalidated: {key}")
        except Exception as e:
            logger.warning(f"Revalidation failed for {key}: {e}")
        finally:
            self._revalidating.discard(key)
    
    async def _get_with_metadata(self, key: str) -> Optional[StaleableValue]:
        """Get cached value with staleness metadata."""
        raw = await self.redis.get(key)
        if not raw:
            return None
        
        data = json.loads(raw)
        return StaleableValue(
            data=data['value'],
            cached_at=data['cached_at'],
            fresh_ttl=data['fresh_ttl'],
            stale_ttl=data['stale_ttl']
        )
    
    async def _set_with_metadata(self, key: str, value: Any):
        """Set value with staleness metadata."""
        data = {
            'value': value,
            'cached_at': time.time(),
            'fresh_ttl': self.fresh_ttl,
            'stale_ttl': self.stale_ttl
        }
        
        # Set TTL to stale_ttl (maximum lifetime)
        await self.redis.setex(
            key,
            self.stale_ttl,
            json.dumps(data)
        )


# Usage
cache = StaleWhileRevalidateCache(
    redis_client,
    fresh_ttl=60,    # Fresh for 1 minute
    stale_ttl=300    # Stale-but-usable for 5 minutes total
)

async def get_product(product_id: str) -> dict:
    return await cache.get_or_fetch(
        key=f"product:{product_id}",
        fetch_func=lambda: db.fetch_product(product_id)
    )

Pros:

  • Fast responses even when "expired"
  • Smooth user experience
  • Natural load distribution
  • Used by CDNs (HTTP Cache-Control: stale-while-revalidate)

Cons:

  • Users may see slightly stale data
  • Complexity in managing dual TTLs
  • Need to track revalidation state

Chapter 3: Strategy Comparison

3.1 Comparison Matrix

Strategy Latency Impact Complexity Best For
Locking High (waiters block) Low Moderate traffic
Coalescing Low (concurrent wait) Medium High traffic
Probabilistic None (gradual) Low Distributed cache
Background None (proactive) High Known hot keys
Stale-While-Revalidate None (serve stale) Medium UX-critical

3.2 Decision Framework

CHOOSING A THUNDERING HERD STRATEGY

Question 1: Do you know your hot keys in advance?
  YES → Consider Background Refresh
  NO  → Continue to Question 2

Question 2: Can users tolerate briefly stale data?
  YES → Stale-While-Revalidate
  NO  → Continue to Question 3

Question 3: How high is your traffic?
  Very high (>10K/sec per key) → Request Coalescing
  Moderate (<10K/sec per key)  → Locking
  
Question 4: Is this a distributed multi-node cache?
  YES → Add Probabilistic Expiration on top
  NO  → Single-node strategies sufficient


COMBINED APPROACH (Production)

For homepage (known hot key):
  Background Refresh + Stale-While-Revalidate

For product pages (many keys):
  Request Coalescing + Probabilistic TTL

For user-specific data (many keys, less hot):
  Locking + Short TTL

3.3 Combining Strategies

# Combined Thundering Herd Protection

class ProductionCache:
    """
    Production cache combining multiple strategies.
    
    - Request coalescing for all keys
    - Probabilistic expiration for distributed cache
    - Stale-while-revalidate for user experience
    - Background refresh for known hot keys
    """
    
    def __init__(self, redis_client):
        self.redis = redis_client
        
        # In-flight tracking for coalescing
        self._inflight: Dict[str, asyncio.Future] = {}
        self._lock = asyncio.Lock()
        
        # Hot keys for background refresh
        self._hot_keys: Dict[str, RefreshConfig] = {}
        
        # Configuration
        self.fresh_ttl = 60
        self.stale_ttl = 300
        self.jitter_percent = 0.1
    
    async def get_or_fetch(
        self,
        key: str,
        fetch_func: Callable,
        allow_stale: bool = True
    ) -> Any:
        """
        Get with combined protection strategies.
        """
        # Step 1: Try cache with stale-while-revalidate
        cached = await self._get_with_staleness(key)
        
        if cached:
            if cached.is_fresh:
                return cached.data
            
            if cached.is_stale_but_usable and allow_stale:
                # Serve stale, refresh in background
                asyncio.create_task(
                    self._coalesced_refresh(key, fetch_func)
                )
                return cached.data
        
        # Step 2: Need fresh data - use coalescing
        return await self._coalesced_fetch(key, fetch_func)
    
    async def _coalesced_fetch(
        self,
        key: str,
        fetch_func: Callable
    ) -> Any:
        """Fetch with request coalescing."""
        async with self._lock:
            if key in self._inflight:
                return await self._inflight[key]
            
            future = asyncio.get_event_loop().create_future()
            self._inflight[key] = future
        
        try:
            value = await fetch_func()
            await self._set_with_jitter(key, value)
            future.set_result(value)
            return value
        except Exception as e:
            future.set_exception(e)
            raise
        finally:
            async with self._lock:
                self._inflight.pop(key, None)
    
    async def _set_with_jitter(self, key: str, value: Any):
        """Set with probabilistic TTL jitter."""
        jitter = random.uniform(
            -self.fresh_ttl * self.jitter_percent,
            self.fresh_ttl * self.jitter_percent
        )
        actual_ttl = int(self.fresh_ttl + jitter)
        
        data = {
            'value': value,
            'cached_at': time.time(),
            'fresh_ttl': actual_ttl,
            'stale_ttl': self.stale_ttl
        }
        
        await self.redis.setex(key, self.stale_ttl, json.dumps(data))

Part II: Implementation

Chapter 4: Production-Ready Implementation

4.1 Complete Thundering Herd Protection System

# Production Thundering Herd Protection

import asyncio
import json
import logging
import random
import time
from dataclasses import dataclass, field
from typing import Dict, Any, Callable, Optional, Set
from enum import Enum

logger = logging.getLogger(__name__)


# =============================================================================
# Configuration
# =============================================================================

@dataclass
class ThunderingHerdConfig:
    """Configuration for thundering herd protection."""
    
    # Coalescing
    enable_coalescing: bool = True
    coalesce_timeout: float = 5.0
    
    # Probabilistic refresh
    enable_probabilistic: bool = True
    jitter_percent: float = 0.1
    early_refresh_beta: float = 1.0
    
    # Stale-while-revalidate
    enable_stale: bool = True
    fresh_ttl: int = 60
    stale_ttl: int = 300
    
    # Locking fallback
    enable_locking: bool = True
    lock_ttl: int = 10
    
    # Metrics
    enable_metrics: bool = True


# =============================================================================
# Cache Value Wrapper
# =============================================================================

@dataclass
class CacheValue:
    """Wrapped cache value with metadata."""
    data: Any
    cached_at: float
    fresh_until: float
    stale_until: float
    
    @property
    def is_fresh(self) -> bool:
        return time.time() < self.fresh_until
    
    @property
    def is_stale_usable(self) -> bool:
        now = time.time()
        return self.fresh_until <= now < self.stale_until
    
    @property
    def is_expired(self) -> bool:
        return time.time() >= self.stale_until
    
    @property
    def remaining_fresh_ttl(self) -> float:
        return max(0, self.fresh_until - time.time())
    
    def to_json(self) -> str:
        return json.dumps({
            'data': self.data,
            'cached_at': self.cached_at,
            'fresh_until': self.fresh_until,
            'stale_until': self.stale_until
        })
    
    @classmethod
    def from_json(cls, raw: str) -> 'CacheValue':
        d = json.loads(raw)
        return cls(
            data=d['data'],
            cached_at=d['cached_at'],
            fresh_until=d['fresh_until'],
            stale_until=d['stale_until']
        )


# =============================================================================
# Metrics
# =============================================================================

@dataclass
class ThunderingHerdMetrics:
    """Metrics for thundering herd protection."""
    cache_hits: int = 0
    cache_misses: int = 0
    stale_serves: int = 0
    coalesced_requests: int = 0
    locks_acquired: int = 0
    locks_waited: int = 0
    early_refreshes: int = 0
    background_refreshes: int = 0
    errors: int = 0
    
    def to_dict(self) -> dict:
        return {
            'hits': self.cache_hits,
            'misses': self.cache_misses,
            'stale_serves': self.stale_serves,
            'coalesced': self.coalesced_requests,
            'locks_acquired': self.locks_acquired,
            'locks_waited': self.locks_waited,
            'early_refreshes': self.early_refreshes,
            'errors': self.errors,
            'hit_ratio': self.cache_hits / max(1, self.cache_hits + self.cache_misses)
        }


# =============================================================================
# Main Implementation
# =============================================================================

class ThunderingHerdCache:
    """
    Production-ready cache with comprehensive thundering herd protection.
    
    Combines:
    - Request coalescing (primary)
    - Stale-while-revalidate (user experience)
    - Probabilistic early refresh (distributed cache)
    - Locking (fallback)
    """
    
    def __init__(
        self,
        redis_client,
        config: ThunderingHerdConfig = None
    ):
        self.redis = redis_client
        self.config = config or ThunderingHerdConfig()
        self.metrics = ThunderingHerdMetrics()
        
        # Coalescing state
        self._inflight: Dict[str, asyncio.Future] = {}
        self._inflight_lock = asyncio.Lock()
        
        # Revalidation state
        self._revalidating: Set[str] = set()
    
    async def get_or_fetch(
        self,
        key: str,
        fetch_func: Callable[[], Any],
        fresh_ttl: int = None,
        stale_ttl: int = None,
        allow_stale: bool = None
    ) -> Any:
        """
        Get from cache with thundering herd protection.
        
        Args:
            key: Cache key
            fetch_func: Async function to fetch data on miss
            fresh_ttl: Override fresh TTL
            stale_ttl: Override stale TTL
            allow_stale: Allow serving stale data
        
        Returns:
            Cached or fetched value
        """
        fresh_ttl = fresh_ttl or self.config.fresh_ttl
        stale_ttl = stale_ttl or self.config.stale_ttl
        allow_stale = allow_stale if allow_stale is not None else self.config.enable_stale
        
        # Step 1: Try cache
        cached = await self._get_cached(key)
        
        if cached:
            # Check freshness
            if cached.is_fresh:
                self.metrics.cache_hits += 1
                
                # Probabilistic early refresh
                if self.config.enable_probabilistic:
                    if self._should_refresh_early(cached):
                        self.metrics.early_refreshes += 1
                        asyncio.create_task(
                            self._background_refresh(key, fetch_func, fresh_ttl, stale_ttl)
                        )
                
                return cached.data
            
            # Stale but usable
            if cached.is_stale_usable and allow_stale:
                self.metrics.stale_serves += 1
                
                # Trigger background refresh
                asyncio.create_task(
                    self._background_refresh(key, fetch_func, fresh_ttl, stale_ttl)
                )
                
                return cached.data
        
        # Step 2: Cache miss - need fresh data
        self.metrics.cache_misses += 1
        
        if self.config.enable_coalescing:
            return await self._coalesced_fetch(key, fetch_func, fresh_ttl, stale_ttl)
        elif self.config.enable_locking:
            return await self._locked_fetch(key, fetch_func, fresh_ttl, stale_ttl)
        else:
            return await self._simple_fetch(key, fetch_func, fresh_ttl, stale_ttl)
    
    async def _get_cached(self, key: str) -> Optional[CacheValue]:
        """Get cached value with metadata."""
        try:
            raw = await self.redis.get(key)
            if raw:
                return CacheValue.from_json(raw)
        except Exception as e:
            logger.warning(f"Cache get error for {key}: {e}")
            self.metrics.errors += 1
        return None
    
    async def _set_cached(
        self,
        key: str,
        value: Any,
        fresh_ttl: int,
        stale_ttl: int
    ):
        """Set cached value with metadata."""
        # Add jitter if enabled
        if self.config.enable_probabilistic:
            jitter = fresh_ttl * self.config.jitter_percent
            fresh_ttl = int(fresh_ttl + random.uniform(-jitter, jitter))
        
        now = time.time()
        cached = CacheValue(
            data=value,
            cached_at=now,
            fresh_until=now + fresh_ttl,
            stale_until=now + stale_ttl
        )
        
        try:
            await self.redis.setex(key, stale_ttl, cached.to_json())
        except Exception as e:
            logger.warning(f"Cache set error for {key}: {e}")
            self.metrics.errors += 1
    
    def _should_refresh_early(self, cached: CacheValue) -> bool:
        """Probabilistically decide on early refresh."""
        remaining = cached.remaining_fresh_ttl
        if remaining <= 0:
            return True
        
        # XFetch algorithm
        try:
            beta = self.config.early_refresh_beta
            threshold = beta * (-1 * math.log(random.random()))
            return threshold > remaining
        except ValueError:
            return False
    
    async def _coalesced_fetch(
        self,
        key: str,
        fetch_func: Callable,
        fresh_ttl: int,
        stale_ttl: int
    ) -> Any:
        """Fetch with request coalescing."""
        async with self._inflight_lock:
            if key in self._inflight:
                # Join existing fetch
                self.metrics.coalesced_requests += 1
                try:
                    return await asyncio.wait_for(
                        asyncio.shield(self._inflight[key]),
                        timeout=self.config.coalesce_timeout
                    )
                except asyncio.TimeoutError:
                    # Timeout - fetch ourselves
                    pass
            
            # We're the fetcher
            future = asyncio.get_event_loop().create_future()
            self._inflight[key] = future
        
        try:
            value = await asyncio.wait_for(
                fetch_func(),
                timeout=self.config.coalesce_timeout
            )
            
            await self._set_cached(key, value, fresh_ttl, stale_ttl)
            
            if not future.done():
                future.set_result(value)
            
            return value
            
        except Exception as e:
            if not future.done():
                future.set_exception(e)
            raise
            
        finally:
            async with self._inflight_lock:
                self._inflight.pop(key, None)
    
    async def _locked_fetch(
        self,
        key: str,
        fetch_func: Callable,
        fresh_ttl: int,
        stale_ttl: int
    ) -> Any:
        """Fetch with lock protection."""
        lock_key = f"lock:{key}"
        
        # Try to acquire lock
        acquired = await self.redis.set(
            lock_key, "1",
            nx=True,
            ex=self.config.lock_ttl
        )
        
        if acquired:
            self.metrics.locks_acquired += 1
            try:
                value = await fetch_func()
                await self._set_cached(key, value, fresh_ttl, stale_ttl)
                return value
            finally:
                await self.redis.delete(lock_key)
        else:
            # Wait for lock holder
            self.metrics.locks_waited += 1
            return await self._wait_for_cache(key, fetch_func, fresh_ttl, stale_ttl)
    
    async def _wait_for_cache(
        self,
        key: str,
        fetch_func: Callable,
        fresh_ttl: int,
        stale_ttl: int,
        max_wait: float = 5.0
    ) -> Any:
        """Wait for another request to populate cache."""
        waited = 0
        poll_interval = 0.05
        
        while waited < max_wait:
            cached = await self._get_cached(key)
            if cached and not cached.is_expired:
                return cached.data
            
            await asyncio.sleep(poll_interval)
            waited += poll_interval
        
        # Timeout - fetch ourselves
        return await self._simple_fetch(key, fetch_func, fresh_ttl, stale_ttl)
    
    async def _simple_fetch(
        self,
        key: str,
        fetch_func: Callable,
        fresh_ttl: int,
        stale_ttl: int
    ) -> Any:
        """Simple fetch without protection (fallback)."""
        value = await fetch_func()
        await self._set_cached(key, value, fresh_ttl, stale_ttl)
        return value
    
    async def _background_refresh(
        self,
        key: str,
        fetch_func: Callable,
        fresh_ttl: int,
        stale_ttl: int
    ):
        """Background cache refresh."""
        if key in self._revalidating:
            return  # Already refreshing
        
        self._revalidating.add(key)
        self.metrics.background_refreshes += 1
        
        try:
            value = await fetch_func()
            await self._set_cached(key, value, fresh_ttl, stale_ttl)
            logger.debug(f"Background refresh completed: {key}")
        except Exception as e:
            logger.warning(f"Background refresh failed for {key}: {e}")
        finally:
            self._revalidating.discard(key)
    
    def get_metrics(self) -> dict:
        """Get thundering herd protection metrics."""
        return self.metrics.to_dict()


# =============================================================================
# Helper Functions
# =============================================================================

import math  # Need for probabilistic calculation


# =============================================================================
# Usage Example
# =============================================================================

async def example_usage():
    """Example usage of ThunderingHerdCache."""
    
    # Create cache with default config
    cache = ThunderingHerdCache(
        redis_client=redis,
        config=ThunderingHerdConfig(
            fresh_ttl=60,
            stale_ttl=300,
            enable_coalescing=True,
            enable_stale=True,
            enable_probabilistic=True
        )
    )
    
    # Get product (all protections active)
    product = await cache.get_or_fetch(
        key=f"product:{product_id}",
        fetch_func=lambda: db.fetch_product(product_id)
    )
    
    # Get critical data (no stale)
    inventory = await cache.get_or_fetch(
        key=f"inventory:{product_id}",
        fetch_func=lambda: db.fetch_inventory(product_id),
        allow_stale=False,
        fresh_ttl=30
    )
    
    # Check metrics
    print(cache.get_metrics())

Chapter 5: Edge Cases and Error Handling

5.1 Edge Case 1: Lock Holder Crashes

SCENARIO: Request acquires lock, then crashes

Timeline:
  T0: Request A acquires lock
  T1: Request B waits for lock
  T2: Request A crashes (lock not released!)
  T3: Request B waits... waits... waits...
  T4: Lock TTL expires (10 seconds)
  T5: Request C acquires lock

Problem: Request B waited 10 seconds for nothing

SOLUTION: Lock with TTL + Wait timeout

lock_ttl = 10 seconds (safety)
wait_timeout = 5 seconds (give up sooner)

If wait times out, fetch ourselves.
Duplicate fetches possible but bounded.

5.2 Edge Case 2: Coalesced Request Timeout

SCENARIO: Fetch takes too long, waiters time out

Timeline:
  T0: Request A starts fetch (slow database)
  T0: Requests B, C, D join (coalesced)
  T5: Requests B, C, D timeout (5 second limit)
  T5: B, C, D each start their own fetch!
  T7: Request A completes, caches result
  T8: B, C, D complete, each caches result

Now we have 4 database queries instead of 1!

SOLUTION: Shield the future + stale fallback

# When timing out, check for stale data first
try:
    return await asyncio.wait_for(future, timeout=5.0)
except asyncio.TimeoutError:
    # Check if we have stale data
    cached = await get_stale(key)
    if cached:
        return cached.data  # Serve stale, don't query
    
    # No stale data - must fetch
    return await fetch_func()

5.3 Edge Case 3: Background Refresh Fails Repeatedly

SCENARIO: Database is having issues, refreshes fail

Timeline:
  T0: Cache populated
  T60: Cache goes stale
  T61: Background refresh attempted - FAILS
  T62: Another request, serve stale, try refresh - FAILS
  T63: More requests, more failed refreshes
  T300: Stale TTL expires, no more stale data!
  T301: All requests hit database - thundering herd!

SOLUTION: Extend stale TTL on refresh failure

async def background_refresh(key, fetch_func):
    try:
        value = await fetch_func()
        await set_cached(key, value, fresh_ttl, stale_ttl)
    except Exception as e:
        # Refresh failed - extend stale TTL
        cached = await get_cached(key)
        if cached:
            # Extend stale window by 60 seconds
            cached.stale_until = time.time() + 60
            await redis.setex(key, 60, cached.to_json())
        
        logger.warning(f"Refresh failed, extended stale: {e}")

5.4 Edge Case 4: Hot Key Moves to Different Server

SCENARIO: Consistent hashing rebalance moves hot key

Timeline:
  T0: Hot key "homepage" on Server A (warm cache)
  T1: Server C added to cluster
  T2: Consistent hashing moves "homepage" to Server C
  T3: Server C cache is COLD for "homepage"
  T4: All homepage requests hit database!

SOLUTION: Cache warming on topology change

# When cluster topology changes:
async def on_cluster_rebalance(moved_keys: List[str]):
    for key in moved_keys:
        if is_hot_key(key):
            # Pre-warm on new server
            value = await fetch_from_old_server_or_db(key)
            await cache.set(key, value)

5.5 Error Handling Matrix

Scenario Impact Handling
Lock holder crash Waiters delayed Lock TTL + wait timeout
Slow fetch Coalesced timeout Shield future + stale fallback
Refresh fails Stale data used Extend stale TTL
Redis down No protection Circuit breaker, rate limit DB
Topology change Cold cache Pre-warm hot keys
Memory pressure Evictions Monitor, scale, prioritize hot keys

Part III: Real-World Application

Chapter 7: How Big Tech Does It

7.1 Case Study: Instagram — Thundering Herd for Viral Posts

INSTAGRAM VIRAL POST HANDLING

Challenge:
  Celebrity posts can get 10M+ views in first minute
  Initial views all cache misses (not cached yet)
  Potential for massive thundering herd

Solution: LEASE-BASED COALESCING

How it works:
  1. First request gets a "lease" (token)
  2. Lease holder fetches from database
  3. Other requests see lease exists, wait
  4. Lease holder populates cache
  5. Waiters read from cache

Special handling for viral content:
  - Detect rapid request rate
  - Extend lease duration for popular content
  - Pre-warm cache for scheduled posts

Implementation sketch:
  
  async def get_post(post_id):
      # Check cache
      cached = await cache.get(f"post:{post_id}")
      if cached:
          return cached
      
      # Try to get lease
      lease = await cache.get_lease(f"post:{post_id}", ttl=2)
      
      if lease.is_owner:
          # We got the lease - fetch
          post = await db.get_post(post_id)
          await cache.set_with_lease(f"post:{post_id}", post, lease)
          return post
      else:
          # Wait for lease owner
          return await cache.wait_for_value(f"post:{post_id}", timeout=2)

7.2 Case Study: Netflix — Homepage Thundering Herd

NETFLIX HOMEPAGE CACHING

Challenge:
  Homepage is highly personalized
  But some elements are shared (trending, new releases)
  Peak traffic at 7-9 PM every day
  Can't afford cache miss storms

Solution: MULTI-LAYER WITH DIFFERENT STRATEGIES

Shared content (trending, categories):
  - Background refresh every 5 minutes
  - Never expires for user requests
  - Stale-while-revalidate if refresh fails

Personalized content (recommendations):
  - Request coalescing per user
  - Short TTL (1 minute) with probabilistic refresh
  - Fallback to non-personalized if slow

Architecture:

  User Request
       │
       ▼
  ┌─────────────────┐
  │  Personalized   │──▶ Short TTL, Coalescing
  │  Recommendations│
  └────────┬────────┘
           │
           ▼
  ┌─────────────────┐
  │  Shared Content │──▶ Background refresh, never miss
  │  (Trending)     │
  └────────┬────────┘
           │
           ▼
  ┌─────────────────┐
  │   Page Assembly │
  └─────────────────┘

7.3 Case Study: Amazon — Deal of the Day

AMAZON DEAL OF THE DAY

Challenge:
  Deal changes at midnight
  Millions of users refresh at 00:00:00
  Previous deal cache invalidated
  Massive thundering herd potential

Solution: PRE-WARMING + ATOMIC SWITCH

How it works:
  
  11:59:00 PM - 1 minute before deal change:
    1. Fetch new deal data
    2. Cache under different key: "deal:2024-01-16"
    3. Old deal still served: "deal:2024-01-15"
  
  12:00:00 AM - Deal change:
    1. Atomic key switch: "deal:current" → "deal:2024-01-16"
    2. No cache miss! New deal already cached
  
  12:00:01 AM:
    1. All requests hit warm cache
    2. Zero database load from deal change

Implementation:

  # Before midnight
  async def prepare_new_deal(date: str, deal_data: dict):
      await cache.set(f"deal:{date}", deal_data, ttl=86400)
  
  # At midnight
  async def switch_deal(new_date: str):
      await cache.set("deal:current_date", new_date)
  
  # Request handler
  async def get_current_deal():
      date = await cache.get("deal:current_date")
      return await cache.get(f"deal:{date}")  # Always hits!

7.4 Summary: Industry Patterns

Company Challenge Strategy
Instagram Viral posts Lease-based coalescing
Netflix Homepage peak Background refresh + coalescing
Amazon Deal change Pre-warming + atomic switch
Twitter Trending topics Write-through + replication
Uber Driver locations Very short TTL, no herd protection needed

Chapter 8: Common Mistakes to Avoid

8.1 Mistake 1: Not Protecting Hot Keys

❌ WRONG: Same strategy for all keys

cache_ttl = 300  # 5 minutes for everything

# Homepage (10K req/sec) - same as
# User settings (1 req/min)

Result: Homepage cache expires → 10K DB queries


✅ CORRECT: Identify and protect hot keys

HOT_KEYS = ["homepage", "featured_products", "trending"]

async def get(key: str):
    if key in HOT_KEYS:
        # Extra protection for hot keys
        return await cache.get_or_fetch(
            key,
            fetch_func,
            strategy="background_refresh"  # Never expires
        )
    else:
        # Standard protection for normal keys
        return await cache.get_or_fetch(
            key,
            fetch_func,
            strategy="coalescing"
        )

8.2 Mistake 2: Unbounded Wait in Coalescing

❌ WRONG: Wait forever for coalesced result

async def coalesced_fetch(key, fetch_func):
    if key in inflight:
        return await inflight[key]  # Wait forever!
    
    # If fetcher hangs, all waiters hang
    # System becomes unresponsive


✅ CORRECT: Timeout and fallback

async def coalesced_fetch(key, fetch_func):
    if key in inflight:
        try:
            return await asyncio.wait_for(
                inflight[key],
                timeout=5.0  # Don't wait forever
            )
        except asyncio.TimeoutError:
            # Fetcher is slow - try stale or fetch ourselves
            stale = await get_stale(key)
            if stale:
                return stale
            # Fall through to fetch
    
    # Normal fetch path
    ...

8.3 Mistake 3: Synchronized TTLs

❌ WRONG: All entries expire at same time

async def populate_cache_on_startup():
    products = await db.fetch_all_products()
    
    for product in products:
        await cache.set(
            f"product:{product.id}",
            product,
            ttl=300  # All expire in exactly 300 seconds!
        )

# 5 minutes later: ALL products expire simultaneously
# Massive thundering herd!


✅ CORRECT: Jittered TTLs

async def populate_cache_on_startup():
    products = await db.fetch_all_products()
    
    for product in products:
        jitter = random.randint(-30, 30)  # ±30 seconds
        ttl = 300 + jitter
        
        await cache.set(
            f"product:{product.id}",
            product,
            ttl=ttl  # Spread expiration over 60 seconds
        )

8.4 Mistake 4: Ignoring Cold Start

❌ WRONG: Deploy and hope for the best

# New deployment, cache empty
# Full production traffic hits empty cache
# Database overwhelmed


✅ CORRECT: Cache warming before traffic

async def deploy_with_warming():
    # Step 1: Deploy new version (not receiving traffic yet)
    await deploy_new_version()
    
    # Step 2: Warm cache with hot keys
    hot_keys = ["homepage", "featured", "categories"]
    for key in hot_keys:
        value = await fetch_from_db(key)
        await cache.set(key, value)
    
    # Step 3: Warm cache with top N products
    top_products = await db.fetch_top_products(limit=1000)
    for product in top_products:
        await cache.set(f"product:{product.id}", product)
    
    # Step 4: Now safe to receive traffic
    await enable_traffic()

8.5 Mistake Checklist

Before deploying, verify:

  • Hot keys identified — Which keys get >1K req/sec?
  • Protection strategy chosen — Coalescing, locking, or background?
  • TTLs have jitter — Entries don't expire simultaneously
  • Timeouts configured — No unbounded waits
  • Cold start handled — Cache warming on deploy
  • Stale fallback exists — Can serve stale during issues
  • Metrics in place — Track coalesced requests, wait times

Part IV: Interview Preparation

Chapter 9: Interview Tips and Phrases

9.1 When to Discuss Thundering Herd

Bring up thundering herd when:

  • Designing a cache layer for high-traffic system
  • Interviewer asks "what happens when cache is empty?"
  • System has known hot spots (homepage, viral content)
  • Discussing cold start or deployment strategies

9.2 Key Phrases to Use

INTRODUCING THE PROBLEM:

"One risk with caching is the thundering herd problem. When 
a popular cache entry expires, thousands of requests might 
simultaneously hit the database. I'd want to protect against 
that."


EXPLAINING REQUEST COALESCING:

"My primary defense would be request coalescing. If multiple 
requests come in for the same missing key, only one actually 
queries the database. The others wait for that result. This 
turns 1000 database queries into 1."


EXPLAINING STALE-WHILE-REVALIDATE:

"For user experience, I'd add stale-while-revalidate. When 
cache expires, we serve the stale data immediately—which is 
usually fine for a few seconds—while refreshing in the 
background. Users get fast responses, and we avoid the 
thundering herd."


DISCUSSING HOT KEYS:

"For known hot keys like the homepage, I'd use background 
refresh. A background job refreshes the cache every 30 
seconds, well before the TTL expires. Those keys never 
actually expire from a user's perspective."


HANDLING FOLLOW-UP QUESTIONS:

"If the background refresh fails, we extend the stale 
window rather than letting the entry expire. This prevents 
a thundering herd when the database is already struggling."

9.3 Questions to Ask Interviewer

  • "What are the hottest cache keys in this system?"
  • "What's the peak request rate we need to handle?"
  • "Is serving slightly stale data acceptable?"
  • "How do we handle deployments and cold starts?"

9.4 Common Follow-up Questions

Question Good Answer
"What if the database is slow?" "Stale-while-revalidate helps here. Users get cached data immediately. The slow database query happens in the background. We'd also have circuit breakers to prevent overloading a struggling database."
"How do you handle cold start?" "Cache warming before enabling traffic. We pre-populate the cache with hot keys like homepage, top products, and active deals. Then gradually increase traffic to the new instance."
"What if Redis goes down?" "Circuit breaker opens, requests go to database. We'd rate limit to protect the database—maybe serve a degraded experience rather than overwhelm it. And alert ops immediately."
"How do you know which keys are hot?" "Metrics on cache access patterns. Track hit counts per key prefix. Also domain knowledge—homepage is always hot, deal pages are hot during sales. We can pre-identify these."

Chapter 10: Practice Problems

Problem 1: Black Friday Homepage

Setup: Your e-commerce homepage gets 50,000 requests/second during Black Friday. The homepage cache has a 60-second TTL. Design thundering herd protection.

Requirements:

  • Homepage data takes 500ms to compute (multiple DB queries)
  • Black Friday deals update every hour
  • Database can handle 100 queries/second
  • Users can tolerate 10-second stale data

Questions:

  1. What happens at TTL expiration without protection?
  2. How would you prevent the thundering herd?
  3. How do you handle the hourly deal update?
  • 50,000 simultaneous requests vs 100 queries/second capacity
  • Consider background refresh for known hot key
  • Deal updates are scheduled — can you pre-warm?

Without protection:

  • TTL expires, 50,000 requests miss
  • All 50,000 try to query database
  • Database capacity: 100/sec
  • Time to clear backlog: 500 seconds
  • Result: Site effectively down

Protection strategy:

# 1. Background refresh (primary)
#    Homepage is known hot key - never let it expire

background_refresh_config = {
    "homepage": {
        "fetch_func": compute_homepage,
        "ttl": 60,
        "refresh_before": 10  # Refresh 10s before expiry
    }
}

# Background job runs every 50 seconds
# Homepage always warm, never expires from user perspective


# 2. Stale-while-revalidate (backup)
#    If background refresh fails, serve stale

cache_config = {
    "fresh_ttl": 60,     # Fresh for 60s
    "stale_ttl": 300     # Usable stale for 5min
}

# Even if refresh fails, users get data for 5 minutes


# 3. Hourly deal update
#    Pre-warm before the hour

async def update_deals():
    # At X:59:00 (1 min before deal change)
    new_homepage = await compute_homepage_with_new_deals()
    await cache.set("homepage:next_hour", new_homepage)
    
    # At X:00:00 (deal change time)
    await cache.rename("homepage:next_hour", "homepage")
    
    # No cache miss! New homepage already cached


# 4. Request coalescing (additional safety)
#    If somehow cache misses, coalesce requests

# Even if 1000 requests miss, only 1 queries database

Result: Zero thundering herd, smooth Black Friday.

Problem 2: Viral Tweet

Setup: A celebrity with 50 million followers tweets. Within seconds, millions of users want to see the tweet. The tweet wasn't in cache before (new content).

Requirements:

  • Tweet must be visible within 2 seconds of posting
  • System normally handles 10,000 tweets/second
  • Database can handle 50,000 reads/second
  • This tweet might get 100,000+ reads/second

Questions:

  1. How do you handle the initial cache miss for a viral tweet?
  2. How do you detect that a tweet is going viral?
  3. What's different about caching for celebrity accounts?
  • Initial request MUST query database (not cached yet)
  • Request coalescing helps but needs quick detection
  • Celebrity accounts are known in advance

Challenge analysis:

  • New tweet → not in cache → first request(s) miss
  • Within 1 second, could have 100K requests
  • Without protection: 100K DB queries, overwhelm

Solution: Celebrity fast-path + coalescing

# 1. Celebrity accounts get special handling
CELEBRITY_ACCOUNTS = load_accounts_over_1M_followers()

async def on_new_tweet(tweet: Tweet):
    if tweet.author_id in CELEBRITY_ACCOUNTS:
        # Immediately cache new tweet
        await cache.set(
            f"tweet:{tweet.id}",
            tweet,
            ttl=3600
        )
        
        # Pre-warm in all regional caches
        await broadcast_to_regions(tweet)
    
    # Normal tweets cached on first read


# 2. Request coalescing for all tweets
async def get_tweet(tweet_id: str):
    # Check cache
    cached = await cache.get(f"tweet:{tweet_id}")
    if cached:
        return cached
    
    # Coalesced fetch
    return await coalescing_cache.get_or_fetch(
        key=f"tweet:{tweet_id}",
        fetch_func=lambda: db.get_tweet(tweet_id),
        ttl=3600
    )


# 3. Viral detection and promotion
class ViralDetector:
    def __init__(self):
        self.request_counts = Counter()
    
    async def track(self, tweet_id: str):
        self.request_counts[tweet_id] += 1
        
        if self.request_counts[tweet_id] > 1000:  # Threshold
            # Promote to hot key handling
            await self.promote_to_hot(tweet_id)
    
    async def promote_to_hot(self, tweet_id: str):
        # Extend TTL
        await cache.expire(f"tweet:{tweet_id}", 3600)
        
        # Replicate to edge caches
        await replicate_to_edge(tweet_id)
        
        # Alert for monitoring
        logger.info(f"Viral tweet detected: {tweet_id}")

Key insight: Pre-cache for known celebrities, fast coalescing for surprise virality.

Problem 3: Distributed Cache Cluster

Setup: You have a Redis cluster with 10 nodes. Cache keys are distributed via consistent hashing. One node goes down, causing keys to move to other nodes.

Requirements:

  • 1 million keys on the failed node
  • Keys are now "cold" on their new nodes
  • Traffic continues during failover
  • Database must not be overwhelmed

Questions:

  1. What happens to the 1 million keys when the node fails?
  2. How do you prevent thundering herd during failover?
  3. How would you design for this scenario?
  • Consistent hashing redistributes keys
  • New nodes have cold cache for those keys
  • Not all keys are equally hot

Problem analysis:

  • Node fails, 1M keys moved to other nodes
  • Those nodes have cold cache for moved keys
  • If 1% of keys are hot (10K keys)
  • Each hot key gets 100 req/sec
  • Potential: 1M+ DB queries in first second

Solution: Multi-layer protection

# 1. Replica reads (primary defense)
#    Redis Cluster has replicas - promote before primary fails

# Configure Redis Cluster with replicas
# replica-serve-stale-data yes
# Replicas can serve reads even during failover


# 2. Request coalescing (application layer)
#    Even after failover, coalesce requests

# This is always on, helps during any cache miss


# 3. Graceful degradation
#    Limit database queries during failover

class CircuitBreaker:
    def __init__(self, max_concurrent: int = 1000):
        self.semaphore = asyncio.Semaphore(max_concurrent)
    
    async def query(self, func):
        async with self.semaphore:
            return await func()

# Only 1000 concurrent DB queries
# Others wait or get stale/error


# 4. Hot key warming on topology change

async def on_cluster_topology_change(moved_keys: List[str]):
    # Identify hot keys among moved keys
    hot_keys = [k for k in moved_keys if is_hot_key(k)]
    
    # Warm hot keys first (prioritized)
    for key in hot_keys[:100]:  # Top 100 hottest
        try:
            value = await db.fetch(key)
            await cache.set(key, value)
        except Exception:
            pass  # Best effort


# 5. Stale serving from old node (if possible)

# Before fully decommissioning old node:
async def get_with_fallback(key: str):
    # Try new node
    value = await new_node.get(key)
    if value:
        return value
    
    # Try old node (might still be up for reads)
    try:
        value = await old_node.get(key)
        if value:
            # Migrate to new node
            await new_node.set(key, value)
            return value
    except:
        pass
    
    # Fall through to database
    return await coalesced_fetch(key)

Key insight: Multiple layers of defense, graceful degradation over total failure.


Chapter 11: Mock Interview Dialogue

Interviewer: "We're building a trending topics feature. It shows the top 10 trending hashtags, updated every minute. We expect 100,000 requests per second for this data. How would you design the caching?"

You: "Interesting problem. Let me make sure I understand the access pattern.

So we have a single piece of data—the top 10 trending topics—that:

  • Updates every minute
  • Gets 100,000 requests per second
  • Is identical for all users

This is a classic hot key problem. My primary concern is the thundering herd when the cache expires or updates."

Interviewer: "Exactly. Walk me through your approach."

You: "I'd use background refresh as the primary strategy. Here's why:

Since we know this key is hot and we know when it updates (every minute), we can be proactive. A background job computes the new trending topics and updates the cache before the old data expires.

Timeline:
  T=0s     Background job computes trending, caches with TTL=90s
  T=60s    Background job computes new trending, updates cache
  T=60s    Old cache would have expired, but we already updated
  
  User requests always hit cache. Never a miss.

Interviewer: "What if the background job fails?"

You: "Good question. I'd have two safeguards:

First, stale-while-revalidate. Even if the background job misses a cycle, the cache entry is still valid—just stale. Users get the 1-minute-old trending topics while we retry the refresh. For trending topics, being 1-2 minutes stale is usually acceptable.

Second, extended TTL. I set the cache TTL to 90 seconds, not 60, even though we refresh every 60. This gives a 30-second buffer. If the job is delayed, users still get data.

If multiple refresh attempts fail, I'd alert ops and potentially serve a fallback—maybe the last successful trending topics with a 'slightly stale' indicator."

Interviewer: "What about the 100,000 requests per second? Can a single Redis key handle that?"

You: "That's a great point. A single Redis key can handle around 100K operations per second, so we're at the edge.

I'd do two things:

First, read replicas. Redis Cluster can route reads to replicas. With 3 replicas, we distribute the 100K reads across 4 nodes—25K each, easily handled.

Second, local caching. Each application server can cache the trending topics locally with a 5-second TTL. Most requests hit the local cache, never reaching Redis at all.

Request → Local Cache (hit 80%) → Redis (hit 99.9%) → Background Job

With 100 app servers and 5-second local TTL, each server makes 12 Redis requests per minute. That's only 1,200 Redis requests per minute total—trivial load."

Interviewer: "Interesting. How do you invalidate the local caches when trending updates?"

You: "For this use case, I wouldn't explicitly invalidate. The 5-second local TTL means users see updates within 5 seconds, which is acceptable for trending topics.

But if we needed faster propagation, I could use pub/sub:

  1. Background job updates Redis cache
  2. Background job publishes to 'trending-updated' channel
  3. App servers subscribe, clear local cache on message
  4. Next local request fetches from Redis

This gives near-instant propagation when needed."

Interviewer: "Last question: how would you handle a cold start? Say all your caches are empty."

You: "Cold start is where thundering herd is most dangerous. With 100K req/sec hitting an empty cache, we'd overwhelm any database.

My approach:

  1. Deployment ordering — The background job that computes trending runs before the API servers start accepting traffic. Cache is warm before first user request.

  2. Health check gate — API servers report unhealthy until the trending cache is populated. Load balancer won't route traffic to cold servers.

  3. Coalescing as fallback — Even if somehow cache is cold with traffic, request coalescing ensures only one computation runs. The other 99,999 requests wait for that result.

Combining these, a cold start scenario should be impossible in normal operation, and handled gracefully if it somehow occurs."


Summary

DAY 3 KEY TAKEAWAYS

THE PROBLEM:
• Thundering herd = many requests hit empty cache simultaneously
• Can overwhelm database (N requests → N queries)
• Occurs on: expiration, cold start, invalidation, hot key creation

PROTECTION STRATEGIES:

Locking (Mutex):
  • One request fetches, others wait
  • Simple but adds latency
  • Use for: Moderate traffic

Request Coalescing:
  • Duplicate requests share single fetch
  • No polling overhead
  • Use for: High traffic (primary strategy)

Probabilistic Expiration:
  • Random early refresh prevents synchronized expiry
  • No coordination needed
  • Use for: Distributed cache, many keys

Background Refresh:
  • Refresh before expiry, never miss
  • Most effective for known hot keys
  • Use for: Homepage, trending, deals

Stale-While-Revalidate:
  • Serve stale immediately, refresh async
  • Best user experience
  • Use for: When slight staleness is OK

COMBINING STRATEGIES (Production):

Hot keys (homepage):       Background refresh + stale fallback
Normal keys:               Coalescing + probabilistic TTL
Cold start:                Warming + coalescing + health gates

KEY PRINCIPLES:
• Identify hot keys in advance
• Add jitter to TTLs (prevent synchronized expiry)
• Always have timeouts (no unbounded waits)
• Monitor coalesced requests and wait times
• Plan for cold start explicitly

📚 Further Reading

Official Documentation

Engineering Blogs

Papers

  • "Scaling Memcache at Facebook" — Sections on leases and thundering herd
  • "XFetch: A Probabilistic Early Expiration Algorithm"

Books

  • "Designing Data-Intensive Applications" by Martin Kleppmann

End of Day 3: Thundering Herd

Tomorrow: Day 4 — Feed Caching. We'll tackle the "celebrity problem": how do you cache personalized feeds for millions of users when one celebrity's post needs to appear in 10 million feeds? You'll learn push-on-write vs pull-on-read and when to use each.