Week 4 — Day 3: Thundering Herd
System Design Mastery Series
Preface
Yesterday, you learned how to invalidate stale cache entries. You can now keep your cache fresh.
But there's a dark side to invalidation that we didn't fully address:
BLACK FRIDAY, 12:00:00 AM
The moment the sale starts:
Homepage cache TTL expires
50,000 users refresh simultaneously
Cache status: EMPTY
Request #1: Cache miss → Query database
Request #2: Cache miss → Query database
Request #3: Cache miss → Query database
...
Request #50,000: Cache miss → Query database
Database receives: 50,000 identical queries
Database capacity: 5,000 queries/second
Result:
12:00:01 — Database CPU: 100%
12:00:02 — Query latency: 50ms → 5,000ms
12:00:03 — Connection pool exhausted
12:00:04 — Database: "Too many connections"
12:00:05 — Site down, customers angry, revenue lost
All because a single cache key expired.
This is the thundering herd problem — and it's destroyed more Black Friday sales than any other caching issue.
Today, you'll learn how to tame the herd.
Part I: Foundations
Chapter 1: What Is Thundering Herd?
1.1 The Simple Definition
Thundering herd occurs when many requests simultaneously attempt to access an unavailable resource, overwhelming the system that provides it.
EVERYDAY ANALOGY: Store Opening
Black Friday, 6:00 AM
Store doors: LOCKED (cache empty)
5,000 people waiting outside
6:00:00 — Doors open
6:00:01 — 5,000 people rush in simultaneously
6:00:02 — Entrance bottleneck
6:00:03 — People trampled, chaos
The store can handle 100 people entering per minute.
5,000 at once = disaster.
SOLUTION: Controlled entry
- Open doors gradually
- Let people in batches of 50
- Others wait in organized queue
Same principle applies to cache misses.
1.2 When Thundering Herd Occurs
THUNDERING HERD TRIGGERS
1. CACHE EXPIRATION
Popular key expires
Many concurrent requests miss
All hit database simultaneously
2. COLD START
Service restarts, cache empty
All traffic goes to database
Database overwhelmed
3. CACHE FAILURE
Redis goes down
All requests bypass cache
Database sees 100x normal load
4. INVALIDATION STORM
Bulk invalidation (deployment, sale start)
Many keys deleted at once
Sudden spike in cache misses
5. HOT KEY CREATION
New viral content
Not in cache yet
Massive traffic from first second
1.3 The Math of Thundering Herd
QUANTIFYING THE PROBLEM
Normal operation:
Requests/second: 10,000
Cache hit ratio: 99%
Cache misses/second: 100
Database queries/sec: 100 ✓
Cache expires for hot key (1% of traffic):
Requests for hot key: 100/second
All miss simultaneously
Database queries/sec: 100 (normal) + 100 (hot key) = 200 ✓
Still manageable!
Cache expires for VERY hot key (50% of traffic):
Requests for hot key: 5,000/second
All miss simultaneously
Database queries/sec: 100 (normal) + 5,000 (hot key) = 5,100
Database capacity: 1,000 queries/second
Result: 5x overload 💥
THE AMPLIFICATION FACTOR
Without protection:
1 cache miss = N database queries
(where N = concurrent requests for same key)
With thundering herd protection:
1 cache miss = 1 database query
(other N-1 requests wait for result)
1.4 Key Terminology
| Term | Definition |
|---|---|
| Thundering herd | Many requests hitting uncached resource simultaneously |
| Cache stampede | Synonym for thundering herd |
| Dog-pile effect | Another synonym, common in Ruby/Python communities |
| Hot key | Cache key with disproportionately high traffic |
| Request coalescing | Combining duplicate requests into one |
| Lock/mutex | Mechanism to ensure only one request fetches data |
| Probabilistic expiration | Random early refresh to prevent synchronized expiry |
| Cache warming | Pre-populating cache before traffic hits |
Chapter 2: Protection Strategies
2.1 Strategy 1: Locking (Mutex)
Only one request fetches from database; others wait.
LOCKING STRATEGY
Without lock:
Request A ──▶ Cache miss ──▶ Query DB ──▶ Response
Request B ──▶ Cache miss ──▶ Query DB ──▶ Response
Request C ──▶ Cache miss ──▶ Query DB ──▶ Response
3 identical database queries!
With lock:
Request A ──▶ Cache miss ──▶ Acquire lock ──▶ Query DB ──▶ Cache result ──▶ Release lock
Request B ──▶ Cache miss ──▶ Wait for lock...
Request C ──▶ Cache miss ──▶ Wait for lock...
Lock released:
Request B ──▶ Read from cache ──▶ Response
Request C ──▶ Read from cache ──▶ Response
1 database query, 3 responses!
Implementation:
# Locking Strategy for Thundering Herd Prevention
import asyncio
from typing import Optional, Any, Callable
import json
class LockingCache:
"""
Cache with mutex-based thundering herd protection.
Only one request fetches from database on cache miss;
others wait for the result.
"""
def __init__(self, redis_client, default_ttl: int = 300):
self.redis = redis_client
self.default_ttl = default_ttl
self.lock_ttl = 10 # Lock expires after 10 seconds (safety)
async def get_or_fetch(
self,
key: str,
fetch_func: Callable,
ttl: int = None
) -> Any:
"""
Get from cache, or fetch with lock protection.
Args:
key: Cache key
fetch_func: Async function to fetch data if cache miss
ttl: Cache TTL in seconds
"""
ttl = ttl or self.default_ttl
# Step 1: Try cache
cached = await self.redis.get(key)
if cached:
return json.loads(cached)
# Step 2: Cache miss - try to acquire lock
lock_key = f"lock:{key}"
lock_acquired = await self.redis.set(
lock_key,
"1",
nx=True, # Only set if not exists
ex=self.lock_ttl # Expire lock after 10 seconds
)
if lock_acquired:
# We got the lock - fetch and cache
try:
value = await fetch_func()
# Cache the result
await self.redis.setex(key, ttl, json.dumps(value))
return value
finally:
# Release lock
await self.redis.delete(lock_key)
else:
# Lock held by another request - wait and retry
return await self._wait_for_cache(key, fetch_func, ttl)
async def _wait_for_cache(
self,
key: str,
fetch_func: Callable,
ttl: int,
max_wait: float = 5.0,
poll_interval: float = 0.05
) -> Any:
"""
Wait for another request to populate cache.
Falls back to fetching if wait times out.
"""
waited = 0
while waited < max_wait:
# Check if cache is now populated
cached = await self.redis.get(key)
if cached:
return json.loads(cached)
# Wait a bit and retry
await asyncio.sleep(poll_interval)
waited += poll_interval
# Timeout - fetch ourselves (lock holder may have failed)
value = await fetch_func()
await self.redis.setex(key, ttl, json.dumps(value))
return value
# Usage
cache = LockingCache(redis_client)
async def get_product(product_id: str) -> dict:
return await cache.get_or_fetch(
key=f"product:{product_id}",
fetch_func=lambda: db.fetch_one(
"SELECT * FROM products WHERE id = $1",
product_id
),
ttl=300
)
Pros:
- Guarantees only one database query
- Simple mental model
- Works well for moderate traffic
Cons:
- Waiters block, increasing latency
- Lock can become bottleneck
- Risk of deadlock if lock holder crashes
- Polling adds overhead
2.2 Strategy 2: Request Coalescing (Single Flight)
Multiple concurrent requests for the same key share a single fetch operation.
REQUEST COALESCING
Request A ──┐
Request B ──┼──▶ Single fetch_func() call ──▶ Single DB query
Request C ──┘ │
│
┌───────────────────────────────────────┘
│
▼
┌─────────────────┐
│ Result shared │
│ with all three │
│ requests │
└─────────────────┘
Unlike locking:
- No polling/waiting
- Requests "join" the in-flight operation
- All get result simultaneously
Implementation:
# Request Coalescing (Single Flight Pattern)
import asyncio
from typing import Dict, Any, Callable, Optional
from dataclasses import dataclass
import json
@dataclass
class InflightRequest:
"""Represents an in-flight fetch operation."""
future: asyncio.Future
created_at: float
class CoalescingCache:
"""
Cache with request coalescing for thundering herd prevention.
Multiple requests for the same key share a single database fetch.
This is more efficient than locking because there's no polling.
"""
def __init__(self, redis_client, default_ttl: int = 300):
self.redis = redis_client
self.default_ttl = default_ttl
# Track in-flight requests: key -> Future
self._inflight: Dict[str, InflightRequest] = {}
self._lock = asyncio.Lock()
async def get_or_fetch(
self,
key: str,
fetch_func: Callable,
ttl: int = None
) -> Any:
"""
Get from cache, or fetch with request coalescing.
If another request is already fetching this key,
we join that request instead of making a duplicate fetch.
"""
ttl = ttl or self.default_ttl
# Step 1: Try cache
cached = await self.redis.get(key)
if cached:
return json.loads(cached)
# Step 2: Check for in-flight request
async with self._lock:
if key in self._inflight:
# Another request is fetching - join it
inflight = self._inflight[key]
return await inflight.future
# No in-flight request - we'll be the fetcher
future = asyncio.get_event_loop().create_future()
self._inflight[key] = InflightRequest(
future=future,
created_at=asyncio.get_event_loop().time()
)
# Step 3: Fetch data (we're the designated fetcher)
try:
value = await fetch_func()
# Cache the result
await self.redis.setex(key, ttl, json.dumps(value))
# Notify all waiting requests
future.set_result(value)
return value
except Exception as e:
# Notify waiters of failure
future.set_exception(e)
raise
finally:
# Clean up in-flight tracking
async with self._lock:
self._inflight.pop(key, None)
# More robust implementation with timeout
class RobustCoalescingCache:
"""
Production-ready coalescing cache with timeouts and cleanup.
"""
def __init__(
self,
redis_client,
default_ttl: int = 300,
fetch_timeout: float = 5.0,
cleanup_interval: float = 60.0
):
self.redis = redis_client
self.default_ttl = default_ttl
self.fetch_timeout = fetch_timeout
self._inflight: Dict[str, InflightRequest] = {}
self._lock = asyncio.Lock()
# Start cleanup task
asyncio.create_task(self._cleanup_loop(cleanup_interval))
async def get_or_fetch(
self,
key: str,
fetch_func: Callable,
ttl: int = None
) -> Any:
"""Get from cache with coalescing and timeout."""
ttl = ttl or self.default_ttl
# Try cache first
cached = await self.redis.get(key)
if cached:
return json.loads(cached)
# Check/register in-flight
async with self._lock:
if key in self._inflight:
inflight = self._inflight[key]
try:
# Wait with timeout
return await asyncio.wait_for(
asyncio.shield(inflight.future),
timeout=self.fetch_timeout
)
except asyncio.TimeoutError:
# Fetcher is taking too long, fetch ourselves
pass
# Register as fetcher
future = asyncio.get_event_loop().create_future()
self._inflight[key] = InflightRequest(
future=future,
created_at=asyncio.get_event_loop().time()
)
# Fetch with timeout
try:
value = await asyncio.wait_for(
fetch_func(),
timeout=self.fetch_timeout
)
await self.redis.setex(key, ttl, json.dumps(value))
if not future.done():
future.set_result(value)
return value
except Exception as e:
if not future.done():
future.set_exception(e)
raise
finally:
async with self._lock:
self._inflight.pop(key, None)
async def _cleanup_loop(self, interval: float):
"""Clean up stale in-flight entries."""
while True:
await asyncio.sleep(interval)
now = asyncio.get_event_loop().time()
stale_keys = []
async with self._lock:
for key, inflight in self._inflight.items():
if now - inflight.created_at > self.fetch_timeout * 2:
stale_keys.append(key)
for key in stale_keys:
self._inflight.pop(key, None)
Pros:
- No polling overhead
- All requests get result simultaneously
- Lower latency than locking
- Cleaner code
Cons:
- More complex implementation
- Memory overhead for tracking in-flight requests
- Need to handle timeout for stale entries
2.3 Strategy 3: Probabilistic Early Expiration
Randomly refresh cache before it expires, preventing synchronized expiration.
PROBABILISTIC EARLY EXPIRATION
Standard TTL (Problem):
All requests use same TTL
All entries expire at same time
All requests miss simultaneously = thundering herd
Timeline:
T=0s Cache populated, TTL=60s
T=60s Cache expires
T=60s 1000 requests all miss!
Probabilistic expiration (Solution):
Each request has small chance of refreshing early
Entries refresh at different times
Never synchronized expiration
Timeline:
T=0s Cache populated, TTL=60s
T=55s Request A: random < threshold → REFRESH (1 request)
T=60s Cache already fresh, no mass expiration
THE FORMULA
beta = 1.0 # Tuning parameter (higher = more aggressive refresh)
ttl_remaining = expiry_time - current_time
# Probability of early refresh increases as expiry approaches
should_refresh = random() < beta * log(random()) * -1 / ttl_remaining
Or simpler version:
should_refresh = random() < (1.0 - ttl_remaining / original_ttl) * 0.1
Implementation:
# Probabilistic Early Expiration
import random
import math
import time
from typing import Any, Callable, Optional
from dataclasses import dataclass
import json
@dataclass
class CachedValue:
"""Cached value with expiration metadata."""
value: Any
created_at: float
ttl: float
@property
def expires_at(self) -> float:
return self.created_at + self.ttl
@property
def remaining_ttl(self) -> float:
return max(0, self.expires_at - time.time())
@property
def ttl_fraction_remaining(self) -> float:
"""Fraction of TTL remaining (0.0 to 1.0)."""
return self.remaining_ttl / self.ttl if self.ttl > 0 else 0
class ProbabilisticCache:
"""
Cache with probabilistic early expiration.
Randomly refreshes entries before they expire,
preventing synchronized mass expiration.
"""
def __init__(
self,
redis_client,
default_ttl: int = 300,
beta: float = 1.0 # Higher = more aggressive refresh
):
self.redis = redis_client
self.default_ttl = default_ttl
self.beta = beta
async def get_or_fetch(
self,
key: str,
fetch_func: Callable,
ttl: int = None
) -> Any:
"""
Get from cache with probabilistic early refresh.
"""
ttl = ttl or self.default_ttl
# Get cached value with metadata
cached_data = await self._get_with_metadata(key)
if cached_data:
# Check if we should refresh early
if self._should_refresh_early(cached_data):
# Refresh in background (don't block request)
asyncio.create_task(
self._refresh_cache(key, fetch_func, ttl)
)
return cached_data.value
# Cache miss - fetch and cache
value = await fetch_func()
await self._set_with_metadata(key, value, ttl)
return value
def _should_refresh_early(self, cached: CachedValue) -> bool:
"""
Probabilistically decide if we should refresh early.
Probability increases as expiration approaches.
Uses the XFetch algorithm from Redis documentation.
"""
remaining = cached.remaining_ttl
if remaining <= 0:
return True # Already expired
# XFetch probability calculation
# P(refresh) = beta * log(random()) * -1 / remaining
# This creates exponentially increasing probability near expiry
try:
random_factor = -1 * math.log(random.random())
threshold = self.beta * random_factor
return threshold > remaining
except ValueError:
return False
async def _get_with_metadata(self, key: str) -> Optional[CachedValue]:
"""Get cached value with expiration metadata."""
# Store metadata alongside value
pipe = self.redis.pipeline()
pipe.get(f"{key}:data")
pipe.get(f"{key}:meta")
data, meta = await pipe.execute()
if not data or not meta:
return None
meta = json.loads(meta)
return CachedValue(
value=json.loads(data),
created_at=meta['created_at'],
ttl=meta['ttl']
)
async def _set_with_metadata(self, key: str, value: Any, ttl: int):
"""Set cached value with metadata."""
now = time.time()
pipe = self.redis.pipeline()
pipe.setex(f"{key}:data", ttl, json.dumps(value))
pipe.setex(f"{key}:meta", ttl, json.dumps({
'created_at': now,
'ttl': ttl
}))
await pipe.execute()
async def _refresh_cache(self, key: str, fetch_func: Callable, ttl: int):
"""Background refresh of cache entry."""
try:
value = await fetch_func()
await self._set_with_metadata(key, value, ttl)
except Exception as e:
# Log but don't fail - old value still valid
logger.warning(f"Background refresh failed for {key}: {e}")
# Simpler implementation using TTL with jitter
class JitteredTTLCache:
"""
Simpler probabilistic expiration using TTL jitter.
Instead of calculating probability, just randomize the TTL.
"""
def __init__(
self,
redis_client,
default_ttl: int = 300,
jitter_percent: float = 0.1 # 10% jitter
):
self.redis = redis_client
self.default_ttl = default_ttl
self.jitter_percent = jitter_percent
def _jittered_ttl(self, base_ttl: int) -> int:
"""Add random jitter to TTL."""
jitter_range = base_ttl * self.jitter_percent
jitter = random.uniform(-jitter_range, jitter_range)
return max(1, int(base_ttl + jitter))
async def set(self, key: str, value: Any, ttl: int = None):
"""Set with jittered TTL."""
ttl = ttl or self.default_ttl
actual_ttl = self._jittered_ttl(ttl)
await self.redis.setex(key, actual_ttl, json.dumps(value))
async def get_or_fetch(
self,
key: str,
fetch_func: Callable,
ttl: int = None
) -> Any:
"""Get with jittered TTL on cache population."""
cached = await self.redis.get(key)
if cached:
return json.loads(cached)
value = await fetch_func()
await self.set(key, value, ttl)
return value
Pros:
- No coordination needed
- Graceful degradation
- Works well for distributed caches
- Simple jitter version is very easy
Cons:
- Not deterministic
- Small chance of thundering herd still
- Slightly reduced cache efficiency (early refreshes)
2.4 Strategy 4: Background Refresh (Proactive Refresh)
Never let cache expire — refresh before TTL.
BACKGROUND REFRESH
Standard approach (reactive):
Request → Cache miss → Fetch → Cache → Response
Problem: Cache miss = latency spike
Background refresh (proactive):
Background job → Fetch → Update cache (before expiry)
Request → Cache hit → Response
No cache miss ever!
Timeline:
Standard:
T=0 Cache populated, TTL=60s
T=60s Expires
T=60.1s Request → MISS → Fetch (slow)
Background:
T=0 Cache populated, TTL=60s
T=45s Background job refreshes (before expiry)
T=60s Would have expired, but already refreshed
T=60.1s Request → HIT → Fast response
Implementation:
# Background Refresh Strategy
import asyncio
from typing import Dict, Any, Callable, List
from dataclasses import dataclass, field
from datetime import datetime, timedelta
import json
@dataclass
class RefreshConfig:
"""Configuration for a cached key that needs background refresh."""
key: str
fetch_func: Callable
ttl: int
refresh_interval: int # Refresh this many seconds before expiry
priority: int = 1 # Higher = more important
@dataclass
class CacheEntry:
"""Track cache entry for refresh scheduling."""
key: str
expires_at: datetime
config: RefreshConfig
class BackgroundRefreshCache:
"""
Cache with background refresh.
A background task proactively refreshes entries before they expire.
This ensures cache is always warm and no thundering herd occurs.
"""
def __init__(
self,
redis_client,
refresh_workers: int = 4
):
self.redis = redis_client
self.refresh_workers = refresh_workers
# Registry of keys to refresh
self._registered: Dict[str, RefreshConfig] = {}
# Queue of pending refreshes
self._refresh_queue: asyncio.PriorityQueue = asyncio.PriorityQueue()
self._running = False
async def start(self):
"""Start background refresh workers."""
self._running = True
# Start scheduler
asyncio.create_task(self._scheduler_loop())
# Start workers
for i in range(self.refresh_workers):
asyncio.create_task(self._worker_loop(i))
async def stop(self):
"""Stop background refresh."""
self._running = False
def register(
self,
key: str,
fetch_func: Callable,
ttl: int = 300,
refresh_before: int = 30 # Refresh 30s before expiry
):
"""Register a key for background refresh."""
self._registered[key] = RefreshConfig(
key=key,
fetch_func=fetch_func,
ttl=ttl,
refresh_interval=ttl - refresh_before
)
async def get(self, key: str) -> Any:
"""Get from cache (should always hit if registered)."""
cached = await self.redis.get(key)
if cached:
return json.loads(cached)
# Miss - key might not be registered or first access
if key in self._registered:
config = self._registered[key]
value = await config.fetch_func()
await self._cache_and_schedule(key, value, config)
return value
return None
async def _cache_and_schedule(
self,
key: str,
value: Any,
config: RefreshConfig
):
"""Cache value and schedule next refresh."""
await self.redis.setex(key, config.ttl, json.dumps(value))
# Schedule refresh before expiry
refresh_at = datetime.utcnow() + timedelta(seconds=config.refresh_interval)
await self._refresh_queue.put((
refresh_at.timestamp(), # Priority (earlier = higher priority)
CacheEntry(key=key, expires_at=refresh_at, config=config)
))
async def _scheduler_loop(self):
"""Schedule refreshes at the right time."""
while self._running:
try:
# Peek at next item
if self._refresh_queue.empty():
await asyncio.sleep(1)
continue
priority, entry = await self._refresh_queue.get()
# Wait until refresh time
now = datetime.utcnow().timestamp()
if priority > now:
# Put back and wait
await self._refresh_queue.put((priority, entry))
await asyncio.sleep(min(1, priority - now))
continue
# Time to refresh - requeue for worker
await self._do_refresh(entry)
except Exception as e:
logger.error(f"Scheduler error: {e}")
await asyncio.sleep(1)
async def _do_refresh(self, entry: CacheEntry):
"""Execute a cache refresh."""
try:
config = entry.config
value = await config.fetch_func()
await self._cache_and_schedule(entry.key, value, config)
logger.debug(f"Refreshed cache: {entry.key}")
except Exception as e:
logger.error(f"Failed to refresh {entry.key}: {e}")
# Reschedule with backoff
await asyncio.sleep(5)
await self._refresh_queue.put((
datetime.utcnow().timestamp() + 5,
entry
))
async def _worker_loop(self, worker_id: int):
"""Worker that processes refresh queue."""
# In this implementation, scheduler does the work
# Workers could be used for parallel refreshes
pass
# Usage
cache = BackgroundRefreshCache(redis_client)
# Register keys for background refresh
cache.register(
key="homepage:featured",
fetch_func=lambda: db.fetch("SELECT * FROM featured_products"),
ttl=60,
refresh_before=10 # Refresh 10s before expiry
)
cache.register(
key="deals:page",
fetch_func=lambda: db.fetch("SELECT * FROM active_deals"),
ttl=300,
refresh_before=30
)
await cache.start()
# Now gets always hit (assuming registration happened before first request)
featured = await cache.get("homepage:featured")
Pros:
- No cache misses for registered keys
- Consistent low latency
- Predictable database load
- No thundering herd possible
Cons:
- Complex implementation
- Must know keys in advance
- Refreshes even if key not accessed
- Not suitable for millions of keys
2.5 Strategy 5: Stale-While-Revalidate
Serve stale data immediately while refreshing in background.
STALE-WHILE-REVALIDATE
Standard cache miss:
Request → Miss → Wait for fetch → Response (slow)
Stale-while-revalidate:
Request → Stale hit → Response immediately (fast!)
└──▶ Background refresh
User gets fast response (stale but recent)
Next user gets fresh data
Timeline:
T=0 Cache populated
T=60s Cache "expires" but data kept as stale
T=61s Request comes in:
- Return stale data immediately
- Trigger background refresh
T=62s Background refresh completes
T=63s Next request gets fresh data
Implementation:
# Stale-While-Revalidate Pattern
import asyncio
import time
from typing import Any, Callable, Optional
from dataclasses import dataclass
import json
@dataclass
class StaleableValue:
"""Value that can be served stale."""
data: Any
cached_at: float
fresh_ttl: float # Serve fresh until this
stale_ttl: float # Allow stale until this
@property
def is_fresh(self) -> bool:
return time.time() < self.cached_at + self.fresh_ttl
@property
def is_stale_but_usable(self) -> bool:
now = time.time()
return (self.cached_at + self.fresh_ttl <= now <
self.cached_at + self.stale_ttl)
@property
def is_expired(self) -> bool:
return time.time() >= self.cached_at + self.stale_ttl
class StaleWhileRevalidateCache:
"""
Cache that serves stale content while revalidating.
Provides fast responses even on cache "expiry" by serving
slightly stale data while refreshing in background.
"""
def __init__(
self,
redis_client,
fresh_ttl: int = 60, # Serve fresh for 60s
stale_ttl: int = 300 # Allow stale for 5min total
):
self.redis = redis_client
self.fresh_ttl = fresh_ttl
self.stale_ttl = stale_ttl
# Track in-flight revalidations
self._revalidating: set = set()
async def get_or_fetch(
self,
key: str,
fetch_func: Callable
) -> Any:
"""
Get from cache with stale-while-revalidate.
Returns:
- Fresh data if available
- Stale data (with background refresh) if within stale window
- Fresh fetch if expired
"""
cached = await self._get_with_metadata(key)
if cached:
if cached.is_fresh:
# Fresh - return immediately
return cached.data
elif cached.is_stale_but_usable:
# Stale but usable - return and refresh
self._trigger_revalidation(key, fetch_func)
return cached.data
# Expired or not cached - must fetch
value = await fetch_func()
await self._set_with_metadata(key, value)
return value
def _trigger_revalidation(self, key: str, fetch_func: Callable):
"""Trigger background revalidation if not already running."""
if key in self._revalidating:
return # Already revalidating
self._revalidating.add(key)
asyncio.create_task(self._revalidate(key, fetch_func))
async def _revalidate(self, key: str, fetch_func: Callable):
"""Background revalidation."""
try:
value = await fetch_func()
await self._set_with_metadata(key, value)
logger.debug(f"Revalidated: {key}")
except Exception as e:
logger.warning(f"Revalidation failed for {key}: {e}")
finally:
self._revalidating.discard(key)
async def _get_with_metadata(self, key: str) -> Optional[StaleableValue]:
"""Get cached value with staleness metadata."""
raw = await self.redis.get(key)
if not raw:
return None
data = json.loads(raw)
return StaleableValue(
data=data['value'],
cached_at=data['cached_at'],
fresh_ttl=data['fresh_ttl'],
stale_ttl=data['stale_ttl']
)
async def _set_with_metadata(self, key: str, value: Any):
"""Set value with staleness metadata."""
data = {
'value': value,
'cached_at': time.time(),
'fresh_ttl': self.fresh_ttl,
'stale_ttl': self.stale_ttl
}
# Set TTL to stale_ttl (maximum lifetime)
await self.redis.setex(
key,
self.stale_ttl,
json.dumps(data)
)
# Usage
cache = StaleWhileRevalidateCache(
redis_client,
fresh_ttl=60, # Fresh for 1 minute
stale_ttl=300 # Stale-but-usable for 5 minutes total
)
async def get_product(product_id: str) -> dict:
return await cache.get_or_fetch(
key=f"product:{product_id}",
fetch_func=lambda: db.fetch_product(product_id)
)
Pros:
- Fast responses even when "expired"
- Smooth user experience
- Natural load distribution
- Used by CDNs (HTTP Cache-Control: stale-while-revalidate)
Cons:
- Users may see slightly stale data
- Complexity in managing dual TTLs
- Need to track revalidation state
Chapter 3: Strategy Comparison
3.1 Comparison Matrix
| Strategy | Latency Impact | Complexity | Best For |
|---|---|---|---|
| Locking | High (waiters block) | Low | Moderate traffic |
| Coalescing | Low (concurrent wait) | Medium | High traffic |
| Probabilistic | None (gradual) | Low | Distributed cache |
| Background | None (proactive) | High | Known hot keys |
| Stale-While-Revalidate | None (serve stale) | Medium | UX-critical |
3.2 Decision Framework
CHOOSING A THUNDERING HERD STRATEGY
Question 1: Do you know your hot keys in advance?
YES → Consider Background Refresh
NO → Continue to Question 2
Question 2: Can users tolerate briefly stale data?
YES → Stale-While-Revalidate
NO → Continue to Question 3
Question 3: How high is your traffic?
Very high (>10K/sec per key) → Request Coalescing
Moderate (<10K/sec per key) → Locking
Question 4: Is this a distributed multi-node cache?
YES → Add Probabilistic Expiration on top
NO → Single-node strategies sufficient
COMBINED APPROACH (Production)
For homepage (known hot key):
Background Refresh + Stale-While-Revalidate
For product pages (many keys):
Request Coalescing + Probabilistic TTL
For user-specific data (many keys, less hot):
Locking + Short TTL
3.3 Combining Strategies
# Combined Thundering Herd Protection
class ProductionCache:
"""
Production cache combining multiple strategies.
- Request coalescing for all keys
- Probabilistic expiration for distributed cache
- Stale-while-revalidate for user experience
- Background refresh for known hot keys
"""
def __init__(self, redis_client):
self.redis = redis_client
# In-flight tracking for coalescing
self._inflight: Dict[str, asyncio.Future] = {}
self._lock = asyncio.Lock()
# Hot keys for background refresh
self._hot_keys: Dict[str, RefreshConfig] = {}
# Configuration
self.fresh_ttl = 60
self.stale_ttl = 300
self.jitter_percent = 0.1
async def get_or_fetch(
self,
key: str,
fetch_func: Callable,
allow_stale: bool = True
) -> Any:
"""
Get with combined protection strategies.
"""
# Step 1: Try cache with stale-while-revalidate
cached = await self._get_with_staleness(key)
if cached:
if cached.is_fresh:
return cached.data
if cached.is_stale_but_usable and allow_stale:
# Serve stale, refresh in background
asyncio.create_task(
self._coalesced_refresh(key, fetch_func)
)
return cached.data
# Step 2: Need fresh data - use coalescing
return await self._coalesced_fetch(key, fetch_func)
async def _coalesced_fetch(
self,
key: str,
fetch_func: Callable
) -> Any:
"""Fetch with request coalescing."""
async with self._lock:
if key in self._inflight:
return await self._inflight[key]
future = asyncio.get_event_loop().create_future()
self._inflight[key] = future
try:
value = await fetch_func()
await self._set_with_jitter(key, value)
future.set_result(value)
return value
except Exception as e:
future.set_exception(e)
raise
finally:
async with self._lock:
self._inflight.pop(key, None)
async def _set_with_jitter(self, key: str, value: Any):
"""Set with probabilistic TTL jitter."""
jitter = random.uniform(
-self.fresh_ttl * self.jitter_percent,
self.fresh_ttl * self.jitter_percent
)
actual_ttl = int(self.fresh_ttl + jitter)
data = {
'value': value,
'cached_at': time.time(),
'fresh_ttl': actual_ttl,
'stale_ttl': self.stale_ttl
}
await self.redis.setex(key, self.stale_ttl, json.dumps(data))
Part II: Implementation
Chapter 4: Production-Ready Implementation
4.1 Complete Thundering Herd Protection System
# Production Thundering Herd Protection
import asyncio
import json
import logging
import random
import time
from dataclasses import dataclass, field
from typing import Dict, Any, Callable, Optional, Set
from enum import Enum
logger = logging.getLogger(__name__)
# =============================================================================
# Configuration
# =============================================================================
@dataclass
class ThunderingHerdConfig:
"""Configuration for thundering herd protection."""
# Coalescing
enable_coalescing: bool = True
coalesce_timeout: float = 5.0
# Probabilistic refresh
enable_probabilistic: bool = True
jitter_percent: float = 0.1
early_refresh_beta: float = 1.0
# Stale-while-revalidate
enable_stale: bool = True
fresh_ttl: int = 60
stale_ttl: int = 300
# Locking fallback
enable_locking: bool = True
lock_ttl: int = 10
# Metrics
enable_metrics: bool = True
# =============================================================================
# Cache Value Wrapper
# =============================================================================
@dataclass
class CacheValue:
"""Wrapped cache value with metadata."""
data: Any
cached_at: float
fresh_until: float
stale_until: float
@property
def is_fresh(self) -> bool:
return time.time() < self.fresh_until
@property
def is_stale_usable(self) -> bool:
now = time.time()
return self.fresh_until <= now < self.stale_until
@property
def is_expired(self) -> bool:
return time.time() >= self.stale_until
@property
def remaining_fresh_ttl(self) -> float:
return max(0, self.fresh_until - time.time())
def to_json(self) -> str:
return json.dumps({
'data': self.data,
'cached_at': self.cached_at,
'fresh_until': self.fresh_until,
'stale_until': self.stale_until
})
@classmethod
def from_json(cls, raw: str) -> 'CacheValue':
d = json.loads(raw)
return cls(
data=d['data'],
cached_at=d['cached_at'],
fresh_until=d['fresh_until'],
stale_until=d['stale_until']
)
# =============================================================================
# Metrics
# =============================================================================
@dataclass
class ThunderingHerdMetrics:
"""Metrics for thundering herd protection."""
cache_hits: int = 0
cache_misses: int = 0
stale_serves: int = 0
coalesced_requests: int = 0
locks_acquired: int = 0
locks_waited: int = 0
early_refreshes: int = 0
background_refreshes: int = 0
errors: int = 0
def to_dict(self) -> dict:
return {
'hits': self.cache_hits,
'misses': self.cache_misses,
'stale_serves': self.stale_serves,
'coalesced': self.coalesced_requests,
'locks_acquired': self.locks_acquired,
'locks_waited': self.locks_waited,
'early_refreshes': self.early_refreshes,
'errors': self.errors,
'hit_ratio': self.cache_hits / max(1, self.cache_hits + self.cache_misses)
}
# =============================================================================
# Main Implementation
# =============================================================================
class ThunderingHerdCache:
"""
Production-ready cache with comprehensive thundering herd protection.
Combines:
- Request coalescing (primary)
- Stale-while-revalidate (user experience)
- Probabilistic early refresh (distributed cache)
- Locking (fallback)
"""
def __init__(
self,
redis_client,
config: ThunderingHerdConfig = None
):
self.redis = redis_client
self.config = config or ThunderingHerdConfig()
self.metrics = ThunderingHerdMetrics()
# Coalescing state
self._inflight: Dict[str, asyncio.Future] = {}
self._inflight_lock = asyncio.Lock()
# Revalidation state
self._revalidating: Set[str] = set()
async def get_or_fetch(
self,
key: str,
fetch_func: Callable[[], Any],
fresh_ttl: int = None,
stale_ttl: int = None,
allow_stale: bool = None
) -> Any:
"""
Get from cache with thundering herd protection.
Args:
key: Cache key
fetch_func: Async function to fetch data on miss
fresh_ttl: Override fresh TTL
stale_ttl: Override stale TTL
allow_stale: Allow serving stale data
Returns:
Cached or fetched value
"""
fresh_ttl = fresh_ttl or self.config.fresh_ttl
stale_ttl = stale_ttl or self.config.stale_ttl
allow_stale = allow_stale if allow_stale is not None else self.config.enable_stale
# Step 1: Try cache
cached = await self._get_cached(key)
if cached:
# Check freshness
if cached.is_fresh:
self.metrics.cache_hits += 1
# Probabilistic early refresh
if self.config.enable_probabilistic:
if self._should_refresh_early(cached):
self.metrics.early_refreshes += 1
asyncio.create_task(
self._background_refresh(key, fetch_func, fresh_ttl, stale_ttl)
)
return cached.data
# Stale but usable
if cached.is_stale_usable and allow_stale:
self.metrics.stale_serves += 1
# Trigger background refresh
asyncio.create_task(
self._background_refresh(key, fetch_func, fresh_ttl, stale_ttl)
)
return cached.data
# Step 2: Cache miss - need fresh data
self.metrics.cache_misses += 1
if self.config.enable_coalescing:
return await self._coalesced_fetch(key, fetch_func, fresh_ttl, stale_ttl)
elif self.config.enable_locking:
return await self._locked_fetch(key, fetch_func, fresh_ttl, stale_ttl)
else:
return await self._simple_fetch(key, fetch_func, fresh_ttl, stale_ttl)
async def _get_cached(self, key: str) -> Optional[CacheValue]:
"""Get cached value with metadata."""
try:
raw = await self.redis.get(key)
if raw:
return CacheValue.from_json(raw)
except Exception as e:
logger.warning(f"Cache get error for {key}: {e}")
self.metrics.errors += 1
return None
async def _set_cached(
self,
key: str,
value: Any,
fresh_ttl: int,
stale_ttl: int
):
"""Set cached value with metadata."""
# Add jitter if enabled
if self.config.enable_probabilistic:
jitter = fresh_ttl * self.config.jitter_percent
fresh_ttl = int(fresh_ttl + random.uniform(-jitter, jitter))
now = time.time()
cached = CacheValue(
data=value,
cached_at=now,
fresh_until=now + fresh_ttl,
stale_until=now + stale_ttl
)
try:
await self.redis.setex(key, stale_ttl, cached.to_json())
except Exception as e:
logger.warning(f"Cache set error for {key}: {e}")
self.metrics.errors += 1
def _should_refresh_early(self, cached: CacheValue) -> bool:
"""Probabilistically decide on early refresh."""
remaining = cached.remaining_fresh_ttl
if remaining <= 0:
return True
# XFetch algorithm
try:
beta = self.config.early_refresh_beta
threshold = beta * (-1 * math.log(random.random()))
return threshold > remaining
except ValueError:
return False
async def _coalesced_fetch(
self,
key: str,
fetch_func: Callable,
fresh_ttl: int,
stale_ttl: int
) -> Any:
"""Fetch with request coalescing."""
async with self._inflight_lock:
if key in self._inflight:
# Join existing fetch
self.metrics.coalesced_requests += 1
try:
return await asyncio.wait_for(
asyncio.shield(self._inflight[key]),
timeout=self.config.coalesce_timeout
)
except asyncio.TimeoutError:
# Timeout - fetch ourselves
pass
# We're the fetcher
future = asyncio.get_event_loop().create_future()
self._inflight[key] = future
try:
value = await asyncio.wait_for(
fetch_func(),
timeout=self.config.coalesce_timeout
)
await self._set_cached(key, value, fresh_ttl, stale_ttl)
if not future.done():
future.set_result(value)
return value
except Exception as e:
if not future.done():
future.set_exception(e)
raise
finally:
async with self._inflight_lock:
self._inflight.pop(key, None)
async def _locked_fetch(
self,
key: str,
fetch_func: Callable,
fresh_ttl: int,
stale_ttl: int
) -> Any:
"""Fetch with lock protection."""
lock_key = f"lock:{key}"
# Try to acquire lock
acquired = await self.redis.set(
lock_key, "1",
nx=True,
ex=self.config.lock_ttl
)
if acquired:
self.metrics.locks_acquired += 1
try:
value = await fetch_func()
await self._set_cached(key, value, fresh_ttl, stale_ttl)
return value
finally:
await self.redis.delete(lock_key)
else:
# Wait for lock holder
self.metrics.locks_waited += 1
return await self._wait_for_cache(key, fetch_func, fresh_ttl, stale_ttl)
async def _wait_for_cache(
self,
key: str,
fetch_func: Callable,
fresh_ttl: int,
stale_ttl: int,
max_wait: float = 5.0
) -> Any:
"""Wait for another request to populate cache."""
waited = 0
poll_interval = 0.05
while waited < max_wait:
cached = await self._get_cached(key)
if cached and not cached.is_expired:
return cached.data
await asyncio.sleep(poll_interval)
waited += poll_interval
# Timeout - fetch ourselves
return await self._simple_fetch(key, fetch_func, fresh_ttl, stale_ttl)
async def _simple_fetch(
self,
key: str,
fetch_func: Callable,
fresh_ttl: int,
stale_ttl: int
) -> Any:
"""Simple fetch without protection (fallback)."""
value = await fetch_func()
await self._set_cached(key, value, fresh_ttl, stale_ttl)
return value
async def _background_refresh(
self,
key: str,
fetch_func: Callable,
fresh_ttl: int,
stale_ttl: int
):
"""Background cache refresh."""
if key in self._revalidating:
return # Already refreshing
self._revalidating.add(key)
self.metrics.background_refreshes += 1
try:
value = await fetch_func()
await self._set_cached(key, value, fresh_ttl, stale_ttl)
logger.debug(f"Background refresh completed: {key}")
except Exception as e:
logger.warning(f"Background refresh failed for {key}: {e}")
finally:
self._revalidating.discard(key)
def get_metrics(self) -> dict:
"""Get thundering herd protection metrics."""
return self.metrics.to_dict()
# =============================================================================
# Helper Functions
# =============================================================================
import math # Need for probabilistic calculation
# =============================================================================
# Usage Example
# =============================================================================
async def example_usage():
"""Example usage of ThunderingHerdCache."""
# Create cache with default config
cache = ThunderingHerdCache(
redis_client=redis,
config=ThunderingHerdConfig(
fresh_ttl=60,
stale_ttl=300,
enable_coalescing=True,
enable_stale=True,
enable_probabilistic=True
)
)
# Get product (all protections active)
product = await cache.get_or_fetch(
key=f"product:{product_id}",
fetch_func=lambda: db.fetch_product(product_id)
)
# Get critical data (no stale)
inventory = await cache.get_or_fetch(
key=f"inventory:{product_id}",
fetch_func=lambda: db.fetch_inventory(product_id),
allow_stale=False,
fresh_ttl=30
)
# Check metrics
print(cache.get_metrics())
Chapter 5: Edge Cases and Error Handling
5.1 Edge Case 1: Lock Holder Crashes
SCENARIO: Request acquires lock, then crashes
Timeline:
T0: Request A acquires lock
T1: Request B waits for lock
T2: Request A crashes (lock not released!)
T3: Request B waits... waits... waits...
T4: Lock TTL expires (10 seconds)
T5: Request C acquires lock
Problem: Request B waited 10 seconds for nothing
SOLUTION: Lock with TTL + Wait timeout
lock_ttl = 10 seconds (safety)
wait_timeout = 5 seconds (give up sooner)
If wait times out, fetch ourselves.
Duplicate fetches possible but bounded.
5.2 Edge Case 2: Coalesced Request Timeout
SCENARIO: Fetch takes too long, waiters time out
Timeline:
T0: Request A starts fetch (slow database)
T0: Requests B, C, D join (coalesced)
T5: Requests B, C, D timeout (5 second limit)
T5: B, C, D each start their own fetch!
T7: Request A completes, caches result
T8: B, C, D complete, each caches result
Now we have 4 database queries instead of 1!
SOLUTION: Shield the future + stale fallback
# When timing out, check for stale data first
try:
return await asyncio.wait_for(future, timeout=5.0)
except asyncio.TimeoutError:
# Check if we have stale data
cached = await get_stale(key)
if cached:
return cached.data # Serve stale, don't query
# No stale data - must fetch
return await fetch_func()
5.3 Edge Case 3: Background Refresh Fails Repeatedly
SCENARIO: Database is having issues, refreshes fail
Timeline:
T0: Cache populated
T60: Cache goes stale
T61: Background refresh attempted - FAILS
T62: Another request, serve stale, try refresh - FAILS
T63: More requests, more failed refreshes
T300: Stale TTL expires, no more stale data!
T301: All requests hit database - thundering herd!
SOLUTION: Extend stale TTL on refresh failure
async def background_refresh(key, fetch_func):
try:
value = await fetch_func()
await set_cached(key, value, fresh_ttl, stale_ttl)
except Exception as e:
# Refresh failed - extend stale TTL
cached = await get_cached(key)
if cached:
# Extend stale window by 60 seconds
cached.stale_until = time.time() + 60
await redis.setex(key, 60, cached.to_json())
logger.warning(f"Refresh failed, extended stale: {e}")
5.4 Edge Case 4: Hot Key Moves to Different Server
SCENARIO: Consistent hashing rebalance moves hot key
Timeline:
T0: Hot key "homepage" on Server A (warm cache)
T1: Server C added to cluster
T2: Consistent hashing moves "homepage" to Server C
T3: Server C cache is COLD for "homepage"
T4: All homepage requests hit database!
SOLUTION: Cache warming on topology change
# When cluster topology changes:
async def on_cluster_rebalance(moved_keys: List[str]):
for key in moved_keys:
if is_hot_key(key):
# Pre-warm on new server
value = await fetch_from_old_server_or_db(key)
await cache.set(key, value)
5.5 Error Handling Matrix
| Scenario | Impact | Handling |
|---|---|---|
| Lock holder crash | Waiters delayed | Lock TTL + wait timeout |
| Slow fetch | Coalesced timeout | Shield future + stale fallback |
| Refresh fails | Stale data used | Extend stale TTL |
| Redis down | No protection | Circuit breaker, rate limit DB |
| Topology change | Cold cache | Pre-warm hot keys |
| Memory pressure | Evictions | Monitor, scale, prioritize hot keys |
Part III: Real-World Application
Chapter 7: How Big Tech Does It
7.1 Case Study: Instagram — Thundering Herd for Viral Posts
INSTAGRAM VIRAL POST HANDLING
Challenge:
Celebrity posts can get 10M+ views in first minute
Initial views all cache misses (not cached yet)
Potential for massive thundering herd
Solution: LEASE-BASED COALESCING
How it works:
1. First request gets a "lease" (token)
2. Lease holder fetches from database
3. Other requests see lease exists, wait
4. Lease holder populates cache
5. Waiters read from cache
Special handling for viral content:
- Detect rapid request rate
- Extend lease duration for popular content
- Pre-warm cache for scheduled posts
Implementation sketch:
async def get_post(post_id):
# Check cache
cached = await cache.get(f"post:{post_id}")
if cached:
return cached
# Try to get lease
lease = await cache.get_lease(f"post:{post_id}", ttl=2)
if lease.is_owner:
# We got the lease - fetch
post = await db.get_post(post_id)
await cache.set_with_lease(f"post:{post_id}", post, lease)
return post
else:
# Wait for lease owner
return await cache.wait_for_value(f"post:{post_id}", timeout=2)
7.2 Case Study: Netflix — Homepage Thundering Herd
NETFLIX HOMEPAGE CACHING
Challenge:
Homepage is highly personalized
But some elements are shared (trending, new releases)
Peak traffic at 7-9 PM every day
Can't afford cache miss storms
Solution: MULTI-LAYER WITH DIFFERENT STRATEGIES
Shared content (trending, categories):
- Background refresh every 5 minutes
- Never expires for user requests
- Stale-while-revalidate if refresh fails
Personalized content (recommendations):
- Request coalescing per user
- Short TTL (1 minute) with probabilistic refresh
- Fallback to non-personalized if slow
Architecture:
User Request
│
▼
┌─────────────────┐
│ Personalized │──▶ Short TTL, Coalescing
│ Recommendations│
└────────┬────────┘
│
▼
┌─────────────────┐
│ Shared Content │──▶ Background refresh, never miss
│ (Trending) │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Page Assembly │
└─────────────────┘
7.3 Case Study: Amazon — Deal of the Day
AMAZON DEAL OF THE DAY
Challenge:
Deal changes at midnight
Millions of users refresh at 00:00:00
Previous deal cache invalidated
Massive thundering herd potential
Solution: PRE-WARMING + ATOMIC SWITCH
How it works:
11:59:00 PM - 1 minute before deal change:
1. Fetch new deal data
2. Cache under different key: "deal:2024-01-16"
3. Old deal still served: "deal:2024-01-15"
12:00:00 AM - Deal change:
1. Atomic key switch: "deal:current" → "deal:2024-01-16"
2. No cache miss! New deal already cached
12:00:01 AM:
1. All requests hit warm cache
2. Zero database load from deal change
Implementation:
# Before midnight
async def prepare_new_deal(date: str, deal_data: dict):
await cache.set(f"deal:{date}", deal_data, ttl=86400)
# At midnight
async def switch_deal(new_date: str):
await cache.set("deal:current_date", new_date)
# Request handler
async def get_current_deal():
date = await cache.get("deal:current_date")
return await cache.get(f"deal:{date}") # Always hits!
7.4 Summary: Industry Patterns
| Company | Challenge | Strategy |
|---|---|---|
| Viral posts | Lease-based coalescing | |
| Netflix | Homepage peak | Background refresh + coalescing |
| Amazon | Deal change | Pre-warming + atomic switch |
| Trending topics | Write-through + replication | |
| Uber | Driver locations | Very short TTL, no herd protection needed |
Chapter 8: Common Mistakes to Avoid
8.1 Mistake 1: Not Protecting Hot Keys
❌ WRONG: Same strategy for all keys
cache_ttl = 300 # 5 minutes for everything
# Homepage (10K req/sec) - same as
# User settings (1 req/min)
Result: Homepage cache expires → 10K DB queries
✅ CORRECT: Identify and protect hot keys
HOT_KEYS = ["homepage", "featured_products", "trending"]
async def get(key: str):
if key in HOT_KEYS:
# Extra protection for hot keys
return await cache.get_or_fetch(
key,
fetch_func,
strategy="background_refresh" # Never expires
)
else:
# Standard protection for normal keys
return await cache.get_or_fetch(
key,
fetch_func,
strategy="coalescing"
)
8.2 Mistake 2: Unbounded Wait in Coalescing
❌ WRONG: Wait forever for coalesced result
async def coalesced_fetch(key, fetch_func):
if key in inflight:
return await inflight[key] # Wait forever!
# If fetcher hangs, all waiters hang
# System becomes unresponsive
✅ CORRECT: Timeout and fallback
async def coalesced_fetch(key, fetch_func):
if key in inflight:
try:
return await asyncio.wait_for(
inflight[key],
timeout=5.0 # Don't wait forever
)
except asyncio.TimeoutError:
# Fetcher is slow - try stale or fetch ourselves
stale = await get_stale(key)
if stale:
return stale
# Fall through to fetch
# Normal fetch path
...
8.3 Mistake 3: Synchronized TTLs
❌ WRONG: All entries expire at same time
async def populate_cache_on_startup():
products = await db.fetch_all_products()
for product in products:
await cache.set(
f"product:{product.id}",
product,
ttl=300 # All expire in exactly 300 seconds!
)
# 5 minutes later: ALL products expire simultaneously
# Massive thundering herd!
✅ CORRECT: Jittered TTLs
async def populate_cache_on_startup():
products = await db.fetch_all_products()
for product in products:
jitter = random.randint(-30, 30) # ±30 seconds
ttl = 300 + jitter
await cache.set(
f"product:{product.id}",
product,
ttl=ttl # Spread expiration over 60 seconds
)
8.4 Mistake 4: Ignoring Cold Start
❌ WRONG: Deploy and hope for the best
# New deployment, cache empty
# Full production traffic hits empty cache
# Database overwhelmed
✅ CORRECT: Cache warming before traffic
async def deploy_with_warming():
# Step 1: Deploy new version (not receiving traffic yet)
await deploy_new_version()
# Step 2: Warm cache with hot keys
hot_keys = ["homepage", "featured", "categories"]
for key in hot_keys:
value = await fetch_from_db(key)
await cache.set(key, value)
# Step 3: Warm cache with top N products
top_products = await db.fetch_top_products(limit=1000)
for product in top_products:
await cache.set(f"product:{product.id}", product)
# Step 4: Now safe to receive traffic
await enable_traffic()
8.5 Mistake Checklist
Before deploying, verify:
- Hot keys identified — Which keys get >1K req/sec?
- Protection strategy chosen — Coalescing, locking, or background?
- TTLs have jitter — Entries don't expire simultaneously
- Timeouts configured — No unbounded waits
- Cold start handled — Cache warming on deploy
- Stale fallback exists — Can serve stale during issues
- Metrics in place — Track coalesced requests, wait times
Part IV: Interview Preparation
Chapter 9: Interview Tips and Phrases
9.1 When to Discuss Thundering Herd
Bring up thundering herd when:
- Designing a cache layer for high-traffic system
- Interviewer asks "what happens when cache is empty?"
- System has known hot spots (homepage, viral content)
- Discussing cold start or deployment strategies
9.2 Key Phrases to Use
INTRODUCING THE PROBLEM:
"One risk with caching is the thundering herd problem. When
a popular cache entry expires, thousands of requests might
simultaneously hit the database. I'd want to protect against
that."
EXPLAINING REQUEST COALESCING:
"My primary defense would be request coalescing. If multiple
requests come in for the same missing key, only one actually
queries the database. The others wait for that result. This
turns 1000 database queries into 1."
EXPLAINING STALE-WHILE-REVALIDATE:
"For user experience, I'd add stale-while-revalidate. When
cache expires, we serve the stale data immediately—which is
usually fine for a few seconds—while refreshing in the
background. Users get fast responses, and we avoid the
thundering herd."
DISCUSSING HOT KEYS:
"For known hot keys like the homepage, I'd use background
refresh. A background job refreshes the cache every 30
seconds, well before the TTL expires. Those keys never
actually expire from a user's perspective."
HANDLING FOLLOW-UP QUESTIONS:
"If the background refresh fails, we extend the stale
window rather than letting the entry expire. This prevents
a thundering herd when the database is already struggling."
9.3 Questions to Ask Interviewer
- "What are the hottest cache keys in this system?"
- "What's the peak request rate we need to handle?"
- "Is serving slightly stale data acceptable?"
- "How do we handle deployments and cold starts?"
9.4 Common Follow-up Questions
| Question | Good Answer |
|---|---|
| "What if the database is slow?" | "Stale-while-revalidate helps here. Users get cached data immediately. The slow database query happens in the background. We'd also have circuit breakers to prevent overloading a struggling database." |
| "How do you handle cold start?" | "Cache warming before enabling traffic. We pre-populate the cache with hot keys like homepage, top products, and active deals. Then gradually increase traffic to the new instance." |
| "What if Redis goes down?" | "Circuit breaker opens, requests go to database. We'd rate limit to protect the database—maybe serve a degraded experience rather than overwhelm it. And alert ops immediately." |
| "How do you know which keys are hot?" | "Metrics on cache access patterns. Track hit counts per key prefix. Also domain knowledge—homepage is always hot, deal pages are hot during sales. We can pre-identify these." |
Chapter 10: Practice Problems
Problem 1: Black Friday Homepage
Setup: Your e-commerce homepage gets 50,000 requests/second during Black Friday. The homepage cache has a 60-second TTL. Design thundering herd protection.
Requirements:
- Homepage data takes 500ms to compute (multiple DB queries)
- Black Friday deals update every hour
- Database can handle 100 queries/second
- Users can tolerate 10-second stale data
Questions:
- What happens at TTL expiration without protection?
- How would you prevent the thundering herd?
- How do you handle the hourly deal update?
- 50,000 simultaneous requests vs 100 queries/second capacity
- Consider background refresh for known hot key
- Deal updates are scheduled — can you pre-warm?
Without protection:
- TTL expires, 50,000 requests miss
- All 50,000 try to query database
- Database capacity: 100/sec
- Time to clear backlog: 500 seconds
- Result: Site effectively down
Protection strategy:
# 1. Background refresh (primary)
# Homepage is known hot key - never let it expire
background_refresh_config = {
"homepage": {
"fetch_func": compute_homepage,
"ttl": 60,
"refresh_before": 10 # Refresh 10s before expiry
}
}
# Background job runs every 50 seconds
# Homepage always warm, never expires from user perspective
# 2. Stale-while-revalidate (backup)
# If background refresh fails, serve stale
cache_config = {
"fresh_ttl": 60, # Fresh for 60s
"stale_ttl": 300 # Usable stale for 5min
}
# Even if refresh fails, users get data for 5 minutes
# 3. Hourly deal update
# Pre-warm before the hour
async def update_deals():
# At X:59:00 (1 min before deal change)
new_homepage = await compute_homepage_with_new_deals()
await cache.set("homepage:next_hour", new_homepage)
# At X:00:00 (deal change time)
await cache.rename("homepage:next_hour", "homepage")
# No cache miss! New homepage already cached
# 4. Request coalescing (additional safety)
# If somehow cache misses, coalesce requests
# Even if 1000 requests miss, only 1 queries database
Result: Zero thundering herd, smooth Black Friday.
Problem 2: Viral Tweet
Setup: A celebrity with 50 million followers tweets. Within seconds, millions of users want to see the tweet. The tweet wasn't in cache before (new content).
Requirements:
- Tweet must be visible within 2 seconds of posting
- System normally handles 10,000 tweets/second
- Database can handle 50,000 reads/second
- This tweet might get 100,000+ reads/second
Questions:
- How do you handle the initial cache miss for a viral tweet?
- How do you detect that a tweet is going viral?
- What's different about caching for celebrity accounts?
- Initial request MUST query database (not cached yet)
- Request coalescing helps but needs quick detection
- Celebrity accounts are known in advance
Challenge analysis:
- New tweet → not in cache → first request(s) miss
- Within 1 second, could have 100K requests
- Without protection: 100K DB queries, overwhelm
Solution: Celebrity fast-path + coalescing
# 1. Celebrity accounts get special handling
CELEBRITY_ACCOUNTS = load_accounts_over_1M_followers()
async def on_new_tweet(tweet: Tweet):
if tweet.author_id in CELEBRITY_ACCOUNTS:
# Immediately cache new tweet
await cache.set(
f"tweet:{tweet.id}",
tweet,
ttl=3600
)
# Pre-warm in all regional caches
await broadcast_to_regions(tweet)
# Normal tweets cached on first read
# 2. Request coalescing for all tweets
async def get_tweet(tweet_id: str):
# Check cache
cached = await cache.get(f"tweet:{tweet_id}")
if cached:
return cached
# Coalesced fetch
return await coalescing_cache.get_or_fetch(
key=f"tweet:{tweet_id}",
fetch_func=lambda: db.get_tweet(tweet_id),
ttl=3600
)
# 3. Viral detection and promotion
class ViralDetector:
def __init__(self):
self.request_counts = Counter()
async def track(self, tweet_id: str):
self.request_counts[tweet_id] += 1
if self.request_counts[tweet_id] > 1000: # Threshold
# Promote to hot key handling
await self.promote_to_hot(tweet_id)
async def promote_to_hot(self, tweet_id: str):
# Extend TTL
await cache.expire(f"tweet:{tweet_id}", 3600)
# Replicate to edge caches
await replicate_to_edge(tweet_id)
# Alert for monitoring
logger.info(f"Viral tweet detected: {tweet_id}")
Key insight: Pre-cache for known celebrities, fast coalescing for surprise virality.
Problem 3: Distributed Cache Cluster
Setup: You have a Redis cluster with 10 nodes. Cache keys are distributed via consistent hashing. One node goes down, causing keys to move to other nodes.
Requirements:
- 1 million keys on the failed node
- Keys are now "cold" on their new nodes
- Traffic continues during failover
- Database must not be overwhelmed
Questions:
- What happens to the 1 million keys when the node fails?
- How do you prevent thundering herd during failover?
- How would you design for this scenario?
- Consistent hashing redistributes keys
- New nodes have cold cache for those keys
- Not all keys are equally hot
Problem analysis:
- Node fails, 1M keys moved to other nodes
- Those nodes have cold cache for moved keys
- If 1% of keys are hot (10K keys)
- Each hot key gets 100 req/sec
- Potential: 1M+ DB queries in first second
Solution: Multi-layer protection
# 1. Replica reads (primary defense)
# Redis Cluster has replicas - promote before primary fails
# Configure Redis Cluster with replicas
# replica-serve-stale-data yes
# Replicas can serve reads even during failover
# 2. Request coalescing (application layer)
# Even after failover, coalesce requests
# This is always on, helps during any cache miss
# 3. Graceful degradation
# Limit database queries during failover
class CircuitBreaker:
def __init__(self, max_concurrent: int = 1000):
self.semaphore = asyncio.Semaphore(max_concurrent)
async def query(self, func):
async with self.semaphore:
return await func()
# Only 1000 concurrent DB queries
# Others wait or get stale/error
# 4. Hot key warming on topology change
async def on_cluster_topology_change(moved_keys: List[str]):
# Identify hot keys among moved keys
hot_keys = [k for k in moved_keys if is_hot_key(k)]
# Warm hot keys first (prioritized)
for key in hot_keys[:100]: # Top 100 hottest
try:
value = await db.fetch(key)
await cache.set(key, value)
except Exception:
pass # Best effort
# 5. Stale serving from old node (if possible)
# Before fully decommissioning old node:
async def get_with_fallback(key: str):
# Try new node
value = await new_node.get(key)
if value:
return value
# Try old node (might still be up for reads)
try:
value = await old_node.get(key)
if value:
# Migrate to new node
await new_node.set(key, value)
return value
except:
pass
# Fall through to database
return await coalesced_fetch(key)
Key insight: Multiple layers of defense, graceful degradation over total failure.
Chapter 11: Mock Interview Dialogue
Scenario: Design Cache for Trending Topics
Interviewer: "We're building a trending topics feature. It shows the top 10 trending hashtags, updated every minute. We expect 100,000 requests per second for this data. How would you design the caching?"
You: "Interesting problem. Let me make sure I understand the access pattern.
So we have a single piece of data—the top 10 trending topics—that:
- Updates every minute
- Gets 100,000 requests per second
- Is identical for all users
This is a classic hot key problem. My primary concern is the thundering herd when the cache expires or updates."
Interviewer: "Exactly. Walk me through your approach."
You: "I'd use background refresh as the primary strategy. Here's why:
Since we know this key is hot and we know when it updates (every minute), we can be proactive. A background job computes the new trending topics and updates the cache before the old data expires.
Timeline:
T=0s Background job computes trending, caches with TTL=90s
T=60s Background job computes new trending, updates cache
T=60s Old cache would have expired, but we already updated
User requests always hit cache. Never a miss.
Interviewer: "What if the background job fails?"
You: "Good question. I'd have two safeguards:
First, stale-while-revalidate. Even if the background job misses a cycle, the cache entry is still valid—just stale. Users get the 1-minute-old trending topics while we retry the refresh. For trending topics, being 1-2 minutes stale is usually acceptable.
Second, extended TTL. I set the cache TTL to 90 seconds, not 60, even though we refresh every 60. This gives a 30-second buffer. If the job is delayed, users still get data.
If multiple refresh attempts fail, I'd alert ops and potentially serve a fallback—maybe the last successful trending topics with a 'slightly stale' indicator."
Interviewer: "What about the 100,000 requests per second? Can a single Redis key handle that?"
You: "That's a great point. A single Redis key can handle around 100K operations per second, so we're at the edge.
I'd do two things:
First, read replicas. Redis Cluster can route reads to replicas. With 3 replicas, we distribute the 100K reads across 4 nodes—25K each, easily handled.
Second, local caching. Each application server can cache the trending topics locally with a 5-second TTL. Most requests hit the local cache, never reaching Redis at all.
Request → Local Cache (hit 80%) → Redis (hit 99.9%) → Background Job
With 100 app servers and 5-second local TTL, each server makes 12 Redis requests per minute. That's only 1,200 Redis requests per minute total—trivial load."
Interviewer: "Interesting. How do you invalidate the local caches when trending updates?"
You: "For this use case, I wouldn't explicitly invalidate. The 5-second local TTL means users see updates within 5 seconds, which is acceptable for trending topics.
But if we needed faster propagation, I could use pub/sub:
- Background job updates Redis cache
- Background job publishes to 'trending-updated' channel
- App servers subscribe, clear local cache on message
- Next local request fetches from Redis
This gives near-instant propagation when needed."
Interviewer: "Last question: how would you handle a cold start? Say all your caches are empty."
You: "Cold start is where thundering herd is most dangerous. With 100K req/sec hitting an empty cache, we'd overwhelm any database.
My approach:
-
Deployment ordering — The background job that computes trending runs before the API servers start accepting traffic. Cache is warm before first user request.
-
Health check gate — API servers report unhealthy until the trending cache is populated. Load balancer won't route traffic to cold servers.
-
Coalescing as fallback — Even if somehow cache is cold with traffic, request coalescing ensures only one computation runs. The other 99,999 requests wait for that result.
Combining these, a cold start scenario should be impossible in normal operation, and handled gracefully if it somehow occurs."
Summary
DAY 3 KEY TAKEAWAYS
THE PROBLEM:
• Thundering herd = many requests hit empty cache simultaneously
• Can overwhelm database (N requests → N queries)
• Occurs on: expiration, cold start, invalidation, hot key creation
PROTECTION STRATEGIES:
Locking (Mutex):
• One request fetches, others wait
• Simple but adds latency
• Use for: Moderate traffic
Request Coalescing:
• Duplicate requests share single fetch
• No polling overhead
• Use for: High traffic (primary strategy)
Probabilistic Expiration:
• Random early refresh prevents synchronized expiry
• No coordination needed
• Use for: Distributed cache, many keys
Background Refresh:
• Refresh before expiry, never miss
• Most effective for known hot keys
• Use for: Homepage, trending, deals
Stale-While-Revalidate:
• Serve stale immediately, refresh async
• Best user experience
• Use for: When slight staleness is OK
COMBINING STRATEGIES (Production):
Hot keys (homepage): Background refresh + stale fallback
Normal keys: Coalescing + probabilistic TTL
Cold start: Warming + coalescing + health gates
KEY PRINCIPLES:
• Identify hot keys in advance
• Add jitter to TTLs (prevent synchronized expiry)
• Always have timeouts (no unbounded waits)
• Monitor coalesced requests and wait times
• Plan for cold start explicitly
📚 Further Reading
Official Documentation
- Redis Best Practices: https://redis.io/docs/management/optimization/
- Memcached Wiki: https://github.com/memcached/memcached/wiki
Engineering Blogs
- Instagram Thundering Herd: https://instagram-engineering.com/thundering-herds-promises-82191c8af57d
- Facebook Lease-based Caching: https://research.facebook.com/publications/scaling-memcache-at-facebook/
Papers
- "Scaling Memcache at Facebook" — Sections on leases and thundering herd
- "XFetch: A Probabilistic Early Expiration Algorithm"
Books
- "Designing Data-Intensive Applications" by Martin Kleppmann
End of Day 3: Thundering Herd
Tomorrow: Day 4 — Feed Caching. We'll tackle the "celebrity problem": how do you cache personalized feeds for millions of users when one celebrity's post needs to appear in 10 million feeds? You'll learn push-on-write vs pull-on-read and when to use each.