Himanshu Kukreja
0%
Day 01

Week 2 — Day 1: Timeout Hell

System Design Mastery Series


Preface

Imagine you're at a restaurant. You order food, and the waiter says "it'll be ready soon." How long do you wait before asking what's going on? 5 minutes? 30 minutes? An hour?

If you wait too long, you waste your evening. If you give up too quickly, your food might have been 30 seconds away.

This is the timeout problem.

In distributed systems, your service calls other services. Those services might be slow. They might be dead. They might be "thinking about it." How long do you wait before giving up?

Get it wrong, and you'll either:

  • Wait forever while your users stare at a loading spinner
  • Give up too early and fail requests that would have succeeded
  • Create cascade failures that take down your entire system

Today, we learn to set timeouts correctly. It's harder than it sounds.


Part I: Foundations

Chapter 1: Why Timeouts Are Hard

1.1 The Basic Problem

When you make a network call, four things can happen:

1. SUCCESS (fast):    Request → Response in 50ms     ✓ Great!
2. SUCCESS (slow):    Request → Response in 5000ms   ? Is this ok?
3. FAILURE (clear):   Request → Error response       ✓ At least we know
4. FAILURE (silent):  Request → ... nothing ...      ? How long do we wait?

Case 4 is the killer. The server might be:

  • Processing your request (just slow)
  • Dead and will never respond
  • Alive but your request got lost
  • Responding, but the response got lost

You can't tell the difference. All you know is: no response yet.

1.2 A Simple Analogy: The Pizza Delivery

You order pizza. Estimated delivery: 30 minutes.

Timeline:
  0 min:  Order placed
  30 min: No pizza yet. Wait more?
  45 min: No pizza yet. Call the restaurant?
  60 min: No pizza yet. Assume it's lost?
  
If you call at 30 min:
  → Maybe the driver is 2 minutes away. You look impatient.
  
If you wait until 60 min:
  → Maybe the driver crashed at minute 5. You wasted an hour.

The "right" answer depends on:
  - How reliable is this restaurant usually?
  - How hungry are you?
  - Can you order from somewhere else?

Timeouts are the same. You need to decide:

  • How long is "too long" for this service?
  • What do you do when you give up?
  • Can you try an alternative?

1.3 The Real Danger: Resource Exhaustion

Here's what makes timeouts critical in production:

Your server has 100 threads to handle requests.
Each request calls a downstream service.

Normal day:
  - Downstream responds in 100ms
  - Each thread handles 10 requests/second
  - Capacity: 1000 requests/second ✓

Bad day (downstream is slow):
  - Downstream responds in 10 seconds
  - Each thread handles 0.1 requests/second
  - Capacity: 10 requests/second ✗

  All 100 threads are waiting for slow responses.
  New requests queue up.
  Queue fills up.
  Your service starts rejecting everything.
  Your service looks dead to YOUR callers.

A slow downstream service can make YOU look dead. This is a cascade failure.

1.4 The Golden Rule

"A service that's slow is worse than a service that's down."

Why? When a service is down:

  • You get an error immediately
  • You can fall back to Plan B
  • Your threads aren't blocked

When a service is slow:

  • You don't know if it will eventually respond
  • Your threads are stuck waiting
  • You're slowly drowning

Timeouts convert "slow" into "down" so you can deal with it.


Chapter 2: Understanding Latency

Before setting timeouts, you need to understand latency.

2.1 Percentiles Matter, Averages Lie

Service A response times (1000 requests):
  - 950 requests: 50ms
  - 40 requests:  200ms
  - 9 requests:   500ms
  - 1 request:    5000ms

Average: ~85ms
P50 (median): 50ms
P95: 200ms
P99: 500ms
P99.9: 5000ms

Which number do you use for timeouts?

If you set timeout = average (85ms):

  • 5% of requests will timeout even when the service is healthy!
  • That's 50 failed requests per 1000. Unacceptable.

If you set timeout = P99 (500ms):

  • Only 1% of healthy requests timeout
  • But those 1% might just need a bit more time

If you set timeout = P99.9 (5000ms):

  • Almost no healthy requests timeout
  • But slow requests tie up your threads for 5 seconds

There's no perfect answer. You're trading:

  • False timeouts (good requests killed) vs
  • Resource waste (threads waiting too long)

2.2 The Latency Distribution

Most services have a "long tail" distribution:

          ▲ Number of requests
          │
    ██████│
    ██████│
    ██████│████
    ██████│████
    ██████│████████
    ██████│██████████████████████____________________
          └─────────────────────────────────────────▶ Response time
          50ms        200ms       500ms      5000ms
          
Most requests are fast, but some are VERY slow.

Why does this happen?

  • Garbage collection pauses
  • Database query hitting cold cache
  • Network congestion
  • Unlucky thread scheduling
  • The service itself calling a slow dependency

2.3 Know Your Numbers

Before setting any timeout, measure your dependencies:

import time
from dataclasses import dataclass
from typing import List
import statistics

@dataclass
class LatencyStats:
    p50: float
    p95: float
    p99: float
    p999: float
    max: float
    
    @classmethod
    def from_samples(cls, samples: List[float]) -> 'LatencyStats':
        sorted_samples = sorted(samples)
        n = len(sorted_samples)
        
        return cls(
            p50=sorted_samples[int(n * 0.50)],
            p95=sorted_samples[int(n * 0.95)],
            p99=sorted_samples[int(n * 0.99)],
            p999=sorted_samples[int(n * 0.999)] if n >= 1000 else sorted_samples[-1],
            max=sorted_samples[-1]
        )

# Example: Measuring a service
samples = [measure_call() for _ in range(10000)]
stats = LatencyStats.from_samples(samples)

print(f"P50: {stats.p50:.1f}ms")   # Typical response
print(f"P95: {stats.p95:.1f}ms")   # Most responses
print(f"P99: {stats.p99:.1f}ms")   # Almost all responses
print(f"P99.9: {stats.p999:.1f}ms") # Even outliers
print(f"Max: {stats.max:.1f}ms")   # Worst case seen

Chapter 3: Timeout Strategies

3.1 Strategy 1: Fixed Timeout

The simplest approach: pick a number and stick with it.

import requests

def call_service(url: str) -> dict:
    response = requests.get(url, timeout=5.0)  # Always 5 seconds
    return response.json()

How to pick the number:

Rule of thumb: timeout = P99 × 2 to 3

If P99 = 200ms:
  timeout = 400-600ms
  
This means:
  - 99% of healthy requests complete
  - Slow requests get some extra time
  - But not SO much time that you're stuck

Pros:

  • Simple to implement
  • Easy to understand
  • Predictable behavior

Cons:

  • Doesn't adapt to changes
  • Same timeout for peak hours and off-hours
  • Can't account for temporary slowdowns

3.2 Strategy 2: Timeout Budget

When you call multiple services, divide your time wisely.

Your API has 5 second SLA (must respond within 5s).
You call 3 services:
  - Fraud check: P99 = 200ms
  - Bank API: P99 = 2000ms  
  - Notification: P99 = 100ms

Bad approach - 5s for everyone:
  If fraud check takes 5s (slow day), you have 0s left for bank!
  
Good approach - budget:
  Total budget: 4.5s (leave 500ms for your own processing)
  
  Fraud check: 500ms (2.5x P99)
  Bank API: 3500ms (1.75x P99)  
  Notification: 500ms (5x P99)
  Total: 4500ms ✓
from dataclasses import dataclass
from typing import Optional
import time

@dataclass
class TimeoutBudget:
    """Manages timeout budget across multiple calls."""
    
    total_budget_ms: float
    start_time: float = None
    
    def __post_init__(self):
        self.start_time = time.time() * 1000  # Current time in ms
    
    def remaining(self) -> float:
        """How much time is left?"""
        elapsed = (time.time() * 1000) - self.start_time
        return max(0, self.total_budget_ms - elapsed)
    
    def get_timeout(self, default: float, minimum: float = 100) -> float:
        """Get timeout for next call, respecting remaining budget."""
        remaining = self.remaining()
        
        if remaining <= minimum:
            raise TimeoutError("Budget exhausted")
        
        # Use default if we have plenty of time, otherwise use remaining
        return min(default, remaining - minimum)  # Keep 100ms buffer

# Usage
def process_payment(user_id: str, amount: float):
    budget = TimeoutBudget(total_budget_ms=4500)
    
    # Call 1: Fraud check (want 500ms, but respect budget)
    fraud_timeout = budget.get_timeout(default=500)
    fraud_result = fraud_check(user_id, amount, timeout=fraud_timeout)
    
    # Call 2: Bank API (want 3500ms, but respect remaining budget)
    bank_timeout = budget.get_timeout(default=3500)
    bank_result = bank_charge(user_id, amount, timeout=bank_timeout)
    
    # Call 3: Notification (want 500ms, but respect remaining budget)
    notify_timeout = budget.get_timeout(default=500)
    send_notification(user_id, bank_result, timeout=notify_timeout)
    
    return bank_result

3.3 Strategy 3: Deadline Propagation

Pass the deadline through your entire call chain.

User → API Gateway → Payment Service → Bank API

Without deadline propagation:
  User: "I'll wait 5 seconds"
  Gateway: Sets 5s timeout to Payment
  Payment: Sets 5s timeout to Bank
  
  If Gateway→Payment takes 3s, Payment still waits 5s for Bank.
  Total: Could be 8 seconds! User gave up at 5s.

With deadline propagation:
  User: "Deadline: 10:00:05.000"
  Gateway: Passes deadline to Payment
  Payment: "I have until 10:00:05.000. It's now 10:00:02.000."
           "I have 3 seconds left for Bank."
           Sets 2.5s timeout to Bank (keep 500ms buffer).
from datetime import datetime, timedelta
from typing import Optional

class Deadline:
    """Represents an absolute deadline that propagates through calls."""
    
    def __init__(self, deadline: datetime):
        self.deadline = deadline
    
    @classmethod
    def from_timeout(cls, timeout_ms: float) -> 'Deadline':
        """Create deadline from relative timeout."""
        return cls(datetime.now() + timedelta(milliseconds=timeout_ms))
    
    def remaining_ms(self) -> float:
        """Milliseconds until deadline."""
        remaining = (self.deadline - datetime.now()).total_seconds() * 1000
        return max(0, remaining)
    
    def is_expired(self) -> bool:
        return datetime.now() >= self.deadline
    
    def to_header(self) -> str:
        """Convert to HTTP header for propagation."""
        return self.deadline.isoformat()
    
    @classmethod
    def from_header(cls, header: str) -> Optional['Deadline']:
        """Parse from HTTP header."""
        try:
            return cls(datetime.fromisoformat(header))
        except:
            return None

# In your API handler
def payment_handler(request):
    # Get deadline from incoming request (or create default)
    deadline_header = request.headers.get('X-Deadline')
    if deadline_header:
        deadline = Deadline.from_header(deadline_header)
    else:
        deadline = Deadline.from_timeout(5000)  # Default 5s
    
    # Check if already expired
    if deadline.is_expired():
        return error_response("Request deadline exceeded")
    
    # Use remaining time for downstream call
    remaining = deadline.remaining_ms()
    if remaining < 100:
        return error_response("Insufficient time remaining")
    
    # Call downstream with propagated deadline
    response = call_bank_api(
        timeout=remaining - 100,  # Keep 100ms buffer
        headers={'X-Deadline': deadline.to_header()}
    )
    
    return response

3.4 Strategy 4: Adaptive Timeouts

Automatically adjust timeouts based on observed latency.

Basic idea:
  - Track recent response times
  - Set timeout = recent_P99 × safety_factor
  - Timeout adapts as service gets faster or slower

Example:
  Hour 1: Service P99 = 100ms → timeout = 300ms
  Hour 2: Service P99 = 500ms → timeout = 1500ms (service degraded)
  Hour 3: Service P99 = 80ms  → timeout = 240ms (service recovered)
from collections import deque
import threading
import time

class AdaptiveTimeout:
    """
    Adjusts timeout based on recent response times.
    
    Uses exponentially weighted moving average for smooth adaptation.
    """
    
    def __init__(
        self,
        initial_timeout_ms: float = 1000,
        min_timeout_ms: float = 100,
        max_timeout_ms: float = 30000,
        safety_factor: float = 3.0,
        window_size: int = 100
    ):
        self.min_timeout = min_timeout_ms
        self.max_timeout = max_timeout_ms
        self.safety_factor = safety_factor
        
        # Track recent latencies
        self.latencies = deque(maxlen=window_size)
        self.lock = threading.Lock()
        
        # Initialize with default
        self.current_timeout = initial_timeout_ms
    
    def record_latency(self, latency_ms: float):
        """Record observed latency."""
        with self.lock:
            self.latencies.append(latency_ms)
            self._update_timeout()
    
    def record_timeout(self):
        """Record that a timeout occurred."""
        with self.lock:
            # Treat timeout as latency = current_timeout
            # This pushes the adaptive timeout higher
            self.latencies.append(self.current_timeout)
            self._update_timeout()
    
    def _update_timeout(self):
        """Recalculate timeout based on recent data."""
        if len(self.latencies) < 10:
            return  # Not enough data
        
        # Calculate P99 of recent latencies
        sorted_latencies = sorted(self.latencies)
        p99_index = int(len(sorted_latencies) * 0.99)
        p99 = sorted_latencies[p99_index]
        
        # New timeout = P99 × safety factor, clamped to bounds
        new_timeout = p99 * self.safety_factor
        self.current_timeout = max(self.min_timeout, 
                                    min(self.max_timeout, new_timeout))
    
    def get_timeout(self) -> float:
        """Get current timeout value."""
        return self.current_timeout

# Usage
fraud_check_timeout = AdaptiveTimeout(
    initial_timeout_ms=500,
    min_timeout_ms=100,
    max_timeout_ms=2000,
    safety_factor=2.5
)

def call_fraud_check(data: dict) -> dict:
    timeout = fraud_check_timeout.get_timeout()
    start = time.time()
    
    try:
        result = http_client.post('/fraud/check', json=data, timeout=timeout/1000)
        latency = (time.time() - start) * 1000
        fraud_check_timeout.record_latency(latency)
        return result.json()
    
    except TimeoutError:
        fraud_check_timeout.record_timeout()
        raise

Pros of adaptive timeouts:

  • Automatically adjusts to changing conditions
  • No need to manually tune after deployment
  • Can detect degradation early

Cons of adaptive timeouts:

  • Can oscillate if not tuned carefully
  • Cold start problem (no data yet)
  • May mask real problems (just keeps increasing timeout)

Chapter 4: Cascade Failures

4.1 How Timeouts Cause Cascades

The most dangerous failure mode:

Setup:
  Service A calls Service B calls Service C
  Each has 100 threads
  
Normal:
  A → B → C
  100ms total
  Everyone happy

C gets slow (P99 goes from 50ms to 10s):

  Second 1:
    B's threads are waiting for C
    50 of B's 100 threads are blocked
    
  Second 5:
    All of B's threads waiting for C
    B can't accept new requests from A
    
  Second 6:
    A's threads are waiting for B
    A starts to back up
    
  Second 10:
    A, B, and C are all "slow"
    User sees timeout
    
  The user only called A. They don't even know C exists!

4.2 The Thread Pool Death Spiral

                    ┌─────────────────────────────────────────┐
                    │         The Death Spiral                │
                    │                                         │
                    │   Downstream       Thread pool fills    │
                    │   gets slow   ──▶  waiting for it       │
                    │       │                  │               │
                    │       │                  ▼               │
                    │       │          Can't accept new       │
                    │       │          requests                │
                    │       │                  │               │
                    │       │                  ▼               │
                    │       │          OUR service looks      │
                    │       │          slow to callers        │
                    │       │                  │               │
                    │       ▼                  ▼               │
                    │   THEIR callers' thread pools fill      │
                    │                                         │
                    │          Cascade continues...           │
                    └─────────────────────────────────────────┘

4.3 Preventing Cascades

Solution 1: Proper timeouts (today's focus)

Don't wait forever. Give up and free the thread.

# Bad: No timeout
response = requests.get(url)  # Could wait forever

# Good: Explicit timeout
response = requests.get(url, timeout=2.0)  # Give up after 2s

Solution 2: Bulkheads (isolate failures)

Don't let one slow dependency consume all threads.

# Dedicated thread pool per dependency
fraud_check_pool = ThreadPoolExecutor(max_workers=20)
bank_api_pool = ThreadPoolExecutor(max_workers=50)
notification_pool = ThreadPoolExecutor(max_workers=10)

# If bank_api is slow, it only affects its 50 threads
# fraud_check and notification can still work

Solution 3: Circuit breakers (Day 3)

Stop calling a failing service entirely.

Solution 4: Load shedding

When overloaded, reject some requests immediately.

if active_requests > MAX_CONCURRENT:
    return Response(status=503, body="Service overloaded")

Chapter 5: Connection vs Read Timeouts

5.1 Two Different Timeouts

Most HTTP clients have two timeout settings:

┌─────────────────────────────────────────────────────────────────────┐
│                         Request Timeline                             │
│                                                                      │
│  Start ──▶ TCP Connect ──▶ Send Request ──▶ Wait ──▶ Read Response  │
│            │                                 │        │              │
│            └─── Connection Timeout ──────────┘        │              │
│                                                       │              │
│            └─────────── Read Timeout ─────────────────┘              │
│                                                                      │
│  OR: Total timeout (some libraries)                                 │
│  └─────────────────── Total Timeout ────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────────┘

Connection timeout: How long to wait for TCP handshake

  • Usually should be SHORT (1-5 seconds)
  • If you can't connect in 5s, something is very wrong
  • Network issue, firewall, service down

Read timeout: How long to wait for response after connected

  • Depends on what the service does
  • Can be longer for complex operations
  • This is usually what people mean by "timeout"
import requests

# Separate timeouts
response = requests.get(
    url,
    timeout=(3.0, 10.0)  # (connect_timeout, read_timeout)
)

# Or single total timeout (varies by library)
response = requests.get(
    url,
    timeout=10.0  # Total time for entire request
)

5.2 Common Mistake: Forgetting Connection Timeout

# Dangerous: Long read timeout, no connection timeout
response = requests.get(url, timeout=30.0)

# If the server is unreachable (firewall, wrong IP), this might:
# - Wait 30s for TCP to give up (system default)
# - Or hang indefinitely on some systems!

# Better: Short connection, longer read
response = requests.get(url, timeout=(3.0, 30.0))

Part II: The Design Challenge

Chapter 6: Payment Service Design

6.1 The System

You're building a payment service. When a user clicks "Pay $99", you need to:

┌──────────────────────────────────────────────────────────────────────┐
│                         Payment Flow                                  │
│                                                                       │
│  User clicks                                                         │
│  "Pay $99"                                                           │
│      │                                                               │
│      ▼                                                               │
│  ┌─────────────────┐                                                 │
│  │ Your Payment    │                                                 │
│  │ Service         │                                                 │
│  └────────┬────────┘                                                 │
│           │                                                          │
│           ├────────────────┐                                         │
│           │                │                                         │
│           ▼                │                                         │
│  ┌─────────────────┐       │                                         │
│  │ 1. Fraud Check  │       │  "Is this transaction suspicious?"      │
│  │    P99: 200ms   │       │                                         │
│  └────────┬────────┘       │                                         │
│           │                │                                         │
│           ▼                │                                         │
│  ┌─────────────────┐       │                                         │
│  │ 2. Bank API     │       │  "Charge the card"                      │
│  │    P99: 2000ms  │       │                                         │
│  └────────┬────────┘       │                                         │
│           │                │                                         │
│           ▼                ▼                                         │
│  ┌─────────────────────────────┐                                     │
│  │ 3. Notification Service     │  "Send confirmation email"          │
│  │    P99: 100ms              │                                      │
│  └─────────────────────────────┘                                     │
│                                                                       │
└──────────────────────────────────────────────────────────────────────┘

6.2 Requirements

  • User SLA: Response within 5 seconds
  • Success rate: 99.9% of valid payments should succeed
  • No hanging: User should never wait more than 5s
  • Graceful degradation: Some failures are acceptable, but handle them well

6.3 Service Characteristics

Service P50 P99 P99.9 Notes
Fraud Check 50ms 200ms 500ms Fast ML model, occasional slow
Bank API 500ms 2000ms 5000ms External, we don't control it
Notification 20ms 100ms 200ms Internal, very reliable

6.4 The Naive Implementation (Don't Do This)

def process_payment_naive(user_id: str, amount: float) -> dict:
    """
    WRONG: No timeouts, no error handling.
    """
    # Step 1: Fraud check (could hang forever)
    fraud_result = requests.post(
        'http://fraud-service/check',
        json={'user_id': user_id, 'amount': amount}
    )
    
    if fraud_result.json()['is_fraudulent']:
        return {'status': 'rejected', 'reason': 'fraud'}
    
    # Step 2: Charge bank (could hang forever)
    bank_result = requests.post(
        'http://bank-api/charge',
        json={'user_id': user_id, 'amount': amount}
    )
    
    # Step 3: Send notification (could hang forever)
    requests.post(
        'http://notification-service/send',
        json={'user_id': user_id, 'message': 'Payment successful'}
    )
    
    return {'status': 'success', 'transaction_id': bank_result.json()['id']}

Problems:

  1. No timeouts — any service can hang forever
  2. No error handling — exceptions crash the request
  3. Notification failure fails the whole payment
  4. User could wait minutes for a response

6.5 First Improvement: Add Basic Timeouts

import requests
from requests.exceptions import Timeout, RequestException

def process_payment_v2(user_id: str, amount: float) -> dict:
    """
    Better: Has timeouts, but not well thought out.
    """
    try:
        # Step 1: Fraud check
        fraud_result = requests.post(
            'http://fraud-service/check',
            json={'user_id': user_id, 'amount': amount},
            timeout=5.0  # 5 second timeout
        )
        
        if fraud_result.json()['is_fraudulent']:
            return {'status': 'rejected', 'reason': 'fraud'}
        
        # Step 2: Charge bank
        bank_result = requests.post(
            'http://bank-api/charge',
            json={'user_id': user_id, 'amount': amount},
            timeout=5.0  # 5 second timeout
        )
        
        # Step 3: Send notification
        requests.post(
            'http://notification-service/send',
            json={'user_id': user_id, 'message': 'Payment successful'},
            timeout=5.0  # 5 second timeout
        )
        
        return {'status': 'success', 'transaction_id': bank_result.json()['id']}
    
    except Timeout:
        return {'status': 'error', 'reason': 'timeout'}
    except RequestException as e:
        return {'status': 'error', 'reason': str(e)}

Better, but still problems:

  1. Each service gets 5s — total could be 15s!
  2. Same timeout for fast service (fraud) and slow service (bank)
  3. Notification failure still fails the payment
  4. User SLA is 5s, but we could take up to 15s

6.6 Proper Implementation: Timeout Budget

import requests
from requests.exceptions import Timeout, RequestException
from dataclasses import dataclass
from typing import Optional
import time
import logging

logger = logging.getLogger(__name__)

@dataclass
class PaymentResult:
    status: str  # 'success', 'rejected', 'error'
    transaction_id: Optional[str] = None
    reason: Optional[str] = None

class PaymentService:
    """
    Payment service with proper timeout management.
    """
    
    # Service configuration
    TOTAL_BUDGET_MS = 4500  # Leave 500ms buffer from 5s SLA
    
    # Individual service timeouts (default, can be reduced by budget)
    FRAUD_TIMEOUT_MS = 600    # P99 is 200ms, so 3x
    BANK_TIMEOUT_MS = 3500    # P99 is 2000ms, so 1.75x (it's slow)
    NOTIFY_TIMEOUT_MS = 300   # P99 is 100ms, so 3x
    
    def __init__(self, fraud_url: str, bank_url: str, notify_url: str):
        self.fraud_url = fraud_url
        self.bank_url = bank_url
        self.notify_url = notify_url
    
    def process_payment(self, user_id: str, amount: float) -> PaymentResult:
        """
        Process a payment with proper timeout budget management.
        """
        start_time = time.time()
        
        def remaining_budget() -> float:
            elapsed = (time.time() - start_time) * 1000
            return max(0, self.TOTAL_BUDGET_MS - elapsed)
        
        def get_timeout(default_ms: float, min_ms: float = 100) -> float:
            """Get timeout respecting remaining budget."""
            remaining = remaining_budget()
            if remaining < min_ms:
                raise TimeoutError("Budget exhausted")
            return min(default_ms, remaining - min_ms) / 1000  # Convert to seconds
        
        # =====================================================================
        # Step 1: Fraud Check (REQUIRED)
        # =====================================================================
        try:
            timeout = get_timeout(self.FRAUD_TIMEOUT_MS)
            logger.info(f"Fraud check with timeout={timeout:.2f}s")
            
            fraud_response = requests.post(
                f'{self.fraud_url}/check',
                json={'user_id': user_id, 'amount': amount},
                timeout=(1.0, timeout)  # 1s connect, variable read
            )
            fraud_response.raise_for_status()
            fraud_result = fraud_response.json()
            
            if fraud_result.get('is_fraudulent'):
                return PaymentResult(
                    status='rejected',
                    reason='Transaction flagged as potentially fraudulent'
                )
                
        except Timeout:
            logger.warning(f"Fraud check timed out after {timeout:.2f}s")
            # DECISION: Fraud check timeout = reject (be safe)
            return PaymentResult(
                status='error',
                reason='Unable to verify transaction safety. Please try again.'
            )
        except RequestException as e:
            logger.error(f"Fraud check failed: {e}")
            return PaymentResult(status='error', reason='Verification service unavailable')
        
        # =====================================================================
        # Step 2: Bank Charge (REQUIRED)
        # =====================================================================
        try:
            timeout = get_timeout(self.BANK_TIMEOUT_MS)
            logger.info(f"Bank charge with timeout={timeout:.2f}s")
            
            bank_response = requests.post(
                f'{self.bank_url}/charge',
                json={'user_id': user_id, 'amount': amount},
                timeout=(2.0, timeout)  # 2s connect (external), variable read
            )
            bank_response.raise_for_status()
            bank_result = bank_response.json()
            transaction_id = bank_result['transaction_id']
            
        except Timeout:
            logger.warning(f"Bank API timed out after {timeout:.2f}s")
            # DECISION: Bank timeout = ambiguous! Did it charge or not?
            # We'll cover this in Day 2 (idempotency)
            return PaymentResult(
                status='error',
                reason='Payment processing timed out. Please check your statement before retrying.'
            )
        except RequestException as e:
            logger.error(f"Bank charge failed: {e}")
            return PaymentResult(status='error', reason='Payment processing failed')
        
        # =====================================================================
        # Step 3: Notification (OPTIONAL - fire and forget)
        # =====================================================================
        try:
            timeout = get_timeout(self.NOTIFY_TIMEOUT_MS, min_ms=50)
            logger.info(f"Notification with timeout={timeout:.2f}s")
            
            requests.post(
                f'{self.notify_url}/send',
                json={
                    'user_id': user_id,
                    'template': 'payment_success',
                    'data': {'amount': amount, 'transaction_id': transaction_id}
                },
                timeout=(0.5, timeout)
            )
        except Exception as e:
            # Notification failure should NOT fail the payment
            logger.warning(f"Notification failed (non-critical): {e}")
            # Could queue for retry, but payment still succeeded
        
        # =====================================================================
        # Success!
        # =====================================================================
        return PaymentResult(
            status='success',
            transaction_id=transaction_id
        )

6.7 What We Did Right

  1. Total budget: 4.5s total, leaving buffer for our own processing

  2. Proportional timeouts:

    • Fraud: 600ms (fast service gets less time)
    • Bank: 3500ms (slow service gets more time)
    • Notification: 300ms (fast, and not critical)
  3. Budget tracking: Each call checks remaining budget

  4. Separate criticality:

    • Fraud: Required, timeout = reject (safe default)
    • Bank: Required, timeout = ambiguous error
    • Notification: Optional, failure logged but ignored
  5. Connection vs read timeouts: Separate values, shorter for internal services


Chapter 7: The "Bank is Slow Today" Scenario

7.1 The Challenge

Your payment service has been running smoothly. Then one day:

Normal day:
  Bank API P99: 2000ms
  Your timeout: 3500ms
  Everything works!

Today:
  Bank API P99: 8000ms (they're having issues)
  Your timeout: 3500ms
  
  What happens?

7.2 The Cascade Begins

Minute 1:
  - 80% of bank calls timeout at 3.5s
  - Users see "timeout" errors
  - Users retry (making it worse)

Minute 2:
  - Your threads are all waiting for bank
  - New requests queue up
  - Queue grows to 1000 requests

Minute 3:
  - Queue is full, you start rejecting
  - Users see 503 errors
  - You look completely down

Minute 5:
  - Bank's P99 goes back to 2s
  - But you're still overwhelmed by backed-up requests
  - Recovery takes another 10 minutes

7.3 What Should Happen

Minute 1:
  - 80% of bank calls timeout
  - Circuit breaker notices pattern
  - Circuit opens → fail fast

Minute 2:
  - New requests fail immediately (no waiting)
  - Users see "Bank temporarily unavailable"
  - Threads are free for other work

Minute 3:
  - Circuit breaker tries one request (half-open)
  - Still fails → stays open
  - Or succeeds → closes circuit

Minute 5:
  - Bank recovers
  - Circuit breaker detects recovery
  - Normal operation resumes

Total impact: 5 minutes of "please try again"
vs: 15+ minutes of complete outage

We'll implement this in Day 3 (Circuit Breakers).

7.4 Immediate Mitigations (Without Circuit Breaker)

Even without a circuit breaker, you can help:

Mitigation 1: Return cached/default values

def get_fraud_score(user_id: str) -> float:
    try:
        return fraud_service.check(user_id, timeout=0.5)
    except Timeout:
        # Return conservative default
        logger.warning(f"Fraud check timeout, using default for {user_id}")
        return 0.5  # Medium risk, will require additional verification

Mitigation 2: Fail fast when queue is long

from threading import Semaphore

# Only allow 50 concurrent bank calls
bank_semaphore = Semaphore(50)

def call_bank_api(data: dict) -> dict:
    # Try to acquire permit (don't wait if all taken)
    acquired = bank_semaphore.acquire(blocking=False)
    
    if not acquired:
        # All permits taken = bank is backed up
        raise ServiceOverloadedError("Bank API at capacity")
    
    try:
        return requests.post(BANK_URL, json=data, timeout=3.5)
    finally:
        bank_semaphore.release()

Mitigation 3: Shed load early

from collections import deque
import time

class LoadShedder:
    """Reject requests when response time degrades."""
    
    def __init__(self, target_latency_ms: float = 100, window_size: int = 100):
        self.target_latency = target_latency_ms
        self.recent_latencies = deque(maxlen=window_size)
    
    def record(self, latency_ms: float):
        self.recent_latencies.append(latency_ms)
    
    def should_shed(self) -> bool:
        if len(self.recent_latencies) < 10:
            return False
        
        avg_latency = sum(self.recent_latencies) / len(self.recent_latencies)
        return avg_latency > self.target_latency * 3  # 3x target = shedding

shedder = LoadShedder(target_latency_ms=200)

@app.route('/payment')
def payment_handler():
    if shedder.should_shed():
        return Response(status=503, body="Service temporarily overloaded")
    
    start = time.time()
    result = process_payment(request.json)
    shedder.record((time.time() - start) * 1000)
    
    return result

Part III: Advanced Topics

Chapter 8: Adaptive Timeouts Deep Dive

8.1 When to Use Adaptive Timeouts

Good candidates:

  • Services with variable latency (batch processing, ML inference)
  • External APIs you don't control
  • Services during migration/transition

Bad candidates:

  • Services with strict SLAs (you need hard limits)
  • Security-critical paths (predictability > optimization)
  • Services where latency is a bug signal

8.2 Implementation with Smoothing

import threading
import time
from collections import deque
from typing import Optional

class SmoothAdaptiveTimeout:
    """
    Adaptive timeout with exponential smoothing to prevent oscillation.
    """
    
    def __init__(
        self,
        initial_timeout_ms: float = 1000,
        min_timeout_ms: float = 100,
        max_timeout_ms: float = 30000,
        percentile: float = 0.99,
        safety_factor: float = 2.0,
        smoothing_factor: float = 0.1,  # How fast to adapt (0-1)
        window_size: int = 1000
    ):
        self.min_timeout = min_timeout_ms
        self.max_timeout = max_timeout_ms
        self.percentile = percentile
        self.safety_factor = safety_factor
        self.smoothing_factor = smoothing_factor
        
        self.samples = deque(maxlen=window_size)
        self.current_timeout = initial_timeout_ms
        self.lock = threading.Lock()
        
        # Track health
        self.timeout_count = 0
        self.success_count = 0
    
    def record_success(self, latency_ms: float):
        """Record successful call with observed latency."""
        with self.lock:
            self.samples.append(latency_ms)
            self.success_count += 1
            self._update_timeout()
    
    def record_timeout(self):
        """Record timeout occurrence."""
        with self.lock:
            # Record as if latency was at current timeout
            self.samples.append(self.current_timeout)
            self.timeout_count += 1
            # Increase timeout more aggressively on timeouts
            self.current_timeout = min(
                self.max_timeout,
                self.current_timeout * 1.5
            )
    
    def _update_timeout(self):
        """Update timeout based on recent samples."""
        if len(self.samples) < 20:
            return  # Need minimum data
        
        # Calculate target percentile
        sorted_samples = sorted(self.samples)
        idx = int(len(sorted_samples) * self.percentile)
        observed_percentile = sorted_samples[idx]
        
        # Target timeout
        target = observed_percentile * self.safety_factor
        target = max(self.min_timeout, min(self.max_timeout, target))
        
        # Smooth update (exponential moving average)
        self.current_timeout = (
            self.smoothing_factor * target +
            (1 - self.smoothing_factor) * self.current_timeout
        )
    
    def get_timeout(self) -> float:
        """Get current timeout in milliseconds."""
        return self.current_timeout
    
    def get_stats(self) -> dict:
        """Get health statistics."""
        total = self.timeout_count + self.success_count
        return {
            'current_timeout_ms': self.current_timeout,
            'success_rate': self.success_count / total if total > 0 else 1.0,
            'timeout_rate': self.timeout_count / total if total > 0 else 0.0,
            'sample_count': len(self.samples)
        }

8.3 The Risk of Adaptive Timeouts

Scenario: Service has a bug causing slow responses

With fixed timeout (500ms):
  - Requests timeout
  - Users see errors
  - Alert fires: "High timeout rate"
  - Team investigates
  - Bug found in 30 minutes

With adaptive timeout:
  - Timeout automatically increases to 2000ms
  - Requests "succeed" (slowly)
  - No alert (no timeouts!)
  - Users complain about slowness
  - Bug found in 3 hours (after user complaints)

Solution: Alert on the timeout value itself

# Prometheus alert
- alert: AdaptiveTimeoutTooHigh
  expr: adaptive_timeout_current_ms > 2000
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Adaptive timeout has increased significantly"
    description: "Service {{ $labels.service }} timeout is {{ $value }}ms"

Chapter 9: Timeout Patterns in Practice

9.1 The Hedged Request Pattern

If a request is slow, send a second request to a different server:

Request flow:
  1. Send request to Server A
  2. Wait P95 time (e.g., 50ms)
  3. If no response, ALSO send to Server B
  4. Use whichever responds first
  5. Cancel the other

This turns one slow request into at most 2 requests,
but dramatically cuts tail latency.
import asyncio
from typing import Optional

async def hedged_request(
    urls: list[str],
    hedge_delay_ms: float = 50,
    timeout_ms: float = 1000
) -> Optional[dict]:
    """
    Send request to first URL, hedge to second if slow.
    """
    
    async def fetch(url: str) -> dict:
        async with aiohttp.ClientSession() as session:
            async with session.get(url, timeout=timeout_ms/1000) as resp:
                return await resp.json()
    
    # Start first request
    primary_task = asyncio.create_task(fetch(urls[0]))
    
    # Wait for either response or hedge delay
    done, pending = await asyncio.wait(
        [primary_task],
        timeout=hedge_delay_ms / 1000
    )
    
    if done:
        # Primary responded quickly
        return primary_task.result()
    
    # Primary is slow, start hedge
    hedge_task = asyncio.create_task(fetch(urls[1]))
    
    # Wait for either to complete
    done, pending = await asyncio.wait(
        [primary_task, hedge_task],
        timeout=(timeout_ms - hedge_delay_ms) / 1000,
        return_when=asyncio.FIRST_COMPLETED
    )
    
    # Cancel the slower one
    for task in pending:
        task.cancel()
    
    if done:
        return done.pop().result()
    
    raise TimeoutError("Both requests timed out")

9.2 The Backup Request Pattern

Similar to hedging, but wait longer before backup:

async def backup_request(
    primary_url: str,
    backup_url: str,
    primary_timeout_ms: float = 2000,
    backup_timeout_ms: float = 2000
) -> dict:
    """
    Try primary first, fall back to backup on failure.
    """
    
    try:
        return await fetch_with_timeout(primary_url, primary_timeout_ms)
    except (TimeoutError, RequestException) as e:
        logger.warning(f"Primary failed: {e}, trying backup")
        return await fetch_with_timeout(backup_url, backup_timeout_ms)

9.3 Per-Request Deadline

Set deadline at edge, propagate through system:

from datetime import datetime, timedelta
from contextvars import ContextVar

# Store deadline in context
request_deadline: ContextVar[datetime] = ContextVar('request_deadline')

class DeadlineMiddleware:
    """Middleware to set and propagate deadlines."""
    
    def __init__(self, app, default_timeout_ms: float = 5000):
        self.app = app
        self.default_timeout = timedelta(milliseconds=default_timeout_ms)
    
    async def __call__(self, request, call_next):
        # Check for incoming deadline header
        deadline_header = request.headers.get('X-Deadline')
        
        if deadline_header:
            deadline = datetime.fromisoformat(deadline_header)
        else:
            deadline = datetime.now() + self.default_timeout
        
        # Set in context for all downstream code
        token = request_deadline.set(deadline)
        
        try:
            return await call_next(request)
        finally:
            request_deadline.reset(token)

def get_remaining_timeout() -> float:
    """Get remaining time until deadline in seconds."""
    deadline = request_deadline.get()
    remaining = (deadline - datetime.now()).total_seconds()
    return max(0, remaining)

# Usage in any function
def call_downstream_service():
    timeout = get_remaining_timeout()
    if timeout < 0.1:
        raise DeadlineExceeded()
    
    return requests.get(url, timeout=timeout - 0.1)  # Keep 100ms buffer

Part IV: Discussion and Trade-offs

Chapter 10: The Hard Questions

10.1 "Your service calls 3 downstream services. How do you set timeouts?"

Strong Answer:

"First, I need to know:

  • My SLA to callers (let's say 5 seconds)
  • Each service's latency profile (P50, P99, P99.9)
  • Each service's criticality (required vs optional)

Then I'd create a timeout budget:

Total budget: 4.5s (90% of SLA for buffer)

Service A (P99=200ms): 600ms timeout (3x P99)
Service B (P99=500ms): 1500ms timeout (3x P99)
Service C (P99=100ms): 300ms timeout (3x P99)
Buffer: 2100ms

If services can be parallelized:
  Total = max(A, B, C) + buffer
  
If sequential:
  Total = A + B + C + buffer
  Adjust if it exceeds budget

I'd also consider:

  • Should any services fail open? (return default on timeout)
  • Are any services optional? (proceed without them)
  • Can I parallelize independent calls?

The key insight is that timeout = P99 is wrong — that means 1% of healthy requests will always timeout. You want P99 × 2-3 to give headroom."

10.2 "Bank API is slow today — what happens to your users?"

Strong Answer:

"With our timeout budget, the bank gets 3.5s. If it's responding in 5s+:

Immediate impact:

  • 80%+ of payments timeout
  • Users see 'payment processing timed out'
  • They're unsure if they were charged (this is bad!)

Without mitigation:

  • Our threads are blocked waiting for bank
  • Our service becomes slow for ALL requests
  • Eventually we look completely down

With proper mitigation:

  1. Fail fast: Timeout fires, free the thread immediately
  2. Clear messaging: 'Bank temporarily slow. Check your statement before retrying.'
  3. Bulkhead: Limit concurrent bank calls so other traffic isn't affected
  4. Circuit breaker (Day 3): After 10 timeouts, fail immediately without waiting

The key insight is: slow is worse than down. A 5s timeout that fires is better than waiting 30s to find out the service is down."

10.3 "Would you use adaptive timeouts? What's the risk?"

Strong Answer:

"Adaptive timeouts adjust based on observed latency. I'd use them selectively.

Where I'd use them:

  • External APIs I don't control
  • Services with natural latency variation (ML inference, batch jobs)
  • During migrations where latency is expected to change

Where I wouldn't:

  • Critical paths with strict SLAs (need hard guarantees)
  • Security-sensitive operations (predictability matters)
  • Services where latency increase = bug (want alerts, not adaptation)

The main risk: Adaptive timeouts can mask problems. If a service has a bug causing slow responses, adaptive timeouts will just increase — no alert fires, no one investigates. Meanwhile, users experience degradation.

Mitigation:

  • Alert when adaptive timeout exceeds a threshold
  • Set a hard maximum that adaptive can't exceed
  • Monitor the timeout value as a metric, not just timeout rate

The key insight is: adaptive timeouts optimize for availability, but can hide reliability problems."


Chapter 11: Session Summary

What You Should Know Now

After this session, you should be able to:

  1. Explain why timeouts are critical — convert slow to down, prevent cascade failures
  2. Calculate timeout budgets — divide time among multiple services
  3. Choose timeout values — P99 × 2-3, not average
  4. Handle timeout scenarios — what to do when each service times out
  5. Prevent cascade failures — bulkheads, fail fast, load shedding

Key Trade-offs to Remember

Decision Trade-off
Shorter timeout Fewer hung threads vs More false timeouts
Longer timeout More requests succeed vs Resource exhaustion risk
Adaptive timeout Self-tuning vs Can hide problems
Fixed timeout Predictable vs Doesn't adapt to changes
Per-service timeout Granular control vs Complex configuration

Questions to Ask in Every Design

  1. What's my SLA to callers?
  2. What's each dependency's latency profile (P50, P99, P99.9)?
  3. What happens when each dependency times out?
  4. Can I parallelize calls to reduce total latency?
  5. What's the failure mode for each timeout?

Part V: Interview Questions and Answers

Chapter 12: Real-World Interview Scenarios

12.1 Conceptual Questions

Question 1: "What's wrong with setting all timeouts to 30 seconds to be safe?"

Interviewer's Intent: Testing understanding of resource exhaustion.

Strong Answer:

"A 30-second timeout feels safe because you're unlikely to kill requests that would have succeeded. But it creates serious problems:

Resource exhaustion: If a service is down, each request waits 30 seconds. With 100 threads and 50 requests/second, in 2 seconds all threads are waiting. New requests queue up. In 60 seconds, you have 3000 queued requests. Your service is effectively down.

User experience: Nobody waits 30 seconds for a web page. Users abandon at 3-5 seconds. A 30-second timeout means you're holding onto a request for a user who's already gone.

Cascade failures: Your callers also have timeouts. If their timeout is 5 seconds and yours is 30, you'll timeout on them but keep working — wasting resources.

The right approach is timeout = P99 × 2-3, plus a hard maximum that reflects user patience (usually 5-10 seconds). If a service hasn't responded by then, the user has already left anyway."


Question 2: "Explain connection timeout vs read timeout."

Interviewer's Intent: Testing depth of knowledge.

Strong Answer:

"These are two different phases of an HTTP request:

Connection timeout: How long to wait for TCP handshake to complete. This is the time to establish a connection, before any data is sent. Should be short — 1-5 seconds. If you can't connect in 5 seconds, something is fundamentally wrong (wrong IP, firewall, service down).

Read timeout: How long to wait for response data after connection is established. This is the actual processing time. Depends on what the service does — could be 100ms for a cache hit, 30 seconds for a complex query.

Common mistake: Setting only read timeout and forgetting connection timeout. Some libraries have no default, so you might wait indefinitely for a TCP handshake to a black-hole IP address.

# Always set both
requests.get(url, timeout=(3.0, 10.0))  # 3s connect, 10s read

# Or total timeout if library supports it
requests.get(url, timeout=13.0)

The key insight is: connection failures are almost always fast (service down, network broken), so the timeout can be short. Read failures might be slow (service overloaded), so need more headroom."


Question 3: "How do you choose timeout values when you don't have historical latency data?"

Interviewer's Intent: Testing practical problem-solving.

Strong Answer:

"This is the cold-start problem. Several strategies:

Start conservative, then tune:

  • Pick reasonable defaults (1s for internal services, 5s for external)
  • Add comprehensive latency metrics
  • After 24-48 hours of production traffic, analyze P99
  • Adjust to P99 × 2-3

Use benchmarks:

  • Load test the service before launch
  • Measure latency under expected load
  • Use those P99 numbers

Ask the service owners:

  • What's your expected P99?
  • What's your SLA?
  • What happens at high load?

Match user expectations:

  • Users expect pages to load in 2-3 seconds
  • If the page needs 5 API calls, each has ~400-500ms budget
  • Work backwards from user patience

Start with adaptive timeouts:

  • Let the system learn latency patterns
  • Set a hard maximum as safety
  • Monitor and alert on timeout value drift

The key insight is: any timeout is better than no timeout. A wrong timeout causes degraded experience. No timeout causes complete outages."


12.2 Design Questions

Question 4: "Design timeout strategy for an e-commerce checkout that calls: inventory, payment, and shipping estimate."

Interviewer's Intent: Testing end-to-end thinking.

Strong Answer:

"Let me work through this systematically.

Requirements clarification:

  • User SLA: 5 seconds for checkout
  • Inventory: P99 = 100ms, required
  • Payment: P99 = 2000ms, required
  • Shipping: P99 = 500ms, nice-to-have (can show 'calculated later')

Timeout budget:

Total budget: 4500ms (leave 500ms for processing)

Inventory: 300ms (3x P99)
Payment: 4000ms (2x P99, dominates budget)
Shipping: 0ms (parallel or async)

Problem: 300 + 4000 = 4300ms, tight!

Optimization — parallelize where possible:

┌─────────────────────────────────────────┐
│  Start                                   │
│    │                                     │
│    ├──────────────┐                      │
│    ▼              ▼                      │
│  Inventory    Shipping                   │
│  (300ms)      (1500ms)                   │
│    │              │                      │
│    ▼              │                      │
│  Payment          │                      │
│  (4000ms)         │                      │
│    │              │                      │
│    ▼              ▼                      │
│  Done (use whatever shipping             │
│        returned, or show TBD)            │
└─────────────────────────────────────────┘

Total time: 300 (inventory) + 4000 (payment) = 4300ms
Shipping calculated in parallel, doesn't add to critical path

Failure handling:

  • Inventory timeout: Fail checkout (can't sell what we don't have)
  • Payment timeout: Ambiguous error, need idempotency (Day 2)
  • Shipping timeout: Proceed, show 'shipping calculated at confirmation'

Implementation:

async def checkout(cart, payment_info):
    budget = TimeoutBudget(4500)
    
    # Start shipping estimate in background (non-critical)
    shipping_task = asyncio.create_task(
        get_shipping_estimate(cart, timeout=1500)
    )
    
    # Inventory check (required)
    try:
        inventory_ok = await check_inventory(cart, timeout=budget.get(300))
        if not inventory_ok:
            return CheckoutResult(error='Items out of stock')
    except Timeout:
        return CheckoutResult(error='Unable to verify inventory')
    
    # Payment (required)
    try:
        payment_result = await process_payment(
            payment_info, 
            timeout=budget.get(4000)
        )
    except Timeout:
        return CheckoutResult(error='Payment processing timed out')
    
    # Get shipping if ready, else placeholder
    try:
        shipping = await asyncio.wait_for(shipping_task, timeout=0.1)
    except:
        shipping = 'Will be calculated'
    
    return CheckoutResult(success=True, shipping=shipping)
```"

---

#### Question 5: "Your microservice calls 10 downstream services. Some are fast (P99=50ms), some are slow (P99=5s). Design the timeout strategy."

**Interviewer's Intent**: Testing scalable thinking.

**Strong Answer**:

"With 10 services, I need structure. Let me categorize and systematize.

**Step 1: Categorize services**

Fast & Critical: Auth, Config (P99 < 100ms, must succeed) Fast & Optional: Analytics, Logging (P99 < 100ms, can skip) Slow & Critical: Database, Search (P99 > 1s, must succeed) Slow & Optional: Recommendations (P99 > 1s, can skip)


**Step 2: Group by dependency type**

Configuration

TIMEOUT_CONFIG = { 'auth': {'p99': 50, 'factor': 3, 'required': True}, 'config': {'p99': 30, 'factor': 3, 'required': True}, 'analytics': {'p99': 80, 'factor': 2, 'required': False}, 'logging': {'p99': 40, 'factor': 2, 'required': False}, 'database': {'p99': 200, 'factor': 3, 'required': True}, 'search': {'p99': 1000, 'factor': 2, 'required': True}, 'recommendations':{'p99': 2000, 'factor': 1.5, 'required': False}, # ... etc }

def get_timeout(service: str) -> float: config = TIMEOUT_CONFIG[service] return config['p99'] * config['factor']


**Step 3: Parallelize where possible**

Required fast calls (auth, config): Do first, sequentially (critical path) Independent slow calls (search, recommendations): Parallelize Optional calls: Fire-and-forget or best-effort


**Step 4: Budget management**

```python
class ServiceOrchestrator:
    def __init__(self, total_budget_ms: float = 5000):
        self.budget = TimeoutBudget(total_budget_ms)
    
    async def execute(self, request):
        # Phase 1: Critical fast services (sequential)
        auth = await self.call('auth')
        config = await self.call('config')
        
        # Phase 2: All other services (parallel)
        tasks = {
            'database': self.call('database'),
            'search': self.call('search'),
            'recommendations': self.call('recommendations'),
            'analytics': self.call_optional('analytics'),
        }
        
        results = await asyncio.gather(
            *tasks.values(),
            return_exceptions=True
        )
        
        # Phase 3: Combine results, handle failures
        return self.combine_results(dict(zip(tasks.keys(), results)))

Key principles:

  1. Categorize by criticality and latency
  2. Configure timeouts per-service based on P99
  3. Parallelize independent calls
  4. Fail gracefully for optional services
  5. Hard budget prevents total time explosion"

12.3 Scenario-Based Questions

Question 6: "It's Black Friday. Your payment service P99 goes from 2s to 10s. What do you do?"

Interviewer's Intent: Testing incident response.

Strong Answer:

"This is a cascade failure in progress. Here's my response:

Immediate (0-5 minutes):

  1. Confirm the issue: Check metrics — is it our service or the bank?
  2. Check current impact: How many requests timing out? How backed up is the queue?
  3. Enable circuit breaker (if not already): Stop waiting 10s for each failure

If it's the bank (external):

Option A: Reduce traffic to bank
  - Open circuit breaker
  - Queue payments for later processing
  - Show users 'We'll process your payment shortly'

Option B: Wait it out
  - If bank says 'we're on it, 30 min fix'
  - Keep circuit open
  - Drain queue slowly when recovered

If it's us (internal):

Check: Are we overloaded? (CPU, memory, connections)
  - Yes: Scale horizontally, shed load
  - No: Check for recent deploy (rollback?)

Check: Is it one endpoint or all?
  - One: Disable or rate-limit that endpoint
  - All: More systemic issue

Communication:

  • Update status page: 'Payment processing delays'
  • Alert customer support: 'Users may see timeout errors'
  • Don't say 'we're investigating' for more than 15 minutes without update

Post-incident:

  • Why didn't we detect this earlier?
  • Should we have lower timeout for Black Friday? (Shorter timeout = faster fail = less queue)
  • Do we need more capacity headroom for peak days?

The key insight is: Black Friday is predictable. We should have load tested this scenario. If we didn't, that's the real issue to fix."


Question 7: "How would you monitor and alert on timeout issues?"

Interviewer's Intent: Testing operational thinking.

Strong Answer:

"I'd monitor at multiple levels:

Level 1: Per-service timeout metrics

# Prometheus metrics
timeout_total = Counter(
    'service_timeouts_total',
    'Total timeout occurrences',
    ['service', 'endpoint']
)

latency_histogram = Histogram(
    'service_latency_seconds',
    'Request latency',
    ['service', 'endpoint'],
    buckets=[.01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10]
)

Level 2: Alerts

# High timeout rate
- alert: HighTimeoutRate
  expr: rate(service_timeouts_total[5m]) / rate(service_requests_total[5m]) > 0.05
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: 'Timeout rate > 5% for {{ $labels.service }}'

# Latency degradation
- alert: LatencyDegradation
  expr: histogram_quantile(0.99, rate(service_latency_seconds_bucket[5m])) > 2
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: 'P99 latency > 2s for {{ $labels.service }}'

# Approaching timeout
- alert: LatencyNearTimeout
  expr: |
    histogram_quantile(0.99, rate(service_latency_seconds_bucket[5m])) 
    > (service_timeout_seconds * 0.8)
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: 'P99 is 80% of timeout for {{ $labels.service }}'

Level 3: Dashboard

  • Real-time P50, P95, P99 by service
  • Timeout rate trend
  • Queue depth (if requests are backing up)
  • Comparison to baseline (same time last week)

Key insight: Alert on 'latency approaching timeout' not just 'timeouts occurred'. By the time timeouts are happening, users are already affected. Catching the trend earlier lets you act proactively."


12.4 Deep-Dive Questions

Question 8: "Compare deadline propagation vs per-hop timeouts."

Interviewer's Intent: Testing depth of distributed systems knowledge.

Strong Answer:

"Both solve the same problem—preventing requests from living forever—but differently.

Per-hop timeouts:

User → A (5s timeout) → B (5s timeout) → C (5s timeout)

Total possible time: 5 + 5 + 5 = 15 seconds
User might have given up at 5 seconds!

Each service independently decides how long to wait. Simple to implement but can overshoot end-to-end SLA.

Deadline propagation:

User → A (deadline: T+5s) → B (deadline: T+5s) → C (deadline: T+5s)

At each hop:
  'How much time do I have until T+5s?'
  'Set my downstream timeout to that minus buffer'

Total time: Always < 5 seconds

The deadline is absolute, so everyone respects the same end time.

Trade-offs:

Aspect Per-Hop Deadline Propagation
Implementation Simpler Requires header/context passing
Accuracy Can overshoot Respects end-to-end SLA
Debugging Each hop independent Need to trace deadline
Clock dependency No Requires synchronized clocks
Flexibility Each service decides End-to-end controlled

My recommendation:

Use deadline propagation for user-facing requests where SLA matters. The user doesn't care that each service met its 5s timeout—they care that the page loaded in 3 seconds.

Use per-hop timeouts for internal/async work where there's no user waiting.

Best practice: gRPC has built-in deadline propagation. If using HTTP, add X-Deadline header."


Question 9: "Explain how to implement and test timeout behavior."

Interviewer's Intent: Testing quality and completeness.

Strong Answer:

"Timeout implementation needs both code and tests.

Implementation:

import httpx
from contextlib import asynccontextmanager

class TimeoutClient:
    def __init__(
        self,
        connect_timeout: float = 3.0,
        read_timeout: float = 10.0,
        total_timeout: float = None
    ):
        self.timeout = httpx.Timeout(
            connect=connect_timeout,
            read=read_timeout,
            pool=5.0,  # Time to wait for connection from pool
        )
        self.total_timeout = total_timeout
    
    async def get(self, url: str, **kwargs) -> httpx.Response:
        # Allow override per-request
        timeout = kwargs.pop('timeout', self.timeout)
        
        async with httpx.AsyncClient(timeout=timeout) as client:
            if self.total_timeout:
                # Wrap in total timeout
                return await asyncio.wait_for(
                    client.get(url, **kwargs),
                    timeout=self.total_timeout
                )
            return await client.get(url, **kwargs)

Testing approach:

  1. Unit tests with mocked delays:
@pytest.mark.asyncio
async def test_timeout_fires():
    async def slow_handler(request):
        await asyncio.sleep(5)  # Longer than timeout
        return Response(200)
    
    with mock_server(slow_handler):
        client = TimeoutClient(read_timeout=1.0)
        
        with pytest.raises(TimeoutError):
            await client.get('http://mock-server/slow')
  1. Integration tests with chaos:
def test_timeout_under_network_delay():
    # Use toxiproxy or similar to add 2s latency
    with toxiproxy.add_latency('service-a', latency_ms=2000):
        result = payment_service.process(payment_data)
        
        # Should timeout and return error (not hang)
        assert result.status == 'error'
        assert 'timeout' in result.reason.lower()
  1. Load tests:
def test_no_cascade_under_slow_dependency():
    # Make one dependency slow
    with slow_service('bank-api', latency_ms=5000):
        # Fire 1000 requests
        results = load_test(requests=1000, concurrency=100)
        
        # Verify:
        # - Requests completed (not hung)
        assert all(r.completed for r in results)
        # - Time was bounded
        assert max(r.duration for r in results) < 6.0
        # - Threads weren't exhausted
        assert service.available_threads() > 50
  1. Chaos engineering in production:
# Gremlin or Chaos Monkey integration
def chaos_test_timeout_resilience():
    # Add latency to 10% of payment-service calls
    gremlin.attack(
        target='payment-service',
        attack='latency',
        latency_ms=3000,
        percentage=10
    )
    
    # Monitor: timeout rate should go up, but not cascade
    # Alert should fire for timeout rate
    # Service should remain healthy overall

The key insight is: timeout behavior is critical path—test it as thoroughly as business logic."


Chapter 13: Interview Preparation Checklist

Before your interview, make sure you can:

Concepts

  • Explain why slow is worse than down
  • Describe P50, P95, P99 and why averages lie
  • Explain connection vs read timeout
  • Describe cascade failures and how timeouts cause them

Implementation

  • Calculate timeout budget for multiple services
  • Implement adaptive timeout
  • Implement deadline propagation
  • Choose between fixed, adaptive, and deadline approaches

Operations

  • Design timeout monitoring and alerting
  • Handle the "dependency is slow today" scenario
  • Test timeout behavior effectively

Exercises

Exercise 1: Timeout Budget Calculator

Build a tool that:

  • Takes a list of services with their P99 latencies
  • Takes a total SLA requirement
  • Outputs recommended timeouts for each service
  • Handles both sequential and parallel calls

Exercise 2: Adaptive Timeout Implementation

Implement an adaptive timeout system that:

  • Tracks latency percentiles in a sliding window
  • Adjusts timeout based on observed P99
  • Has configurable min/max bounds
  • Alerts when timeout exceeds threshold

Exercise 3: Cascade Failure Simulation

Build a simulation with:

  • 3 services in a chain (A → B → C)
  • Configurable latency and timeout for each
  • Ability to inject slow responses
  • Visualization of thread utilization during cascade

Further Reading

  • "Release It!" by Michael Nygard: Chapter on Stability Patterns (Timeouts, Circuit Breakers)
  • Google SRE Book: Chapter on Handling Overload
  • AWS Builder's Library: Timeouts, retries, and backoff with jitter
  • Hystrix Wiki: Netflix's circuit breaker library documentation (now in maintenance)

Appendix: Complete Payment Service Code

"""
Production-ready payment service with proper timeout handling.
Demonstrates concepts from Day 1: Timeout Hell.
"""

import asyncio
import logging
import time
from dataclasses import dataclass, field
from typing import Optional, Dict, Any
from enum import Enum
import httpx
from prometheus_client import Counter, Histogram, Gauge

# =============================================================================
# Metrics
# =============================================================================

request_latency = Histogram(
    'payment_service_request_seconds',
    'Request latency by operation',
    ['operation'],
    buckets=[.01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10]
)

timeout_counter = Counter(
    'payment_service_timeouts_total',
    'Timeout occurrences by service',
    ['service']
)

active_requests = Gauge(
    'payment_service_active_requests',
    'Currently active requests'
)

# =============================================================================
# Configuration
# =============================================================================

@dataclass
class ServiceConfig:
    url: str
    connect_timeout: float  # seconds
    read_timeout: float     # seconds
    required: bool          # Is this service required for success?

@dataclass 
class PaymentConfig:
    total_budget_ms: float = 4500
    
    fraud_service: ServiceConfig = field(default_factory=lambda: ServiceConfig(
        url='http://fraud-service',
        connect_timeout=1.0,
        read_timeout=0.6,
        required=True
    ))
    
    bank_api: ServiceConfig = field(default_factory=lambda: ServiceConfig(
        url='http://bank-api',
        connect_timeout=2.0,
        read_timeout=3.5,
        required=True
    ))
    
    notification_service: ServiceConfig = field(default_factory=lambda: ServiceConfig(
        url='http://notification-service',
        connect_timeout=0.5,
        read_timeout=0.3,
        required=False
    ))

# =============================================================================
# Timeout Budget
# =============================================================================

class TimeoutBudget:
    """Manages timeout budget across multiple service calls."""
    
    def __init__(self, total_ms: float):
        self.total_ms = total_ms
        self.start_time = time.monotonic()
    
    def remaining_ms(self) -> float:
        elapsed = (time.monotonic() - self.start_time) * 1000
        return max(0, self.total_ms - elapsed)
    
    def get_timeout(self, default_ms: float, min_ms: float = 100) -> float:
        """Get timeout respecting remaining budget."""
        remaining = self.remaining_ms()
        
        if remaining <= min_ms:
            raise BudgetExhaustedError(
                f"Budget exhausted. Remaining: {remaining:.0f}ms"
            )
        
        # Use lesser of default and remaining (minus buffer)
        return min(default_ms, remaining - min_ms) / 1000
    
    def is_exhausted(self) -> bool:
        return self.remaining_ms() < 100

class BudgetExhaustedError(Exception):
    pass

# =============================================================================
# Result Types
# =============================================================================

class PaymentStatus(Enum):
    SUCCESS = 'success'
    REJECTED = 'rejected'
    ERROR = 'error'

@dataclass
class PaymentResult:
    status: PaymentStatus
    transaction_id: Optional[str] = None
    error_message: Optional[str] = None
    metadata: Dict[str, Any] = field(default_factory=dict)

# =============================================================================
# Payment Service
# =============================================================================

class PaymentService:
    """
    Payment service demonstrating proper timeout management.
    """
    
    def __init__(self, config: PaymentConfig = None):
        self.config = config or PaymentConfig()
        self.logger = logging.getLogger('payment_service')
    
    async def process_payment(
        self,
        user_id: str,
        amount: float,
        idempotency_key: str  # We'll use this in Day 2
    ) -> PaymentResult:
        """
        Process a payment with proper timeout budget management.
        """
        active_requests.inc()
        start_time = time.monotonic()
        
        try:
            budget = TimeoutBudget(self.config.total_budget_ms)
            
            # Step 1: Fraud Check
            fraud_result = await self._check_fraud(budget, user_id, amount)
            if fraud_result.status != PaymentStatus.SUCCESS:
                return fraud_result
            
            # Step 2: Bank Charge
            bank_result = await self._charge_bank(budget, user_id, amount)
            if bank_result.status != PaymentStatus.SUCCESS:
                return bank_result
            
            transaction_id = bank_result.transaction_id
            
            # Step 3: Notification (fire and forget)
            asyncio.create_task(
                self._send_notification(user_id, amount, transaction_id)
            )
            
            return PaymentResult(
                status=PaymentStatus.SUCCESS,
                transaction_id=transaction_id
            )
        
        except BudgetExhaustedError as e:
            self.logger.error(f"Budget exhausted: {e}")
            return PaymentResult(
                status=PaymentStatus.ERROR,
                error_message="Request took too long. Please try again."
            )
        
        finally:
            active_requests.dec()
            duration = time.monotonic() - start_time
            request_latency.labels(operation='process_payment').observe(duration)
    
    async def _check_fraud(
        self,
        budget: TimeoutBudget,
        user_id: str,
        amount: float
    ) -> PaymentResult:
        """Check transaction for fraud."""
        config = self.config.fraud_service
        
        try:
            timeout = budget.get_timeout(config.read_timeout * 1000)
            self.logger.info(f"Fraud check with timeout={timeout:.2f}s")
            
            async with httpx.AsyncClient() as client:
                response = await client.post(
                    f"{config.url}/check",
                    json={'user_id': user_id, 'amount': amount},
                    timeout=httpx.Timeout(
                        connect=config.connect_timeout,
                        read=timeout
                    )
                )
                response.raise_for_status()
                data = response.json()
                
                if data.get('is_fraudulent'):
                    return PaymentResult(
                        status=PaymentStatus.REJECTED,
                        error_message='Transaction flagged as suspicious'
                    )
                
                return PaymentResult(status=PaymentStatus.SUCCESS)
        
        except httpx.TimeoutException:
            timeout_counter.labels(service='fraud').inc()
            self.logger.warning("Fraud check timed out")
            
            # Fraud timeout = reject (be safe)
            return PaymentResult(
                status=PaymentStatus.ERROR,
                error_message='Unable to verify transaction safety'
            )
        
        except httpx.HTTPError as e:
            self.logger.error(f"Fraud check failed: {e}")
            return PaymentResult(
                status=PaymentStatus.ERROR,
                error_message='Verification service unavailable'
            )
    
    async def _charge_bank(
        self,
        budget: TimeoutBudget,
        user_id: str,
        amount: float
    ) -> PaymentResult:
        """Charge the bank account."""
        config = self.config.bank_api
        
        try:
            timeout = budget.get_timeout(config.read_timeout * 1000)
            self.logger.info(f"Bank charge with timeout={timeout:.2f}s")
            
            async with httpx.AsyncClient() as client:
                response = await client.post(
                    f"{config.url}/charge",
                    json={'user_id': user_id, 'amount': amount},
                    timeout=httpx.Timeout(
                        connect=config.connect_timeout,
                        read=timeout
                    )
                )
                response.raise_for_status()
                data = response.json()
                
                return PaymentResult(
                    status=PaymentStatus.SUCCESS,
                    transaction_id=data['transaction_id']
                )
        
        except httpx.TimeoutException:
            timeout_counter.labels(service='bank').inc()
            self.logger.warning("Bank API timed out")
            
            # Bank timeout = ambiguous! (Day 2 covers idempotency)
            return PaymentResult(
                status=PaymentStatus.ERROR,
                error_message='Payment processing timed out. Check your statement before retrying.'
            )
        
        except httpx.HTTPStatusError as e:
            self.logger.error(f"Bank charge failed: {e}")
            
            if e.response.status_code == 402:
                return PaymentResult(
                    status=PaymentStatus.REJECTED,
                    error_message='Insufficient funds'
                )
            
            return PaymentResult(
                status=PaymentStatus.ERROR,
                error_message='Payment processing failed'
            )
        
        except httpx.HTTPError as e:
            self.logger.error(f"Bank charge error: {e}")
            return PaymentResult(
                status=PaymentStatus.ERROR,
                error_message='Payment service unavailable'
            )
    
    async def _send_notification(
        self,
        user_id: str,
        amount: float,
        transaction_id: str
    ):
        """Send notification (non-critical)."""
        config = self.config.notification_service
        
        try:
            async with httpx.AsyncClient() as client:
                await client.post(
                    f"{config.url}/send",
                    json={
                        'user_id': user_id,
                        'template': 'payment_success',
                        'data': {
                            'amount': amount,
                            'transaction_id': transaction_id
                        }
                    },
                    timeout=httpx.Timeout(
                        connect=config.connect_timeout,
                        read=config.read_timeout
                    )
                )
        except Exception as e:
            # Notification failure is NOT critical
            self.logger.warning(f"Notification failed (non-critical): {e}")
            # Could queue for retry here


# =============================================================================
# Example Usage
# =============================================================================

async def main():
    service = PaymentService()
    
    result = await service.process_payment(
        user_id='user_123',
        amount=99.99,
        idempotency_key='order_456_attempt_1'
    )
    
    print(f"Result: {result}")

if __name__ == '__main__':
    logging.basicConfig(level=logging.INFO)
    asyncio.run(main())

End of Day 1: Timeout Hell

Tomorrow: Day 2 — Idempotency in Practice. We'll solve the problem we left open today: when the bank API times out, did the payment go through or not? How do we ensure users are never charged twice?