Bonus Problem 9: Uber

The Real-Time Marketplace That Moves 30 Million People Every Day

🚗 How Do You Match Millions of Riders with Drivers in Under 3 Seconds?

Imagine this challenge: Every second, thousands of people open an app expecting to see available cars near them. Within 3 seconds, you must find the optimal driver from potentially hundreds nearby, calculate the fare, estimate arrival time, and begin the match—all while the driver and rider are both moving.

This is Uber—a real-time, location-based marketplace operating in 10,000+ cities across 70 countries. It's not just a taxi app. It's a masterclass in geospatial indexing, dynamic pricing, and building systems that make split-second decisions at global scale.

THE UBER SCALE (2025)

┌──────────────────────────────────────────────────────────────────────────┐
│                                                                          │
│   MARKETPLACE                                                            │
│   ───────────                                                            │
│   Monthly Active Users:          156 Million                             │
│   Active Drivers:                7.4 Million                             │
│   Daily Trips:                   30 Million                              │
│   Cities:                        10,000+                                 │
│   Countries:                     70                                      │
│                                                                          │
│   REAL-TIME                                                              │
│   ─────────                                                              │
│   Location updates/second:       Millions                                │
│   Match latency target:          < 3 seconds                             │
│   ETA accuracy:                  ~95% within 3 minutes                   │
│   Driver ping response:          < 15 seconds                            │
│                                                                          │
│   DATA                                                                   │
│   ────                                                                   │
│   Events processed/day:          Hundreds of billions                    │
│   Data pipelines:                200,000+                                │
│   ML models in production:       Thousands                               │
│   Kafka messages/day:            Trillions                               │
│                                                                          │
│   INFRASTRUCTURE                                                         │
│   ──────────────                                                         │
│   Microservices:                 Thousands                               │
│   Data centers:                  Multiple (hybrid cloud)                 │
│   Engineers:                     ~6,000                                  │
│                                                                          │
│   BUSINESS (2024)                                                        │
│   ───────────────                                                        │
│   Gross Bookings:                $163 Billion                            │
│   Revenue:                       $43.9 Billion                           │
│   Trips completed:               ~10 Billion (cumulative)                │
│                                                                          │
└──────────────────────────────────────────────────────────────────────────┘

This is the system we'll design today—and discover how Uber matches riders with drivers faster than you can blink.

The Interview Begins

You walk into the interview room. The interviewer smiles and gestures to the whiteboard.

Interviewer: "Thanks for coming in. Today we're going to design a ride-hailing platform like Uber. I'm interested in how you think about real-time matching, geospatial systems, and dynamic pricing. Please think out loud—this is collaborative."

They write on the whiteboard:

╔══════════════════════════════════════════════════════════════════════════╗
║                                                                          ║
║                    Design a Ride-Hailing Platform                        ║
║                                                                          ║
║   Requirements:                                                          ║
║   - Match riders with nearby drivers in real-time (< 3 seconds)          ║
║   - Track driver locations continuously                                  ║
║   - Calculate accurate ETAs and fares                                    ║
║   - Implement dynamic (surge) pricing                                    ║
║   - Support multiple ride types (UberX, Pool, Black, etc.)               ║
║   - Handle global scale across thousands of cities                       ║
║                                                                          ║
╚══════════════════════════════════════════════════════════════════════════╝

Interviewer: "Take a few minutes to think about this, then walk me through your approach. We have about 45 minutes."

Phase 1: Requirements Clarification (5 minutes)

Before diving in, you take a breath and start asking questions.

Your Questions

You: "Before I start designing, let me clarify some requirements. What's our target scale—how many concurrent users and trips?"

Interviewer: "Think global scale. 150+ million monthly users, 7+ million drivers, 30 million trips per day."

You: "For the matching process, what's the acceptable latency? How fast should a rider see available cars?"

Interviewer: "Under 3 seconds from request to match. Users should see nearby drivers update in real-time as they move."

You: "How frequently do drivers report their location?"

Interviewer: "Every 4 seconds while online. That's millions of location updates per second globally."

You: "What about the ETA calculation? How accurate does it need to be?"

Interviewer: "Within 3 minutes of actual arrival time, 95% of the time. This is critical for user trust."

You: "Perfect. Let me summarize the requirements."

Functional Requirements

1. RIDER EXPERIENCE
   - See nearby available drivers on map
   - Request a ride with pickup and destination
   - Get fare estimate before confirming
   - Track driver en route to pickup
   - Track trip progress to destination
   - Rate driver after trip

2. DRIVER EXPERIENCE
   - Go online/offline
   - Receive ride requests with accept/decline
   - Navigate to pickup and destination
   - See earnings and trip history
   - Receive surge notifications

3. MATCHING
   - Find optimal driver for rider request
   - Consider distance, ETA, driver rating
   - Support ride types (X, Pool, Black, XL)
   - Handle driver cancellations gracefully

4. PRICING
   - Calculate base fare (time + distance)
   - Apply dynamic surge pricing
   - Show upfront pricing to riders
   - Split fares for Pool rides

Non-Functional Requirements

1. LATENCY
   - Match request to driver assignment: < 3 seconds
   - Location update processing: < 100ms
   - ETA calculation: < 500ms
   - Fare calculation: < 200ms

2. AVAILABILITY
   - 99.99% for ride requests
   - Graceful degradation (show cached data if needed)
   - Multi-region failover

3. SCALE
   - 30 million trips/day
   - Millions of location updates/second
   - Peak handling (New Year's Eve, major events)

4. CONSISTENCY
   - Strong consistency for trip state
   - Eventual consistency for driver locations
   - Exactly-once trip creation

Phase 2: Back of the Envelope Estimation (5 minutes)

You: "Let me work through the numbers to understand the scale."

Traffic Estimation

RIDE REQUESTS

Daily trips:                      30 million
Peak multiplier:                  3x (evening rush, events)
Peak trips/hour:                  30M × 3 / 24 = 3.75 million
Peak trips/second:                ~1,000 trips/second

LOCATION UPDATES

Active drivers:                   7.4 million
Online at any time:               ~20% = 1.5 million
Update frequency:                 Every 4 seconds
Location updates/second:          1.5M / 4 = 375,000/second
Peak (1.5x):                      ~500,000 updates/second

Storage Estimation

TRIP DATA

Trips per day:                    30 million
Trip record size:                 ~5 KB (locations, timestamps, fare)
Daily trip storage:               30M × 5 KB = 150 GB/day
Yearly:                           ~55 TB/year

LOCATION HISTORY

Updates per second:               500,000
Bytes per update:                 ~100 bytes (lat, lng, timestamp, driver_id)
Daily location data:              500K × 100 × 86,400 = 4.3 TB/day
Retention (7 days):               ~30 TB

Key Metrics Summary

┌──────────────────────────────────────────────────────────────────────────┐
│                         ESTIMATION SUMMARY                               │
│                                                                          │
│   TRAFFIC                                                                │
│   ├── Peak trip requests:        1,000/second                            │
│   ├── Location updates:          500,000/second                          │
│   └── API requests (total):      ~5 million/second                       │
│                                                                          │
│   STORAGE                                                                │
│   ├── Trip data:                 150 GB/day                              │
│   ├── Location data:             4.3 TB/day                              │
│   └── Total (with retention):    ~100 TB active                          │
│                                                                          │
│   COMPUTE                                                                │
│   ├── Matching decisions:        1,000/second                            │
│   ├── ETA calculations:          10,000/second                           │
│   └── Surge calculations:        Per cell, every 30 seconds              │
│                                                                          │
└──────────────────────────────────────────────────────────────────────────┘

Phase 3: High-Level Design (10 minutes)

You: "Now let me sketch out the high-level architecture."

System Architecture

┌──────────────────────────────────────────────────────────────────────────┐
│                       UBER HIGH-LEVEL ARCHITECTURE                       │
│                                                                          │
│   ┌─────────────────────────────────────────────────────────────────┐    │
│   │                         CLIENTS                                 │    │
│   │         ┌──────────┐              ┌──────────┐                  │    │
│   │         │  Rider   │              │  Driver  │                  │    │
│   │         │   App    │              │   App    │                  │    │
│   │         └────┬─────┘              └────┬─────┘                  │    │
│   └──────────────┼─────────────────────────┼────────────────────────┘    │
│                  │                         │                             │
│                  │      WebSocket          │     WebSocket               │
│                  │      + HTTP             │     + HTTP                  │
│                  ▼                         ▼                             │
│   ┌──────────────────────────────────────────────────────────────────┐   │
│   │                      API GATEWAY                                 │   │
│   │         Rate Limiting │ Auth │ Routing │ Load Balancing          │   │
│   └────────────────────────────────┬─────────────────────────────────┘   │
│                                    │                                     │
│   ┌────────────────────────────────┼─────────────────────────────────┐   │
│   │                                ▼                                 │   │
│   │   ┌──────────────────────────────────────────────────────────┐   │   │
│   │   │                  REAL-TIME LAYER                         │   │   │
│   │   │  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐         │   │   │
│   │   │  │  WebSocket  │ │  Location   │ │    Push     │         │   │   │
│   │   │  │   Manager   │ │   Service   │ │   Service   │         │   │   │
│   │   │  └─────────────┘ └─────────────┘ └─────────────┘         │   │   │
│   │   └──────────────────────────────────────────────────────────┘   │   │
│   │                                                                  │   │
│   │   ┌──────────────────────────────────────────────────────────┐   │   │
│   │   │                  CORE SERVICES                           │   │   │
│   │   │  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐         │   │   │
│   │   │  │   Matching  │ │    Trip     │ │   Pricing   │         │   │   │
│   │   │  │   Service   │ │   Service   │ │   Service   │         │   │   │
│   │   │  └─────────────┘ └─────────────┘ └─────────────┘         │   │   │
│   │   │  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐         │   │   │
│   │   │  │     ETA     │ │    Map      │ │   Payment   │         │   │   │
│   │   │  │   Service   │ │   Service   │ │   Service   │         │   │   │
│   │   │  └─────────────┘ └─────────────┘ └─────────────┘         │   │   │
│   │   └──────────────────────────────────────────────────────────┘   │   │
│   │                                                                  │   │
│   │   ┌──────────────────────────────────────────────────────────┐   │   │
│   │   │                   DATA LAYER                             │   │   │
│   │   │  ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ │   │   │
│   │   │  │  Spanner  │ │   Redis   │ │ Cassandra │ │   Kafka   │ │   │   │
│   │   │  │  (Trips)  │ │(Locations)│ │ (History) │ │ (Events)  │ │   │   │
│   │   │  └───────────┘ └───────────┘ └───────────┘ └───────────┘ │   │   │
│   │   └──────────────────────────────────────────────────────────┘   │   │
│   │                                                                  │   │
│   │   ┌──────────────────────────────────────────────────────────┐   │   │
│   │   │                  GEOSPATIAL INDEX                        │   │   │
│   │   │         H3 Hexagonal Grid (Driver Location Index)        │   │   │
│   │   └──────────────────────────────────────────────────────────┘   │   │
│   │                                                                  │   │
│   │                      UBER SERVICES                               │   │
│   └──────────────────────────────────────────────────────────────────┘   │
│                                                                          │
└──────────────────────────────────────────────────────────────────────────┘

Data Flow: Requesting a Ride

RIDE REQUEST FLOW

1. RIDER OPENS APP
   └── WebSocket connection established
   └── See nearby drivers (from Location Service)

2. RIDER REQUESTS RIDE
   └── POST /ride-request {pickup, destination, ride_type}
   └── API Gateway → Matching Service

3. MATCHING SERVICE
   ├── Query Location Service for nearby drivers
   ├── Filter by ride type, rating, availability
   ├── Calculate ETA for each candidate
   ├── Score and rank candidates
   └── Select optimal driver

4. DRIVER NOTIFICATION
   └── Push notification via WebSocket
   └── Driver has 15 seconds to accept

5. IF ACCEPTED
   ├── Trip Service creates trip record
   ├── Pricing Service calculates fare
   ├── Both rider and driver get trip details
   └── Navigation begins

6. DURING TRIP
   ├── Driver location updates every 4 seconds
   ├── ETA recalculated continuously
   └── Both apps show real-time progress

7. TRIP COMPLETION
   ├── Calculate final fare
   ├── Process payment
   ├── Request ratings
   └── Update driver availability

TOTAL LATENCY: < 3 seconds (request to driver notification)

Phase 4: Deep Dives (20 minutes)

Interviewer: "Let's dive deeper. Tell me about the geospatial indexing system."

Deep Dive 1: H3 Geospatial Indexing (Week 1 Concepts)

The Problem

GEOSPATIAL CHALLENGE

Finding nearby drivers:
  - "Find all drivers within 2km of this location"
  - Naive approach: Check distance to ALL drivers
  - With 1.5M online drivers: Too slow

Requirements:
  - Sub-100ms lookups
  - Works globally (not just one city)
  - Handles drivers that are constantly moving
  - Efficient for both sparse and dense areas

The Solution: H3 Hexagonal Grid

┌─────────────────────────────────────────────────────────────────────────┐
│                      H3 HEXAGONAL GRID SYSTEM                           │
│                                                                         │
│   WHY HEXAGONS?                                                         │
│   ─────────────                                                         │
│   Squares: 2 different neighbor distances (edge vs corner)              │
│   Triangles: 3 different neighbor distances                             │
│   Hexagons: 1 uniform distance to ALL neighbors ✓                       │
│                                                                         │
│                    ___                                                  │
│                ___/   \___                                              │
│            ___/   \ D /   \___                                          │
│           /   \ C /___|   \   \                                         │
│           \___/   \   / E \___/                                         │
│           /   \ B |___|___/   \                                         │
│           \___/___|   | F \___/                                         │
│               \ A |___|___/                                             │
│                \__|   | G                                               │
│                       |___                                              │
│                                                                         │
│   All neighbors (B,C,D,E,F,G) are equidistant from A                    │
│                                                                         │
│   H3 RESOLUTION LEVELS                                                  │
│   ────────────────────                                                  │
│   Resolution 0:  ~4,357 km² (continent scale)                           │
│   Resolution 7:  ~5.16 km²  (city district)                             │
│   Resolution 9:  ~105 m²    (street block) ← Uber typically uses        │
│   Resolution 15: ~0.9 m²    (precise location)                          │
│                                                                         │
│   HIERARCHICAL STRUCTURE                                                │
│   ──────────────────────                                                │
│   Each hexagon subdivides into ~7 child hexagons                        │
│   Parent cells contain children (useful for aggregation)                │
│                                                                         │
│   Resolution 7 (city)                                                   │
│       │                                                                 │
│       ├── Resolution 8 (neighborhood)                                   │
│       │       │                                                         │
│       │       ├── Resolution 9 (street block)                           │
│       │       │       │                                                 │
│       │       │       └── Resolution 10 (building level)                │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Implementation

# geospatial/h3_location_index.py

"""
H3-Based Location Index for Uber

Uses Uber's open-source H3 hexagonal grid system for
efficient geospatial queries.
"""

from dataclasses import dataclass
from typing import List, Set, Optional
import h3
import redis


@dataclass
class DriverLocation:
    driver_id: str
    latitude: float
    longitude: float
    timestamp: float
    ride_types: List[str]
    rating: float
    is_available: bool


class H3LocationIndex:
    """
    Location index using H3 hexagonal grid.
    
    Key insight: Convert 2D lat/lng to 1D H3 cell ID
    for O(1) lookups instead of O(n) distance calculations.
    
    Applies concepts:
    - Week 1: Partitioning strategies
    - Week 4: Caching for low-latency lookups
    """
    
    # Resolution 9: ~100m hexagons, good for urban areas
    DEFAULT_RESOLUTION = 9
    
    def __init__(self, redis_client: redis.Redis):
        self.redis = redis_client
        self.resolution = self.DEFAULT_RESOLUTION
    
    def update_driver_location(self, driver: DriverLocation) -> None:
        """
        Update driver location in the index.
        
        Called every 4 seconds per driver.
        Must be extremely fast (< 10ms).
        """
        # Convert lat/lng to H3 cell
        cell_id = h3.latlng_to_cell(
            driver.latitude, 
            driver.longitude, 
            self.resolution
        )
        
        # Remove from old cell (if moved)
        old_cell = self.redis.hget(f"driver:{driver.driver_id}", "cell")
        if old_cell and old_cell != cell_id:
            self.redis.srem(f"cell:{old_cell}", driver.driver_id)
        
        # Add to new cell
        self.redis.sadd(f"cell:{cell_id}", driver.driver_id)
        
        # Store driver details
        self.redis.hset(f"driver:{driver.driver_id}", mapping={
            "cell": cell_id,
            "lat": driver.latitude,
            "lng": driver.longitude,
            "ts": driver.timestamp,
            "available": int(driver.is_available),
            "rating": driver.rating
        })
        
        # Set TTL for automatic cleanup if driver goes offline
        self.redis.expire(f"driver:{driver.driver_id}", 30)
    
    def find_nearby_drivers(
        self, 
        latitude: float, 
        longitude: float,
        radius_km: float = 2.0,
        ride_type: str = "UberX",
        limit: int = 10
    ) -> List[DriverLocation]:
        """
        Find available drivers near a location.
        
        Uses H3 k-ring to get nearby cells, then fetches
        drivers from those cells.
        """
        # Get center cell
        center_cell = h3.latlng_to_cell(
            latitude, longitude, self.resolution
        )
        
        # Calculate k-ring size based on radius
        # At resolution 9, each hex is ~100m edge
        k = int(radius_km / 0.1) + 1
        
        # Get all cells in radius (k-ring)
        nearby_cells = h3.grid_disk(center_cell, k)
        
        # Fetch drivers from all cells
        candidates = []
        for cell_id in nearby_cells:
            driver_ids = self.redis.smembers(f"cell:{cell_id}")
            for driver_id in driver_ids:
                driver_data = self.redis.hgetall(f"driver:{driver_id}")
                if driver_data and driver_data.get("available"):
                    candidates.append(self._to_driver_location(
                        driver_id, driver_data
                    ))
        
        # Sort by distance and return top matches
        candidates.sort(
            key=lambda d: self._haversine_distance(
                latitude, longitude, d.latitude, d.longitude
            )
        )
        
        return candidates[:limit]
    
    def get_drivers_in_cell(self, cell_id: str) -> Set[str]:
        """Get all driver IDs in a specific H3 cell."""
        return self.redis.smembers(f"cell:{cell_id}")
    
    def _haversine_distance(
        self, lat1: float, lng1: float, 
        lat2: float, lng2: float
    ) -> float:
        """Calculate distance between two points in km."""
        from math import radians, sin, cos, sqrt, atan2
        
        R = 6371  # Earth's radius in km
        
        lat1, lng1, lat2, lng2 = map(
            radians, [lat1, lng1, lat2, lng2]
        )
        
        dlat = lat2 - lat1
        dlng = lng2 - lng1
        
        a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlng/2)**2
        c = 2 * atan2(sqrt(a), sqrt(1-a))
        
        return R * c

Deep Dive 2: Dynamic Surge Pricing (Week 5 Concepts)

Interviewer: "How does surge pricing work?"

The Problem

SUPPLY-DEMAND IMBALANCE

Scenario: Friday night, 11 PM, downtown
  - Rider requests: 500/minute
  - Available drivers: 50
  - Result: 10-minute wait times, unhappy users

Without surge pricing:
  - Drivers don't know demand is high
  - No incentive to drive to busy areas
  - Riders wait forever or can't get a ride

With surge pricing:
  - Higher prices attract more drivers
  - Some riders defer (reducing demand)
  - Market reaches equilibrium faster

The Solution: Real-Time Surge Calculation

┌─────────────────────────────────────────────────────────────────────────┐
│                      SURGE PRICING ALGORITHM                            │
│                                                                         │
│   INPUT SIGNALS                                                         │
│   ─────────────                                                         │
│   • Ride requests in H3 cell (last 5 minutes)                           │
│   • Available drivers in cell + neighbors                               │
│   • Historical demand patterns (time, day, events)                      │
│   • Current trip completion rate                                        │
│   • Weather data                                                        │
│                                                                         │
│   SURGE CALCULATION                                                     │
│   ─────────────────                                                     │
│                                                                         │
│   demand_supply_ratio = requests / available_drivers                    │
│                                                                         │
│   If ratio < 1.0  → surge = 1.0 (no surge)                              │
│   If ratio = 2.0  → surge ≈ 1.5x                                        │
│   If ratio = 4.0  → surge ≈ 2.5x                                        │
│   If ratio > 10.0 → surge = cap (e.g., 5.0x)                            │
│                                                                         │
│   SMOOTHING                                                             │
│   ─────────                                                             │
│   • Don't change surge too rapidly (confusing for users)                │
│   • Use exponential moving average                                      │
│   • Gradually ramp up and down                                          │
│                                                                         │
│   VISUALIZATION                                                         │
│   ─────────────                                                         │
│                                                                         │
│   City map with H3 cells colored by surge:                              │
│                                                                         │
│   ┌─────────────────────────────┐                                       │
│   │     ___     ___     ___     │  Legend:                              │
│   │  __/1.0\___/1.2\___/1.0\__ │  1.0x = No surge (gray)                │
│   │ /1.0\___/2.5\___/1.5\___/  │  1.5x = Light surge (yellow)           │
│   │ \___/1.2\___/3.0\___/1.2\  │  2.5x = Medium surge (orange)          │
│   │ /1.0\___/2.0\___/1.5\___/  │  3.0x = High surge (red)               │
│   │ \___/1.0\___/1.2\___/1.0\  │                                        │
│   └─────────────────────────────┘                                       │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Implementation

# pricing/surge_service.py

"""
Dynamic Surge Pricing Service

Calculates real-time surge multipliers based on
supply-demand imbalance in each H3 cell.
"""

from dataclasses import dataclass
from typing import Dict, Optional
from datetime import datetime, timedelta
import h3


@dataclass
class SurgeConfig:
    min_surge: float = 1.0
    max_surge: float = 5.0
    update_interval_seconds: int = 30
    smoothing_factor: float = 0.3  # For exponential moving average
    demand_window_minutes: int = 5


@dataclass
class CellSurge:
    cell_id: str
    surge_multiplier: float
    demand_count: int
    supply_count: int
    updated_at: datetime


class SurgePricingService:
    """
    Calculate surge pricing per H3 cell.
    
    Key principles:
    - Higher demand → higher surge
    - More drivers → lower surge
    - Smooth transitions (no sudden jumps)
    - Respect max cap (emergencies, PR concerns)
    
    Applies concepts:
    - Week 5: Eventual consistency (surge propagates)
    - Week 8: Real-time analytics pipeline
    """
    
    def __init__(
        self, 
        config: SurgeConfig,
        location_index,
        demand_tracker,
        cache
    ):
        self.config = config
        self.location_index = location_index
        self.demand_tracker = demand_tracker
        self.cache = cache
        
        # Surge curve parameters (tuned via ML)
        self.surge_curve = [
            (1.0, 1.0),   # ratio 1.0 → surge 1.0
            (1.5, 1.25),  # ratio 1.5 → surge 1.25
            (2.0, 1.5),   # ratio 2.0 → surge 1.5
            (3.0, 2.0),   # ratio 3.0 → surge 2.0
            (5.0, 3.0),   # ratio 5.0 → surge 3.0
            (10.0, 5.0),  # ratio 10.0 → surge 5.0 (cap)
        ]
    
    def get_surge_for_location(
        self, 
        latitude: float, 
        longitude: float
    ) -> float:
        """Get current surge multiplier for a location."""
        cell_id = h3.latlng_to_cell(latitude, longitude, 9)
        
        # Check cache first
        cached = self.cache.get(f"surge:{cell_id}")
        if cached:
            return float(cached)
        
        # Calculate fresh surge
        surge = self._calculate_cell_surge(cell_id)
        
        # Cache for 30 seconds
        self.cache.setex(f"surge:{cell_id}", 30, surge)
        
        return surge
    
    def _calculate_cell_surge(self, cell_id: str) -> float:
        """Calculate surge for a specific H3 cell."""
        # Get demand (requests in last 5 minutes)
        demand = self.demand_tracker.get_request_count(
            cell_id, 
            window_minutes=self.config.demand_window_minutes
        )
        
        # Get supply (available drivers in cell + neighbors)
        nearby_cells = h3.grid_disk(cell_id, 1)  # Cell + 6 neighbors
        supply = sum(
            len(self.location_index.get_drivers_in_cell(c))
            for c in nearby_cells
        )
        
        # Prevent division by zero
        if supply == 0:
            return self.config.max_surge if demand > 0 else self.config.min_surge
        
        # Calculate demand/supply ratio
        ratio = demand / supply
        
        # Map ratio to surge using curve
        raw_surge = self._ratio_to_surge(ratio)
        
        # Apply smoothing (don't change too fast)
        previous_surge = self.cache.get(f"surge:{cell_id}")
        if previous_surge:
            raw_surge = self._smooth_surge(
                float(previous_surge), 
                raw_surge
            )
        
        # Clamp to bounds
        return max(
            self.config.min_surge,
            min(self.config.max_surge, raw_surge)
        )
    
    def _ratio_to_surge(self, ratio: float) -> float:
        """Convert demand/supply ratio to surge multiplier."""
        # Linear interpolation between curve points
        for i in range(len(self.surge_curve) - 1):
            r1, s1 = self.surge_curve[i]
            r2, s2 = self.surge_curve[i + 1]
            
            if r1 <= ratio <= r2:
                # Linear interpolation
                t = (ratio - r1) / (r2 - r1)
                return s1 + t * (s2 - s1)
        
        # Above max ratio
        return self.surge_curve[-1][1]
    
    def _smooth_surge(
        self, 
        previous: float, 
        current: float
    ) -> float:
        """Apply exponential moving average for smooth transitions."""
        alpha = self.config.smoothing_factor
        return alpha * current + (1 - alpha) * previous
    
    def calculate_fare(
        self,
        distance_km: float,
        duration_minutes: float,
        surge: float,
        ride_type: str = "UberX"
    ) -> float:
        """Calculate total fare with surge."""
        # Base rates (vary by city/ride type)
        rates = {
            "UberX": {"base": 2.50, "per_km": 1.20, "per_min": 0.25},
            "UberXL": {"base": 4.00, "per_km": 1.80, "per_min": 0.35},
            "UberBlack": {"base": 8.00, "per_km": 3.00, "per_min": 0.50},
        }
        
        rate = rates.get(ride_type, rates["UberX"])
        
        base_fare = (
            rate["base"] + 
            rate["per_km"] * distance_km + 
            rate["per_min"] * duration_minutes
        )
        
        # Apply surge
        surged_fare = base_fare * surge
        
        # Minimum fare
        return max(surged_fare, 5.00)

Deep Dive 3: Matching Algorithm (Week 1 Concepts)

Interviewer: "How do you match riders with the optimal driver?"

The Problem

MATCHING CHALLENGE

Given:
  - 1 rider requesting a ride
  - 50 nearby available drivers
  
Find the "best" match considering:
  - Distance (affects ETA and driver cost)
  - Driver rating
  - Driver's acceptance rate
  - Ride type compatibility
  - Fairness (drivers waiting longer get priority)

The Solution: Weighted Scoring with Dispatch

┌─────────────────────────────────────────────────────────────────────────┐
│                      MATCHING ALGORITHM                                 │
│                                                                         │
│   STEP 1: CANDIDATE SELECTION                                           │
│   ───────────────────────────                                           │
│   Query H3 index for drivers within radius                              │
│   Filter: available, correct ride type, not already pinged              │
│                                                                         │
│   STEP 2: SCORING                                                       │
│   ───────────────                                                       │
│   For each candidate, calculate:                                        │
│                                                                         │
│   score = w1 × ETA_score +                                              │
│           w2 × rating_score +                                           │
│           w3 × acceptance_score +                                       │
│           w4 × wait_time_score                                          │
│                                                                         │
│   Where:                                                                │
│   • ETA_score = 1 - (ETA / max_ETA)  [closer is better]                 │
│   • rating_score = (rating - 4.0) / 1.0  [higher is better]             │
│   • acceptance_score = acceptance_rate  [higher is better]              │
│   • wait_time_score = min(wait_time / 30, 1)  [longer wait = priority]  │
│                                                                         │
│   STEP 3: DISPATCH                                                      │
│   ────────────────                                                      │
│   Send request to highest-scoring driver                                │
│   Wait up to 15 seconds for response                                    │
│   If declined/timeout → next best driver                                │
│                                                                         │
│   OPTIMIZATION: BATCH MATCHING                                          │
│   ────────────────────────────                                          │
│   For high-volume periods:                                              │
│   • Collect requests for 2-second windows                               │
│   • Solve assignment problem (Hungarian algorithm)                      │
│   • Globally optimize rider-driver pairs                                │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Implementation

# matching/dispatch_service.py

"""
Rider-Driver Matching Service

Finds optimal driver for each ride request using
weighted scoring and ETA estimation.
"""

from dataclasses import dataclass
from typing import List, Optional, Tuple
from enum import Enum
import asyncio


class MatchResult(Enum):
    MATCHED = "matched"
    NO_DRIVERS = "no_drivers"
    ALL_DECLINED = "all_declined"
    TIMEOUT = "timeout"


@dataclass
class MatchRequest:
    request_id: str
    rider_id: str
    pickup_lat: float
    pickup_lng: float
    destination_lat: float
    destination_lng: float
    ride_type: str


@dataclass
class DriverCandidate:
    driver_id: str
    latitude: float
    longitude: float
    rating: float
    acceptance_rate: float
    wait_time_seconds: int
    eta_seconds: int
    score: float = 0.0


class MatchingService:
    """
    Match riders with optimal drivers.
    
    Uses weighted scoring to balance multiple factors:
    - ETA (user experience)
    - Driver rating (quality)
    - Acceptance rate (reliability)
    - Wait time (fairness to drivers)
    
    Applies concepts:
    - Week 1: Partitioning (H3-based driver lookup)
    - Week 2: Timeouts and retries
    """
    
    # Scoring weights (tuned via ML/experimentation)
    WEIGHTS = {
        "eta": 0.4,
        "rating": 0.2,
        "acceptance": 0.2,
        "wait_time": 0.2
    }
    
    DRIVER_RESPONSE_TIMEOUT = 15  # seconds
    MAX_CANDIDATES = 10
    
    def __init__(
        self, 
        location_index,
        eta_service,
        driver_service,
        notification_service
    ):
        self.location_index = location_index
        self.eta_service = eta_service
        self.driver_service = driver_service
        self.notifications = notification_service
    
    async def find_match(
        self, 
        request: MatchRequest
    ) -> Tuple[MatchResult, Optional[str]]:
        """
        Find and dispatch to optimal driver.
        
        Returns (result, driver_id or None)
        """
        # Step 1: Find nearby candidates
        candidates = await self._get_candidates(request)
        
        if not candidates:
            return (MatchResult.NO_DRIVERS, None)
        
        # Step 2: Score and rank candidates
        scored = await self._score_candidates(request, candidates)
        scored.sort(key=lambda c: c.score, reverse=True)
        
        # Step 3: Dispatch to drivers in order
        for candidate in scored[:self.MAX_CANDIDATES]:
            accepted = await self._dispatch_to_driver(
                request, candidate
            )
            if accepted:
                return (MatchResult.MATCHED, candidate.driver_id)
        
        return (MatchResult.ALL_DECLINED, None)
    
    async def _get_candidates(
        self, 
        request: MatchRequest
    ) -> List[DriverCandidate]:
        """Get available drivers near pickup location."""
        drivers = self.location_index.find_nearby_drivers(
            request.pickup_lat,
            request.pickup_lng,
            radius_km=3.0,
            ride_type=request.ride_type,
            limit=50
        )
        
        candidates = []
        for driver in drivers:
            if not driver.is_available:
                continue
            
            # Get driver stats
            stats = await self.driver_service.get_stats(driver.driver_id)
            
            # Calculate ETA
            eta = await self.eta_service.calculate(
                driver.latitude, driver.longitude,
                request.pickup_lat, request.pickup_lng
            )
            
            candidates.append(DriverCandidate(
                driver_id=driver.driver_id,
                latitude=driver.latitude,
                longitude=driver.longitude,
                rating=driver.rating,
                acceptance_rate=stats.acceptance_rate,
                wait_time_seconds=stats.wait_time_seconds,
                eta_seconds=eta
            ))
        
        return candidates
    
    async def _score_candidates(
        self,
        request: MatchRequest,
        candidates: List[DriverCandidate]
    ) -> List[DriverCandidate]:
        """Calculate match score for each candidate."""
        max_eta = max(c.eta_seconds for c in candidates) or 1
        
        for candidate in candidates:
            # Normalize scores to 0-1 range
            eta_score = 1 - (candidate.eta_seconds / max_eta)
            rating_score = (candidate.rating - 4.0) / 1.0
            acceptance_score = candidate.acceptance_rate
            wait_score = min(candidate.wait_time_seconds / 1800, 1.0)
            
            # Weighted sum
            candidate.score = (
                self.WEIGHTS["eta"] * eta_score +
                self.WEIGHTS["rating"] * max(0, rating_score) +
                self.WEIGHTS["acceptance"] * acceptance_score +
                self.WEIGHTS["wait_time"] * wait_score
            )
        
        return candidates
    
    async def _dispatch_to_driver(
        self,
        request: MatchRequest,
        candidate: DriverCandidate
    ) -> bool:
        """Send ride request to driver and wait for response."""
        # Mark driver as unavailable (prevent double-dispatch)
        await self.driver_service.set_pending(candidate.driver_id)
        
        try:
            # Send push notification
            await self.notifications.send_ride_request(
                driver_id=candidate.driver_id,
                request_id=request.request_id,
                pickup_address="123 Main St",  # Would resolve address
                eta_seconds=candidate.eta_seconds
            )
            
            # Wait for response with timeout
            response = await asyncio.wait_for(
                self.driver_service.wait_for_response(
                    candidate.driver_id, 
                    request.request_id
                ),
                timeout=self.DRIVER_RESPONSE_TIMEOUT
            )
            
            return response == "accepted"
            
        except asyncio.TimeoutError:
            return False
            
        finally:
            # Release driver if not matched
            await self.driver_service.clear_pending(candidate.driver_id)

Deep Dive 4: ETA Prediction (Week 8 Concepts)

Interviewer: "How do you calculate accurate ETAs?"

The Problem

ETA CHALLENGES

Simple distance/speed calculation fails because:
  - Traffic varies by time of day
  - Accidents, road closures
  - Weather affects speed
  - One-way streets
  - Pickup location accessibility (building entrance vs street)
  - Historical patterns matter

Uber's DeepETA model:
  - 95% accuracy within 3 minutes
  - Uses ML to combine multiple signals

Implementation

# routing/eta_service.py

"""
ETA Prediction Service

Combines routing engine with ML model for accurate
arrival time predictions.
"""

from dataclasses import dataclass
from typing import List, Optional, Tuple
from datetime import datetime


@dataclass
class ETAResult:
    eta_seconds: int
    distance_meters: int
    confidence: float
    route_polyline: str


class ETAService:
    """
    Calculate estimated time of arrival.
    
    Combines:
    - Graph-based shortest path (baseline)
    - Real-time traffic data
    - ML model adjustments (DeepETA)
    - Historical patterns
    
    Applies concepts:
    - Week 8: ML pipeline integration
    - Week 4: Caching for repeated queries
    """
    
    def __init__(
        self, 
        routing_engine,
        traffic_service,
        ml_model,
        cache
    ):
        self.routing = routing_engine
        self.traffic = traffic_service
        self.model = ml_model
        self.cache = cache
    
    async def calculate(
        self,
        origin_lat: float,
        origin_lng: float,
        dest_lat: float,
        dest_lng: float,
        departure_time: Optional[datetime] = None
    ) -> ETAResult:
        """Calculate ETA from origin to destination."""
        departure_time = departure_time or datetime.utcnow()
        
        # Check cache for recent identical query
        cache_key = self._make_cache_key(
            origin_lat, origin_lng, dest_lat, dest_lng
        )
        cached = self.cache.get(cache_key)
        if cached:
            return cached
        
        # Step 1: Get baseline route from routing engine
        route = await self.routing.get_route(
            origin_lat, origin_lng,
            dest_lat, dest_lng
        )
        
        # Step 2: Get current traffic conditions
        traffic_factor = await self.traffic.get_factor(
            route.segments,
            departure_time
        )
        
        # Step 3: ML model adjustment
        features = self._extract_features(
            route, traffic_factor, departure_time
        )
        ml_adjustment = self.model.predict(features)
        
        # Step 4: Combine estimates
        base_eta = route.duration_seconds
        traffic_eta = base_eta * traffic_factor
        final_eta = int(traffic_eta * ml_adjustment)
        
        result = ETAResult(
            eta_seconds=final_eta,
            distance_meters=route.distance_meters,
            confidence=0.95 if traffic_factor < 1.5 else 0.85,
            route_polyline=route.polyline
        )
        
        # Cache for 60 seconds
        self.cache.setex(cache_key, 60, result)
        
        return result
    
    def _extract_features(
        self,
        route,
        traffic_factor: float,
        departure_time: datetime
    ) -> dict:
        """Extract features for ML model."""
        return {
            "distance_km": route.distance_meters / 1000,
            "base_duration_min": route.duration_seconds / 60,
            "traffic_factor": traffic_factor,
            "hour_of_day": departure_time.hour,
            "day_of_week": departure_time.weekday(),
            "is_rush_hour": self._is_rush_hour(departure_time),
            "num_turns": route.num_turns,
            "num_traffic_lights": route.num_traffic_lights,
            "highway_fraction": route.highway_fraction,
        }
    
    def _is_rush_hour(self, dt: datetime) -> bool:
        """Check if time is during typical rush hour."""
        hour = dt.hour
        is_weekday = dt.weekday() < 5
        morning_rush = 7 <= hour <= 9
        evening_rush = 17 <= hour <= 19
        return is_weekday and (morning_rush or evening_rush)

Phase 5: Scaling and Edge Cases

Scaling Strategies

┌─────────────────────────────────────────────────────────────────────────┐
│                       SCALING STRATEGIES                                │
│                                                                         │
│   GEOGRAPHIC SHARDING                                                   │
│   ───────────────────                                                   │
│   • Each city/region is a shard                                         │
│   • Location data stays local (low latency)                             │
│   • Cross-region trips handled specially                                │
│                                                                         │
│   EVENT-DRIVEN ARCHITECTURE                                             │
│   ─────────────────────────                                             │
│   • All state changes → Kafka events                                    │
│   • Services react to events asynchronously                             │
│   • Enables real-time analytics pipeline                                │
│                                                                         │
│   CELL-BASED LOAD BALANCING                                             │
│   ─────────────────────────                                             │
│   • Hot H3 cells (downtown) get dedicated capacity                      │
│   • Cold cells share resources                                          │
│   • Dynamic rebalancing based on demand                                 │
│                                                                         │
│   HYBRID CLOUD                                                          │
│   ────────────                                                          │
│   • Core services in Uber data centers                                  │
│   • Burst capacity in cloud (GCP, AWS)                                  │
│   • Multi-region active-active                                          │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Edge Cases

EDGE CASE 1: Driver Goes Offline Mid-Match

Scenario: Driver selected, then loses connectivity
Handling:
  - 15-second timeout for response
  - Automatic retry with next best driver
  - Driver marked unavailable after 30s silence


EDGE CASE 2: Mass Event Ending (Concert, Sports)

Scenario: 50,000 people request rides simultaneously
Handling:
  - Surge pricing activates (attracts drivers)
  - Virtual queue system for fairness
  - Pre-positioning drivers before event ends
  - Predictive scaling based on event schedule


EDGE CASE 3: GPS Drift/Inaccuracy

Scenario: Driver appears in wrong H3 cell
Handling:
  - Kalman filter smooths location jumps
  - Query neighboring cells (k-ring)
  - Fall back to larger radius search


EDGE CASE 4: Payment Failure

Scenario: Card declined after trip completion
Handling:
  - Retry with backup payment method
  - Grace period before blocking rider
  - Driver always gets paid (Uber absorbs risk)

Phase 6: Monitoring and Operations

┌─────────────────────────────────────────────────────────────────────────┐
│                       UBER MONITORING DASHBOARD                         │
│                                                                         │
│   MARKETPLACE HEALTH                                                    │
│   ├── Trips completing/minute:    ~21,000       [██████████] 20,834     │
│   ├── Average ETA accuracy:       94.7%         [█████████░] 94.7%      │
│   ├── Match success rate:         98.2%         [██████████] 98.2%      │
│   └── Average surge:              1.15x         [██░░░░░░░░] 1.15x      │
│                                                                         │
│   LATENCY                                                               │
│   ├── Match time (p50):           1.8s          [█████████░] on target  │
│   ├── Match time (p99):           4.2s          [███████░░░] warning    │
│   ├── Location update processing: 45ms          [██████████] healthy    │
│   └── ETA calculation:            180ms         [████████░░] healthy    │
│                                                                         │
│   SUPPLY-DEMAND                                                         │
│   ├── Online drivers:             1.2M          [███████░░░]            │
│   ├── Active trips:               890K          [████████░░]            │
│   ├── Pending requests:           12K           [█░░░░░░░░░]            │
│   └── Surge zones (>1.5x):        847           [██░░░░░░░░]            │
│                                                                         │
│   INFRASTRUCTURE                                                        │
│   ├── Kafka lag:                  < 100ms       [██████████] healthy    │
│   ├── Redis hit rate:             99.2%         [██████████] healthy    │
│   ├── API error rate:             0.02%         [░░░░░░░░░░] healthy    │
│   └── WebSocket connections:      45M           [████████░░]            │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Alerting Strategy

CRITICAL (PagerDuty):
  - Match success rate < 90%
  - Trip completion rate < 95%
  - Location service latency > 500ms
  - Payment processing failures > 1%

WARNING (Slack):
  - Surge zones > 2000 (unusual demand)
  - Driver online rate dropping > 10%
  - ETA accuracy < 90%
  - Kafka consumer lag > 1 minute

INFO (Dashboard):
  - Regional traffic patterns
  - Driver earnings distribution
  - New city launch metrics

Interview Conclusion

Interviewer: "Excellent work. You've covered the geospatial system, pricing, and matching really well. Any questions?"

You: "Thank you! I'm curious about how Uber handles the transition when a driver crosses city boundaries—does the surge pricing change mid-trip?"

Interviewer: "Great question. The fare is locked at request time based on pickup location. If surge changes during the trip, the rider pays what was quoted upfront. We introduced upfront pricing specifically to avoid surprises."

Summary: Concepts Applied

┌─────────────────────────────────────────────────────────────────────────┐
│             CONCEPTS FROM 10-WEEK COURSE IN UBER DESIGN                 │
│                                                                         │
│   WEEK 1: DATA AT SCALE                                                 │
│   ├── H3 Partitioning: Geospatial sharding by hexagon                   │
│   ├── Geographic sharding: City-level data isolation                    │
│   └── Hot key handling: High-demand cells get dedicated capacity        │
│                                                                         │
│   WEEK 2: FAILURE-FIRST DESIGN                                          │
│   ├── Timeouts: 15s driver response window                              │
│   ├── Retries: Automatic failover to next driver                        │
│   └── Circuit breakers: Protect downstream services                     │
│                                                                         │
│   WEEK 3: MESSAGING & ASYNC                                             │
│   ├── Kafka: Trillions of events per day                                │
│   ├── Event-driven: All state changes as events                         │
│   └── Real-time processing: Flink for surge calculation                 │
│                                                                         │
│   WEEK 4: CACHING                                                       │
│   ├── Redis: Driver locations for sub-ms lookups                        │
│   ├── ETA cache: Avoid redundant routing calculations                   │
│   └── Surge cache: 30-second TTL for stability                          │
│                                                                         │
│   WEEK 5: CONSISTENCY                                                   │
│   ├── Strong: Trip state (Spanner)                                      │
│   ├── Eventual: Driver locations, surge pricing                         │
│   └── Optimistic locking: Prevent double-dispatch                       │
│                                                                         │
│   WEEK 8: ANALYTICS PIPELINE                                            │
│   ├── Real-time: Surge pricing, fraud detection                         │
│   ├── Batch: ML model training, demand forecasting                      │
│   └── ML integration: DeepETA, demand prediction                        │
│                                                                         │
│   WEEK 10: PRODUCTION READINESS                                         │
│   ├── SLOs: < 3s match, 95% ETA accuracy                                │
│   ├── Multi-region: Active-active deployment                            │
│   └── Observability: Real-time marketplace health                       │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────┐
│                  WHY UBER IS AN ENGINEERING MARVEL                      │
│                                                                         │
│   REAL-TIME SCALE                                                       │
│   ───────────────                                                       │
│   • 30 million trips matched daily                                      │
│   • Millions of location updates per second                             │
│   • Sub-3-second matching latency                                       │
│                                                                         │
│   INNOVATION                                                            │
│   ──────────                                                            │
│   • H3: Open-sourced hexagonal geospatial system                        │
│   • DeepETA: ML model with 95% accuracy                                 │
│   • Dynamic pricing: Real-time market equilibrium                       │
│                                                                         │
│   GLOBAL COMPLEXITY                                                     │
│   ─────────────────                                                     │
│   • 10,000+ cities with local regulations                               │
│   • Multiple products: Rides, Eats, Freight                             │
│   • 7.4 million independent driver-partners                             │
│                                                                         │
│   DATA PLATFORM                                                         │
│   ─────────────                                                         │
│   • 200,000+ data pipelines                                             │
│   • Petabytes processed daily                                           │
│   • Thousands of ML models in production                                │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Sources

Official Uber Engineering:

Uber Engineering Blog: https://www.uber.com/blog/engineering/
H3 Geospatial Indexing: https://www.uber.com/blog/h3/
Fulfillment Platform Re-architecture: https://www.uber.com/blog/fulfillment-platform-rearchitecture/
Tech Stack Foundation: https://www.uber.com/blog/tech-stack-part-one-foundation/

H3 Documentation:

H3 GitHub: https://github.com/uber/h3
H3 Documentation: https://h3geo.org/

Architecture:

Astronomer - Uber's 200K Data Pipelines: https://www.astronomer.io/blog/airflow-in-action-uber/
ByteByteGo - Uber System Design: https://blog.bytebytego.com/

Pricing:

Uber Marketplace Surge Pricing: https://www.uber.com/us/en/marketplace/pricing/surge-pricing/
Uber Dynamic Pricing: https://www.uber.com/en-GB/blog/uber-dynamic-pricing/

Self-Assessment Checklist

After studying this case study, you should be able to:

This case study covers Uber's architecture for matching millions of riders with drivers in real-time across 10,000+ cities worldwide.

Back to Course Overview