Himanshu Kukreja
0%
LearnSystem Designbonus-problemsVisa Mastercard Payments

Bonus Problem 4: Visa/Mastercard Global Payment Network

The Invisible Infrastructure Moving $28 Trillion Annually


🎯 65,000 Transactions Per Second β€” Every Second, Every Day

Imagine building a system where a farmer in rural Nebraska can swipe a card at 2 AM, and within 2 seconds, a bank in Tokyo has approved the transaction, a merchant in Dubai has been guaranteed payment, and fraud detection AI has analyzed 500+ risk attributes β€” all while maintaining 99.9999% uptime.

Now imagine this happens 65,000 times. Per second. Across 200+ countries. In 160+ currencies. With $0 tolerance for double-charging.

This is VisaNet and Mastercard's payment network β€” the largest real-time financial system ever built, processing over $28 trillion annually and touching nearly every card transaction on Earth.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                                          β”‚
β”‚                  THE VISA/MASTERCARD SCALE (2025)                        β”‚
β”‚                                                                          β”‚
β”‚   VOLUME                                                                 β”‚
β”‚   ──────                                                                 β”‚
β”‚   Visa Transactions:          293 Billion/year (FY2024)                  β”‚
β”‚   Mastercard Transactions:    197 Billion/year (FY2024)                  β”‚
β”‚   Combined Daily Volume:      1.3+ Billion transactions/day              β”‚
β”‚   Peak Capacity:              65,000+ messages/second                    β”‚
β”‚                                                                          β”‚
β”‚   SCALE                                                                  β”‚
β”‚   ─────                                                                  β”‚
β”‚   Visa Cards:                 4.8 Billion credentials globally           β”‚
β”‚   Mastercard Cards:           3.16 Billion cards worldwide               β”‚
β”‚   Merchant Locations:         150+ Million (Visa)                        β”‚
β”‚   Countries:                  200+ territories                           β”‚
β”‚   Currencies:                 160+ supported                             β”‚
β”‚                                                                          β”‚
β”‚   PERFORMANCE                                                            β”‚
β”‚   ───────────                                                            β”‚
β”‚   Authorization Latency:      < 2 seconds end-to-end                     β”‚
β”‚   Fraud Detection:            < 1 millisecond per transaction            β”‚
β”‚   Network Uptime:             99.9999% (six 9s)                          β”‚
β”‚   Fraud Prevented (Visa):     $40+ Billion/year                          β”‚
β”‚                                                                          β”‚
β”‚   FINANCIAL                                                              β”‚
β”‚   ─────────                                                              β”‚
β”‚   Visa Payment Volume:        $16+ Trillion/year                         β”‚
β”‚   Combined Volume:            $28+ Trillion/year                         β”‚
β”‚   Visa Revenue:               $35.9 Billion (FY2024)                     β”‚
β”‚   Mastercard Revenue:         $28.2 Billion (FY2024)                     β”‚
β”‚                                                                          β”‚
β”‚   INFRASTRUCTURE                                                         β”‚
β”‚   ──────────────                                                         β”‚
β”‚   Visa Data Centers:          7 globally synchronized                    β”‚
β”‚   Security Events/Day:        22 Billion monitored                       β”‚
β”‚   AI Risk Attributes:         500+ analyzed per transaction              β”‚
β”‚                                                                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

This is the system we'll design today β€” and understand how the world's commerce backbone actually works.


The Interview Begins

You walk into the interview room. The interviewer smiles and gestures to the whiteboard.

Interviewer: "Thanks for coming in. Today we're going to design a global card payment network β€” something like Visa or Mastercard. I want to understand how you'd architect a system that can authorize a payment in under 2 seconds, anywhere in the world, while preventing billions in fraud. Ready?"

They write on the whiteboard:

╔══════════════════════════════════════════════════════════════════════════╗
β•‘                                                                          β•‘
β•‘                    Design a Global Payment Network                       β•‘
β•‘                                                                          β•‘
β•‘   A credit/debit card network connecting cardholders, merchants,         β•‘
β•‘   issuing banks, and acquiring banks globally.                           β•‘
β•‘                                                                          β•‘
β•‘   Requirements:                                                          β•‘
β•‘   β€’ Process 50,000+ transactions per second peak                         β•‘
β•‘   β€’ Authorization response in < 2 seconds                                β•‘
β•‘   β€’ Real-time fraud detection with < 0.1% false positive rate            β•‘
β•‘   β€’ Support 200+ countries and 160+ currencies                           β•‘
β•‘   β€’ 99.999%+ availability (five 9s minimum)                              β•‘
β•‘   β€’ Never double-charge, never lose transactions                         β•‘
β•‘                                                                          β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

Interviewer: "Take a few minutes to think about this, then walk me through your approach. We have about 45 minutes."


Phase 1: Requirements Clarification (5 minutes)

Before diving in, you take a breath and start asking questions. This is crucial β€” never assume.

Your Questions

You: "Before I start designing, I'd like to clarify a few requirements. First, can you describe the basic flow? Who are the participants in a card transaction?"

Interviewer: "Great question. It's a four-party model: the cardholder who swipes their card, the merchant who accepts the payment, the acquiring bank that works with the merchant, and the issuing bank that issued the card to the cardholder. Our network sits in the middle, routing messages between the acquirer and issuer."

You: "So the network doesn't hold funds or issue cards β€” it's purely a message routing and processing layer?"

Interviewer: "Exactly. We route authorization requests, handle clearing and settlement, and provide value-added services like fraud detection. The actual money movement happens between banks."

You: "What's the latency budget for authorization? And how is it distributed?"

Interviewer: "End-to-end from card swipe to response should be under 2 seconds. But the merchant terminal, acquirer systems, and issuer systems all consume time. Our network portion should be well under 200 milliseconds β€” ideally much faster."

You: "For fraud detection, how much latency can we add to the critical path?"

Interviewer: "It must be inline with authorization β€” no more than 1-2 milliseconds. We can't slow down legitimate transactions."

You: "What happens if the issuing bank is unreachable during authorization?"

Interviewer: "Good question. We need 'stand-in processing' β€” the ability to make authorization decisions on behalf of an unreachable issuer within pre-agreed parameters."

You: "One more β€” what's the settlement timeline?"

Interviewer: "Clearing happens within hours of the transaction. Settlement β€” the actual money movement β€” typically happens T+1 to T+2 through banking systems. But that's separate from the real-time authorization path."

Functional Requirements

1. AUTHORIZATION (Real-time)
   β€’ Route authorization requests from acquirer to issuer
   β€’ Return approve/decline within 200ms network time
   β€’ Support stand-in processing when issuer unreachable
   β€’ Handle multiple card types: credit, debit, prepaid

2. CLEARING (Near real-time)
   β€’ Collect transaction details from acquirers
   β€’ Calculate interchange fees per transaction
   β€’ Prepare settlement positions for all parties
   β€’ Handle adjustments, chargebacks, reversals

3. SETTLEMENT (Batch)
   β€’ Calculate net positions between all banks
   β€’ Facilitate fund transfers between banks
   β€’ Generate settlement reports and reconciliation

4. FRAUD DETECTION (Real-time)
   β€’ Score every transaction for fraud risk
   β€’ Provide risk score to issuer for decision
   β€’ Detect patterns: velocity, geography, merchant type
   β€’ Block enumeration attacks (card testing)

5. VALUE-ADDED SERVICES
   β€’ Tokenization for secure card storage
   β€’ 3D Secure for online authentication
   β€’ Currency conversion
   β€’ Loyalty and rewards integration

Non-Functional Requirements

1. SCALE
   β€’ 65,000 transactions per second peak capacity
   β€’ 500+ billion transactions per year
   β€’ 200+ countries, 160+ currencies
   β€’ Millions of connected endpoints (banks, processors)

2. LATENCY
   β€’ Authorization: < 200ms network portion (p99)
   β€’ Fraud scoring: < 1ms per transaction
   β€’ End-to-end: < 2 seconds including all parties

3. AVAILABILITY
   β€’ 99.9999% uptime (32 seconds downtime/year max)
   β€’ Zero data loss for financial transactions
   β€’ Geographic redundancy across continents

4. CONSISTENCY
   β€’ Exactly-once processing for authorizations
   β€’ No double-charging under any failure scenario
   β€’ Full audit trail for every transaction

5. SECURITY
   β€’ PCI-DSS Level 1 compliance
   β€’ End-to-end encryption
   β€’ Hardware security modules for keys
   β€’ Real-time threat detection

Phase 2: Back of the Envelope Estimation (5 minutes)

You: "Let me work through the numbers to understand the scale we're dealing with."

Transaction Volume Estimation

TRANSACTION VOLUME

Base numbers (2024/2025 actual):
  Visa annual transactions:           293 billion
  Mastercard annual transactions:     197 billion
  Combined:                           ~490 billion/year

Daily calculation:
  490B / 365 days =                   ~1.34 billion/day
  Per hour average:                   ~56 million/hour
  Per second average:                 ~15,500/second

Peak calculation:
  Peak events: Black Friday, Cyber Monday, Singles Day
  Peak multiplier:                    ~4x average
  Peak TPS:                           ~62,000/second
  Design capacity (headroom):         ~100,000/second

Message Size Estimation

ISO 8583 MESSAGE SIZE

Authorization request fields:
  Message Type Indicator:             4 bytes
  Bitmap:                             16 bytes
  Primary Account Number (PAN):       19 bytes
  Processing Code:                    6 bytes
  Transaction Amount:                 12 bytes
  Transmission DateTime:              10 bytes
  Merchant Category Code:             4 bytes
  Acquiring Institution ID:           11 bytes
  Card Acceptor Terminal ID:          8 bytes
  Additional data fields:             ~200 bytes

Typical message size:                 300-500 bytes
With encryption overhead:             ~600 bytes

Bandwidth calculation:
  100K TPS Γ— 600 bytes Γ— 2 (req+resp)
  =                                   ~120 MB/second
  =                                   ~1 Gbps per data center
  With replication (3x):              ~3 Gbps network capacity

Storage Estimation

STORAGE FOR TRANSACTION RECORDS

Per transaction stored:
  Authorization record:               ~1 KB
  Clearing record:                    ~2 KB
  Settlement details:                 ~500 bytes
  Total per transaction:              ~3.5 KB

Daily storage:
  1.34B transactions Γ— 3.5 KB =       ~4.7 TB/day

Annual storage:
  4.7 TB Γ— 365 =                      ~1.7 PB/year

With 7-year retention:                ~12 PB
With replication (3x):                ~36 PB total storage

Key Metrics Summary

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    ESTIMATION SUMMARY                                    β”‚
β”‚                                                                          β”‚
β”‚  TRAFFIC                                                                 β”‚
β”‚  β”œβ”€β”€ Peak transactions:       100,000 /second (design capacity)          β”‚
β”‚  β”œβ”€β”€ Daily transactions:      1.34 billion                               β”‚
β”‚  └── Annual transactions:     490 billion                                β”‚
β”‚                                                                          β”‚
β”‚  STORAGE                                                                 β”‚
β”‚  β”œβ”€β”€ Per day:                 4.7 TB                                     β”‚
β”‚  β”œβ”€β”€ Per year:                1.7 PB                                     β”‚
β”‚  └── 7-year retention:        ~36 PB (with replication)                  β”‚
β”‚                                                                          β”‚
β”‚  BANDWIDTH                                                               β”‚
β”‚  β”œβ”€β”€ Per data center:         ~1 Gbps sustained                          β”‚
β”‚  └── Peak burst:              ~5 Gbps                                    β”‚
β”‚                                                                          β”‚
β”‚  INFRASTRUCTURE (rough)                                                  β”‚
β”‚  β”œβ”€β”€ Data centers:            4-7 globally synchronized                  β”‚
β”‚  β”œβ”€β”€ Connected banks:         15,000+ financial institutions             β”‚
β”‚  └── Network coverage:        10+ million miles of telecom               β”‚
β”‚                                                                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Phase 3: High-Level Design (10 minutes)

You: "Now let me sketch out the high-level architecture. The key insight is that this is a four-party model with the payment network in the center."

The Four-Party Model

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    THE FOUR-PARTY MODEL                                  β”‚
β”‚                                                                          β”‚
β”‚                                                                          β”‚
β”‚   CARDHOLDER                                         MERCHANT            β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚
β”‚   β”‚             β”‚   1. Swipes/Taps card             β”‚             β”‚      β”‚
β”‚   β”‚   Consumer  β”‚ ────────────────────────────────▢ β”‚   Retailer  β”‚      β”‚
β”‚   β”‚             β”‚                                   β”‚             β”‚      β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜                                   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜      β”‚
β”‚          β”‚                                                 β”‚             β”‚
β”‚          β”‚ Has card from                                   β”‚ Has account β”‚
β”‚          β”‚                                                 β”‚ with        β”‚
β”‚          β–Ό                                                 β–Ό             β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚
β”‚   β”‚             β”‚   4. Settlement ($$$)             β”‚             β”‚      β”‚
β”‚   β”‚   ISSUING   β”‚ ◀──────────────────────────────── β”‚  ACQUIRING  β”‚      β”‚
β”‚   β”‚    BANK     β”‚                                   β”‚    BANK     β”‚      β”‚
β”‚   β”‚             β”‚                                   β”‚             β”‚      β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜                                   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜      β”‚
β”‚          β”‚                                                 β”‚             β”‚
β”‚          β”‚ 3. Auth Response                2. Auth Request β”‚             β”‚
β”‚          β”‚                                                 β”‚             β”‚
β”‚          β”‚         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                 β”‚             β”‚
β”‚          └────────▢│                     β”‚β—€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β”‚
β”‚                    β”‚   PAYMENT NETWORK   β”‚                               β”‚
β”‚                    β”‚   (Visa/Mastercard) β”‚                               β”‚
β”‚                    β”‚                     β”‚                               β”‚
β”‚                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                               β”‚
β”‚                                                                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         HIGH-LEVEL ARCHITECTURE                         β”‚
β”‚                                                                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚                    ACQUIRING SIDE                                 β”‚  β”‚
β”‚  β”‚                                                                   β”‚  β”‚
β”‚  β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”‚  β”‚
β”‚  β”‚   β”‚ POS     │───▢│ Acquirer│───▢│ Acquirer        β”‚               β”‚  β”‚
β”‚  β”‚   β”‚Terminal β”‚    β”‚Processorβ”‚    β”‚ Gateway         β”‚               β”‚  β”‚
β”‚  β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                             β”‚                           β”‚
β”‚                                             β–Ό                           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚                    PAYMENT NETWORK CORE                           β”‚  β”‚
β”‚  β”‚                                                                   β”‚  β”‚
β”‚  β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚  β”‚
β”‚  β”‚   β”‚                   Message Gateway                           β”‚ β”‚  β”‚
β”‚  β”‚   β”‚              (ISO 8583 Protocol Handler)                    β”‚ β”‚  β”‚
β”‚  β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚  β”‚
β”‚  β”‚                             β”‚                                     β”‚  β”‚
β”‚  β”‚              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                      β”‚  β”‚
β”‚  β”‚              β–Ό              β–Ό              β–Ό                      β”‚  β”‚
β”‚  β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”‚  β”‚
β”‚  β”‚   β”‚   Router/   β”‚  β”‚   Fraud     β”‚  β”‚   Stand-In  β”‚               β”‚  β”‚
β”‚  β”‚   β”‚   Switch    β”‚  β”‚  Detection  β”‚  β”‚  Processing β”‚               β”‚  β”‚
β”‚  β”‚   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚  β”‚
β”‚  β”‚          β”‚                β”‚                                       β”‚  β”‚
β”‚  β”‚          β–Ό                β–Ό                                       β”‚  β”‚
β”‚  β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚  β”‚
β”‚  β”‚   β”‚              Authorization Processing                       β”‚ β”‚  β”‚
β”‚  β”‚   β”‚         (Transaction Validation, Enrichment)                β”‚ β”‚  β”‚
β”‚  β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚  β”‚
β”‚  β”‚                             β”‚                                     β”‚  β”‚
β”‚  β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚  β”‚
β”‚  β”‚   β”‚                                                             β”‚ β”‚  β”‚
β”‚  β”‚   β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚ β”‚  β”‚
β”‚  β”‚   β”‚   β”‚PostgreSQLβ”‚  β”‚ Redis    β”‚  β”‚ Kafka    β”‚  β”‚ Analyticsβ”‚    β”‚ β”‚  β”‚
β”‚  β”‚   β”‚   β”‚ (Records)β”‚  β”‚ (Cache)  β”‚  β”‚ (Events) β”‚  β”‚ (OLAP)   β”‚    β”‚ β”‚  β”‚
β”‚  β”‚   β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚ β”‚  β”‚
β”‚  β”‚   β”‚                    Data Layer                               β”‚ β”‚  β”‚
β”‚  β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚  β”‚
β”‚  β”‚                                                                   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                             β”‚                           β”‚
β”‚                                             β–Ό                           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚                    ISSUING SIDE                                   β”‚  β”‚
β”‚  β”‚                                                                   β”‚  β”‚
β”‚  β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”‚  β”‚
β”‚  β”‚   β”‚ Issuer          │───▢│ Issuer  │───▢│ Core    β”‚               β”‚  β”‚
β”‚  β”‚   β”‚ Gateway         β”‚    β”‚Processorβ”‚    β”‚ Banking β”‚               β”‚  β”‚
β”‚  β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚  β”‚
β”‚  β”‚                                                                   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Component Breakdown

You: "Let me walk through each major component..."

1. Message Gateway

Purpose: Protocol handling for ISO 8583 messages from connected banks and processors.

Key responsibilities:

  • Parse and validate incoming ISO 8583 messages
  • Handle multiple message versions and network variations
  • Encrypt/decrypt sensitive data fields
  • Route to appropriate internal services

Technology choice: Custom high-performance message parser, likely C++ or specialized hardware for latency-critical path.

2. Router/Switch

Purpose: Route authorization requests to the correct issuing bank based on card BIN (Bank Identification Number).

Key responsibilities:

  • BIN lookup to identify issuing bank
  • Select optimal route (primary, fallback)
  • Load balance across issuer connections
  • Detect and route around failures

3. Fraud Detection Engine

Purpose: Score every transaction for fraud risk in real-time.

Key responsibilities:

  • Analyze 500+ risk attributes per transaction
  • Return risk score within 1 millisecond
  • Feed scores to issuer for decision support
  • Detect velocity patterns, geographic anomalies

4. Stand-In Processing

Purpose: Make authorization decisions when issuer is unreachable.

Key responsibilities:

  • Maintain issuer-defined rules and limits
  • Track card-level velocity and spending
  • Approve/decline within pre-agreed parameters
  • Queue transactions for later issuer reconciliation

5. Clearing & Settlement Engine

Purpose: Process non-real-time clearing and facilitate settlement.

Key responsibilities:

  • Collect and validate clearing records
  • Calculate interchange fees per transaction
  • Net settlement positions across banks
  • Generate settlement files and reports

Data Flow

You: "Let me trace through a typical authorization flow..."

AUTHORIZATION FLOW (< 2 seconds end-to-end)

Step 1: Card Presented (0-200ms)
        Cardholder ──▢ POS Terminal ──▢ Acquirer Processor
        β€’ Card data captured
        β€’ PIN/CVV validated locally
        β€’ ISO 8583 message constructed

Step 2: Acquirer to Network (200-400ms)
        Acquirer ──▢ Message Gateway ──▢ Router
        β€’ Message validated and parsed
        β€’ BIN lookup to identify issuer
        β€’ Transaction enriched (merchant data)

Step 3: Fraud Scoring (400-401ms)
        Router ──▢ Fraud Engine ──▢ Router
        β€’ 500+ attributes analyzed
        β€’ Risk score calculated
        β€’ Score attached to message

Step 4: Issuer Authorization (401-1400ms)
        Router ──▢ Issuer Gateway ──▢ Issuer Core Banking
        β€’ Credit limit checked
        β€’ Fraud rules evaluated
        β€’ Authorize/Decline decision

Step 5: Response Return (1400-1800ms)
        Issuer ──▢ Network ──▢ Acquirer ──▢ POS
        β€’ Auth code generated
        β€’ Response transmitted back
        β€’ Receipt printed

Step 6: Clearing (Later, batch)
        Acquirer ──▢ Network ──▢ Issuer
        β€’ Final transaction details
        β€’ Interchange calculated
        β€’ Disputes window opens

Step 7: Settlement (T+1 to T+2)
        Network calculates net positions
        Banks transfer funds
        Merchant credited

Phase 4: Deep Dives (20 minutes)

Interviewer: "Great high-level design. Let's dive deeper into a few areas. Tell me more about how you'd handle the real-time authorization at 65,000 TPS."


Deep Dive 1: Real-Time Authorization at Scale (Week 1-2 Concepts)

You: "This is the heart of the system. Let me explain how we achieve sub-200ms latency at 65,000 TPS."

The Problem

AUTHORIZATION LATENCY CHALLENGE

Without proper optimization:
  Network RTT to bank:              50-100ms
  Message parsing:                  10-20ms
  BIN lookup (if naive):            5-10ms
  Fraud scoring:                    50-100ms (if not optimized)
  Database writes:                  20-50ms
  Total:                            150-280ms just in network

But we need:
  βœ“ End-to-end including acquirer + issuer: < 2 seconds
  βœ“ Network portion only: < 200ms
  βœ“ Fraud scoring: < 1ms
  βœ“ Zero message loss

The Solution: Ultra-Low Latency Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    LOW-LATENCY AUTHORIZATION PATH                       β”‚
β”‚                                                                         β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚                    NETWORK EDGE                                 β”‚   β”‚
β”‚   β”‚                                                                 β”‚   β”‚
β”‚   β”‚   Dedicated leased lines (not internet)                         β”‚   β”‚
β”‚   β”‚   MPLS VPN with predictable latency                             β”‚   β”‚
β”‚   β”‚   Multiple redundant paths per connection                       β”‚   β”‚
β”‚   β”‚                                                                 β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                               β”‚                                         β”‚
β”‚                               β–Ό                                         β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚                    MESSAGE PROCESSING                           β”‚   β”‚
β”‚   β”‚                                                                 β”‚   β”‚
β”‚   β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚   β”‚
β”‚   β”‚   β”‚ FPGA/ASIC    │───▢│ Memory-only  │───▢│ Pre-computed β”‚      β”‚   β”‚
β”‚   β”‚   β”‚ Parser       β”‚    β”‚ Processing   β”‚    β”‚ BIN Tables   β”‚      β”‚   β”‚
β”‚   β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚   β”‚
β”‚   β”‚                                                                 β”‚   β”‚
β”‚   β”‚   β€’ Zero-copy message handling                                  β”‚   β”‚
β”‚   β”‚   β€’ BIN lookup in < 1ΞΌs (in-memory hash)                        β”‚   β”‚
β”‚   β”‚   β€’ No disk I/O on critical path                                β”‚   β”‚
β”‚   β”‚                                                                 β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                               β”‚                                         β”‚
β”‚                               β–Ό                                         β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚                    ASYNC PERSISTENCE                            β”‚   β”‚
β”‚   β”‚                                                                 β”‚   β”‚
β”‚   β”‚   Authorization response returns BEFORE disk write              β”‚   β”‚
β”‚   β”‚   WAL ensures durability (Week 1: Write-ahead logs)             β”‚   β”‚
β”‚   β”‚   Async replication to standby data centers                     β”‚   β”‚
β”‚   β”‚                                                                 β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Implementation

# Real-Time Authorization Service
# Applies: Week 1 (Partitioning), Week 2 (Timeouts, Idempotency)

from dataclasses import dataclass
from typing import Optional
import asyncio
import time
from enum import Enum


class AuthDecision(Enum):
    APPROVED = "00"
    DECLINED_INSUFFICIENT_FUNDS = "51"
    DECLINED_EXPIRED_CARD = "54"
    DECLINED_FRAUD = "59"
    SYSTEM_ERROR = "96"
    ISSUER_UNAVAILABLE = "91"


@dataclass
class AuthorizationRequest:
    """ISO 8583-based authorization request."""
    message_type: str  # "0100" for auth request
    pan: str  # Primary Account Number (card number)
    amount: int  # In smallest currency unit (cents)
    currency_code: str  # "840" for USD
    merchant_id: str
    terminal_id: str
    mcc: str  # Merchant Category Code
    transaction_id: str  # Unique ID for idempotency
    timestamp: float


@dataclass
class AuthorizationResponse:
    """Authorization response with decision."""
    transaction_id: str
    decision: AuthDecision
    auth_code: Optional[str] = None
    risk_score: Optional[int] = None
    processing_time_ms: float = 0


class BINLookupService:
    """
    Ultra-fast BIN lookup using in-memory hash table.
    
    BIN (Bank Identification Number) is the first 6-8 digits of card.
    Used to route to correct issuing bank.
    
    Applies: Week 1, Day 1 - Hash partitioning for O(1) lookup
    """
    
    def __init__(self):
        # Pre-loaded hash map of BIN -> Issuer routing info
        # In production: 500K+ BIN ranges loaded at startup
        self._bin_table: dict[str, IssuerRoute] = {}
        self._load_bin_table()
    
    def _load_bin_table(self):
        """Load BIN table into memory at startup."""
        # Example: Load from database into memory
        # Real system has 500K+ entries
        pass
    
    def lookup(self, pan: str) -> Optional['IssuerRoute']:
        """
        O(1) lookup of issuer routing information.
        
        Takes < 1 microsecond with in-memory hash.
        """
        # Try 8-digit BIN first, then 6-digit
        bin_8 = pan[:8]
        bin_6 = pan[:6]
        
        return self._bin_table.get(bin_8) or self._bin_table.get(bin_6)


@dataclass
class IssuerRoute:
    """Routing information for an issuer."""
    issuer_id: str
    primary_endpoint: str
    backup_endpoint: str
    timeout_ms: int
    supports_standin: bool


class AuthorizationService:
    """
    Core authorization service achieving < 200ms latency.
    
    Key optimizations:
    1. No disk I/O on critical path
    2. In-memory BIN lookup
    3. Async persistence after response
    4. Pre-established connections to issuers
    
    Applies:
    - Week 1, Day 1: Partitioning (BIN-based routing)
    - Week 2, Day 1: Timeout management
    - Week 2, Day 2: Idempotency keys
    """
    
    def __init__(
        self,
        bin_service: BINLookupService,
        fraud_service: 'FraudDetectionService',
        issuer_gateway: 'IssuerGateway',
        standin_service: 'StandInService',
        wal: 'WriteAheadLog'
    ):
        self.bin_service = bin_service
        self.fraud_service = fraud_service
        self.issuer_gateway = issuer_gateway
        self.standin_service = standin_service
        self.wal = wal
        
        # Idempotency cache (Week 2, Day 2)
        self._idempotency_cache: dict[str, AuthorizationResponse] = {}
    
    async def authorize(
        self, 
        request: AuthorizationRequest
    ) -> AuthorizationResponse:
        """
        Process authorization request with strict latency SLA.
        
        Target: < 200ms for network processing portion.
        """
        start_time = time.monotonic()
        
        # Step 1: Check idempotency (< 0.1ms)
        # Prevents double-charging on retries
        cached = self._idempotency_cache.get(request.transaction_id)
        if cached:
            return cached
        
        # Step 2: WAL write for durability (async, non-blocking)
        # We'll persist AFTER sending response
        wal_future = asyncio.create_task(
            self.wal.append(request)
        )
        
        # Step 3: BIN lookup (< 0.001ms)
        route = self.bin_service.lookup(request.pan)
        if not route:
            return self._error_response(
                request, AuthDecision.SYSTEM_ERROR
            )
        
        # Step 4: Fraud scoring (< 1ms)
        # This happens in parallel with nothing else
        risk_score = await self.fraud_service.score(request)
        
        # Step 5: Route to issuer with timeout (variable, ~100-1000ms)
        try:
            response = await asyncio.wait_for(
                self.issuer_gateway.authorize(request, route, risk_score),
                timeout=route.timeout_ms / 1000.0
            )
        except asyncio.TimeoutError:
            # Issuer timeout - use stand-in processing
            if route.supports_standin:
                response = await self.standin_service.authorize(
                    request, risk_score
                )
            else:
                response = self._error_response(
                    request, AuthDecision.ISSUER_UNAVAILABLE
                )
        
        # Step 6: Calculate processing time
        processing_time = (time.monotonic() - start_time) * 1000
        response.processing_time_ms = processing_time
        
        # Step 7: Cache for idempotency (TTL: 24 hours)
        self._idempotency_cache[request.transaction_id] = response
        
        # Step 8: Ensure WAL write completed
        await wal_future
        
        return response
    
    def _error_response(
        self, 
        request: AuthorizationRequest,
        decision: AuthDecision
    ) -> AuthorizationResponse:
        return AuthorizationResponse(
            transaction_id=request.transaction_id,
            decision=decision
        )


class WriteAheadLog:
    """
    Write-ahead log for transaction durability.
    
    Applies: Week 1 - WAL for durability before processing.
    
    Key insight: We write to WAL but don't wait for sync
    before sending response. WAL ensures we can recover
    any in-flight transactions after crash.
    """
    
    async def append(self, request: AuthorizationRequest) -> None:
        """
        Append request to WAL.
        
        In production: Write to local SSD with group commit
        for batching multiple transactions per fsync.
        """
        # Serialize and write to durable storage
        pass


# =============================================================================
# Network Layer: ISO 8583 Message Handling
# =============================================================================

class ISO8583Parser:
    """
    High-performance ISO 8583 message parser.
    
    In production, this might be implemented in:
    - C++ for CPU optimization
    - FPGA for hardware acceleration
    - Specialized network appliances
    
    Key fields in ISO 8583:
    - Field 2: Primary Account Number (PAN)
    - Field 3: Processing Code
    - Field 4: Transaction Amount
    - Field 11: System Trace Audit Number
    - Field 37: Retrieval Reference Number
    - Field 39: Response Code
    """
    
    @staticmethod
    def parse(raw_message: bytes) -> AuthorizationRequest:
        """
        Parse ISO 8583 message to internal format.
        
        Real implementation handles:
        - Multiple MTI versions (1987, 1993, 2003)
        - Network-specific variations
        - BCD vs ASCII encoding
        - Variable-length fields
        """
        # Parse MTI (Message Type Indicator)
        # Parse bitmap to know which fields present
        # Parse each field according to spec
        pass
    
    @staticmethod
    def serialize(response: AuthorizationResponse) -> bytes:
        """Serialize response to ISO 8583 format."""
        pass

Edge Cases

Interviewer: "What happens if the issuer is slow or unreachable?"

You: "We implement stand-in processing with tiered timeouts..."

ISSUER TIMEOUT HANDLING

Scenario: Issuer taking too long or unreachable

Timeout Strategy (Week 2, Day 1):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                                         β”‚
β”‚   Timeout Tier 1: 500ms                                                 β”‚
β”‚   └── First attempt to primary endpoint                                 β”‚
β”‚                                                                         β”‚
β”‚   Timeout Tier 2: 300ms                                                 β”‚
β”‚   └── Failover to backup endpoint                                       β”‚
β”‚                                                                         β”‚
β”‚   Timeout Tier 3: 200ms (Stand-In)                                      β”‚
β”‚   └── Make decision locally using issuer-provided rules:                β”‚
β”‚       β€’ Single transaction limit: $500                                  β”‚
β”‚       β€’ Daily velocity limit: $2,000                                    β”‚
β”‚       β€’ Decline if risk score > 80                                      β”‚
β”‚       β€’ Decline if card reported lost/stolen                            β”‚
β”‚                                                                         β”‚
β”‚   Total budget: ~1,000ms (leaves time for acquirer/merchant)            β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Stand-In Processing:
  β€’ Track card velocity in Redis (global cluster)
  β€’ Apply issuer-defined rules
  β€’ Queue for later reconciliation with issuer
  β€’ Issuer accepts liability for approved stand-ins

Deep Dive 2: Real-Time Fraud Detection in < 1ms (Week 1-2 Concepts)

Interviewer: "How do you score 500+ attributes for fraud in under a millisecond?"

You: "This is where AI meets extreme performance engineering. Let me show you the architecture..."

The Solution: ML at Millisecond Scale

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    FRAUD DETECTION ARCHITECTURE                         β”‚
β”‚                                                                         β”‚
β”‚   Transaction ───▢ Feature Extraction ───▢ Model Scoring ───▢ Score     β”‚
β”‚                                                                         β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚                    FEATURE EXTRACTION (< 0.5ms)                 β”‚   β”‚
β”‚   β”‚                                                                 β”‚   β”‚
β”‚   β”‚   Transaction Features:     Velocity Features:                  β”‚   β”‚
β”‚   β”‚   β€’ Amount                  β€’ Txn count last hour               β”‚   β”‚
β”‚   β”‚   β€’ Currency                β€’ Txn count last day                β”‚   β”‚
β”‚   β”‚   β€’ MCC category            β€’ Amount last hour                  β”‚   β”‚
β”‚   β”‚   β€’ Card present/absent     β€’ Unique merchants today            β”‚   β”‚
β”‚   β”‚   β€’ Entry mode (chip/swipe) β€’ Geographic spread                 β”‚   β”‚
β”‚   β”‚                                                                 β”‚   β”‚
β”‚   β”‚   Behavioral Features:      Risk Indicators:                    β”‚   β”‚
β”‚   β”‚   β€’ Time since last txn     β€’ Is high-risk MCC?                 β”‚   β”‚
β”‚   β”‚   β€’ Distance from last txn  β€’ Is high-risk country?             β”‚   β”‚
β”‚   β”‚   β€’ Deviation from pattern  β€’ Card age                          β”‚   β”‚
β”‚   β”‚   β€’ Device fingerprint      β€’ Address verification result       β”‚   β”‚
β”‚   β”‚                                                                 β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                               β”‚                                         β”‚
β”‚                               β–Ό                                         β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚                    MODEL SCORING (< 0.5ms)                      β”‚   β”‚
β”‚   β”‚                                                                 β”‚   β”‚
β”‚   β”‚   Neural Network:                                               β”‚   β”‚
β”‚   β”‚   β€’ Pre-compiled for inference                                  β”‚   β”‚
β”‚   β”‚   β€’ Weights loaded in memory                                    β”‚   β”‚
β”‚   β”‚   β€’ GPU/TPU acceleration or optimized CPU                       β”‚   β”‚
β”‚   β”‚   β€’ Batch scoring for throughput                                β”‚   β”‚
β”‚   β”‚                                                                 β”‚   β”‚
β”‚   β”‚   Output: Risk score 0-99                                       β”‚   β”‚
β”‚   β”‚   β€’ 0-30: Low risk (auto-approve candidate)                     β”‚   β”‚
β”‚   β”‚   β€’ 30-70: Medium risk (standard processing)                    β”‚   β”‚
β”‚   β”‚   β€’ 70-90: High risk (additional auth may be required)          β”‚   β”‚
β”‚   β”‚   β€’ 90-99: Very high risk (likely fraud, decline candidate)     β”‚   β”‚
β”‚   β”‚                                                                 β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Implementation

# Real-Time Fraud Detection Service
# Applies: Week 1 (Data partitioning for velocity), Week 2 (Timeouts)

import numpy as np
from typing import Dict, List
import redis.asyncio as redis
from dataclasses import dataclass


@dataclass
class FraudFeatures:
    """500+ features extracted for fraud scoring."""
    
    # Transaction features (from request)
    amount: float
    currency: str
    mcc: str
    entry_mode: str
    card_present: bool
    
    # Velocity features (from Redis)
    txn_count_1h: int
    txn_count_24h: int
    amount_sum_1h: float
    amount_sum_24h: float
    unique_merchants_24h: int
    unique_countries_24h: int
    
    # Behavioral features (computed)
    time_since_last_txn_seconds: float
    distance_from_last_txn_km: float
    amount_deviation_from_avg: float
    
    # Risk indicators
    is_high_risk_mcc: bool
    is_high_risk_country: bool
    card_age_days: int
    
    def to_vector(self) -> np.ndarray:
        """Convert to feature vector for model input."""
        # One-hot encode categorical features
        # Normalize numerical features
        # Return as numpy array
        pass


class VelocityService:
    """
    Track card-level velocity using Redis.
    
    Applies: Week 1, Day 4 - Hot key handling
    
    Challenge: Popular cards (corporate cards) can be hot keys.
    Solution: Use Redis Cluster with card-hash-based sharding.
    """
    
    def __init__(self, redis_cluster: redis.RedisCluster):
        self.redis = redis_cluster
    
    async def get_velocity(self, pan_hash: str) -> Dict[str, any]:
        """
        Get velocity metrics for a card.
        
        Uses Redis sorted sets for time-windowed counting.
        All operations are O(log N) or better.
        """
        pipe = self.redis.pipeline()
        now = time.time()
        
        # Key structure: velocity:{pan_hash}:{metric}
        base_key = f"velocity:{pan_hash}"
        
        # Count transactions in last hour
        pipe.zcount(f"{base_key}:txns", now - 3600, now)
        
        # Count transactions in last 24 hours
        pipe.zcount(f"{base_key}:txns", now - 86400, now)
        
        # Sum amounts in last hour (stored as score)
        pipe.zrangebyscore(
            f"{base_key}:amounts", 
            now - 3600, 
            now, 
            withscores=True
        )
        
        # Unique merchants in last 24 hours
        pipe.zcount(f"{base_key}:merchants", now - 86400, now)
        
        # Last transaction location
        pipe.get(f"{base_key}:last_location")
        
        results = await pipe.execute()
        
        return {
            "txn_count_1h": results[0],
            "txn_count_24h": results[1],
            "amount_sum_1h": sum(score for _, score in results[2]),
            "unique_merchants_24h": results[3],
            "last_location": results[4]
        }
    
    async def record_transaction(
        self, 
        pan_hash: str,
        amount: float,
        merchant_id: str,
        location: str
    ) -> None:
        """
        Record transaction for future velocity checks.
        
        Uses async fire-and-forget to not block auth response.
        """
        pipe = self.redis.pipeline()
        now = time.time()
        base_key = f"velocity:{pan_hash}"
        
        # Add to transaction count
        pipe.zadd(f"{base_key}:txns", {str(now): now})
        
        # Add amount
        pipe.zadd(f"{base_key}:amounts", {str(now): amount})
        
        # Add merchant
        pipe.zadd(f"{base_key}:merchants", {merchant_id: now})
        
        # Update last location
        pipe.set(f"{base_key}:last_location", location)
        
        # Expire old data (48 hour window for cleanup)
        for key_suffix in ["txns", "amounts", "merchants"]:
            pipe.zremrangebyscore(
                f"{base_key}:{key_suffix}", 
                0, 
                now - 172800
            )
        
        await pipe.execute()


class FraudDetectionService:
    """
    Real-time fraud scoring in < 1ms.
    
    Applies:
    - Week 1, Day 4: Hot key handling for velocity
    - Week 2, Day 1: Strict timeout management
    
    Key optimizations:
    1. Pre-loaded model weights in memory
    2. Batch inference when possible
    3. Feature computation parallelized
    4. Redis cluster for velocity data
    """
    
    def __init__(
        self,
        velocity_service: VelocityService,
        model: 'FraudModel'
    ):
        self.velocity = velocity_service
        self.model = model
        
        # Pre-compute static risk indicators
        self._high_risk_mccs = self._load_high_risk_mccs()
        self._high_risk_countries = self._load_high_risk_countries()
    
    async def score(self, request: AuthorizationRequest) -> int:
        """
        Score transaction for fraud risk.
        
        Returns: Risk score 0-99
        
        Target latency: < 1ms
        """
        # Hash PAN for privacy and consistent sharding
        pan_hash = self._hash_pan(request.pan)
        
        # Get velocity features (Redis, < 0.3ms)
        velocity = await self.velocity.get_velocity(pan_hash)
        
        # Extract all features (< 0.2ms)
        features = self._extract_features(request, velocity)
        
        # Run model inference (< 0.5ms)
        score = self.model.predict(features.to_vector())
        
        # Fire-and-forget: Record this transaction for future velocity
        asyncio.create_task(
            self.velocity.record_transaction(
                pan_hash,
                request.amount,
                request.merchant_id,
                self._get_location(request)
            )
        )
        
        return int(score * 99)
    
    def _extract_features(
        self, 
        request: AuthorizationRequest,
        velocity: Dict
    ) -> FraudFeatures:
        """Extract features from request and velocity data."""
        
        # Compute geographic distance if we have last location
        distance_km = 0.0
        if velocity.get("last_location"):
            distance_km = self._calculate_distance(
                velocity["last_location"],
                self._get_location(request)
            )
        
        return FraudFeatures(
            amount=request.amount / 100.0,  # Convert cents to dollars
            currency=request.currency_code,
            mcc=request.mcc,
            entry_mode="chip",  # From request
            card_present=True,  # From request
            txn_count_1h=velocity.get("txn_count_1h", 0),
            txn_count_24h=velocity.get("txn_count_24h", 0),
            amount_sum_1h=velocity.get("amount_sum_1h", 0.0),
            amount_sum_24h=velocity.get("amount_sum_24h", 0.0),
            unique_merchants_24h=velocity.get("unique_merchants_24h", 0),
            unique_countries_24h=1,  # Computed from history
            time_since_last_txn_seconds=0.0,  # Computed
            distance_from_last_txn_km=distance_km,
            amount_deviation_from_avg=0.0,  # Computed
            is_high_risk_mcc=request.mcc in self._high_risk_mccs,
            is_high_risk_country=False,  # From merchant location
            card_age_days=365  # From card data
        )
    
    def _hash_pan(self, pan: str) -> str:
        """Hash PAN for privacy-preserving velocity lookup."""
        import hashlib
        return hashlib.sha256(pan.encode()).hexdigest()[:16]
    
    def _load_high_risk_mccs(self) -> set:
        """Load high-risk merchant category codes."""
        return {
            "5912",  # Drug stores
            "5944",  # Jewelry stores
            "5999",  # Misc retail
            "7995",  # Gambling
        }
    
    def _load_high_risk_countries(self) -> set:
        """Load high-risk countries."""
        return set()  # Configured per issuer


class FraudModel:
    """
    Pre-trained neural network for fraud scoring.
    
    In production:
    - Trained on billions of transactions
    - Updated weekly with new fraud patterns
    - A/B tested before deployment
    - Multiple model versions for different card types
    """
    
    def __init__(self, model_path: str):
        # Load pre-trained model weights
        # Could be TensorFlow, PyTorch, or ONNX
        self.weights = self._load_weights(model_path)
    
    def predict(self, features: np.ndarray) -> float:
        """
        Run inference on feature vector.
        
        Returns: Probability of fraud (0.0 to 1.0)
        
        In production, this might use:
        - TensorRT for GPU acceleration
        - ONNX Runtime for CPU optimization
        - Custom inference engine
        """
        # Simple neural network forward pass
        # Real implementation is much more sophisticated
        return 0.1  # Placeholder
    
    def _load_weights(self, path: str):
        """Load model weights from file."""
        pass

Deep Dive 3: Global Data Center Architecture (Week 1-2 Concepts)

Interviewer: "How do you achieve 99.9999% uptime across global data centers?"

You: "This requires a sophisticated multi-data-center architecture with synchronous and asynchronous replication..."

The Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    GLOBAL DATA CENTER TOPOLOGY                          β”‚
β”‚                                                                         β”‚
β”‚                           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                           β”‚
β”‚                           β”‚   LONDON DC     β”‚                           β”‚
β”‚                           β”‚   (Active)      β”‚                           β”‚
β”‚                           β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜                           β”‚
β”‚                                    β”‚                                    β”‚
β”‚              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”‚
β”‚              β”‚                     β”‚                     β”‚              β”‚
β”‚              β–Ό                     β–Ό                     β–Ό              β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚
β”‚   β”‚  ASHBURN DC     β”‚   β”‚  DENVER DC      β”‚   β”‚  SINGAPORE DC   β”‚       β”‚
β”‚   β”‚  (Primary US)   │◀─▢│  (Backup US)    │◀─▢│  (APAC Primary) β”‚       β”‚
β”‚   β”‚                 β”‚   β”‚                 β”‚   β”‚                 β”‚       β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚
β”‚            β”‚                                            β”‚               β”‚
β”‚            β”‚            MPLS VPN Network                β”‚               β”‚
β”‚            β”‚         (Dedicated circuits,               β”‚               β”‚
β”‚            β”‚          not public internet)              β”‚               β”‚
β”‚            β”‚                                            β”‚               β”‚
β”‚            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚
β”‚                               β”‚                                         β”‚
β”‚                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                              β”‚
β”‚                    β”‚                     β”‚                              β”‚
β”‚                    β–Ό                     β–Ό                              β”‚
β”‚         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                       β”‚
β”‚         β”‚  Acquirer Banks β”‚   β”‚  Issuer Banks   β”‚                       β”‚
β”‚         β”‚  (Connections)  β”‚   β”‚  (Connections)  β”‚                       β”‚
β”‚         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                       β”‚
β”‚                                                                         β”‚
β”‚   REPLICATION STRATEGY:                                                 β”‚
β”‚   β€’ Auth state: Synchronous within region, async across regions         β”‚
β”‚   β€’ Transaction log: Multi-master with conflict resolution              β”‚
β”‚   β€’ BIN tables: Read replicas everywhere, writes to primary             β”‚
β”‚   β€’ Velocity data: Redis Cluster spanning regions                       β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Redundancy Design

# Multi-Data-Center Routing and Failover
# Applies: Week 1, Day 2 - Replication Trade-offs

from dataclasses import dataclass
from typing import List, Optional
from enum import Enum
import asyncio


class DataCenterStatus(Enum):
    HEALTHY = "healthy"
    DEGRADED = "degraded"
    OFFLINE = "offline"


@dataclass
class DataCenter:
    """Data center configuration and health."""
    id: str
    region: str
    primary_for_regions: List[str]
    status: DataCenterStatus
    latency_ms: float
    capacity_pct: float


class GlobalRouter:
    """
    Route requests to optimal data center.
    
    Applies: Week 1, Day 2 - Replication and failover
    
    Routing priorities:
    1. Geographic proximity (latency)
    2. Data center health
    3. Load balancing
    4. Regulatory requirements (data residency)
    """
    
    def __init__(self):
        self.data_centers: Dict[str, DataCenter] = {}
        self._health_check_interval = 1.0  # seconds
    
    def select_data_center(
        self, 
        source_region: str,
        transaction_type: str
    ) -> DataCenter:
        """
        Select optimal data center for processing.
        
        Returns primary DC for region if healthy,
        otherwise fails over to backup.
        """
        # Get primary DC for this region
        primary = self._get_primary_for_region(source_region)
        
        if primary and primary.status == DataCenterStatus.HEALTHY:
            return primary
        
        # Primary unhealthy - find backup
        backup = self._get_backup_for_region(source_region)
        
        if backup and backup.status == DataCenterStatus.HEALTHY:
            return backup
        
        # All regional DCs down - use global fallback
        return self._get_any_healthy_dc()
    
    def _get_primary_for_region(self, region: str) -> Optional[DataCenter]:
        """Get primary DC for a region."""
        for dc in self.data_centers.values():
            if region in dc.primary_for_regions:
                return dc
        return None


class TransactionReplicator:
    """
    Replicate transactions across data centers.
    
    Applies: Week 1, Day 2 - Sync vs Async replication
    
    Strategy:
    - Authorization state: Sync within region (strong consistency)
    - Transaction log: Async across regions (eventual consistency)
    - WAL replication ensures no data loss on DC failure
    """
    
    def __init__(
        self,
        local_dc: str,
        peer_dcs: List[str]
    ):
        self.local_dc = local_dc
        self.peer_dcs = peer_dcs
    
    async def replicate_auth(
        self, 
        transaction: AuthorizationRequest,
        response: AuthorizationResponse
    ) -> None:
        """
        Replicate authorization to peer DCs.
        
        Sync replication to regional peer (for hot standby).
        Async replication to other regions (for DR).
        """
        # Sync replicate to regional peer
        regional_peer = self._get_regional_peer()
        if regional_peer:
            await self._sync_replicate(regional_peer, transaction, response)
        
        # Async replicate to other regions
        for dc in self.peer_dcs:
            if dc != regional_peer:
                asyncio.create_task(
                    self._async_replicate(dc, transaction, response)
                )
    
    async def _sync_replicate(
        self, 
        dc: str, 
        transaction: AuthorizationRequest,
        response: AuthorizationResponse
    ) -> None:
        """
        Synchronous replication - wait for acknowledgment.
        
        Used for regional failover capability.
        Timeout: 50ms (if peer is slow, continue anyway)
        """
        try:
            await asyncio.wait_for(
                self._send_to_dc(dc, transaction, response),
                timeout=0.05
            )
        except asyncio.TimeoutError:
            # Log but don't fail the auth
            # Regional peer will catch up from WAL
            pass
    
    async def _async_replicate(
        self, 
        dc: str,
        transaction: AuthorizationRequest, 
        response: AuthorizationResponse
    ) -> None:
        """
        Asynchronous replication - fire and forget.
        
        Used for cross-region disaster recovery.
        Will be caught up from WAL if this fails.
        """
        try:
            await self._send_to_dc(dc, transaction, response)
        except Exception as e:
            # Log error - cross-region replication will recover
            pass


class DataCenterFailover:
    """
    Handle data center failover scenarios.
    
    Applies: Week 2, Day 3 - Circuit breakers
    
    Failover scenarios:
    1. Network partition between DCs
    2. Complete DC outage
    3. Degraded performance (slow responses)
    """
    
    def __init__(self):
        self._dc_health: Dict[str, CircuitBreaker] = {}
    
    async def check_dc_health(self, dc_id: str) -> DataCenterStatus:
        """
        Check health of a data center.
        
        Uses circuit breaker pattern to avoid
        cascading failures.
        """
        breaker = self._dc_health.get(dc_id)
        
        if breaker and breaker.is_open:
            return DataCenterStatus.OFFLINE
        
        try:
            # Send health check
            latency = await self._ping_dc(dc_id)
            
            if latency > 100:  # ms
                return DataCenterStatus.DEGRADED
            
            return DataCenterStatus.HEALTHY
            
        except Exception:
            # Record failure
            if breaker:
                breaker.record_failure()
            return DataCenterStatus.OFFLINE

Physical Resilience

PHYSICAL DATA CENTER RESILIENCE (VISA ASHBURN EXAMPLE)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    DATA CENTER PHYSICAL DESIGN                           β”‚
β”‚                                                                          β”‚
β”‚   POWER                                                                  β”‚
β”‚   ─────                                                                  β”‚
β”‚   β€’ 4 independent utility feeds                                          β”‚
β”‚   β€’ 4 x 1MW diesel generators                                            β”‚
β”‚   β€’ 24,000 gallons diesel (9 days runtime)                               β”‚
β”‚   β€’ Uninterruptible power supply (UPS) with battery backup               β”‚
β”‚   β€’ N+1 redundancy on all power systems                                  β”‚
β”‚                                                                          β”‚
β”‚   COOLING                                                                β”‚
β”‚   ───────                                                                β”‚
β”‚   β€’ 1.5 million gallon water storage tank                                β”‚
β”‚   β€’ Multiple chiller plants                                              β”‚
β”‚   β€’ Enough capacity to cool 300 homes                                    β”‚
β”‚   β€’ On-site well for emergency water                                     β”‚
β”‚                                                                          β”‚
β”‚   PHYSICAL SECURITY                                                      β”‚
β”‚   ─────────────────                                                      β”‚
β”‚   β€’ 18-inch reinforced concrete walls                                    β”‚
β”‚   β€’ Designed for 170 mph winds                                           β”‚
β”‚   β€’ Earthquake resistant                                                 β”‚
β”‚   β€’ Hydraulic bollards (stop 50 mph vehicles)                            β”‚
β”‚   β€’ Multi-layer biometric access                                         β”‚
β”‚   β€’ Only 75 employees cleared for data halls                             β”‚
β”‚                                                                          β”‚
β”‚   NETWORK                                                                β”‚
β”‚   ───────                                                                β”‚
β”‚   β€’ Multiple fiber routes from different carriers                        β”‚
β”‚   β€’ MPLS VPN (not public internet)                                       β”‚
β”‚   β€’ 10+ million miles of telecom network                                 β”‚
β”‚   β€’ Redundant connections to every major bank                            β”‚
β”‚                                                                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Deep Dive 4: Clearing and Settlement (Week 3 Concepts)

Interviewer: "Walk me through how money actually moves after authorization."

You: "Clearing and settlement is where the financial reality catches up with the real-time authorization. Let me explain the batch processing system..."

Clearing and Settlement Flow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    CLEARING AND SETTLEMENT TIMELINE                      β”‚
β”‚                                                                          β”‚
β”‚   T+0 (Authorization Day)                                                β”‚
β”‚   ───────────────────────                                                β”‚
β”‚   12:00 PM: Customer swipes card at merchant                             β”‚
β”‚             Authorization approved (real-time)                           β”‚
β”‚             Money NOT moved yet                                          β”‚
β”‚                                                                          β”‚
β”‚   T+0 (End of Day)                                                       β”‚
β”‚   ────────────────                                                       β”‚
β”‚   11:59 PM: Merchant batches all day's transactions                      β”‚
β”‚             Sends clearing file to acquirer                              β”‚
β”‚             Acquirer sends to network                                    β”‚
β”‚                                                                          β”‚
β”‚   T+1 (Clearing Day)                                                     β”‚
β”‚   ────────────────────                                                   β”‚
β”‚   2:00 AM:  Network processes clearing files                             β”‚
β”‚             Matches clearing to authorizations                           β”‚
β”‚             Calculates interchange fees                                  β”‚
β”‚             Nets positions across all banks                              β”‚
β”‚                                                                          β”‚
β”‚   6:00 AM:  Network sends settlement files to banks                      β”‚
β”‚             Each bank knows net debit or credit                          β”‚
β”‚                                                                          β”‚
β”‚   T+1/T+2 (Settlement)                                                   β”‚
β”‚   ──────────────────────                                                 β”‚
β”‚   9:00 AM:  Banks with net debit wire funds to network                   β”‚
β”‚             Network distributes to banks with net credit                 β”‚
β”‚             Usually via Fedwire or SWIFT                                 β”‚
β”‚                                                                          β”‚
β”‚   Result:   Merchant's bank account credited                             β”‚
β”‚             Customer's statement shows charge                            β”‚
β”‚             Interchange fee collected                                    β”‚
β”‚                                                                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Fee Calculation

INTERCHANGE FEE EXAMPLE

Transaction: $100 purchase at restaurant

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                                          β”‚
β”‚   Customer pays:              $100.00                                    β”‚
β”‚                                                                          β”‚
β”‚   Breakdown:                                                             β”‚
β”‚   β”œβ”€β”€ Merchant receives:      $97.50                                     β”‚
β”‚   β”‚                                                                      β”‚
β”‚   β”œβ”€β”€ Acquirer keeps:         $0.30  (processing fee)                    β”‚
β”‚   β”‚                                                                      β”‚
β”‚   β”œβ”€β”€ Network keeps:          $0.20  (scheme fee)                        β”‚
β”‚   β”‚                                                                      β”‚
β”‚   └── Issuer receives:        $2.00  (interchange fee)                   β”‚
β”‚                                                                          β”‚
β”‚   Interchange varies by:                                                 β”‚
β”‚   β€’ Card type (credit vs debit, premium vs standard)                     β”‚
β”‚   β€’ Merchant category (restaurant, grocery, gas station)                 β”‚
β”‚   β€’ Transaction type (card present vs card not present)                  β”‚
β”‚   β€’ Risk level (chip vs swipe vs online)                                 β”‚
β”‚                                                                          β”‚
β”‚   Typical range: 1.5% - 3.5% of transaction                              β”‚
β”‚                                                                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Implementation

# Clearing and Settlement Service
# Applies: Week 3 (Messaging, Batch Processing)

from dataclasses import dataclass
from typing import List, Dict
from decimal import Decimal
from datetime import datetime, date
import asyncio


@dataclass
class ClearingRecord:
    """Clearing record submitted by acquirer."""
    transaction_id: str
    authorization_code: str
    pan_hash: str
    amount: Decimal
    currency: str
    merchant_id: str
    mcc: str
    acquirer_bin: str
    issuer_bin: str
    transaction_date: date
    clearing_date: date


@dataclass
class InterchangeFee:
    """Calculated interchange fee."""
    transaction_id: str
    rate: Decimal  # Percentage
    fixed_fee: Decimal  # Fixed amount
    total_fee: Decimal
    fee_category: str


@dataclass
class SettlementPosition:
    """Net settlement position for a bank."""
    bank_id: str
    net_amount: Decimal  # Positive = receives, Negative = pays
    transaction_count: int
    interchange_received: Decimal
    interchange_paid: Decimal


class ClearingService:
    """
    Process clearing files and prepare settlement.
    
    Applies: Week 3, Day 1 - Batch vs Stream processing
    
    This is primarily batch processing:
    - Runs daily after clearing cutoff
    - Processes millions of records
    - Must be idempotent (can re-run safely)
    """
    
    def __init__(
        self,
        interchange_calculator: 'InterchangeCalculator',
        settlement_service: 'SettlementService'
    ):
        self.interchange = interchange_calculator
        self.settlement = settlement_service
    
    async def process_clearing_batch(
        self, 
        clearing_records: List[ClearingRecord]
    ) -> List[SettlementPosition]:
        """
        Process a batch of clearing records.
        
        Steps:
        1. Validate each record against authorization
        2. Calculate interchange fees
        3. Net positions across all banks
        4. Generate settlement file
        """
        # Group by acquirer and issuer
        by_acquirer: Dict[str, List[ClearingRecord]] = {}
        by_issuer: Dict[str, List[ClearingRecord]] = {}
        
        for record in clearing_records:
            # Validate against authorization
            if not await self._validate_against_auth(record):
                continue
            
            # Calculate interchange
            fee = self.interchange.calculate(record)
            
            # Add to acquirer's batch
            if record.acquirer_bin not in by_acquirer:
                by_acquirer[record.acquirer_bin] = []
            by_acquirer[record.acquirer_bin].append(record)
            
            # Add to issuer's batch
            if record.issuer_bin not in by_issuer:
                by_issuer[record.issuer_bin] = []
            by_issuer[record.issuer_bin].append(record)
        
        # Calculate net positions
        positions = self._calculate_net_positions(
            by_acquirer, by_issuer
        )
        
        # Send to settlement
        await self.settlement.initiate_settlement(positions)
        
        return positions
    
    async def _validate_against_auth(
        self, 
        record: ClearingRecord
    ) -> bool:
        """
        Validate clearing record matches authorization.
        
        Checks:
        - Authorization exists
        - Amounts match (within tolerance)
        - Not already cleared
        - Within clearing window
        """
        # Look up original authorization
        # Compare amounts (allow small differences for tips)
        # Ensure not duplicate
        return True
    
    def _calculate_net_positions(
        self,
        by_acquirer: Dict[str, List[ClearingRecord]],
        by_issuer: Dict[str, List[ClearingRecord]]
    ) -> List[SettlementPosition]:
        """
        Calculate net settlement position for each bank.
        
        Acquirers: Pay out transaction amounts, receive from merchants
        Issuers: Receive interchange, pay transaction amounts
        
        Net it all together so each bank has single debit/credit.
        """
        positions: Dict[str, SettlementPosition] = {}
        
        # Process acquirer side (they owe the transaction amounts)
        for acquirer_bin, records in by_acquirer.items():
            total_amount = sum(r.amount for r in records)
            # Acquirer owes this amount
            if acquirer_bin not in positions:
                positions[acquirer_bin] = SettlementPosition(
                    bank_id=acquirer_bin,
                    net_amount=Decimal(0),
                    transaction_count=0,
                    interchange_received=Decimal(0),
                    interchange_paid=Decimal(0)
                )
            positions[acquirer_bin].net_amount -= total_amount
            positions[acquirer_bin].transaction_count += len(records)
        
        # Process issuer side (they receive the transaction amounts)
        for issuer_bin, records in by_issuer.items():
            total_amount = sum(r.amount for r in records)
            total_interchange = sum(
                self.interchange.calculate(r).total_fee 
                for r in records
            )
            
            if issuer_bin not in positions:
                positions[issuer_bin] = SettlementPosition(
                    bank_id=issuer_bin,
                    net_amount=Decimal(0),
                    transaction_count=0,
                    interchange_received=Decimal(0),
                    interchange_paid=Decimal(0)
                )
            
            # Issuer receives: amount minus interchange
            positions[issuer_bin].net_amount += (
                total_amount - total_interchange
            )
            positions[issuer_bin].interchange_received += total_interchange
        
        return list(positions.values())


class InterchangeCalculator:
    """
    Calculate interchange fees based on transaction characteristics.
    
    Interchange varies by:
    - Card type (credit, debit, premium, corporate)
    - Merchant category code
    - Transaction type (card present, CNP, recurring)
    - Geographic factors
    """
    
    def __init__(self):
        # Load interchange rate tables
        # Visa and Mastercard publish these twice yearly
        self._rate_tables = self._load_rate_tables()
    
    def calculate(self, record: ClearingRecord) -> InterchangeFee:
        """Calculate interchange fee for a transaction."""
        
        # Look up rate based on characteristics
        rate_info = self._lookup_rate(
            card_type=self._get_card_type(record.pan_hash),
            mcc=record.mcc,
            card_present=True,  # Derived from clearing data
            transaction_type="purchase"
        )
        
        # Calculate fee
        percentage_fee = record.amount * rate_info["rate"]
        fixed_fee = Decimal(rate_info["fixed"])
        total = percentage_fee + fixed_fee
        
        return InterchangeFee(
            transaction_id=record.transaction_id,
            rate=rate_info["rate"],
            fixed_fee=fixed_fee,
            total_fee=total,
            fee_category=rate_info["category"]
        )
    
    def _lookup_rate(
        self,
        card_type: str,
        mcc: str,
        card_present: bool,
        transaction_type: str
    ) -> Dict:
        """Look up interchange rate from tables."""
        # Complex logic based on Visa/Mastercard published rates
        # Example rates:
        return {
            "rate": Decimal("0.0185"),  # 1.85%
            "fixed": Decimal("0.10"),   # 10 cents
            "category": "standard_credit_purchase"
        }
    
    def _load_rate_tables(self) -> Dict:
        """Load interchange rate tables."""
        # Published by Visa and Mastercard
        return {}


class SettlementService:
    """
    Execute settlement between banks.
    
    Settlement happens via:
    - Fedwire (US domestic)
    - SWIFT (International)
    - Central bank systems
    """
    
    async def initiate_settlement(
        self, 
        positions: List[SettlementPosition]
    ) -> None:
        """
        Initiate settlement transfers.
        
        Process:
        1. Send debit instructions to banks that owe
        2. Wait for funds to arrive in clearing account
        3. Send credit instructions to banks that receive
        """
        # Banks that owe (negative position)
        debits = [p for p in positions if p.net_amount < 0]
        
        # Banks that receive (positive position)
        credits = [p for p in positions if p.net_amount > 0]
        
        # Request debits first
        for position in debits:
            await self._request_debit(
                position.bank_id, 
                abs(position.net_amount)
            )
        
        # Wait for funds to arrive (usually within hour)
        await self._wait_for_funds()
        
        # Send credits
        for position in credits:
            await self._send_credit(
                position.bank_id,
                position.net_amount
            )
    
    async def _request_debit(
        self, 
        bank_id: str, 
        amount: Decimal
    ) -> None:
        """Request debit from bank via Fedwire."""
        # Send Fedwire 1031 drawdown request
        pass
    
    async def _send_credit(
        self, 
        bank_id: str,
        amount: Decimal
    ) -> None:
        """Send credit to bank via Fedwire."""
        # Send Fedwire funds transfer
        pass

Phase 5: Scaling and Edge Cases (5 minutes)

Interviewer: "How would this system scale to 10x the current load?"

Scaling Strategy

You: "The system is designed for horizontal scaling at multiple layers..."

SCALING TO 10X (650,000 TPS)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    SCALING STRATEGY                                      β”‚
β”‚                                                                          β”‚
β”‚  CURRENT                    10X SCALE                                    β”‚
β”‚  ───────                    ─────────                                    β”‚
β”‚                                                                          β”‚
β”‚  Message Gateways:          Message Gateways:                            β”‚
β”‚  100 nodes                  1,000 nodes                                  β”‚
β”‚  (Stateless, add more)      (Same architecture)                          β”‚
β”‚                                                                          β”‚
β”‚  Router/Switch:             Router/Switch:                               β”‚
β”‚  50 nodes                   500 nodes                                    β”‚
β”‚  (BIN tables fit in RAM)    (Partition by BIN range)                     β”‚
β”‚                                                                          β”‚
β”‚  Fraud Detection:           Fraud Detection:                             β”‚
β”‚  200 nodes                  2,000 nodes                                  β”‚
β”‚  (Model inference)          (More GPU nodes)                             β”‚
β”‚                                                                          β”‚
β”‚  Redis Cluster:             Redis Cluster:                               β”‚
β”‚  50 shards                  500 shards                                   β”‚
β”‚  (Velocity data)            (Re-shard by PAN hash)                       β”‚
β”‚                                                                          β”‚
β”‚  Data Centers:              Data Centers:                                β”‚
β”‚  4 active                   7+ active                                    β”‚
β”‚                             (Add APAC, LATAM capacity)                   β”‚
β”‚                                                                          β”‚
β”‚  Issuer Connections:        Issuer Connections:                          β”‚
β”‚  15,000 banks               Same (banks add capacity)                    β”‚
β”‚                                                                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Edge Cases

Interviewer: "What are some edge cases we should handle?"

Edge Case 1: Duplicate Authorization Requests

Scenario: Acquirer retries due to timeout, but original succeeded

Problem:
  Customer could be double-charged

Solution (Week 2, Day 2 - Idempotency):
  β€’ Every auth request has unique transaction_id
  β€’ Cache auth responses for 24 hours
  β€’ Return cached response on duplicate request
  β€’ Idempotency key = acquirer_id + transaction_id + timestamp

Edge Case 2: Authorization/Clearing Mismatch

Scenario: Clearing amount differs from authorization

Examples:
  β€’ Restaurant: Auth $50, Clearing $60 (tip added)
  β€’ Gas station: Auth $100 (hold), Clearing $42 (actual pump)
  β€’ Hotel: Auth $500, Clearing $650 (incidentals)

Solution:
  β€’ Allow clearing within tolerance of auth
  β€’ Partial clearing allowed
  β€’ Over-tolerance triggers issuer notification
  β€’ Some MCCs have special rules (gas: auth $1, clear actual)

Edge Case 3: Issuer Timeout During High Volume

Scenario: Black Friday, issuer can't keep up

Problem:
  β€’ Issuer latency increases from 200ms to 5s
  β€’ Customers abandon purchases
  β€’ Merchant loses sales

Solution (Stand-In Processing):
  β€’ Detect issuer degradation (latency > threshold)
  β€’ Switch to stand-in mode for that issuer
  β€’ Apply issuer-defined rules:
    - Single txn limit: $500
    - Velocity limit: 10 txn/hour
    - Decline if card flagged
  β€’ Queue for later reconciliation
  β€’ Issuer accepts liability for stand-in approvals

Failure Scenarios

Failure Detection Impact Recovery
Data center outage Health checks, latency Route to backup DC Automatic failover < 30s
Redis cluster failure Connection errors Velocity checks fail Fall back to conservative limits
Fraud model timeout Latency > 1ms Transaction delays Bypass fraud (use stand-in rules)
Issuer unreachable Timeouts Can't authorize Stand-in processing
Network partition Split-brain detection Inconsistent state Prefer availability, reconcile later

Phase 6: Monitoring and Operations

Interviewer: "How would you monitor this system in production?"

Key Metrics

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    MONITORING DASHBOARD                                  β”‚
β”‚                                                                          β”‚
β”‚  AUTHORIZATION HEALTH                                                    β”‚
β”‚  β”œβ”€β”€ TPS current:            45,231 /sec    [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘] 70%   β”‚
β”‚  β”œβ”€β”€ Auth latency p99:       127ms          [β–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘] OK    β”‚
β”‚  β”œβ”€β”€ Approval rate:          96.2%          [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘] OK    β”‚
β”‚  └── Stand-in rate:          0.3%           [β–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘] OK    β”‚
β”‚                                                                          β”‚
β”‚  FRAUD DETECTION                                                         β”‚
β”‚  β”œβ”€β”€ Scoring latency p99:    0.8ms          [β–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘] OK    β”‚
β”‚  β”œβ”€β”€ High-risk flagged:      0.5%           [β–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘] OK    β”‚
β”‚  └── False positive rate:    0.08%          [β–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘] OK    β”‚
β”‚                                                                          β”‚
β”‚  DATA CENTER HEALTH                                                      β”‚
β”‚  β”œβ”€β”€ Ashburn:                βœ“ Healthy      [CPU: 45% | Mem: 62%]        β”‚
β”‚  β”œβ”€β”€ Denver:                 βœ“ Healthy      [CPU: 38% | Mem: 58%]        β”‚
β”‚  β”œβ”€β”€ London:                 βœ“ Healthy      [CPU: 52% | Mem: 65%]        β”‚
β”‚  └── Singapore:              βœ“ Healthy      [CPU: 41% | Mem: 55%]        β”‚
β”‚                                                                          β”‚
β”‚  ISSUER CONNECTIVITY                                                     β”‚
β”‚  β”œβ”€β”€ Connected issuers:      14,892 / 15,000                             β”‚
β”‚  β”œβ”€β”€ Degraded issuers:       23                                          β”‚
β”‚  └── Offline issuers:        5                                           β”‚
β”‚                                                                          β”‚
β”‚  CLEARING & SETTLEMENT                                                   β”‚
β”‚  β”œβ”€β”€ Pending clearing:       2.3M records                                β”‚
β”‚  β”œβ”€β”€ Settlement status:      T+1 complete                                β”‚
β”‚  └── Unmatched auths:        0.02%                                       β”‚
β”‚                                                                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Alerting Strategy

CRITICAL (PagerDuty, immediate):
  β€’ Authorization latency p99 > 500ms
  β€’ Approval rate drop > 5% in 5 minutes
  β€’ Data center offline
  β€’ Stand-in rate > 5%
  β€’ Security breach detected

WARNING (Slack, 15 min):
  β€’ Authorization latency p99 > 200ms
  β€’ Single issuer degraded > 5 minutes
  β€’ Fraud score latency > 2ms
  β€’ Clearing match rate < 99%

INFO (Dashboard only):
  β€’ TPS fluctuations
  β€’ Routine issuer timeouts
  β€’ Scheduled maintenance

Runbook: High Authorization Latency

RUNBOOK: Authorization Latency Spike

SYMPTOMS:
  β€’ p99 latency > 200ms
  β€’ Customer complaints about slow checkout
  β€’ Acquirer timeout rates increasing

DIAGNOSIS:
  1. Check issuer latency breakdown:
     $ auth-latency-breakdown --last 5m
     
  2. Identify slow issuers:
     $ issuer-latency-report --threshold 500ms
     
  3. Check fraud scoring latency:
     $ fraud-latency-percentiles --last 5m
     
  4. Check DC health:
     $ dc-health-status --all

RESOLUTION:
  If single issuer slow:
    1. Enable stand-in for that issuer
    2. Alert issuer operations team
    3. Monitor stand-in approval quality
    
  If fraud scoring slow:
    1. Check model serving cluster health
    2. Scale up GPU nodes if needed
    3. Consider bypassing fraud for low-risk
    
  If DC overloaded:
    1. Shift traffic to backup DC
    2. Scale up compute in affected DC
    3. Investigate traffic spike cause

ESCALATION:
  β€’ Network Operations Center (NOC)
  β€’ Risk Operations Center (ROC) if fraud-related
  β€’ Issuer relations if bank-specific

Interview Conclusion

Interviewer: "Excellent work. You've demonstrated strong understanding of payment systems, handled the scale requirements well, and made good trade-off decisions around consistency and availability. Any questions for me?"

You: "Thank you! I'd love to understand how Visa actually handles the transition when a card network launches a new data center. How do you migrate traffic without impacting transactions?"

Interviewer: "Great question. We typically..."


Summary: Concepts Applied from 10-Week Course

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                                          β”‚
β”‚          CONCEPTS FROM 10-WEEK COURSE IN VISA/MASTERCARD DESIGN          β”‚
β”‚                                                                          β”‚
β”‚  WEEK 1: DATA AT SCALE                                                   β”‚
β”‚  β”œβ”€β”€ Partitioning: BIN-based routing for O(1) issuer lookup              β”‚
β”‚  β”œβ”€β”€ Replication: Multi-DC sync/async for availability                   β”‚
β”‚  β”œβ”€β”€ Hot Keys: Velocity tracking with sharded Redis                      β”‚
β”‚  └── WAL: Transaction durability before response                         β”‚
β”‚                                                                          β”‚
β”‚  WEEK 2: FAILURE-FIRST DESIGN                                            β”‚
β”‚  β”œβ”€β”€ Timeouts: Tiered timeout strategy for issuer calls                  β”‚
β”‚  β”œβ”€β”€ Idempotency: Transaction IDs prevent double-charging                β”‚
β”‚  β”œβ”€β”€ Circuit Breakers: Data center health monitoring                     β”‚
β”‚  └── Stand-In: Graceful degradation when issuer unavailable              β”‚
β”‚                                                                          β”‚
β”‚  WEEK 3: MESSAGING & ASYNC                                               β”‚
β”‚  β”œβ”€β”€ Batch Processing: Clearing runs as nightly batch                    β”‚
β”‚  β”œβ”€β”€ Transactional Outbox: Settlement file generation                    β”‚
β”‚  └── Event Sourcing: Transaction log as source of truth                  β”‚
β”‚                                                                          β”‚
β”‚  WEEK 4: CACHING                                                         β”‚
β”‚  β”œβ”€β”€ In-Memory: BIN tables for sub-microsecond lookup                    β”‚
β”‚  └── Velocity Cache: Redis for fraud detection features                  β”‚
β”‚                                                                          β”‚
β”‚  WEEK 5: CONSISTENCY                                                     β”‚
β”‚  β”œβ”€β”€ Eventually Consistent: Cross-region replication                     β”‚
β”‚  β”œβ”€β”€ Strong Consistency: Regional sync replication                       β”‚
β”‚  └── Conflict Resolution: Auth/clearing reconciliation                   β”‚
β”‚                                                                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                                          β”‚
β”‚          WHY VISA/MASTERCARD IS AN ENGINEERING MARVEL                    β”‚
β”‚                                                                          β”‚
β”‚  SCALE                                                                   β”‚
β”‚  ─────                                                                   β”‚
β”‚  β€’ 65,000 TPS sustained capacity                                         β”‚
β”‚  β€’ 500+ billion transactions per year                                    β”‚
β”‚  β€’ $28+ trillion in payment volume                                       β”‚
β”‚  β€’ 200+ countries, 160+ currencies                                       β”‚
β”‚                                                                          β”‚
β”‚  RELIABILITY                                                             β”‚
β”‚  ───────────                                                             β”‚
β”‚  β€’ 99.9999% uptime (32 seconds/year downtime)                            β”‚
β”‚  β€’ Zero tolerance for double-charging                                    β”‚
β”‚  β€’ Global redundancy across 7 data centers                               β”‚
β”‚  β€’ Automatic failover in < 30 seconds                                    β”‚
β”‚                                                                          β”‚
β”‚  SPEED                                                                   β”‚
β”‚  ─────                                                                   β”‚
β”‚  β€’ < 2 second end-to-end authorization                                   β”‚
β”‚  β€’ < 1 millisecond fraud scoring                                         β”‚
β”‚  β€’ < 200ms network processing                                            β”‚
β”‚  β€’ Real-time across the globe                                            β”‚
β”‚                                                                          β”‚
β”‚  SECURITY                                                                β”‚
β”‚  ────────                                                                β”‚
β”‚  β€’ $40+ billion fraud prevented annually (Visa alone)                    β”‚
β”‚  β€’ 500+ attributes analyzed per transaction                              β”‚
β”‚  β€’ AI models trained on billions of transactions                         β”‚
β”‚  β€’ 22 billion security events monitored daily                            β”‚
β”‚                                                                          β”‚
β”‚  KEY LESSONS                                                             β”‚
β”‚  ───────────                                                             β”‚
β”‚  1. Latency is king - every millisecond matters at scale                 β”‚
β”‚  2. Availability > consistency for auth (reconcile later)                β”‚
β”‚  3. In-memory processing for critical path                               β”‚
β”‚  4. Redundancy at every layer - backups have backups                     β”‚
β”‚  5. Stand-in processing: graceful degradation over failure               β”‚
β”‚                                                                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Sources

Official Documentation:

Statistics and Data:

Architecture and Technical:

Security and Fraud:

Payment Industry:


Further Reading

Official Documentation:

Engineering Talks (Highly Recommended):

  • Visa Technology - Various talks on VisaNet architecture at tech conferences
  • Stripe Engineering - Payment infrastructure insights (complementary perspective)

Books:

  • "Designing Data-Intensive Applications" by Martin Kleppmann - Chapters on replication, partitioning, and consistency
  • "System Design Interview Vol 2" by Alex Xu - Payment system design patterns
  • "Building Microservices" by Sam Newman - Distributed systems patterns applicable to payment networks

Related Systems to Study:

  • SWIFT Network: International bank messaging (different scale, complementary)
  • India's UPI: Modern payment rail with different architecture (see Bonus Problem 1)
  • China's UnionPay: Largest card network by transaction count
  • RTP/FedNow: Real-time payment rails (newer, different model)

Self-Assessment Checklist

After studying this design, you should be able to:

  • Explain the four-party model and role of payment networks
  • Design a system handling 65,000+ TPS with < 200ms latency
  • Implement real-time fraud detection in < 1 millisecond
  • Describe ISO 8583 message format and key fields
  • Design multi-data-center architecture for 99.9999% uptime
  • Explain authorization vs clearing vs settlement
  • Calculate interchange fees and net settlement positions
  • Implement idempotency for financial transactions
  • Design stand-in processing for issuer unavailability
  • Explain trade-offs between consistency and availability in payments

This case study demonstrates how Visa and Mastercard built the world's largest real-time financial system, processing $28+ trillion annually with millisecond-level fraud detection and six-nines availability.