Bonus Problem 4: Visa/Mastercard Global Payment Network
The Invisible Infrastructure Moving $28 Trillion Annually
π― 65,000 Transactions Per Second β Every Second, Every Day
Imagine building a system where a farmer in rural Nebraska can swipe a card at 2 AM, and within 2 seconds, a bank in Tokyo has approved the transaction, a merchant in Dubai has been guaranteed payment, and fraud detection AI has analyzed 500+ risk attributes β all while maintaining 99.9999% uptime.
Now imagine this happens 65,000 times. Per second. Across 200+ countries. In 160+ currencies. With $0 tolerance for double-charging.
This is VisaNet and Mastercard's payment network β the largest real-time financial system ever built, processing over $28 trillion annually and touching nearly every card transaction on Earth.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β THE VISA/MASTERCARD SCALE (2025) β
β β
β VOLUME β
β ββββββ β
β Visa Transactions: 293 Billion/year (FY2024) β
β Mastercard Transactions: 197 Billion/year (FY2024) β
β Combined Daily Volume: 1.3+ Billion transactions/day β
β Peak Capacity: 65,000+ messages/second β
β β
β SCALE β
β βββββ β
β Visa Cards: 4.8 Billion credentials globally β
β Mastercard Cards: 3.16 Billion cards worldwide β
β Merchant Locations: 150+ Million (Visa) β
β Countries: 200+ territories β
β Currencies: 160+ supported β
β β
β PERFORMANCE β
β βββββββββββ β
β Authorization Latency: < 2 seconds end-to-end β
β Fraud Detection: < 1 millisecond per transaction β
β Network Uptime: 99.9999% (six 9s) β
β Fraud Prevented (Visa): $40+ Billion/year β
β β
β FINANCIAL β
β βββββββββ β
β Visa Payment Volume: $16+ Trillion/year β
β Combined Volume: $28+ Trillion/year β
β Visa Revenue: $35.9 Billion (FY2024) β
β Mastercard Revenue: $28.2 Billion (FY2024) β
β β
β INFRASTRUCTURE β
β ββββββββββββββ β
β Visa Data Centers: 7 globally synchronized β
β Security Events/Day: 22 Billion monitored β
β AI Risk Attributes: 500+ analyzed per transaction β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
This is the system we'll design today β and understand how the world's commerce backbone actually works.
The Interview Begins
You walk into the interview room. The interviewer smiles and gestures to the whiteboard.
Interviewer: "Thanks for coming in. Today we're going to design a global card payment network β something like Visa or Mastercard. I want to understand how you'd architect a system that can authorize a payment in under 2 seconds, anywhere in the world, while preventing billions in fraud. Ready?"
They write on the whiteboard:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β Design a Global Payment Network β
β β
β A credit/debit card network connecting cardholders, merchants, β
β issuing banks, and acquiring banks globally. β
β β
β Requirements: β
β β’ Process 50,000+ transactions per second peak β
β β’ Authorization response in < 2 seconds β
β β’ Real-time fraud detection with < 0.1% false positive rate β
β β’ Support 200+ countries and 160+ currencies β
β β’ 99.999%+ availability (five 9s minimum) β
β β’ Never double-charge, never lose transactions β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Interviewer: "Take a few minutes to think about this, then walk me through your approach. We have about 45 minutes."
Phase 1: Requirements Clarification (5 minutes)
Before diving in, you take a breath and start asking questions. This is crucial β never assume.
Your Questions
You: "Before I start designing, I'd like to clarify a few requirements. First, can you describe the basic flow? Who are the participants in a card transaction?"
Interviewer: "Great question. It's a four-party model: the cardholder who swipes their card, the merchant who accepts the payment, the acquiring bank that works with the merchant, and the issuing bank that issued the card to the cardholder. Our network sits in the middle, routing messages between the acquirer and issuer."
You: "So the network doesn't hold funds or issue cards β it's purely a message routing and processing layer?"
Interviewer: "Exactly. We route authorization requests, handle clearing and settlement, and provide value-added services like fraud detection. The actual money movement happens between banks."
You: "What's the latency budget for authorization? And how is it distributed?"
Interviewer: "End-to-end from card swipe to response should be under 2 seconds. But the merchant terminal, acquirer systems, and issuer systems all consume time. Our network portion should be well under 200 milliseconds β ideally much faster."
You: "For fraud detection, how much latency can we add to the critical path?"
Interviewer: "It must be inline with authorization β no more than 1-2 milliseconds. We can't slow down legitimate transactions."
You: "What happens if the issuing bank is unreachable during authorization?"
Interviewer: "Good question. We need 'stand-in processing' β the ability to make authorization decisions on behalf of an unreachable issuer within pre-agreed parameters."
You: "One more β what's the settlement timeline?"
Interviewer: "Clearing happens within hours of the transaction. Settlement β the actual money movement β typically happens T+1 to T+2 through banking systems. But that's separate from the real-time authorization path."
Functional Requirements
1. AUTHORIZATION (Real-time)
β’ Route authorization requests from acquirer to issuer
β’ Return approve/decline within 200ms network time
β’ Support stand-in processing when issuer unreachable
β’ Handle multiple card types: credit, debit, prepaid
2. CLEARING (Near real-time)
β’ Collect transaction details from acquirers
β’ Calculate interchange fees per transaction
β’ Prepare settlement positions for all parties
β’ Handle adjustments, chargebacks, reversals
3. SETTLEMENT (Batch)
β’ Calculate net positions between all banks
β’ Facilitate fund transfers between banks
β’ Generate settlement reports and reconciliation
4. FRAUD DETECTION (Real-time)
β’ Score every transaction for fraud risk
β’ Provide risk score to issuer for decision
β’ Detect patterns: velocity, geography, merchant type
β’ Block enumeration attacks (card testing)
5. VALUE-ADDED SERVICES
β’ Tokenization for secure card storage
β’ 3D Secure for online authentication
β’ Currency conversion
β’ Loyalty and rewards integration
Non-Functional Requirements
1. SCALE
β’ 65,000 transactions per second peak capacity
β’ 500+ billion transactions per year
β’ 200+ countries, 160+ currencies
β’ Millions of connected endpoints (banks, processors)
2. LATENCY
β’ Authorization: < 200ms network portion (p99)
β’ Fraud scoring: < 1ms per transaction
β’ End-to-end: < 2 seconds including all parties
3. AVAILABILITY
β’ 99.9999% uptime (32 seconds downtime/year max)
β’ Zero data loss for financial transactions
β’ Geographic redundancy across continents
4. CONSISTENCY
β’ Exactly-once processing for authorizations
β’ No double-charging under any failure scenario
β’ Full audit trail for every transaction
5. SECURITY
β’ PCI-DSS Level 1 compliance
β’ End-to-end encryption
β’ Hardware security modules for keys
β’ Real-time threat detection
Phase 2: Back of the Envelope Estimation (5 minutes)
You: "Let me work through the numbers to understand the scale we're dealing with."
Transaction Volume Estimation
TRANSACTION VOLUME
Base numbers (2024/2025 actual):
Visa annual transactions: 293 billion
Mastercard annual transactions: 197 billion
Combined: ~490 billion/year
Daily calculation:
490B / 365 days = ~1.34 billion/day
Per hour average: ~56 million/hour
Per second average: ~15,500/second
Peak calculation:
Peak events: Black Friday, Cyber Monday, Singles Day
Peak multiplier: ~4x average
Peak TPS: ~62,000/second
Design capacity (headroom): ~100,000/second
Message Size Estimation
ISO 8583 MESSAGE SIZE
Authorization request fields:
Message Type Indicator: 4 bytes
Bitmap: 16 bytes
Primary Account Number (PAN): 19 bytes
Processing Code: 6 bytes
Transaction Amount: 12 bytes
Transmission DateTime: 10 bytes
Merchant Category Code: 4 bytes
Acquiring Institution ID: 11 bytes
Card Acceptor Terminal ID: 8 bytes
Additional data fields: ~200 bytes
Typical message size: 300-500 bytes
With encryption overhead: ~600 bytes
Bandwidth calculation:
100K TPS Γ 600 bytes Γ 2 (req+resp)
= ~120 MB/second
= ~1 Gbps per data center
With replication (3x): ~3 Gbps network capacity
Storage Estimation
STORAGE FOR TRANSACTION RECORDS
Per transaction stored:
Authorization record: ~1 KB
Clearing record: ~2 KB
Settlement details: ~500 bytes
Total per transaction: ~3.5 KB
Daily storage:
1.34B transactions Γ 3.5 KB = ~4.7 TB/day
Annual storage:
4.7 TB Γ 365 = ~1.7 PB/year
With 7-year retention: ~12 PB
With replication (3x): ~36 PB total storage
Key Metrics Summary
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ESTIMATION SUMMARY β
β β
β TRAFFIC β
β βββ Peak transactions: 100,000 /second (design capacity) β
β βββ Daily transactions: 1.34 billion β
β βββ Annual transactions: 490 billion β
β β
β STORAGE β
β βββ Per day: 4.7 TB β
β βββ Per year: 1.7 PB β
β βββ 7-year retention: ~36 PB (with replication) β
β β
β BANDWIDTH β
β βββ Per data center: ~1 Gbps sustained β
β βββ Peak burst: ~5 Gbps β
β β
β INFRASTRUCTURE (rough) β
β βββ Data centers: 4-7 globally synchronized β
β βββ Connected banks: 15,000+ financial institutions β
β βββ Network coverage: 10+ million miles of telecom β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Phase 3: High-Level Design (10 minutes)
You: "Now let me sketch out the high-level architecture. The key insight is that this is a four-party model with the payment network in the center."
The Four-Party Model
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β THE FOUR-PARTY MODEL β
β β
β β
β CARDHOLDER MERCHANT β
β βββββββββββββββ βββββββββββββββ β
β β β 1. Swipes/Taps card β β β
β β Consumer β βββββββββββββββββββββββββββββββββΆ β Retailer β β
β β β β β β
β ββββββββ¬βββββββ ββββββββ¬βββββββ β
β β β β
β β Has card from β Has account β
β β β with β
β βΌ βΌ β
β βββββββββββββββ βββββββββββββββ β
β β β 4. Settlement ($$$) β β β
β β ISSUING β βββββββββββββββββββββββββββββββββ β ACQUIRING β β
β β BANK β β BANK β β
β β β β β β
β ββββββββ¬βββββββ ββββββββ¬βββββββ β
β β β β
β β 3. Auth Response 2. Auth Request β β
β β β β
β β βββββββββββββββββββββββ β β
β ββββββββββΆβ βββββββββββββββββββ β
β β PAYMENT NETWORK β β
β β (Visa/Mastercard) β β
β β β β
β βββββββββββββββββββββββ β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
System Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β HIGH-LEVEL ARCHITECTURE β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ACQUIRING SIDE β β
β β β β
β β βββββββββββ βββββββββββ βββββββββββββββββββ β β
β β β POS βββββΆβ AcquirerβββββΆβ Acquirer β β β
β β βTerminal β βProcessorβ β Gateway β β β
β β βββββββββββ βββββββββββ ββββββββββ¬βββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β PAYMENT NETWORK CORE β β
β β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Message Gateway β β β
β β β (ISO 8583 Protocol Handler) β β β
β β βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ β β
β β β β β
β β ββββββββββββββββΌβββββββββββββββ β β
β β βΌ βΌ βΌ β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββ β β
β β β Router/ β β Fraud β β Stand-In β β β
β β β Switch β β Detection β β Processing β β β
β β ββββββββ¬βββββββ ββββββββ¬βββββββ βββββββββββββββ β β
β β β β β β
β β βΌ βΌ β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Authorization Processing β β β
β β β (Transaction Validation, Enrichment) β β β
β β βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ β β
β β β β β
β β βββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββ β β
β β β β β β
β β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β β β
β β β βPostgreSQLβ β Redis β β Kafka β β Analyticsβ β β β
β β β β (Records)β β (Cache) β β (Events) β β (OLAP) β β β β
β β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β β β
β β β Data Layer β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ISSUING SIDE β β
β β β β
β β βββββββββββββββββββ βββββββββββ βββββββββββ β β
β β β Issuer βββββΆβ Issuer βββββΆβ Core β β β
β β β Gateway β βProcessorβ β Banking β β β
β β βββββββββββββββββββ βββββββββββ βββββββββββ β β
β β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Component Breakdown
You: "Let me walk through each major component..."
1. Message Gateway
Purpose: Protocol handling for ISO 8583 messages from connected banks and processors.
Key responsibilities:
- Parse and validate incoming ISO 8583 messages
- Handle multiple message versions and network variations
- Encrypt/decrypt sensitive data fields
- Route to appropriate internal services
Technology choice: Custom high-performance message parser, likely C++ or specialized hardware for latency-critical path.
2. Router/Switch
Purpose: Route authorization requests to the correct issuing bank based on card BIN (Bank Identification Number).
Key responsibilities:
- BIN lookup to identify issuing bank
- Select optimal route (primary, fallback)
- Load balance across issuer connections
- Detect and route around failures
3. Fraud Detection Engine
Purpose: Score every transaction for fraud risk in real-time.
Key responsibilities:
- Analyze 500+ risk attributes per transaction
- Return risk score within 1 millisecond
- Feed scores to issuer for decision support
- Detect velocity patterns, geographic anomalies
4. Stand-In Processing
Purpose: Make authorization decisions when issuer is unreachable.
Key responsibilities:
- Maintain issuer-defined rules and limits
- Track card-level velocity and spending
- Approve/decline within pre-agreed parameters
- Queue transactions for later issuer reconciliation
5. Clearing & Settlement Engine
Purpose: Process non-real-time clearing and facilitate settlement.
Key responsibilities:
- Collect and validate clearing records
- Calculate interchange fees per transaction
- Net settlement positions across banks
- Generate settlement files and reports
Data Flow
You: "Let me trace through a typical authorization flow..."
AUTHORIZATION FLOW (< 2 seconds end-to-end)
Step 1: Card Presented (0-200ms)
Cardholder βββΆ POS Terminal βββΆ Acquirer Processor
β’ Card data captured
β’ PIN/CVV validated locally
β’ ISO 8583 message constructed
Step 2: Acquirer to Network (200-400ms)
Acquirer βββΆ Message Gateway βββΆ Router
β’ Message validated and parsed
β’ BIN lookup to identify issuer
β’ Transaction enriched (merchant data)
Step 3: Fraud Scoring (400-401ms)
Router βββΆ Fraud Engine βββΆ Router
β’ 500+ attributes analyzed
β’ Risk score calculated
β’ Score attached to message
Step 4: Issuer Authorization (401-1400ms)
Router βββΆ Issuer Gateway βββΆ Issuer Core Banking
β’ Credit limit checked
β’ Fraud rules evaluated
β’ Authorize/Decline decision
Step 5: Response Return (1400-1800ms)
Issuer βββΆ Network βββΆ Acquirer βββΆ POS
β’ Auth code generated
β’ Response transmitted back
β’ Receipt printed
Step 6: Clearing (Later, batch)
Acquirer βββΆ Network βββΆ Issuer
β’ Final transaction details
β’ Interchange calculated
β’ Disputes window opens
Step 7: Settlement (T+1 to T+2)
Network calculates net positions
Banks transfer funds
Merchant credited
Phase 4: Deep Dives (20 minutes)
Interviewer: "Great high-level design. Let's dive deeper into a few areas. Tell me more about how you'd handle the real-time authorization at 65,000 TPS."
Deep Dive 1: Real-Time Authorization at Scale (Week 1-2 Concepts)
You: "This is the heart of the system. Let me explain how we achieve sub-200ms latency at 65,000 TPS."
The Problem
AUTHORIZATION LATENCY CHALLENGE
Without proper optimization:
Network RTT to bank: 50-100ms
Message parsing: 10-20ms
BIN lookup (if naive): 5-10ms
Fraud scoring: 50-100ms (if not optimized)
Database writes: 20-50ms
Total: 150-280ms just in network
But we need:
β End-to-end including acquirer + issuer: < 2 seconds
β Network portion only: < 200ms
β Fraud scoring: < 1ms
β Zero message loss
The Solution: Ultra-Low Latency Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LOW-LATENCY AUTHORIZATION PATH β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β NETWORK EDGE β β
β β β β
β β Dedicated leased lines (not internet) β β
β β MPLS VPN with predictable latency β β
β β Multiple redundant paths per connection β β
β β β β
β βββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β MESSAGE PROCESSING β β
β β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β
β β β FPGA/ASIC βββββΆβ Memory-only βββββΆβ Pre-computed β β β
β β β Parser β β Processing β β BIN Tables β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β
β β β β
β β β’ Zero-copy message handling β β
β β β’ BIN lookup in < 1ΞΌs (in-memory hash) β β
β β β’ No disk I/O on critical path β β
β β β β
β βββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ASYNC PERSISTENCE β β
β β β β
β β Authorization response returns BEFORE disk write β β
β β WAL ensures durability (Week 1: Write-ahead logs) β β
β β Async replication to standby data centers β β
β β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Implementation
# Real-Time Authorization Service
# Applies: Week 1 (Partitioning), Week 2 (Timeouts, Idempotency)
from dataclasses import dataclass
from typing import Optional
import asyncio
import time
from enum import Enum
class AuthDecision(Enum):
APPROVED = "00"
DECLINED_INSUFFICIENT_FUNDS = "51"
DECLINED_EXPIRED_CARD = "54"
DECLINED_FRAUD = "59"
SYSTEM_ERROR = "96"
ISSUER_UNAVAILABLE = "91"
@dataclass
class AuthorizationRequest:
"""ISO 8583-based authorization request."""
message_type: str # "0100" for auth request
pan: str # Primary Account Number (card number)
amount: int # In smallest currency unit (cents)
currency_code: str # "840" for USD
merchant_id: str
terminal_id: str
mcc: str # Merchant Category Code
transaction_id: str # Unique ID for idempotency
timestamp: float
@dataclass
class AuthorizationResponse:
"""Authorization response with decision."""
transaction_id: str
decision: AuthDecision
auth_code: Optional[str] = None
risk_score: Optional[int] = None
processing_time_ms: float = 0
class BINLookupService:
"""
Ultra-fast BIN lookup using in-memory hash table.
BIN (Bank Identification Number) is the first 6-8 digits of card.
Used to route to correct issuing bank.
Applies: Week 1, Day 1 - Hash partitioning for O(1) lookup
"""
def __init__(self):
# Pre-loaded hash map of BIN -> Issuer routing info
# In production: 500K+ BIN ranges loaded at startup
self._bin_table: dict[str, IssuerRoute] = {}
self._load_bin_table()
def _load_bin_table(self):
"""Load BIN table into memory at startup."""
# Example: Load from database into memory
# Real system has 500K+ entries
pass
def lookup(self, pan: str) -> Optional['IssuerRoute']:
"""
O(1) lookup of issuer routing information.
Takes < 1 microsecond with in-memory hash.
"""
# Try 8-digit BIN first, then 6-digit
bin_8 = pan[:8]
bin_6 = pan[:6]
return self._bin_table.get(bin_8) or self._bin_table.get(bin_6)
@dataclass
class IssuerRoute:
"""Routing information for an issuer."""
issuer_id: str
primary_endpoint: str
backup_endpoint: str
timeout_ms: int
supports_standin: bool
class AuthorizationService:
"""
Core authorization service achieving < 200ms latency.
Key optimizations:
1. No disk I/O on critical path
2. In-memory BIN lookup
3. Async persistence after response
4. Pre-established connections to issuers
Applies:
- Week 1, Day 1: Partitioning (BIN-based routing)
- Week 2, Day 1: Timeout management
- Week 2, Day 2: Idempotency keys
"""
def __init__(
self,
bin_service: BINLookupService,
fraud_service: 'FraudDetectionService',
issuer_gateway: 'IssuerGateway',
standin_service: 'StandInService',
wal: 'WriteAheadLog'
):
self.bin_service = bin_service
self.fraud_service = fraud_service
self.issuer_gateway = issuer_gateway
self.standin_service = standin_service
self.wal = wal
# Idempotency cache (Week 2, Day 2)
self._idempotency_cache: dict[str, AuthorizationResponse] = {}
async def authorize(
self,
request: AuthorizationRequest
) -> AuthorizationResponse:
"""
Process authorization request with strict latency SLA.
Target: < 200ms for network processing portion.
"""
start_time = time.monotonic()
# Step 1: Check idempotency (< 0.1ms)
# Prevents double-charging on retries
cached = self._idempotency_cache.get(request.transaction_id)
if cached:
return cached
# Step 2: WAL write for durability (async, non-blocking)
# We'll persist AFTER sending response
wal_future = asyncio.create_task(
self.wal.append(request)
)
# Step 3: BIN lookup (< 0.001ms)
route = self.bin_service.lookup(request.pan)
if not route:
return self._error_response(
request, AuthDecision.SYSTEM_ERROR
)
# Step 4: Fraud scoring (< 1ms)
# This happens in parallel with nothing else
risk_score = await self.fraud_service.score(request)
# Step 5: Route to issuer with timeout (variable, ~100-1000ms)
try:
response = await asyncio.wait_for(
self.issuer_gateway.authorize(request, route, risk_score),
timeout=route.timeout_ms / 1000.0
)
except asyncio.TimeoutError:
# Issuer timeout - use stand-in processing
if route.supports_standin:
response = await self.standin_service.authorize(
request, risk_score
)
else:
response = self._error_response(
request, AuthDecision.ISSUER_UNAVAILABLE
)
# Step 6: Calculate processing time
processing_time = (time.monotonic() - start_time) * 1000
response.processing_time_ms = processing_time
# Step 7: Cache for idempotency (TTL: 24 hours)
self._idempotency_cache[request.transaction_id] = response
# Step 8: Ensure WAL write completed
await wal_future
return response
def _error_response(
self,
request: AuthorizationRequest,
decision: AuthDecision
) -> AuthorizationResponse:
return AuthorizationResponse(
transaction_id=request.transaction_id,
decision=decision
)
class WriteAheadLog:
"""
Write-ahead log for transaction durability.
Applies: Week 1 - WAL for durability before processing.
Key insight: We write to WAL but don't wait for sync
before sending response. WAL ensures we can recover
any in-flight transactions after crash.
"""
async def append(self, request: AuthorizationRequest) -> None:
"""
Append request to WAL.
In production: Write to local SSD with group commit
for batching multiple transactions per fsync.
"""
# Serialize and write to durable storage
pass
# =============================================================================
# Network Layer: ISO 8583 Message Handling
# =============================================================================
class ISO8583Parser:
"""
High-performance ISO 8583 message parser.
In production, this might be implemented in:
- C++ for CPU optimization
- FPGA for hardware acceleration
- Specialized network appliances
Key fields in ISO 8583:
- Field 2: Primary Account Number (PAN)
- Field 3: Processing Code
- Field 4: Transaction Amount
- Field 11: System Trace Audit Number
- Field 37: Retrieval Reference Number
- Field 39: Response Code
"""
@staticmethod
def parse(raw_message: bytes) -> AuthorizationRequest:
"""
Parse ISO 8583 message to internal format.
Real implementation handles:
- Multiple MTI versions (1987, 1993, 2003)
- Network-specific variations
- BCD vs ASCII encoding
- Variable-length fields
"""
# Parse MTI (Message Type Indicator)
# Parse bitmap to know which fields present
# Parse each field according to spec
pass
@staticmethod
def serialize(response: AuthorizationResponse) -> bytes:
"""Serialize response to ISO 8583 format."""
pass
Edge Cases
Interviewer: "What happens if the issuer is slow or unreachable?"
You: "We implement stand-in processing with tiered timeouts..."
ISSUER TIMEOUT HANDLING
Scenario: Issuer taking too long or unreachable
Timeout Strategy (Week 2, Day 1):
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β Timeout Tier 1: 500ms β
β βββ First attempt to primary endpoint β
β β
β Timeout Tier 2: 300ms β
β βββ Failover to backup endpoint β
β β
β Timeout Tier 3: 200ms (Stand-In) β
β βββ Make decision locally using issuer-provided rules: β
β β’ Single transaction limit: $500 β
β β’ Daily velocity limit: $2,000 β
β β’ Decline if risk score > 80 β
β β’ Decline if card reported lost/stolen β
β β
β Total budget: ~1,000ms (leaves time for acquirer/merchant) β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Stand-In Processing:
β’ Track card velocity in Redis (global cluster)
β’ Apply issuer-defined rules
β’ Queue for later reconciliation with issuer
β’ Issuer accepts liability for approved stand-ins
Deep Dive 2: Real-Time Fraud Detection in < 1ms (Week 1-2 Concepts)
Interviewer: "How do you score 500+ attributes for fraud in under a millisecond?"
You: "This is where AI meets extreme performance engineering. Let me show you the architecture..."
The Solution: ML at Millisecond Scale
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FRAUD DETECTION ARCHITECTURE β
β β
β Transaction ββββΆ Feature Extraction ββββΆ Model Scoring ββββΆ Score β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β FEATURE EXTRACTION (< 0.5ms) β β
β β β β
β β Transaction Features: Velocity Features: β β
β β β’ Amount β’ Txn count last hour β β
β β β’ Currency β’ Txn count last day β β
β β β’ MCC category β’ Amount last hour β β
β β β’ Card present/absent β’ Unique merchants today β β
β β β’ Entry mode (chip/swipe) β’ Geographic spread β β
β β β β
β β Behavioral Features: Risk Indicators: β β
β β β’ Time since last txn β’ Is high-risk MCC? β β
β β β’ Distance from last txn β’ Is high-risk country? β β
β β β’ Deviation from pattern β’ Card age β β
β β β’ Device fingerprint β’ Address verification result β β
β β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β MODEL SCORING (< 0.5ms) β β
β β β β
β β Neural Network: β β
β β β’ Pre-compiled for inference β β
β β β’ Weights loaded in memory β β
β β β’ GPU/TPU acceleration or optimized CPU β β
β β β’ Batch scoring for throughput β β
β β β β
β β Output: Risk score 0-99 β β
β β β’ 0-30: Low risk (auto-approve candidate) β β
β β β’ 30-70: Medium risk (standard processing) β β
β β β’ 70-90: High risk (additional auth may be required) β β
β β β’ 90-99: Very high risk (likely fraud, decline candidate) β β
β β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Implementation
# Real-Time Fraud Detection Service
# Applies: Week 1 (Data partitioning for velocity), Week 2 (Timeouts)
import numpy as np
from typing import Dict, List
import redis.asyncio as redis
from dataclasses import dataclass
@dataclass
class FraudFeatures:
"""500+ features extracted for fraud scoring."""
# Transaction features (from request)
amount: float
currency: str
mcc: str
entry_mode: str
card_present: bool
# Velocity features (from Redis)
txn_count_1h: int
txn_count_24h: int
amount_sum_1h: float
amount_sum_24h: float
unique_merchants_24h: int
unique_countries_24h: int
# Behavioral features (computed)
time_since_last_txn_seconds: float
distance_from_last_txn_km: float
amount_deviation_from_avg: float
# Risk indicators
is_high_risk_mcc: bool
is_high_risk_country: bool
card_age_days: int
def to_vector(self) -> np.ndarray:
"""Convert to feature vector for model input."""
# One-hot encode categorical features
# Normalize numerical features
# Return as numpy array
pass
class VelocityService:
"""
Track card-level velocity using Redis.
Applies: Week 1, Day 4 - Hot key handling
Challenge: Popular cards (corporate cards) can be hot keys.
Solution: Use Redis Cluster with card-hash-based sharding.
"""
def __init__(self, redis_cluster: redis.RedisCluster):
self.redis = redis_cluster
async def get_velocity(self, pan_hash: str) -> Dict[str, any]:
"""
Get velocity metrics for a card.
Uses Redis sorted sets for time-windowed counting.
All operations are O(log N) or better.
"""
pipe = self.redis.pipeline()
now = time.time()
# Key structure: velocity:{pan_hash}:{metric}
base_key = f"velocity:{pan_hash}"
# Count transactions in last hour
pipe.zcount(f"{base_key}:txns", now - 3600, now)
# Count transactions in last 24 hours
pipe.zcount(f"{base_key}:txns", now - 86400, now)
# Sum amounts in last hour (stored as score)
pipe.zrangebyscore(
f"{base_key}:amounts",
now - 3600,
now,
withscores=True
)
# Unique merchants in last 24 hours
pipe.zcount(f"{base_key}:merchants", now - 86400, now)
# Last transaction location
pipe.get(f"{base_key}:last_location")
results = await pipe.execute()
return {
"txn_count_1h": results[0],
"txn_count_24h": results[1],
"amount_sum_1h": sum(score for _, score in results[2]),
"unique_merchants_24h": results[3],
"last_location": results[4]
}
async def record_transaction(
self,
pan_hash: str,
amount: float,
merchant_id: str,
location: str
) -> None:
"""
Record transaction for future velocity checks.
Uses async fire-and-forget to not block auth response.
"""
pipe = self.redis.pipeline()
now = time.time()
base_key = f"velocity:{pan_hash}"
# Add to transaction count
pipe.zadd(f"{base_key}:txns", {str(now): now})
# Add amount
pipe.zadd(f"{base_key}:amounts", {str(now): amount})
# Add merchant
pipe.zadd(f"{base_key}:merchants", {merchant_id: now})
# Update last location
pipe.set(f"{base_key}:last_location", location)
# Expire old data (48 hour window for cleanup)
for key_suffix in ["txns", "amounts", "merchants"]:
pipe.zremrangebyscore(
f"{base_key}:{key_suffix}",
0,
now - 172800
)
await pipe.execute()
class FraudDetectionService:
"""
Real-time fraud scoring in < 1ms.
Applies:
- Week 1, Day 4: Hot key handling for velocity
- Week 2, Day 1: Strict timeout management
Key optimizations:
1. Pre-loaded model weights in memory
2. Batch inference when possible
3. Feature computation parallelized
4. Redis cluster for velocity data
"""
def __init__(
self,
velocity_service: VelocityService,
model: 'FraudModel'
):
self.velocity = velocity_service
self.model = model
# Pre-compute static risk indicators
self._high_risk_mccs = self._load_high_risk_mccs()
self._high_risk_countries = self._load_high_risk_countries()
async def score(self, request: AuthorizationRequest) -> int:
"""
Score transaction for fraud risk.
Returns: Risk score 0-99
Target latency: < 1ms
"""
# Hash PAN for privacy and consistent sharding
pan_hash = self._hash_pan(request.pan)
# Get velocity features (Redis, < 0.3ms)
velocity = await self.velocity.get_velocity(pan_hash)
# Extract all features (< 0.2ms)
features = self._extract_features(request, velocity)
# Run model inference (< 0.5ms)
score = self.model.predict(features.to_vector())
# Fire-and-forget: Record this transaction for future velocity
asyncio.create_task(
self.velocity.record_transaction(
pan_hash,
request.amount,
request.merchant_id,
self._get_location(request)
)
)
return int(score * 99)
def _extract_features(
self,
request: AuthorizationRequest,
velocity: Dict
) -> FraudFeatures:
"""Extract features from request and velocity data."""
# Compute geographic distance if we have last location
distance_km = 0.0
if velocity.get("last_location"):
distance_km = self._calculate_distance(
velocity["last_location"],
self._get_location(request)
)
return FraudFeatures(
amount=request.amount / 100.0, # Convert cents to dollars
currency=request.currency_code,
mcc=request.mcc,
entry_mode="chip", # From request
card_present=True, # From request
txn_count_1h=velocity.get("txn_count_1h", 0),
txn_count_24h=velocity.get("txn_count_24h", 0),
amount_sum_1h=velocity.get("amount_sum_1h", 0.0),
amount_sum_24h=velocity.get("amount_sum_24h", 0.0),
unique_merchants_24h=velocity.get("unique_merchants_24h", 0),
unique_countries_24h=1, # Computed from history
time_since_last_txn_seconds=0.0, # Computed
distance_from_last_txn_km=distance_km,
amount_deviation_from_avg=0.0, # Computed
is_high_risk_mcc=request.mcc in self._high_risk_mccs,
is_high_risk_country=False, # From merchant location
card_age_days=365 # From card data
)
def _hash_pan(self, pan: str) -> str:
"""Hash PAN for privacy-preserving velocity lookup."""
import hashlib
return hashlib.sha256(pan.encode()).hexdigest()[:16]
def _load_high_risk_mccs(self) -> set:
"""Load high-risk merchant category codes."""
return {
"5912", # Drug stores
"5944", # Jewelry stores
"5999", # Misc retail
"7995", # Gambling
}
def _load_high_risk_countries(self) -> set:
"""Load high-risk countries."""
return set() # Configured per issuer
class FraudModel:
"""
Pre-trained neural network for fraud scoring.
In production:
- Trained on billions of transactions
- Updated weekly with new fraud patterns
- A/B tested before deployment
- Multiple model versions for different card types
"""
def __init__(self, model_path: str):
# Load pre-trained model weights
# Could be TensorFlow, PyTorch, or ONNX
self.weights = self._load_weights(model_path)
def predict(self, features: np.ndarray) -> float:
"""
Run inference on feature vector.
Returns: Probability of fraud (0.0 to 1.0)
In production, this might use:
- TensorRT for GPU acceleration
- ONNX Runtime for CPU optimization
- Custom inference engine
"""
# Simple neural network forward pass
# Real implementation is much more sophisticated
return 0.1 # Placeholder
def _load_weights(self, path: str):
"""Load model weights from file."""
pass
Deep Dive 3: Global Data Center Architecture (Week 1-2 Concepts)
Interviewer: "How do you achieve 99.9999% uptime across global data centers?"
You: "This requires a sophisticated multi-data-center architecture with synchronous and asynchronous replication..."
The Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β GLOBAL DATA CENTER TOPOLOGY β
β β
β βββββββββββββββββββ β
β β LONDON DC β β
β β (Active) β β
β ββββββββββ¬βββββββββ β
β β β
β βββββββββββββββββββββββΌββββββββββββββββββββββ β
β β β β β
β βΌ βΌ βΌ β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
β β ASHBURN DC β β DENVER DC β β SINGAPORE DC β β
β β (Primary US) ββββΆβ (Backup US) ββββΆβ (APAC Primary) β β
β β β β β β β β
β ββββββββββ¬βββββββββ βββββββββββββββββββ ββββββββββ¬βββββββββ β
β β β β
β β MPLS VPN Network β β
β β (Dedicated circuits, β β
β β not public internet) β β
β β β β
β ββββββββββββββββββββ¬ββββββββββββββββββββββββββ β
β β β
β ββββββββββββ΄βββββββββββ β
β β β β
β βΌ βΌ β
β βββββββββββββββββββ βββββββββββββββββββ β
β β Acquirer Banks β β Issuer Banks β β
β β (Connections) β β (Connections) β β
β βββββββββββββββββββ βββββββββββββββββββ β
β β
β REPLICATION STRATEGY: β
β β’ Auth state: Synchronous within region, async across regions β
β β’ Transaction log: Multi-master with conflict resolution β
β β’ BIN tables: Read replicas everywhere, writes to primary β
β β’ Velocity data: Redis Cluster spanning regions β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Redundancy Design
# Multi-Data-Center Routing and Failover
# Applies: Week 1, Day 2 - Replication Trade-offs
from dataclasses import dataclass
from typing import List, Optional
from enum import Enum
import asyncio
class DataCenterStatus(Enum):
HEALTHY = "healthy"
DEGRADED = "degraded"
OFFLINE = "offline"
@dataclass
class DataCenter:
"""Data center configuration and health."""
id: str
region: str
primary_for_regions: List[str]
status: DataCenterStatus
latency_ms: float
capacity_pct: float
class GlobalRouter:
"""
Route requests to optimal data center.
Applies: Week 1, Day 2 - Replication and failover
Routing priorities:
1. Geographic proximity (latency)
2. Data center health
3. Load balancing
4. Regulatory requirements (data residency)
"""
def __init__(self):
self.data_centers: Dict[str, DataCenter] = {}
self._health_check_interval = 1.0 # seconds
def select_data_center(
self,
source_region: str,
transaction_type: str
) -> DataCenter:
"""
Select optimal data center for processing.
Returns primary DC for region if healthy,
otherwise fails over to backup.
"""
# Get primary DC for this region
primary = self._get_primary_for_region(source_region)
if primary and primary.status == DataCenterStatus.HEALTHY:
return primary
# Primary unhealthy - find backup
backup = self._get_backup_for_region(source_region)
if backup and backup.status == DataCenterStatus.HEALTHY:
return backup
# All regional DCs down - use global fallback
return self._get_any_healthy_dc()
def _get_primary_for_region(self, region: str) -> Optional[DataCenter]:
"""Get primary DC for a region."""
for dc in self.data_centers.values():
if region in dc.primary_for_regions:
return dc
return None
class TransactionReplicator:
"""
Replicate transactions across data centers.
Applies: Week 1, Day 2 - Sync vs Async replication
Strategy:
- Authorization state: Sync within region (strong consistency)
- Transaction log: Async across regions (eventual consistency)
- WAL replication ensures no data loss on DC failure
"""
def __init__(
self,
local_dc: str,
peer_dcs: List[str]
):
self.local_dc = local_dc
self.peer_dcs = peer_dcs
async def replicate_auth(
self,
transaction: AuthorizationRequest,
response: AuthorizationResponse
) -> None:
"""
Replicate authorization to peer DCs.
Sync replication to regional peer (for hot standby).
Async replication to other regions (for DR).
"""
# Sync replicate to regional peer
regional_peer = self._get_regional_peer()
if regional_peer:
await self._sync_replicate(regional_peer, transaction, response)
# Async replicate to other regions
for dc in self.peer_dcs:
if dc != regional_peer:
asyncio.create_task(
self._async_replicate(dc, transaction, response)
)
async def _sync_replicate(
self,
dc: str,
transaction: AuthorizationRequest,
response: AuthorizationResponse
) -> None:
"""
Synchronous replication - wait for acknowledgment.
Used for regional failover capability.
Timeout: 50ms (if peer is slow, continue anyway)
"""
try:
await asyncio.wait_for(
self._send_to_dc(dc, transaction, response),
timeout=0.05
)
except asyncio.TimeoutError:
# Log but don't fail the auth
# Regional peer will catch up from WAL
pass
async def _async_replicate(
self,
dc: str,
transaction: AuthorizationRequest,
response: AuthorizationResponse
) -> None:
"""
Asynchronous replication - fire and forget.
Used for cross-region disaster recovery.
Will be caught up from WAL if this fails.
"""
try:
await self._send_to_dc(dc, transaction, response)
except Exception as e:
# Log error - cross-region replication will recover
pass
class DataCenterFailover:
"""
Handle data center failover scenarios.
Applies: Week 2, Day 3 - Circuit breakers
Failover scenarios:
1. Network partition between DCs
2. Complete DC outage
3. Degraded performance (slow responses)
"""
def __init__(self):
self._dc_health: Dict[str, CircuitBreaker] = {}
async def check_dc_health(self, dc_id: str) -> DataCenterStatus:
"""
Check health of a data center.
Uses circuit breaker pattern to avoid
cascading failures.
"""
breaker = self._dc_health.get(dc_id)
if breaker and breaker.is_open:
return DataCenterStatus.OFFLINE
try:
# Send health check
latency = await self._ping_dc(dc_id)
if latency > 100: # ms
return DataCenterStatus.DEGRADED
return DataCenterStatus.HEALTHY
except Exception:
# Record failure
if breaker:
breaker.record_failure()
return DataCenterStatus.OFFLINE
Physical Resilience
PHYSICAL DATA CENTER RESILIENCE (VISA ASHBURN EXAMPLE)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DATA CENTER PHYSICAL DESIGN β
β β
β POWER β
β βββββ β
β β’ 4 independent utility feeds β
β β’ 4 x 1MW diesel generators β
β β’ 24,000 gallons diesel (9 days runtime) β
β β’ Uninterruptible power supply (UPS) with battery backup β
β β’ N+1 redundancy on all power systems β
β β
β COOLING β
β βββββββ β
β β’ 1.5 million gallon water storage tank β
β β’ Multiple chiller plants β
β β’ Enough capacity to cool 300 homes β
β β’ On-site well for emergency water β
β β
β PHYSICAL SECURITY β
β βββββββββββββββββ β
β β’ 18-inch reinforced concrete walls β
β β’ Designed for 170 mph winds β
β β’ Earthquake resistant β
β β’ Hydraulic bollards (stop 50 mph vehicles) β
β β’ Multi-layer biometric access β
β β’ Only 75 employees cleared for data halls β
β β
β NETWORK β
β βββββββ β
β β’ Multiple fiber routes from different carriers β
β β’ MPLS VPN (not public internet) β
β β’ 10+ million miles of telecom network β
β β’ Redundant connections to every major bank β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Deep Dive 4: Clearing and Settlement (Week 3 Concepts)
Interviewer: "Walk me through how money actually moves after authorization."
You: "Clearing and settlement is where the financial reality catches up with the real-time authorization. Let me explain the batch processing system..."
Clearing and Settlement Flow
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CLEARING AND SETTLEMENT TIMELINE β
β β
β T+0 (Authorization Day) β
β βββββββββββββββββββββββ β
β 12:00 PM: Customer swipes card at merchant β
β Authorization approved (real-time) β
β Money NOT moved yet β
β β
β T+0 (End of Day) β
β ββββββββββββββββ β
β 11:59 PM: Merchant batches all day's transactions β
β Sends clearing file to acquirer β
β Acquirer sends to network β
β β
β T+1 (Clearing Day) β
β ββββββββββββββββββββ β
β 2:00 AM: Network processes clearing files β
β Matches clearing to authorizations β
β Calculates interchange fees β
β Nets positions across all banks β
β β
β 6:00 AM: Network sends settlement files to banks β
β Each bank knows net debit or credit β
β β
β T+1/T+2 (Settlement) β
β ββββββββββββββββββββββ β
β 9:00 AM: Banks with net debit wire funds to network β
β Network distributes to banks with net credit β
β Usually via Fedwire or SWIFT β
β β
β Result: Merchant's bank account credited β
β Customer's statement shows charge β
β Interchange fee collected β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Fee Calculation
INTERCHANGE FEE EXAMPLE
Transaction: $100 purchase at restaurant
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β Customer pays: $100.00 β
β β
β Breakdown: β
β βββ Merchant receives: $97.50 β
β β β
β βββ Acquirer keeps: $0.30 (processing fee) β
β β β
β βββ Network keeps: $0.20 (scheme fee) β
β β β
β βββ Issuer receives: $2.00 (interchange fee) β
β β
β Interchange varies by: β
β β’ Card type (credit vs debit, premium vs standard) β
β β’ Merchant category (restaurant, grocery, gas station) β
β β’ Transaction type (card present vs card not present) β
β β’ Risk level (chip vs swipe vs online) β
β β
β Typical range: 1.5% - 3.5% of transaction β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Implementation
# Clearing and Settlement Service
# Applies: Week 3 (Messaging, Batch Processing)
from dataclasses import dataclass
from typing import List, Dict
from decimal import Decimal
from datetime import datetime, date
import asyncio
@dataclass
class ClearingRecord:
"""Clearing record submitted by acquirer."""
transaction_id: str
authorization_code: str
pan_hash: str
amount: Decimal
currency: str
merchant_id: str
mcc: str
acquirer_bin: str
issuer_bin: str
transaction_date: date
clearing_date: date
@dataclass
class InterchangeFee:
"""Calculated interchange fee."""
transaction_id: str
rate: Decimal # Percentage
fixed_fee: Decimal # Fixed amount
total_fee: Decimal
fee_category: str
@dataclass
class SettlementPosition:
"""Net settlement position for a bank."""
bank_id: str
net_amount: Decimal # Positive = receives, Negative = pays
transaction_count: int
interchange_received: Decimal
interchange_paid: Decimal
class ClearingService:
"""
Process clearing files and prepare settlement.
Applies: Week 3, Day 1 - Batch vs Stream processing
This is primarily batch processing:
- Runs daily after clearing cutoff
- Processes millions of records
- Must be idempotent (can re-run safely)
"""
def __init__(
self,
interchange_calculator: 'InterchangeCalculator',
settlement_service: 'SettlementService'
):
self.interchange = interchange_calculator
self.settlement = settlement_service
async def process_clearing_batch(
self,
clearing_records: List[ClearingRecord]
) -> List[SettlementPosition]:
"""
Process a batch of clearing records.
Steps:
1. Validate each record against authorization
2. Calculate interchange fees
3. Net positions across all banks
4. Generate settlement file
"""
# Group by acquirer and issuer
by_acquirer: Dict[str, List[ClearingRecord]] = {}
by_issuer: Dict[str, List[ClearingRecord]] = {}
for record in clearing_records:
# Validate against authorization
if not await self._validate_against_auth(record):
continue
# Calculate interchange
fee = self.interchange.calculate(record)
# Add to acquirer's batch
if record.acquirer_bin not in by_acquirer:
by_acquirer[record.acquirer_bin] = []
by_acquirer[record.acquirer_bin].append(record)
# Add to issuer's batch
if record.issuer_bin not in by_issuer:
by_issuer[record.issuer_bin] = []
by_issuer[record.issuer_bin].append(record)
# Calculate net positions
positions = self._calculate_net_positions(
by_acquirer, by_issuer
)
# Send to settlement
await self.settlement.initiate_settlement(positions)
return positions
async def _validate_against_auth(
self,
record: ClearingRecord
) -> bool:
"""
Validate clearing record matches authorization.
Checks:
- Authorization exists
- Amounts match (within tolerance)
- Not already cleared
- Within clearing window
"""
# Look up original authorization
# Compare amounts (allow small differences for tips)
# Ensure not duplicate
return True
def _calculate_net_positions(
self,
by_acquirer: Dict[str, List[ClearingRecord]],
by_issuer: Dict[str, List[ClearingRecord]]
) -> List[SettlementPosition]:
"""
Calculate net settlement position for each bank.
Acquirers: Pay out transaction amounts, receive from merchants
Issuers: Receive interchange, pay transaction amounts
Net it all together so each bank has single debit/credit.
"""
positions: Dict[str, SettlementPosition] = {}
# Process acquirer side (they owe the transaction amounts)
for acquirer_bin, records in by_acquirer.items():
total_amount = sum(r.amount for r in records)
# Acquirer owes this amount
if acquirer_bin not in positions:
positions[acquirer_bin] = SettlementPosition(
bank_id=acquirer_bin,
net_amount=Decimal(0),
transaction_count=0,
interchange_received=Decimal(0),
interchange_paid=Decimal(0)
)
positions[acquirer_bin].net_amount -= total_amount
positions[acquirer_bin].transaction_count += len(records)
# Process issuer side (they receive the transaction amounts)
for issuer_bin, records in by_issuer.items():
total_amount = sum(r.amount for r in records)
total_interchange = sum(
self.interchange.calculate(r).total_fee
for r in records
)
if issuer_bin not in positions:
positions[issuer_bin] = SettlementPosition(
bank_id=issuer_bin,
net_amount=Decimal(0),
transaction_count=0,
interchange_received=Decimal(0),
interchange_paid=Decimal(0)
)
# Issuer receives: amount minus interchange
positions[issuer_bin].net_amount += (
total_amount - total_interchange
)
positions[issuer_bin].interchange_received += total_interchange
return list(positions.values())
class InterchangeCalculator:
"""
Calculate interchange fees based on transaction characteristics.
Interchange varies by:
- Card type (credit, debit, premium, corporate)
- Merchant category code
- Transaction type (card present, CNP, recurring)
- Geographic factors
"""
def __init__(self):
# Load interchange rate tables
# Visa and Mastercard publish these twice yearly
self._rate_tables = self._load_rate_tables()
def calculate(self, record: ClearingRecord) -> InterchangeFee:
"""Calculate interchange fee for a transaction."""
# Look up rate based on characteristics
rate_info = self._lookup_rate(
card_type=self._get_card_type(record.pan_hash),
mcc=record.mcc,
card_present=True, # Derived from clearing data
transaction_type="purchase"
)
# Calculate fee
percentage_fee = record.amount * rate_info["rate"]
fixed_fee = Decimal(rate_info["fixed"])
total = percentage_fee + fixed_fee
return InterchangeFee(
transaction_id=record.transaction_id,
rate=rate_info["rate"],
fixed_fee=fixed_fee,
total_fee=total,
fee_category=rate_info["category"]
)
def _lookup_rate(
self,
card_type: str,
mcc: str,
card_present: bool,
transaction_type: str
) -> Dict:
"""Look up interchange rate from tables."""
# Complex logic based on Visa/Mastercard published rates
# Example rates:
return {
"rate": Decimal("0.0185"), # 1.85%
"fixed": Decimal("0.10"), # 10 cents
"category": "standard_credit_purchase"
}
def _load_rate_tables(self) -> Dict:
"""Load interchange rate tables."""
# Published by Visa and Mastercard
return {}
class SettlementService:
"""
Execute settlement between banks.
Settlement happens via:
- Fedwire (US domestic)
- SWIFT (International)
- Central bank systems
"""
async def initiate_settlement(
self,
positions: List[SettlementPosition]
) -> None:
"""
Initiate settlement transfers.
Process:
1. Send debit instructions to banks that owe
2. Wait for funds to arrive in clearing account
3. Send credit instructions to banks that receive
"""
# Banks that owe (negative position)
debits = [p for p in positions if p.net_amount < 0]
# Banks that receive (positive position)
credits = [p for p in positions if p.net_amount > 0]
# Request debits first
for position in debits:
await self._request_debit(
position.bank_id,
abs(position.net_amount)
)
# Wait for funds to arrive (usually within hour)
await self._wait_for_funds()
# Send credits
for position in credits:
await self._send_credit(
position.bank_id,
position.net_amount
)
async def _request_debit(
self,
bank_id: str,
amount: Decimal
) -> None:
"""Request debit from bank via Fedwire."""
# Send Fedwire 1031 drawdown request
pass
async def _send_credit(
self,
bank_id: str,
amount: Decimal
) -> None:
"""Send credit to bank via Fedwire."""
# Send Fedwire funds transfer
pass
Phase 5: Scaling and Edge Cases (5 minutes)
Interviewer: "How would this system scale to 10x the current load?"
Scaling Strategy
You: "The system is designed for horizontal scaling at multiple layers..."
SCALING TO 10X (650,000 TPS)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SCALING STRATEGY β
β β
β CURRENT 10X SCALE β
β βββββββ βββββββββ β
β β
β Message Gateways: Message Gateways: β
β 100 nodes 1,000 nodes β
β (Stateless, add more) (Same architecture) β
β β
β Router/Switch: Router/Switch: β
β 50 nodes 500 nodes β
β (BIN tables fit in RAM) (Partition by BIN range) β
β β
β Fraud Detection: Fraud Detection: β
β 200 nodes 2,000 nodes β
β (Model inference) (More GPU nodes) β
β β
β Redis Cluster: Redis Cluster: β
β 50 shards 500 shards β
β (Velocity data) (Re-shard by PAN hash) β
β β
β Data Centers: Data Centers: β
β 4 active 7+ active β
β (Add APAC, LATAM capacity) β
β β
β Issuer Connections: Issuer Connections: β
β 15,000 banks Same (banks add capacity) β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Edge Cases
Interviewer: "What are some edge cases we should handle?"
Edge Case 1: Duplicate Authorization Requests
Scenario: Acquirer retries due to timeout, but original succeeded
Problem:
Customer could be double-charged
Solution (Week 2, Day 2 - Idempotency):
β’ Every auth request has unique transaction_id
β’ Cache auth responses for 24 hours
β’ Return cached response on duplicate request
β’ Idempotency key = acquirer_id + transaction_id + timestamp
Edge Case 2: Authorization/Clearing Mismatch
Scenario: Clearing amount differs from authorization
Examples:
β’ Restaurant: Auth $50, Clearing $60 (tip added)
β’ Gas station: Auth $100 (hold), Clearing $42 (actual pump)
β’ Hotel: Auth $500, Clearing $650 (incidentals)
Solution:
β’ Allow clearing within tolerance of auth
β’ Partial clearing allowed
β’ Over-tolerance triggers issuer notification
β’ Some MCCs have special rules (gas: auth $1, clear actual)
Edge Case 3: Issuer Timeout During High Volume
Scenario: Black Friday, issuer can't keep up
Problem:
β’ Issuer latency increases from 200ms to 5s
β’ Customers abandon purchases
β’ Merchant loses sales
Solution (Stand-In Processing):
β’ Detect issuer degradation (latency > threshold)
β’ Switch to stand-in mode for that issuer
β’ Apply issuer-defined rules:
- Single txn limit: $500
- Velocity limit: 10 txn/hour
- Decline if card flagged
β’ Queue for later reconciliation
β’ Issuer accepts liability for stand-in approvals
Failure Scenarios
| Failure | Detection | Impact | Recovery |
|---|---|---|---|
| Data center outage | Health checks, latency | Route to backup DC | Automatic failover < 30s |
| Redis cluster failure | Connection errors | Velocity checks fail | Fall back to conservative limits |
| Fraud model timeout | Latency > 1ms | Transaction delays | Bypass fraud (use stand-in rules) |
| Issuer unreachable | Timeouts | Can't authorize | Stand-in processing |
| Network partition | Split-brain detection | Inconsistent state | Prefer availability, reconcile later |
Phase 6: Monitoring and Operations
Interviewer: "How would you monitor this system in production?"
Key Metrics
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MONITORING DASHBOARD β
β β
β AUTHORIZATION HEALTH β
β βββ TPS current: 45,231 /sec [ββββββββββββββββββββ] 70% β
β βββ Auth latency p99: 127ms [ββββββββββββββββββββ] OK β
β βββ Approval rate: 96.2% [ββββββββββββββββββββ] OK β
β βββ Stand-in rate: 0.3% [ββββββββββββββββββββ] OK β
β β
β FRAUD DETECTION β
β βββ Scoring latency p99: 0.8ms [ββββββββββββββββββββ] OK β
β βββ High-risk flagged: 0.5% [ββββββββββββββββββββ] OK β
β βββ False positive rate: 0.08% [ββββββββββββββββββββ] OK β
β β
β DATA CENTER HEALTH β
β βββ Ashburn: β Healthy [CPU: 45% | Mem: 62%] β
β βββ Denver: β Healthy [CPU: 38% | Mem: 58%] β
β βββ London: β Healthy [CPU: 52% | Mem: 65%] β
β βββ Singapore: β Healthy [CPU: 41% | Mem: 55%] β
β β
β ISSUER CONNECTIVITY β
β βββ Connected issuers: 14,892 / 15,000 β
β βββ Degraded issuers: 23 β
β βββ Offline issuers: 5 β
β β
β CLEARING & SETTLEMENT β
β βββ Pending clearing: 2.3M records β
β βββ Settlement status: T+1 complete β
β βββ Unmatched auths: 0.02% β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Alerting Strategy
CRITICAL (PagerDuty, immediate):
β’ Authorization latency p99 > 500ms
β’ Approval rate drop > 5% in 5 minutes
β’ Data center offline
β’ Stand-in rate > 5%
β’ Security breach detected
WARNING (Slack, 15 min):
β’ Authorization latency p99 > 200ms
β’ Single issuer degraded > 5 minutes
β’ Fraud score latency > 2ms
β’ Clearing match rate < 99%
INFO (Dashboard only):
β’ TPS fluctuations
β’ Routine issuer timeouts
β’ Scheduled maintenance
Runbook: High Authorization Latency
RUNBOOK: Authorization Latency Spike
SYMPTOMS:
β’ p99 latency > 200ms
β’ Customer complaints about slow checkout
β’ Acquirer timeout rates increasing
DIAGNOSIS:
1. Check issuer latency breakdown:
$ auth-latency-breakdown --last 5m
2. Identify slow issuers:
$ issuer-latency-report --threshold 500ms
3. Check fraud scoring latency:
$ fraud-latency-percentiles --last 5m
4. Check DC health:
$ dc-health-status --all
RESOLUTION:
If single issuer slow:
1. Enable stand-in for that issuer
2. Alert issuer operations team
3. Monitor stand-in approval quality
If fraud scoring slow:
1. Check model serving cluster health
2. Scale up GPU nodes if needed
3. Consider bypassing fraud for low-risk
If DC overloaded:
1. Shift traffic to backup DC
2. Scale up compute in affected DC
3. Investigate traffic spike cause
ESCALATION:
β’ Network Operations Center (NOC)
β’ Risk Operations Center (ROC) if fraud-related
β’ Issuer relations if bank-specific
Interview Conclusion
Interviewer: "Excellent work. You've demonstrated strong understanding of payment systems, handled the scale requirements well, and made good trade-off decisions around consistency and availability. Any questions for me?"
You: "Thank you! I'd love to understand how Visa actually handles the transition when a card network launches a new data center. How do you migrate traffic without impacting transactions?"
Interviewer: "Great question. We typically..."
Summary: Concepts Applied from 10-Week Course
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β CONCEPTS FROM 10-WEEK COURSE IN VISA/MASTERCARD DESIGN β
β β
β WEEK 1: DATA AT SCALE β
β βββ Partitioning: BIN-based routing for O(1) issuer lookup β
β βββ Replication: Multi-DC sync/async for availability β
β βββ Hot Keys: Velocity tracking with sharded Redis β
β βββ WAL: Transaction durability before response β
β β
β WEEK 2: FAILURE-FIRST DESIGN β
β βββ Timeouts: Tiered timeout strategy for issuer calls β
β βββ Idempotency: Transaction IDs prevent double-charging β
β βββ Circuit Breakers: Data center health monitoring β
β βββ Stand-In: Graceful degradation when issuer unavailable β
β β
β WEEK 3: MESSAGING & ASYNC β
β βββ Batch Processing: Clearing runs as nightly batch β
β βββ Transactional Outbox: Settlement file generation β
β βββ Event Sourcing: Transaction log as source of truth β
β β
β WEEK 4: CACHING β
β βββ In-Memory: BIN tables for sub-microsecond lookup β
β βββ Velocity Cache: Redis for fraud detection features β
β β
β WEEK 5: CONSISTENCY β
β βββ Eventually Consistent: Cross-region replication β
β βββ Strong Consistency: Regional sync replication β
β βββ Conflict Resolution: Auth/clearing reconciliation β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β WHY VISA/MASTERCARD IS AN ENGINEERING MARVEL β
β β
β SCALE β
β βββββ β
β β’ 65,000 TPS sustained capacity β
β β’ 500+ billion transactions per year β
β β’ $28+ trillion in payment volume β
β β’ 200+ countries, 160+ currencies β
β β
β RELIABILITY β
β βββββββββββ β
β β’ 99.9999% uptime (32 seconds/year downtime) β
β β’ Zero tolerance for double-charging β
β β’ Global redundancy across 7 data centers β
β β’ Automatic failover in < 30 seconds β
β β
β SPEED β
β βββββ β
β β’ < 2 second end-to-end authorization β
β β’ < 1 millisecond fraud scoring β
β β’ < 200ms network processing β
β β’ Real-time across the globe β
β β
β SECURITY β
β ββββββββ β
β β’ $40+ billion fraud prevented annually (Visa alone) β
β β’ 500+ attributes analyzed per transaction β
β β’ AI models trained on billions of transactions β
β β’ 22 billion security events monitored daily β
β β
β KEY LESSONS β
β βββββββββββ β
β 1. Latency is king - every millisecond matters at scale β
β 2. Availability > consistency for auth (reconcile later) β
β 3. In-memory processing for critical path β
β 4. Redundancy at every layer - backups have backups β
β 5. Stand-in processing: graceful degradation over failure β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Sources
Official Documentation:
- Visa Developer Portal - VisaNet Connect APIs: https://developer.visa.com/capabilities/visanet-connect-acceptance
- Visa Corporate - Technology & Infrastructure: https://corporate.visa.com
- Mastercard Developer Documentation: https://developer.mastercard.com
- ISO 8583 Standard Specification: https://www.iso.org/standard/31628.html
Statistics and Data:
- Visa FY2024 Annual Report - Financials: https://annualreport.visa.com/financials/default.aspx
- Mastercard Inc. Form 10-K 2024: https://www.sec.gov/Archives/edgar/data/1141391/000114139125000011/ma-20241231.htm
- Capital One Shopping - Credit Card Statistics: https://capitaloneshopping.com/research/number-of-credit-card-transactions/
- CoinLaw - Visa Statistics 2025: https://coinlaw.io/visa-statistics/
- CoinLaw - Mastercard Statistics 2025: https://coinlaw.io/mastercard-statistics/
Architecture and Technical:
- Visa Perspectives - Inside Visa's Engine of Global Commerce: https://corporate.visa.com/en/sites/visa-perspectives/security-trust/inside-visa-global-commerce-engine.html
- Increase - Visa: Half a Century of High Availability: https://increase.com/articles/visa-redundancy
- Increase - ISO 8583: The Language of Credit Cards: https://increase.com/articles/iso-8583-the-language-of-credit-cards
- UniBul - How Visa's Payment System Works: https://blog.unibulmerchantservices.com/how-visas-payment-system-works/
- Network Computing - Inside Visa's Data Center: https://www.networkcomputing.com/data-center-networking/inside-visa-s-data-center
- Philadelphia Fed - Clearing and Settlement of Interbank Card Transactions: https://www.philadelphiafed.org/-/media/frbp/assets/consumer-finance/discussion-papers/d-2013-october-clearing-settlement.pdf
Security and Fraud:
- Visa - AI-Powered Fraud Solutions: https://www.visaacceptance.com/en-us/solutions/ai-driven-fraud-management.html
- Visa - Advanced Authorization: https://corporate.visa.com/en/solutions/secure-card-payments.html
- Visa Newsroom - $25 Billion Fraud Prevented: https://usa.visa.com/about-visa/newsroom/press-releases.releaseId.16421.html
- Visa - Generative AI Fraud Solution: https://investor.visa.com/news/news-details/2024/Visa-Announces-Generative-AI-Powered-Fraud-Solution-to-Combat-Account-Attacks/default.aspx
Payment Industry:
- Wikipedia - ISO 8583: https://en.wikipedia.org/wiki/ISO_8583
- Wikipedia - Four Corners Model: https://en.wikipedia.org/wiki/Four_Corners_Model_for_Payment_Security
- Cryptomathic - Four Corners Model: https://www.cryptomathic.com/blog/cardholder-merchant-issuer-acquirer-the-four-corners-model-for-payment-security-and-key-management
- IR Guide - Introduction to ISO 8583: https://www.ir.com/guides/introduction-to-iso-8583
Further Reading
Official Documentation:
- Visa Developer Portal: https://developer.visa.com (Complete API documentation for VisaNet integration)
- Mastercard Developers: https://developer.mastercard.com (Payment gateway and processing APIs)
- PCI Security Standards Council: https://www.pcisecuritystandards.org (Security compliance requirements)
Engineering Talks (Highly Recommended):
- Visa Technology - Various talks on VisaNet architecture at tech conferences
- Stripe Engineering - Payment infrastructure insights (complementary perspective)
Books:
- "Designing Data-Intensive Applications" by Martin Kleppmann - Chapters on replication, partitioning, and consistency
- "System Design Interview Vol 2" by Alex Xu - Payment system design patterns
- "Building Microservices" by Sam Newman - Distributed systems patterns applicable to payment networks
Related Systems to Study:
- SWIFT Network: International bank messaging (different scale, complementary)
- India's UPI: Modern payment rail with different architecture (see Bonus Problem 1)
- China's UnionPay: Largest card network by transaction count
- RTP/FedNow: Real-time payment rails (newer, different model)
Self-Assessment Checklist
After studying this design, you should be able to:
- Explain the four-party model and role of payment networks
- Design a system handling 65,000+ TPS with < 200ms latency
- Implement real-time fraud detection in < 1 millisecond
- Describe ISO 8583 message format and key fields
- Design multi-data-center architecture for 99.9999% uptime
- Explain authorization vs clearing vs settlement
- Calculate interchange fees and net settlement positions
- Implement idempotency for financial transactions
- Design stand-in processing for issuer unavailability
- Explain trade-offs between consistency and availability in payments
This case study demonstrates how Visa and Mastercard built the world's largest real-time financial system, processing $28+ trillion annually with millisecond-level fraud detection and six-nines availability.