Himanshu Kukreja
0%
LearnSystem Designbonus-problemsAadhar Identity System

Bonus Problem 3: Aadhaar (UIDAI)

The World's Largest Biometric Identity System


πŸͺͺ Identity at Billion Scale

Imagine this challenge: You need to uniquely identify 1.4 billion people.

Not just assign them a number β€” but guarantee that each person appears exactly once in your system. No duplicates. No fakes. Every identity verifiable in under 200 milliseconds.

You'll need to match billions of fingerprints against billions of other fingerprints. Trillions of biometric comparisons. Every single day.

This is Aadhaar β€” and it's the largest biometric identity system ever built.

THE AADHAAR SCALE (2025)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                                         β”‚
β”‚   ENROLLMENT                                                            β”‚
β”‚   ──────────                                                            β”‚
β”‚   Total Enrolled:           1.38+ Billion people                        β”‚
β”‚   Coverage:                 99.9% of adult Indian population            β”‚
β”‚   Biometric Data:           ~15 Petabytes                               β”‚
β”‚   (10 fingerprints + 2 iris scans + photo per person)                   β”‚
β”‚                                                                         β”‚
β”‚   AUTHENTICATION                                                        β”‚
β”‚   ──────────────                                                        β”‚
β”‚   Daily Authentications:    90+ Million                                 β”‚
β”‚   Monthly Authentications:  2.5+ Billion                                β”‚
β”‚   Cumulative (to date):     150+ Billion authentications                β”‚
β”‚   e-KYC Transactions:       45+ Million/month                           β”‚
β”‚   Face Authentication:      18+ Million/month (AI-powered)              β”‚
β”‚                                                                         β”‚
β”‚   PERFORMANCE                                                           β”‚
β”‚   ───────────                                                           β”‚
β”‚   Authentication Latency:   < 200ms                                     β”‚
β”‚   Availability:             99.9%+                                      β”‚
β”‚   Active Entities (AUAs):   550+                                        β”‚
β”‚                                                                         β”‚
β”‚   DEDUPLICATION                                                         β”‚
β”‚   ─────────────                                                         β”‚
β”‚   Biometric Matches/Day:    600+ Trillion (at peak)                     β”‚
β”‚   ABIS Vendors:             3 (for redundancy)                          β”‚
β”‚   Duplicate Detection:      99.965% accuracy                            β”‚
β”‚                                                                         β”‚
β”‚   IMPACT                                                                β”‚
β”‚   ──────                                                                β”‚
β”‚   DBT Savings:              β‚Ή3.5+ Lakh Crore ($42B+) saved              β”‚
β”‚   Ghost Beneficiaries:      Millions eliminated                         β”‚
β”‚   Bank Accounts Linked:     788+ Million                                β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

This is the system we'll design today β€” and understand the engineering marvel behind proving "you are you" at planetary scale.


The Interview Begins

You're interviewing at a government technology agency. The Chief Architect draws on the whiteboard:

Interviewer: "India's Aadhaar is often cited as the world's most ambitious digital identity project. I want you to design a biometric identity system that can scale to a billion people. Walk me through how you'd approach it."

╔═════════════════════════════════════════════════════════════════════════╗
β•‘                                                                         β•‘
β•‘        Design a National Biometric Identity System                      β•‘
β•‘                                                                         β•‘
β•‘   Build an identity system that can:                                    β•‘
β•‘                                                                         β•‘
β•‘   Requirements:                                                         β•‘
β•‘   β€’ Enroll 1+ billion residents with biometrics                         β•‘
β•‘   β€’ Guarantee uniqueness (no duplicates in the system)                  β•‘
β•‘   β€’ Authenticate identity in real-time (< 500ms)                        β•‘
β•‘   β€’ Handle 100+ million authentications per day                         β•‘
β•‘   β€’ Work across 640,000 villages with unreliable connectivity           β•‘
β•‘   β€’ Protect biometric data with highest security                        β•‘
β•‘   β€’ Provide e-KYC (Know Your Customer) service                          β•‘
β•‘   β€’ Support multiple authentication modes (fingerprint, iris, OTP)      β•‘
β•‘   β€’ 99.9%+ availability                                                 β•‘
β•‘                                                                         β•‘
β•‘   Constraint: This is a government project with vendor neutrality       β•‘
β•‘                                                                         β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

Interviewer: "The uniqueness guarantee is the hardest part. You need to prove that each of 1.4 billion people appears exactly once. That's never been done before at this scale."


Phase 1: Requirements Clarification

You: "Let me understand the specific challenges before designing."

Your Questions

You: "First, what biometric modalities are we capturing? And what's the expected quality given enrollment happens in remote villages?"

Interviewer: "10 fingerprints, 2 iris scans, and a photograph. Quality will vary β€” many manual laborers have worn fingerprints, elderly may have faded prints, some people lack fingers. The system must handle all cases."

You: "For uniqueness, what's the acceptable error rate? False accepts (enrolling duplicates) vs false rejects (wrongly rejecting unique people)?"

Interviewer: "This is critical. A false accept means someone gets two identities β€” they can claim benefits twice. A false reject means a legitimate person can't get enrolled. Both are bad, but false accepts are worse for a welfare system."

You: "What about authentication? Is it 1:1 matching (verify this person is who they claim) or 1:N (find this person in the database)?"

Interviewer: "Authentication is always 1:1 β€” they provide their Aadhaar number plus biometric, we verify it matches the stored data. Deduplication during enrollment is 1:N β€” we search the entire database to ensure they're not already enrolled."

You: "What infrastructure can we assume in remote areas?"

Interviewer: "Minimal. Many areas have no internet, unreliable power, extreme temperatures. Enrollment must work offline. Authentication needs connectivity but should degrade gracefully."

Requirements Summary

Functional Requirements:

1. ENROLLMENT
   β€’ Capture demographics (name, address, DOB, gender)
   β€’ Capture biometrics (10 fingerprints, 2 iris, 1 photo)
   β€’ Verify supporting documents (proof of identity/address)
   β€’ Perform deduplication (1:N match against entire database)
   β€’ Generate unique 12-digit Aadhaar number
   β€’ Print and mail physical Aadhaar letter

2. AUTHENTICATION
   β€’ Demographic authentication (name/address matching)
   β€’ Biometric authentication (fingerprint, iris, face)
   β€’ OTP authentication (via registered mobile)
   β€’ Multi-factor authentication (combinations)
   β€’ Return only Yes/No (no PII in response)

3. e-KYC (Know Your Customer)
   β€’ Return verified identity data after authentication
   β€’ Digitally signed response
   β€’ Replace paper-based KYC for banks, telecom, etc.

4. UPDATE
   β€’ Demographics update (address, phone, etc.)
   β€’ Biometrics update (for degraded prints)
   β€’ Document-based or operator-assisted updates

5. PRIVACY FEATURES
   β€’ Virtual ID (16-digit temporary alias for Aadhaar)
   β€’ Masked Aadhaar (partially hidden number)
   β€’ Authentication history (user can see who queried)

Non-Functional Requirements:

SCALE
β€’ 1.4 billion enrolled residents
β€’ 90+ million authentications/day
β€’ Peak: 1000+ authentications/second
β€’ 15+ petabytes of biometric data

LATENCY
β€’ Authentication: < 200ms (1:1 match)
β€’ Deduplication: minutes (1:N against billions)

ACCURACY
β€’ False Positive Identification Rate (FPIR): < 0.0035%
β€’ False Negative Identification Rate (FNIR): < 0.035%

AVAILABILITY
β€’ 99.9%+ uptime
β€’ Geo-distributed for disaster recovery

SECURITY
β€’ 2048-bit PKI encryption
β€’ Data encrypted at rest and in transit
β€’ HSM for key management
β€’ No biometric data leaves CIDR

Phase 2: Back of the Envelope Estimation

You: "Let me work through the numbers to understand the computational challenge."

The Deduplication Challenge

THE IMPOSSIBLE MATH

To guarantee uniqueness, every new enrollment must be
compared against EVERY existing record.

For 1 billion people with 10 fingerprints each:

Fingerprint comparisons for new enrollment:
  1,000,000,000 people Γ— 10 fingers = 10 billion templates

If we enroll 1 million new people per day:
  1,000,000 Γ— 10 billion = 10,000,000,000,000,000 comparisons/day
                         = 10 quadrillion matches/day!

At traditional matching speed (100,000 matches/sec):
  10^16 / 10^5 = 10^11 seconds
               = 3,170 years per day of enrollment!

This is mathematically impossible with brute force.

The Solution: Multi-Modal + Multi-ABIS

MAKING DEDUPLICATION TRACTABLE

1. DEMOGRAPHIC PRE-FILTER
   Before biometric matching, filter by:
   β€’ Name phonetics
   β€’ Date of birth
   β€’ Gender
   β€’ Geographic region
   
   This reduces search space by 99%+

2. MULTI-MODAL BIOMETRICS
   Using fingerprint + iris together:
   β€’ Fingerprint alone: 1 in 10^6 uniqueness
   β€’ Iris alone: 1 in 10^12 uniqueness
   β€’ Combined: 1 in 10^18 uniqueness
   
   The combination allows lower thresholds per modality

3. HIERARCHICAL MATCHING
   β€’ First: Fast, approximate match (GPU-accelerated)
   β€’ If potential match: Detailed matching
   β€’ If still ambiguous: Human adjudication
   
4. THREE ABIS VENDORS
   β€’ Each vendor runs independent deduplication
   β€’ Consensus required (2 of 3 agree)
   β€’ Different algorithms catch different edge cases

Storage Estimation

BIOMETRIC DATA STORAGE

Per person:
  10 fingerprints:    ~100KB (10 Γ— 10KB template)
  2 iris scans:       ~100KB (2 Γ— 50KB template)
  1 photograph:       ~50KB
  Demographics:       ~2KB
  Metadata:           ~5KB
  ─────────────────────────────
  Total per person:   ~257KB

For 1.4 billion people:
  1.4B Γ— 257KB = 360 TB (templates only)

Raw biometric images (archived):
  Per person: ~5MB (high-res captures)
  Total: 1.4B Γ— 5MB = 7 PB

With replication (3x):
  ~20 PB total storage

Authentication Traffic

AUTHENTICATION LOAD

Daily authentications:      90,000,000
Seconds per day:           86,400
Average TPS:               ~1,040 auth/second

Peak multiplier:           3x
Peak TPS:                  ~3,000 auth/second

Each authentication requires:
  1. Decrypt request (PKI)
  2. Lookup Aadhaar record
  3. Biometric 1:1 match
  4. Sign response
  
Time budget: 200ms total
  Network: 50ms
  Crypto: 30ms
  Lookup: 20ms
  Match: 100ms

Phase 3: High-Level Architecture

You: "Aadhaar's architecture follows four key principles: openness, linear scalability, strong security, and vendor neutrality."

The Aadhaar Ecosystem

AADHAAR ARCHITECTURE OVERVIEW

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                                         β”‚
β”‚                         ENROLLMENT ECOSYSTEM                            β”‚
β”‚                                                                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                β”‚
β”‚  β”‚  Resident   │────▢│ Enrollment  │────▢│  Registrar  β”‚                β”‚
β”‚  β”‚  (Village)  β”‚     β”‚   Agency    β”‚     β”‚(State Govt) β”‚                β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜                β”‚
β”‚                             β”‚                    β”‚                      β”‚
β”‚                    Encrypted packet      Verification                   β”‚
β”‚                             β”‚                    β”‚                      β”‚
β”‚                             β–Ό                    β–Ό                      β”‚
β”‚                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                    β”‚
β”‚                      β”‚     CIDR (Central DB)       β”‚                    β”‚
β”‚                      β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚                     β”‚
β”‚                      β”‚  β”‚ ABIS 1  β”‚  β”‚ ABIS 2  β”‚  β”‚ Deduplication       β”‚
β”‚                      β”‚  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β”‚                     β”‚
β”‚                      β”‚       β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜       β”‚                     β”‚
β”‚                      β”‚              β–Ό             β”‚                     β”‚
β”‚                      β”‚        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”‚                     β”‚
β”‚                      β”‚        β”‚ ABIS 3  β”‚         β”‚ 3-way consensus     β”‚
β”‚                      β”‚        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚                     β”‚
β”‚                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                                         β”‚
β”‚                       AUTHENTICATION ECOSYSTEM                          β”‚
β”‚                                                                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                β”‚
β”‚  β”‚  Resident   │────▢│ Service     │────▢│    AUA      β”‚                β”‚
β”‚  β”‚  (at bank,  β”‚     β”‚ Point       β”‚     β”‚(Auth User   β”‚                β”‚
β”‚  β”‚   telecom)  β”‚     β”‚ Device      β”‚     β”‚  Agency)    β”‚                β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜                β”‚
β”‚                                                  β”‚                      β”‚
β”‚                                           Encrypted PID                 β”‚
β”‚                                                  β”‚                      β”‚
β”‚                                                  β–Ό                      β”‚
β”‚                                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                β”‚
β”‚                                          β”‚    ASA      β”‚                β”‚
β”‚                                          β”‚(Auth Serviceβ”‚                β”‚
β”‚                                          β”‚  Agency)    β”‚                β”‚
β”‚                                          β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜                β”‚
β”‚                                                  β”‚                      β”‚
β”‚                                          Secure leased line             β”‚
β”‚                                                  β”‚                      β”‚
β”‚                                                  β–Ό                      β”‚
β”‚                                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
β”‚                                    β”‚        CIDR           β”‚            β”‚
β”‚                                    β”‚   (1:1 matching)      β”‚            β”‚
β”‚                                    β”‚                       β”‚            β”‚
β”‚                                    β”‚  Returns: Yes/No      β”‚            β”‚
β”‚                                    β”‚  (no PII returned)    β”‚            β”‚
β”‚                                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Components

COMPONENT BREAKDOWN

1. ENROLLMENT CLIENT
   β€’ Runs on laptops in villages
   β€’ Captures biometrics (certified devices only)
   β€’ Works OFFLINE (syncs when connected)
   β€’ Encrypts everything at capture time
   β€’ Operator + Supervisor biometric signatures

2. REGISTRAR
   β€’ State governments, banks, oil companies
   β€’ Responsible for enrollment agencies
   β€’ First-level quality checks
   β€’ Uploads packets to CIDR

3. CIDR (Central Identities Data Repository)
   β€’ The "crown jewels" β€” all biometric data
   β€’ Two data centers (Bengaluru + Manesar)
   β€’ Active-active configuration
   β€’ NEVER exposed to internet directly
   β€’ Only UIDAI has access

4. ABIS (Automated Biometric Identification System)
   β€’ Three independent vendors (TCS+Neurotechnology, etc.)
   β€’ Each runs complete deduplication
   β€’ Consensus-based decision
   β€’ Vendor-neutral API integration

5. AUA (Authentication User Agency)
   β€’ Banks, telecom, insurance companies
   β€’ Licensed to use authentication
   β€’ Must follow UIDAI security guidelines
   β€’ Audited regularly

6. ASA (Authentication Service Agency)
   β€’ Secure network intermediary
   β€’ Connects AUAs to CIDR
   β€’ Dedicated leased lines (not internet)
   β€’ 27 licensed ASAs in India

Data Flow: Enrollment

ENROLLMENT FLOW

Resident visits enrollment center with documents

Step 1: CAPTURE
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                                        β”‚
β”‚  Enrollment Station (Offline-capable laptop)                           β”‚
β”‚                                                                        β”‚
β”‚  1 Operator logs in with biometric                                     β”‚
β”‚  2 Captures resident's demographics                                    β”‚
β”‚  3 Scans proof documents (Ration card, Voter ID, etc.)                 β”‚
β”‚  4 Captures 10 fingerprints (slaps + thumbs)                           β”‚
β”‚  5 Captures 2 iris scans                                               β”‚
β”‚  6 Captures photograph                                                 β”‚
β”‚  7 Resident reviews and confirms                                       β”‚
β”‚  8 Operator signs packet biometrically                                 β”‚
β”‚  9 Supervisor approves (for exceptions)                                β”‚
β”‚                                                                        β”‚
β”‚  Output: Encrypted enrollment packet (3-5 MB)                          β”‚
β”‚          Contains HMAC for tamper detection                            β”‚
β”‚                                                                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                                    β–Ό
Step 2: UPLOAD
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                                        β”‚
β”‚  Registrar Backend                                                     β”‚
β”‚                                                                        β”‚
β”‚  1 Receives packets via SFTP or encrypted USB                          β”‚
β”‚  2 Validates packet structure and signatures                           β”‚
β”‚  3 Queues for CIDR upload                                              β”‚
β”‚  4 Uploads via secure channel to CIDR                                  β”‚
β”‚                                                                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                                    β–Ό
Step 3: DEDUPLICATION
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                                        β”‚
β”‚  CIDR Processing                                                       β”‚
β”‚                                                                        β”‚
β”‚  1 Decrypt packet (only CIDR can decrypt)                              β”‚
β”‚  2 Extract biometric templates                                         β”‚
β”‚  3 Demographic pre-filter (reduce search space)                        β”‚
β”‚  4 Send to ABIS 1, ABIS 2, ABIS 3 in parallel                          β”‚
β”‚  5 Each ABIS returns: UNIQUE / DUPLICATE / MANUAL_REVIEW               β”‚
β”‚  6 Consensus: 2 of 3 must agree                                        β”‚
β”‚  7 If DUPLICATE: Manual adjudication                                   β”‚
β”‚  8 If UNIQUE: Generate Aadhaar number                                  β”‚
β”‚  9 Store in database                                                   β”‚
β”‚  10 Queue letter for printing                                          β”‚
β”‚                                                                        β”‚
β”‚  Timeline: 3-90 days (depending on duplicates)                         β”‚
β”‚                                                                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                                    β–Ό
Step 4: DELIVERY
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                                        β”‚
β”‚  Aadhaar Letter printed and mailed to resident's address               β”‚
β”‚  Contains: 12-digit Aadhaar number + QR code                           β”‚
β”‚                                                                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Phase 4: Deep Dives

Deep Dive 1: Biometric Deduplication at Billion Scale

Week 1 concepts: Partitioning, sharding. Week 3 concepts: Async processing.

You: "Deduplication is the hardest problem in Aadhaar. You must compare each new person against 1.4 billion existing records."

THE DEDUPLICATION CHALLENGE

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                                         β”‚
β”‚  Without optimization:                                                  β”‚
β”‚                                                                         β”‚
β”‚  New enrollment: 1 person                                               β”‚
β”‚  Existing database: 1,400,000,000 people                                β”‚
β”‚  Fingers per person: 10                                                 β”‚
β”‚  Iris per person: 2                                                     β”‚
β”‚                                                                         β”‚
β”‚  Fingerprint comparisons:                                               β”‚
β”‚    10 (new) Γ— 10 (existing) Γ— 1.4B = 140 trillion comparisons           β”‚
β”‚                                                                         β”‚
β”‚  At 1 million matches/second: 140 million seconds = 4.4 years!          β”‚
β”‚                                                                         β”‚
β”‚  For 1 million enrollments/day: 4.4 million years of compute/day        β”‚
β”‚                                                                         β”‚
β”‚  This is impossible. We need smarter approaches.                        β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

How Aadhaar Makes It Tractable:

DEDUPLICATION OPTIMIZATION STRATEGIES

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                                         β”‚
β”‚  LAYER 1: DEMOGRAPHIC BLOCKING                                          β”‚
β”‚  ─────────────────────────────                                          β”‚
β”‚                                                                         β”‚
β”‚  Before biometric matching, partition by:                               β”‚
β”‚  β€’ Gender (2 partitions)                                                β”‚
β”‚  β€’ Age range (10-year buckets = 10 partitions)                          β”‚
β”‚  β€’ State (36 partitions)                                                β”‚
β”‚  β€’ Name phonetic hash (100 partitions)                                  β”‚
β”‚                                                                         β”‚
β”‚  Effective reduction: 2 Γ— 10 Γ— 36 Γ— 100 = 72,000x smaller search        β”‚
β”‚                                                                         β”‚
β”‚  1.4B / 72,000 = 19,444 candidate matches per enrollment                β”‚
β”‚  (vs 1.4 billion without blocking)                                      β”‚
β”‚                                                                         β”‚
β”‚  ─────────────────────────────────────────────────────────────────────  β”‚
β”‚                                                                         β”‚
β”‚  LAYER 2: MULTI-MODAL FUSION                                            β”‚
β”‚  ───────────────────────────                                            β”‚
β”‚                                                                         β”‚
β”‚  Fingerprint (10 fingers) + Iris (2 eyes) combined:                     β”‚
β”‚                                                                         β”‚
β”‚  Fingerprint score (0-100) Γ— weight +                                   β”‚
β”‚  Iris score (0-100) Γ— weight =                                          β”‚
β”‚  Final fusion score                                                     β”‚
β”‚                                                                         β”‚
β”‚  Using both modalities:                                                 β”‚
β”‚  β€’ Handles worn fingerprints (use iris)                                 β”‚
β”‚  β€’ Handles cataracts (use fingerprint)                                  β”‚
β”‚  β€’ Much higher accuracy than either alone                               β”‚
β”‚                                                                         β”‚
β”‚  ─────────────────────────────────────────────────────────────────────  β”‚
β”‚                                                                         β”‚
β”‚  LAYER 3: THREE-WAY ABIS CONSENSUS                                      β”‚
β”‚  ─────────────────────────────────                                      β”‚
β”‚                                                                         β”‚
β”‚  Three independent vendors run deduplication:                           β”‚
β”‚                                                                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                            β”‚
β”‚  β”‚ ABIS 1    β”‚  β”‚ ABIS 2    β”‚  β”‚ ABIS 3    β”‚                            β”‚
β”‚  β”‚(Vendor A) β”‚  β”‚(Vendor B) β”‚  β”‚(Vendor C) β”‚                            β”‚
β”‚  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜                            β”‚
β”‚       β”‚              β”‚              β”‚                                   β”‚
β”‚       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                   β”‚
β”‚                      β–Ό                                                  β”‚
β”‚               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                             β”‚
β”‚               β”‚ Consensus β”‚                                             β”‚
β”‚               β”‚  Engine   β”‚                                             β”‚
β”‚               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                             β”‚
β”‚                                                                         β”‚
β”‚  Decision rules:                                                        β”‚
β”‚  β€’ 3/3 UNIQUE β†’ Accept enrollment                                       β”‚
β”‚  β€’ 3/3 DUPLICATE β†’ Reject enrollment                                    β”‚
β”‚  β€’ 2/3 agree β†’ Follow majority                                          β”‚
β”‚  β€’ Mixed/uncertain β†’ Manual adjudication                                β”‚
β”‚                                                                         β”‚
β”‚  Why three vendors?                                                     β”‚
β”‚  β€’ Different algorithms catch different edge cases                      β”‚
β”‚  β€’ No single vendor lock-in                                             β”‚
β”‚  β€’ Higher accuracy through consensus                                    β”‚
β”‚  β€’ Continuous quality competition                                       β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
# deduplication/abis_orchestrator.py

"""
ABIS (Automated Biometric Identification System) Orchestration

Aadhaar uses three independent ABIS vendors for deduplication.
This provides redundancy, accuracy, and vendor neutrality.
"""

from dataclasses import dataclass
from typing import List, Optional, Tuple
from enum import Enum
import asyncio


class DeduplicationResult(Enum):
    UNIQUE = "unique"           # No duplicates found
    DUPLICATE = "duplicate"     # Duplicate found
    MANUAL_REVIEW = "review"    # Uncertain, needs human review


@dataclass
class ABISMatch:
    candidate_aadhaar: str
    fingerprint_score: float      # 0-100
    iris_score: float             # 0-100
    fusion_score: float           # Combined score
    confidence: str               # HIGH, MEDIUM, LOW


@dataclass
class ABISResponse:
    vendor_id: str
    result: DeduplicationResult
    matches: List[ABISMatch]
    processing_time_ms: int


class ABISOrchestrator:
    """
    Orchestrates deduplication across three ABIS vendors.
    
    Aadhaar's multi-ABIS approach:
    1. Send enrollment to all three vendors in parallel
    2. Each vendor searches against their copy of the database
    3. Consensus determines final result
    """
    
    def __init__(
        self,
        abis_clients: List,  # Three ABIS vendor clients
        demographic_filter,
        manual_review_queue
    ):
        self.abis_clients = abis_clients
        self.demo_filter = demographic_filter
        self.review_queue = manual_review_queue
        
        # Thresholds for decision
        self.duplicate_threshold = 80  # Fusion score > 80 = duplicate
        self.unique_threshold = 30     # Fusion score < 30 = unique
    
    async def deduplicate(
        self,
        enrollment_packet: dict
    ) -> Tuple[DeduplicationResult, Optional[str]]:
        """
        Main deduplication flow.
        
        Returns:
            (result, duplicate_aadhaar if found)
        """
        # Step 1: Demographic blocking to reduce search space
        candidate_pool = await self.demo_filter.get_candidates(
            gender=enrollment_packet['gender'],
            dob=enrollment_packet['dob'],
            state=enrollment_packet['state'],
            name_phonetic=enrollment_packet['name_phonetic']
        )
        
        # Log the reduction achieved
        reduction_ratio = 1_400_000_000 / len(candidate_pool)
        print(f"Demographic blocking: {len(candidate_pool):,} candidates")
        print(f"Search space reduced by {reduction_ratio:,.0f}x")
        
        # Step 2: Send to all three ABIS in parallel
        abis_tasks = [
            client.search(
                fingerprints=enrollment_packet['fingerprints'],
                irises=enrollment_packet['irises'],
                candidate_pool=candidate_pool
            )
            for client in self.abis_clients
        ]
        
        responses: List[ABISResponse] = await asyncio.gather(*abis_tasks)
        
        # Step 3: Consensus decision
        return self._consensus_decision(responses)
    
    def _consensus_decision(
        self,
        responses: List[ABISResponse]
    ) -> Tuple[DeduplicationResult, Optional[str]]:
        """
        Three-way consensus logic.
        
        Aadhaar requires 2/3 agreement for automated decision.
        Mixed results go to manual adjudication.
        """
        unique_count = sum(
            1 for r in responses if r.result == DeduplicationResult.UNIQUE
        )
        duplicate_count = sum(
            1 for r in responses if r.result == DeduplicationResult.DUPLICATE
        )
        
        # Case 1: All three agree UNIQUE
        if unique_count == 3:
            return (DeduplicationResult.UNIQUE, None)
        
        # Case 2: All three agree DUPLICATE
        if duplicate_count == 3:
            # Find the matching Aadhaar (should be same across all)
            duplicate_aadhaar = self._find_common_match(responses)
            return (DeduplicationResult.DUPLICATE, duplicate_aadhaar)
        
        # Case 3: 2/3 agree UNIQUE
        if unique_count >= 2:
            return (DeduplicationResult.UNIQUE, None)
        
        # Case 4: 2/3 agree DUPLICATE
        if duplicate_count >= 2:
            duplicate_aadhaar = self._find_common_match(responses)
            return (DeduplicationResult.DUPLICATE, duplicate_aadhaar)
        
        # Case 5: No consensus β€” manual review
        # Queue for human adjudicator
        return (DeduplicationResult.MANUAL_REVIEW, None)
    
    def _find_common_match(
        self,
        responses: List[ABISResponse]
    ) -> Optional[str]:
        """
        Find the Aadhaar number that multiple ABIS agree is a duplicate.
        """
        # Collect all top matches
        match_counts = {}
        for response in responses:
            if response.matches:
                top_match = response.matches[0]
                aadhaar = top_match.candidate_aadhaar
                match_counts[aadhaar] = match_counts.get(aadhaar, 0) + 1
        
        # Return the one with most agreement
        if match_counts:
            return max(match_counts, key=match_counts.get)
        return None


class DemographicBlockingFilter:
    """
    Pre-filters the biometric search space using demographics.
    
    This is crucial for making billion-scale deduplication tractable.
    """
    
    def __init__(self, database):
        self.db = database
    
    async def get_candidates(
        self,
        gender: str,
        dob: str,
        state: str,
        name_phonetic: str
    ) -> List[str]:
        """
        Get candidate Aadhaar numbers that match demographic criteria.
        
        Reduces 1.4 billion to ~20,000-50,000 candidates.
        """
        # Parse DOB to get age range
        birth_year = int(dob[:4])
        age_range_start = birth_year - 5
        age_range_end = birth_year + 5
        
        # Query with demographic filters
        candidates = await self.db.query("""
            SELECT aadhaar_number 
            FROM residents
            WHERE gender = ?
              AND birth_year BETWEEN ? AND ?
              AND state_code = ?
              AND name_phonetic_hash = ?
        """, [gender, age_range_start, age_range_end, 
              state, self._phonetic_hash(name_phonetic)])
        
        return [row['aadhaar_number'] for row in candidates]
    
    def _phonetic_hash(self, name: str) -> str:
        """
        Generate phonetic hash for name matching.
        
        Handles spelling variations:
        "Rahul" and "Rahool" β†’ same hash
        "Priya" and "Priyaa" β†’ same hash
        """
        # Simplified phonetic algorithm (actual uses Soundex variant)
        # Remove vowels, normalize consonants
        consonants = ''.join(c for c in name.upper() if c not in 'AEIOU')
        return consonants[:4].ljust(4, '0')

Deep Dive 2: Authentication in 200 Milliseconds

Week 2 concepts: Timeouts, latency budgets. Week 4 concepts: Caching.

You: "Authentication must verify identity in under 200ms. This is 1:1 matching β€” much simpler than deduplication, but still demanding at scale."

AUTHENTICATION LATENCY BUDGET

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                                         β”‚
β”‚  Total Budget: 200ms                                                    β”‚
β”‚                                                                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚                        BREAKDOWN                                   β”‚ β”‚
β”‚  β”‚                                                                    β”‚ β”‚
β”‚  β”‚  Network (AUA β†’ ASA β†’ CIDR β†’ ASA β†’ AUA):     50ms                  β”‚ β”‚
β”‚  β”‚  β”œβ”€β”€ AUA to ASA:                              15ms                 β”‚ β”‚
β”‚  β”‚  β”œβ”€β”€ ASA to CIDR:                             10ms                 β”‚ β”‚
β”‚  β”‚  β”œβ”€β”€ CIDR to ASA:                             10ms                 β”‚ β”‚
β”‚  β”‚  └── ASA to AUA:                              15ms                 β”‚ β”‚
β”‚  β”‚                                                                    β”‚ β”‚
β”‚  β”‚  Cryptographic operations:                    30ms                 β”‚ β”‚
β”‚  β”‚  β”œβ”€β”€ Decrypt PID block (at CIDR):            10ms                  β”‚ β”‚
β”‚  β”‚  β”œβ”€β”€ Signature verification:                   5ms                 β”‚ β”‚
β”‚  β”‚  └── Sign response:                           15ms                 β”‚ β”‚
β”‚  β”‚                                                                    β”‚ β”‚
β”‚  β”‚  Database lookup:                             20ms                 β”‚ β”‚
β”‚  β”‚  β”œβ”€β”€ Find Aadhaar record:                    10ms                  β”‚ β”‚
β”‚  β”‚  └── Load biometric template:                 10ms                 β”‚ β”‚
β”‚  β”‚                                                                    β”‚ β”‚
β”‚  β”‚  Biometric 1:1 matching:                      80ms                 β”‚ β”‚
β”‚  β”‚  β”œβ”€β”€ Fingerprint match:                       40ms                 β”‚ β”‚
β”‚  β”‚  └── Iris match (if used):                    40ms                 β”‚ β”‚
β”‚  β”‚                                                                    β”‚ β”‚
β”‚  β”‚  Processing overhead:                         20ms                 β”‚ β”‚
β”‚  β”‚                                                                    β”‚ β”‚
β”‚  β”‚  Total:                                      200ms                 β”‚ β”‚
β”‚  β”‚                                                                    β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
# authentication/auth_service.py

"""
Aadhaar Authentication Service

Processes 90+ million authentications per day with <200ms latency.
Returns only Yes/No β€” never exposes biometric or demographic data.
"""

from dataclasses import dataclass
from typing import Optional
from enum import Enum
from datetime import datetime
import asyncio


class AuthMode(Enum):
    DEMOGRAPHIC = "demo"           # Name/address matching
    OTP = "otp"                    # One-time password
    FINGERPRINT = "fmr"            # Fingerprint biometric
    IRIS = "iir"                   # Iris biometric
    FACE = "fid"                   # Face biometric
    MULTI_FACTOR = "mf"            # Combination


@dataclass
class AuthRequest:
    aadhaar_number: str            # Or VID (Virtual ID)
    auth_mode: AuthMode
    pid_block: bytes               # Encrypted biometric/OTP
    aua_code: str                  # Authentication User Agency
    timestamp: datetime
    transaction_id: str
    consent: bool                  # Must be True


@dataclass
class AuthResponse:
    transaction_id: str
    status: str                    # "y" (yes) or "n" (no)
    error_code: Optional[str]      # If status is "n"
    auth_token: Optional[str]      # Unique token for this auth
    timestamp: datetime
    # NOTE: No PII is ever returned!


class AadhaarAuthService:
    """
    Core authentication service running in CIDR.
    
    Security principles:
    1. All requests encrypted with 2048-bit PKI
    2. Biometric data decrypted only inside CIDR
    3. Response is only Yes/No (no data leakage)
    4. All transactions logged for audit
    """
    
    def __init__(
        self,
        biometric_db,
        biometric_matcher,
        otp_service,
        hsm_client,          # Hardware Security Module
        audit_logger
    ):
        self.db = biometric_db
        self.matcher = biometric_matcher
        self.otp = otp_service
        self.hsm = hsm_client
        self.audit = audit_logger
        
        # Performance tuning
        self.template_cache = {}  # LRU cache for hot Aadhaars
        self.cache_ttl = 300      # 5 minutes
    
    async def authenticate(
        self,
        request: AuthRequest
    ) -> AuthResponse:
        """
        Main authentication flow.
        
        Must complete in <200ms.
        """
        start_time = datetime.utcnow()
        
        try:
            # Step 1: Validate request format
            self._validate_request(request)
            
            # Step 2: Decrypt PID block using HSM
            # Only CIDR's HSM can decrypt
            pid_data = await self.hsm.decrypt_pid(request.pid_block)
            
            # Step 3: Resolve Aadhaar number (handle VID)
            aadhaar = await self._resolve_aadhaar(request.aadhaar_number)
            
            # Step 4: Load resident's template (with caching)
            resident_template = await self._load_template(aadhaar)
            
            # Step 5: Perform matching based on auth mode
            match_result = await self._perform_match(
                mode=request.auth_mode,
                pid_data=pid_data,
                stored_template=resident_template
            )
            
            # Step 6: Generate response
            response = AuthResponse(
                transaction_id=request.transaction_id,
                status="y" if match_result.success else "n",
                error_code=match_result.error_code,
                auth_token=self._generate_token(aadhaar, request.aua_code),
                timestamp=datetime.utcnow()
            )
            
            # Step 7: Audit logging (async, don't wait)
            asyncio.create_task(self.audit.log(
                aadhaar=aadhaar,
                aua_code=request.aua_code,
                auth_mode=request.auth_mode,
                result=response.status,
                latency_ms=(datetime.utcnow() - start_time).total_seconds() * 1000
            ))
            
            return response
            
        except ValidationError as e:
            return self._error_response(request, str(e))
        except Exception as e:
            # Never expose internal errors
            return self._error_response(request, "INTERNAL_ERROR")
    
    async def _load_template(self, aadhaar: str) -> dict:
        """
        Load biometric template with caching.
        
        Hot Aadhaars (frequently authenticated) are cached.
        """
        # Check cache first
        if aadhaar in self.template_cache:
            cached = self.template_cache[aadhaar]
            if cached['expires'] > datetime.utcnow():
                return cached['template']
        
        # Cache miss β€” load from database
        template = await self.db.get_template(aadhaar)
        
        # Cache for hot Aadhaars
        self.template_cache[aadhaar] = {
            'template': template,
            'expires': datetime.utcnow() + timedelta(seconds=self.cache_ttl)
        }
        
        return template
    
    async def _perform_match(
        self,
        mode: AuthMode,
        pid_data: dict,
        stored_template: dict
    ) -> MatchResult:
        """
        Perform biometric/demographic/OTP matching.
        """
        if mode == AuthMode.FINGERPRINT:
            # 1:1 fingerprint matching
            return await self.matcher.match_fingerprint(
                captured=pid_data['fingerprint'],
                stored=stored_template['fingerprints'],
                finger_position=pid_data.get('position', 'ANY')
            )
        
        elif mode == AuthMode.IRIS:
            # 1:1 iris matching
            return await self.matcher.match_iris(
                captured=pid_data['iris'],
                stored=stored_template['irises']
            )
        
        elif mode == AuthMode.FACE:
            # AI-powered face matching
            return await self.matcher.match_face(
                captured=pid_data['face_image'],
                stored=stored_template['photo'],
                liveness_check=pid_data.get('liveness_data')
            )
        
        elif mode == AuthMode.OTP:
            # Verify OTP sent to registered mobile
            return await self.otp.verify(
                aadhaar=stored_template['aadhaar'],
                submitted_otp=pid_data['otp']
            )
        
        elif mode == AuthMode.DEMOGRAPHIC:
            # Fuzzy matching on name/address
            return self._demographic_match(
                submitted=pid_data['demographics'],
                stored=stored_template['demographics']
            )
    
    def _demographic_match(self, submitted: dict, stored: dict) -> MatchResult:
        """
        Fuzzy matching for name and address.
        
        Handles variations like:
        - "Raj Kumar" vs "Rajkumar"  
        - "Bangalore" vs "Bengaluru"
        - Hindi transliterations
        """
        name_score = self._fuzzy_name_match(
            submitted.get('name', ''),
            stored['name']
        )
        
        address_score = self._fuzzy_address_match(
            submitted.get('address', ''),
            stored['address']
        )
        
        # Thresholds from UIDAI guidelines
        if name_score >= submitted.get('match_threshold', 100):
            return MatchResult(success=True)
        
        return MatchResult(success=False, error_code="DEMO_MISMATCH")

Deep Dive 3: Security Architecture β€” Protecting Billion Biometrics

Week 9 concepts: Security, encryption, zero-trust.

You: "Aadhaar stores the biometrics of 1.4 billion people. This is the ultimate honeypot. Security is existential."

AADHAAR SECURITY LAYERS

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                                         β”‚
β”‚  LAYER 1: ENCRYPTION EVERYWHERE                                         β”‚
β”‚  ──────────────────────────────                                         β”‚
β”‚                                                                         β”‚
β”‚  At Capture:                                                            β”‚
β”‚  β€’ Biometrics encrypted on enrollment device                            β”‚
β”‚  β€’ 2048-bit PKI encryption                                              β”‚
β”‚  β€’ Only CIDR has private key to decrypt                                 β”‚
β”‚                                                                         β”‚
β”‚  In Transit:                                                            β”‚
β”‚  β€’ All communication over encrypted channels                            β”‚
β”‚  β€’ TLS 1.2+ mandatory                                                   β”‚
β”‚  β€’ Dedicated leased lines (not public internet)                         β”‚
β”‚                                                                         β”‚
β”‚  At Rest:                                                               β”‚
β”‚  β€’ AES-256 encryption for stored data                                   β”‚
β”‚  β€’ Even within CIDR, data is encrypted                                  β”‚
β”‚  β€’ Keys stored in HSM (Hardware Security Module)                        β”‚
β”‚                                                                         β”‚
β”‚  ─────────────────────────────────────────────────────────────────────  β”‚
β”‚                                                                         β”‚
β”‚  LAYER 2: TAMPER DETECTION                                              β”‚
β”‚  ─────────────────────────                                              β”‚
β”‚                                                                         β”‚
β”‚  Every enrollment packet includes:                                      β”‚
β”‚  β€’ HMAC for integrity verification                                      β”‚
β”‚  β€’ Operator's biometric signature                                       β”‚
β”‚  β€’ Supervisor's biometric signature (for exceptions)                    β”‚
β”‚  β€’ GPS coordinates of enrollment station                                β”‚
β”‚  β€’ Timestamp                                                            β”‚
β”‚  β€’ Device ID                                                            β”‚
β”‚                                                                         β”‚
β”‚  Any tampering is detectable and traceable                              β”‚
β”‚                                                                         β”‚
β”‚  ─────────────────────────────────────────────────────────────────────  β”‚
β”‚                                                                         β”‚
β”‚  LAYER 3: ACCESS CONTROL                                                β”‚
β”‚  ───────────────────────                                                β”‚
β”‚                                                                         β”‚
β”‚  CIDR access:                                                           β”‚
β”‚  β€’ Only UIDAI employees (very few)                                      β”‚
β”‚  β€’ Multi-factor authentication required                                 β”‚
β”‚  β€’ All access logged and audited                                        β”‚
β”‚  β€’ No external access to raw biometrics                                 β”‚
β”‚                                                                         β”‚
β”‚  Partner access (AUA/ASA):                                              β”‚
β”‚  β€’ Can only call authentication API                                     β”‚
β”‚  β€’ Cannot query or download data                                        β”‚
β”‚  β€’ Rate limited per entity                                              β”‚
β”‚  β€’ Licensed and audited                                                 β”‚
β”‚                                                                         β”‚
β”‚  ─────────────────────────────────────────────────────────────────────  β”‚
β”‚                                                                         β”‚
β”‚  LAYER 4: RESPONSE MINIMIZATION                                         β”‚
β”‚  ──────────────────────────────                                         β”‚
β”‚                                                                         β”‚
β”‚  Authentication returns ONLY:                                           β”‚
β”‚  β€’ Yes (match) or No (no match)                                         β”‚
β”‚  β€’ Transaction ID                                                       β”‚
β”‚  β€’ Timestamp                                                            β”‚
β”‚                                                                         β”‚
β”‚  NEVER returns:                                                         β”‚
β”‚  β€’ Biometric data                                                       β”‚
β”‚  β€’ Demographic data (unless e-KYC with consent)                         β”‚
β”‚  β€’ Match scores                                                         β”‚
β”‚  β€’ Reason for failure (in detail)                                       β”‚
β”‚                                                                         β”‚
β”‚  This prevents information leakage                                      β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Privacy Features Added Over Time:

PRIVACY ENHANCEMENTS

1. VIRTUAL ID (VID) β€” Introduced 2018
   ─────────────────────────────────
   Problem: Every authentication exposes Aadhaar number
   Solution: 16-digit temporary ID that maps to Aadhaar
   
   Resident generates VID β†’ Uses VID instead of Aadhaar
   Each VID is:
   β€’ Revocable (generate new anytime)
   β€’ Mappable only by CIDR
   β€’ Usable for authentication
   
   AUA never sees actual Aadhaar number

2. TOKENIZATION β€” For recurring services
   ─────────────────────────────────────
   Problem: Same Aadhaar used at multiple services
            Services could collude to track user
   
   Solution: Each AUA gets a unique token for each Aadhaar
   
   Aadhaar 1234-5678-9012 β†’
     Bank A: Token ABC123
     Telecom B: Token XYZ789
     Insurance C: Token PQR456
   
   Services cannot correlate tokens

3. MASKED AADHAAR
   ───────────────
   Display: XXXX-XXXX-9012
   Only last 4 digits visible
   Used for documents that need to show Aadhaar reference

4. AUTHENTICATION HISTORY
   ───────────────────────
   Resident can see:
   β€’ Who authenticated their Aadhaar
   β€’ When
   β€’ What type of authentication
   
   Provides transparency and detects misuse
# security/encryption_service.py

"""
Aadhaar Encryption Service

All biometric data is encrypted at the point of capture.
Only CIDR can decrypt using HSM-protected private keys.
"""

from dataclasses import dataclass
from typing import Tuple
from cryptography.hazmat.primitives import hashes, serialization
from cryptography.hazmat.primitives.asymmetric import rsa, padding
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
import hmac
import hashlib


class PIDBlockEncryption:
    """
    PID (Personal Identity Data) Block encryption.
    
    The PID block contains biometric/OTP data captured at
    the authentication point. It's encrypted before transmission.
    
    Encryption scheme:
    1. Generate random session key (AES-256)
    2. Encrypt biometric data with session key
    3. Encrypt session key with UIDAI's public key (RSA-2048)
    4. Add HMAC for integrity
    """
    
    def __init__(self, uidai_public_key: bytes):
        self.uidai_public_key = serialization.load_pem_public_key(
            uidai_public_key
        )
    
    def encrypt_pid(
        self,
        biometric_data: bytes,
        timestamp: str,
        device_id: str
    ) -> Tuple[bytes, bytes]:
        """
        Encrypt PID block for transmission to CIDR.
        
        Returns:
            (encrypted_data, encrypted_session_key)
        """
        # Generate random session key
        session_key = os.urandom(32)  # 256 bits
        iv = os.urandom(12)  # 96 bits for GCM
        
        # Create PID plaintext with metadata
        pid_plaintext = self._create_pid_xml(
            biometric_data=biometric_data,
            timestamp=timestamp,
            device_id=device_id
        )
        
        # Encrypt with AES-256-GCM
        cipher = Cipher(algorithms.AES(session_key), modes.GCM(iv))
        encryptor = cipher.encryptor()
        ciphertext = encryptor.update(pid_plaintext) + encryptor.finalize()
        
        # Encrypt session key with UIDAI's RSA public key
        encrypted_session_key = self.uidai_public_key.encrypt(
            session_key + iv,  # Include IV
            padding.OAEP(
                mgf=padding.MGF1(algorithm=hashes.SHA256()),
                algorithm=hashes.SHA256(),
                label=None
            )
        )
        
        # Add HMAC for integrity
        hmac_value = hmac.new(
            session_key,
            ciphertext,
            hashlib.sha256
        ).digest()
        
        return (ciphertext + encryptor.tag + hmac_value, 
                encrypted_session_key)


class HSMKeyManager:
    """
    Hardware Security Module interface.
    
    All cryptographic keys are stored and used within HSM.
    Keys never leave the HSM in plaintext.
    
    UIDAI uses FIPS 140-2 Level 3 certified HSMs.
    """
    
    def __init__(self, hsm_connection):
        self.hsm = hsm_connection
    
    async def decrypt_pid(self, encrypted_pid: bytes, 
                          encrypted_key: bytes) -> dict:
        """
        Decrypt PID block inside HSM.
        
        The private key never leaves the HSM.
        Decryption happens entirely within the HSM.
        """
        # Send to HSM for decryption
        decrypted = await self.hsm.decrypt(
            data=encrypted_pid,
            encrypted_key=encrypted_key,
            key_id="UIDAI_AUTH_PRIVATE_KEY",
            algorithm="RSA-OAEP-256"
        )
        
        return self._parse_pid_xml(decrypted)
    
    async def sign_response(self, response_data: bytes) -> bytes:
        """
        Sign authentication response using HSM.
        
        All responses are digitally signed so AUAs can
        verify they came from authentic CIDR.
        """
        signature = await self.hsm.sign(
            data=response_data,
            key_id="UIDAI_SIGNING_KEY",
            algorithm="RSA-SHA256"
        )
        
        return signature


class AadhaarDataVault:
    """
    Secure storage for Aadhaar numbers at AUA/KUA.
    
    UIDAI mandates that any entity storing Aadhaar numbers
    must use an "Aadhaar Data Vault" with:
    - AES-256 encryption
    - Keys in HSM
    - Access logging
    - No plaintext storage
    """
    
    def __init__(self, hsm_client, database):
        self.hsm = hsm_client
        self.db = database
    
    async def store_aadhaar(
        self,
        reference_id: str,  # Your internal customer ID
        aadhaar_number: str
    ):
        """
        Store Aadhaar number securely.
        """
        # Generate reference key in HSM
        encrypted_aadhaar = await self.hsm.encrypt(
            data=aadhaar_number.encode(),
            key_id="AADHAAR_VAULT_KEY"
        )
        
        # Store only encrypted value
        await self.db.insert({
            'reference_id': reference_id,
            'encrypted_aadhaar': encrypted_aadhaar,
            'created_at': datetime.utcnow()
        })
    
    async def get_aadhaar(self, reference_id: str) -> str:
        """
        Retrieve and decrypt Aadhaar number.
        """
        record = await self.db.get(reference_id)
        
        # Decrypt in HSM
        aadhaar = await self.hsm.decrypt(
            data=record['encrypted_aadhaar'],
            key_id="AADHAAR_VAULT_KEY"
        )
        
        # Log access
        await self._log_access(reference_id)
        
        return aadhaar.decode()

Deep Dive 4: Face Authentication β€” AI at Scale

Week 10 concepts: Operational excellence, innovation.

You: "Face authentication was introduced in 2021 and has exploded to 18 million transactions per month. It's AI-powered and developed in-house by UIDAI."

FACE AUTHENTICATION EVOLUTION

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                                         β”‚
β”‚  WHY FACE AUTHENTICATION?                                               β”‚
β”‚  ─────────────────────────                                              β”‚
β”‚                                                                         β”‚
β”‚  Problems with fingerprint:                                             β”‚
β”‚  β€’ Manual laborers: Worn fingerprints                                   β”‚
β”‚  β€’ Elderly: Faded fingerprints                                          β”‚
β”‚  β€’ Amputees: Missing fingers                                            β”‚
β”‚  β€’ Skin conditions: Temporary issues                                    β”‚
β”‚  β€’ COVID-19: Hygiene concerns with touch-based                          β”‚
β”‚                                                                         β”‚
β”‚  Face authentication advantages:                                        β”‚
β”‚  β€’ Contactless (post-COVID preference)                                  β”‚
β”‚  β€’ Works on any smartphone                                              β”‚
β”‚  β€’ More inclusive (no physical requirements)                            β”‚
β”‚  β€’ Convenient for remote authentication                                 β”‚
β”‚                                                                         β”‚
β”‚  ─────────────────────────────────────────────────────────────────────  β”‚
β”‚                                                                         β”‚
β”‚  GROWTH TRAJECTORY                                                      β”‚
β”‚  ─────────────────                                                      β”‚
β”‚                                                                         β”‚
β”‚  Oct 2021:  Launch                                                      β”‚
β”‚  Dec 2023:  100 crore (1 billion) cumulative                            β”‚
β”‚  Jan 2025:  12 crore (120 million) per month                            β”‚
β”‚  Sep 2025:  1.5 crore (15 million) per DAY (record)                     β”‚
β”‚                                                                         β”‚
β”‚  Adoption: 150+ government and private entities                         β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
# authentication/face_auth.py

"""
UIDAI Face Authentication Service

AI/ML-powered face matching developed in-house by UIDAI.
Uses liveness detection to prevent spoofing.
"""

from dataclasses import dataclass
from typing import Tuple
import numpy as np


@dataclass
class FaceAuthRequest:
    aadhaar_or_vid: str
    face_image: bytes              # Captured selfie
    liveness_data: dict            # Blink detection, head movement
    device_info: dict              # Camera specs, OS version


@dataclass
class FaceAuthResult:
    match: bool
    confidence: float              # 0-100
    liveness_verified: bool
    error_code: str = None


class FaceAuthenticationService:
    """
    Face authentication with liveness detection.
    
    Architecture:
    1. Liveness detection (prevent photo/video attacks)
    2. Face detection and alignment
    3. Feature extraction (deep learning)
    4. 1:1 matching against stored photo
    """
    
    def __init__(
        self,
        face_detector,
        liveness_model,
        face_encoder,
        match_threshold: float = 0.85
    ):
        self.detector = face_detector
        self.liveness = liveness_model
        self.encoder = face_encoder
        self.threshold = match_threshold
    
    async def authenticate(
        self,
        request: FaceAuthRequest,
        stored_photo: bytes
    ) -> FaceAuthResult:
        """
        Perform face authentication.
        
        Steps:
        1. Verify liveness (not a photo/video)
        2. Detect and align faces
        3. Extract embeddings
        4. Compare embeddings
        """
        # Step 1: Liveness verification
        liveness_result = await self._verify_liveness(
            image=request.face_image,
            liveness_data=request.liveness_data
        )
        
        if not liveness_result.is_live:
            return FaceAuthResult(
                match=False,
                confidence=0,
                liveness_verified=False,
                error_code="LIVENESS_FAILED"
            )
        
        # Step 2: Face detection and quality check
        captured_face = await self._detect_and_align(request.face_image)
        stored_face = await self._detect_and_align(stored_photo)
        
        if captured_face is None:
            return FaceAuthResult(
                match=False,
                confidence=0,
                liveness_verified=True,
                error_code="FACE_NOT_DETECTED"
            )
        
        # Step 3: Extract face embeddings
        captured_embedding = await self.encoder.encode(captured_face)
        stored_embedding = await self.encoder.encode(stored_face)
        
        # Step 4: Compare embeddings
        similarity = self._cosine_similarity(
            captured_embedding,
            stored_embedding
        )
        
        # Convert to percentage
        confidence = (similarity + 1) / 2 * 100
        
        return FaceAuthResult(
            match=confidence >= self.threshold * 100,
            confidence=confidence,
            liveness_verified=True
        )
    
    async def _verify_liveness(
        self,
        image: bytes,
        liveness_data: dict
    ) -> LivenessResult:
        """
        Verify the face is from a live person.
        
        Liveness checks:
        1. Texture analysis (detect printed photos)
        2. Depth estimation (detect flat screens)
        3. Eye blink detection (detect videos)
        4. Random challenge-response (head movement)
        """
        # Challenge: User was asked to blink/move head
        challenge_type = liveness_data.get('challenge_type')
        frames = liveness_data.get('frames', [])
        
        if challenge_type == 'BLINK':
            # Detect eye blink across frames
            return await self.liveness.detect_blink(frames)
        
        elif challenge_type == 'HEAD_TURN':
            # Detect head movement
            return await self.liveness.detect_head_movement(frames)
        
        elif challenge_type == 'PASSIVE':
            # Passive liveness (texture + depth analysis)
            return await self.liveness.passive_check(image)
        
        return LivenessResult(is_live=False)
    
    def _cosine_similarity(
        self,
        embedding1: np.ndarray,
        embedding2: np.ndarray
    ) -> float:
        """
        Cosine similarity between face embeddings.
        
        Range: -1 to 1 (higher = more similar)
        """
        dot_product = np.dot(embedding1, embedding2)
        norm1 = np.linalg.norm(embedding1)
        norm2 = np.linalg.norm(embedding2)
        
        return dot_product / (norm1 * norm2)


class LivenessDetector:
    """
    Deep learning model for liveness detection.
    
    Trained to distinguish:
    - Real faces
    - Printed photos
    - Screen displays (replay attacks)
    - 3D masks
    """
    
    def __init__(self, model_path: str):
        self.model = self._load_model(model_path)
    
    async def passive_check(self, image: bytes) -> LivenessResult:
        """
        Passive liveness without user action.
        
        Analyzes:
        - MoirΓ© patterns (screen artifacts)
        - Color distribution
        - Texture frequency
        - Specular reflection
        """
        features = self._extract_liveness_features(image)
        
        # Model predicts: real (1) vs spoof (0)
        prediction = self.model.predict(features)
        
        return LivenessResult(
            is_live=prediction > 0.7,
            confidence=prediction,
            method="PASSIVE"
        )
    
    async def detect_blink(self, frames: list) -> LivenessResult:
        """
        Detect eye blink across video frames.
        
        A real person blinks; a photo doesn't.
        """
        eye_aspect_ratios = []
        
        for frame in frames:
            # Detect eyes
            eyes = self._detect_eyes(frame)
            if eyes:
                ear = self._eye_aspect_ratio(eyes)
                eye_aspect_ratios.append(ear)
        
        # Blink = dip in eye aspect ratio
        if len(eye_aspect_ratios) > 5:
            min_ear = min(eye_aspect_ratios)
            max_ear = max(eye_aspect_ratios)
            
            # Significant dip indicates blink
            if (max_ear - min_ear) > 0.15:
                return LivenessResult(is_live=True, method="BLINK")
        
        return LivenessResult(is_live=False, method="BLINK")

Phase 5: Scaling and Edge Cases

Interviewer: "Aadhaar went from 0 to 1 billion enrollments in about 6 years. How did they scale the enrollment infrastructure?"

Enrollment at Village Scale

ENROLLMENT INFRASTRUCTURE

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                                         β”‚
β”‚  THE CHALLENGE                                                          β”‚
β”‚  ─────────────                                                          β”‚
β”‚                                                                         β”‚
β”‚  β€’ 640,000+ villages in India                                           β”‚
β”‚  β€’ Many with no electricity, no internet                                β”‚
β”‚  β€’ Extreme temperatures (deserts, mountains)                            β”‚
β”‚  β€’ Low literacy levels                                                  β”‚
β”‚  β€’ Need to enroll 1+ million people per day                             β”‚
β”‚                                                                         β”‚
β”‚  ─────────────────────────────────────────────────────────────────────  β”‚
β”‚                                                                         β”‚
β”‚  THE SOLUTION: MOBILE ENROLLMENT CAMPS                                  β”‚
β”‚  ─────────────────────────────────────                                  β”‚
β”‚                                                                         β”‚
β”‚  Equipment transported (sometimes by donkeys!):                         β”‚
β”‚  β€’ Ruggedized laptops                                                   β”‚
β”‚  β€’ USB fingerprint scanners                                             β”‚
β”‚  β€’ Iris cameras                                                         β”‚
β”‚  β€’ Web cameras (for photos)                                             β”‚
β”‚  β€’ Portable generators                                                  β”‚
β”‚  β€’ Tables, chairs, canopies                                             β”‚
β”‚                                                                         β”‚
β”‚  Personnel:                                                             β”‚
β”‚  β€’ 150,000+ certified operators                                         β”‚
β”‚  β€’ Supervisors for exception handling                                   β”‚
β”‚  β€’ ~50 enrollments per station per day                                  β”‚
β”‚                                                                         β”‚
β”‚  Peak capacity:                                                         β”‚
β”‚  β€’ 60,000-80,000 enrollment stations                                    β”‚
β”‚  β€’ 1+ million new enrollments per day                                   β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Edge Cases

EDGE CASE 1: Biometric Exceptions

Problem: Not everyone can provide all biometrics
- Missing fingers (accidents, leprosy)
- Cataract patients (can't capture iris)
- Worn fingerprints (manual laborers)
- Children (biometrics change as they grow)

Solution:
β”œβ”€β”€ Multi-modal: If fingerprint fails, use iris
β”œβ”€β”€ Best-finger approach: Use whichever fingers work
β”œβ”€β”€ Exception handling: Supervisor approval for special cases
β”œβ”€β”€ Child enrollment: Mandatory biometric update at 5 and 15 years
└── Best Available Data (BAD) mode for extreme cases

EDGE CASE 2: Offline Enrollment

Problem: No internet in remote villages
Solution: 
β”œβ”€β”€ Enrollment client works entirely offline
β”œβ”€β”€ Packets stored locally (encrypted)
β”œβ”€β”€ Sync when connectivity available
β”œβ”€β”€ USB-based upload via registrar
└── 30-day buffer for packet upload

EDGE CASE 3: Duplicate Enrollment Attempts

Problem: People trying to get multiple Aadhaars for fraud
Solution:
β”œβ”€β”€ Three-way ABIS consensus
β”œβ”€β”€ Manual adjudication for borderline cases
β”œβ”€β”€ Reject if duplicate found
β”œβ”€β”€ Audit trail for investigation
└── Criminal penalties for fraud

EDGE CASE 4: Authentication Failures

Problem: Genuine person fails authentication
Causes:
β”œβ”€β”€ Worn fingerprints (temporary or permanent)
β”œβ”€β”€ Cuts/injuries on fingers
β”œβ”€β”€ Wet/dirty fingers
β”œβ”€β”€ Sensor quality issues
β”œβ”€β”€ Aging (biometrics change over time)

Solution:
β”œβ”€β”€ Try multiple fingers
β”œβ”€β”€ Fall back to iris
β”œβ”€β”€ Fall back to OTP
β”œβ”€β”€ Face authentication option
β”œβ”€β”€ Biometric update facility
└── Exception handling mode for genuine failures

EDGE CASE 5: System Under Attack

Problem: DDoS or brute-force attacks
Mitigation:
β”œβ”€β”€ Rate limiting per AUA
β”œβ”€β”€ IP whitelisting for ASAs
β”œβ”€β”€ No direct internet exposure
β”œβ”€β”€ Anomaly detection
└── Fallback to degraded mode

Phase 6: Monitoring and Operations

You: "With 90 million daily authentications, operational excellence is critical."

MONITORING DASHBOARD

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                                         β”‚
β”‚  REAL-TIME METRICS                                                      β”‚
β”‚                                                                         β”‚
β”‚  Authentication Rate                                                    β”‚
β”‚  β”œβ”€β”€ Current TPS:          1,042 auth/sec                               β”‚
β”‚  β”œβ”€β”€ Today's Total:        67.2M authentications                        β”‚
β”‚  β”œβ”€β”€ Success Rate:         99.2%                                        β”‚
β”‚  └── p99 Latency:          185ms                                        β”‚
β”‚                                                                         β”‚
β”‚  By Authentication Type                                                 β”‚
β”‚  β”œβ”€β”€ Fingerprint:          58%     [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ           ]            β”‚
β”‚  β”œβ”€β”€ OTP:                  22%     [β–ˆβ–ˆβ–ˆβ–ˆ                   ]            β”‚
β”‚  β”œβ”€β”€ Face:                 12%     [β–ˆβ–ˆ                     ]            β”‚
β”‚  β”œβ”€β”€ Iris:                  5%     [β–ˆ                      ]            β”‚
β”‚  └── Demographic:           3%     [                       ]            β”‚
β”‚                                                                         β”‚
β”‚  Top AUAs (by volume)                                                   β”‚
β”‚  β”œβ”€β”€ SBI Bank:             12.3M    [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ              ]            β”‚
β”‚  β”œβ”€β”€ NPCI/UPI:              9.8M    [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                ]            β”‚
β”‚  β”œβ”€β”€ Jio Telecom:           7.2M    [β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                 ]            β”‚
β”‚  β”œβ”€β”€ HDFC Bank:             5.6M    [β–ˆβ–ˆβ–ˆβ–ˆ                  ]            β”‚
β”‚  └── Others:               32.3M                                        β”‚
β”‚                                                                         β”‚
β”‚  Infrastructure Health                                                  β”‚
β”‚  β”œβ”€β”€ CIDR Bengaluru:       βœ“ Healthy    (CPU: 45%)                      β”‚
β”‚  β”œβ”€β”€ CIDR Manesar:         βœ“ Healthy    (CPU: 42%)                      β”‚
β”‚  β”œβ”€β”€ ABIS 1:               βœ“ Healthy    (Queue: 234)                    β”‚
β”‚  β”œβ”€β”€ ABIS 2:               βœ“ Healthy    (Queue: 189)                    β”‚
β”‚  └── ABIS 3:               βœ“ Healthy    (Queue: 267)                    β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Interview Conclusion

Interviewer: "Excellent walkthrough. A few quick questions:"

Interviewer: "What's the hardest part of building a system like Aadhaar?"

You: "Three things:

  1. Biometric deduplication at scale β€” Proving uniqueness against 1.4 billion records requires clever algorithms, demographic blocking, and multi-ABIS consensus. Brute force is mathematically impossible.

  2. Inclusive design β€” The system must work for manual laborers with worn fingerprints, elderly with cataracts, amputees, and people in villages with no electricity. You can't just optimize for the happy path.

  3. Security of the crown jewels β€” If the CIDR is breached, the biometrics of 1.4 billion people are exposed. There's no 'password reset' for fingerprints. The security architecture must be bulletproof."

Interviewer: "What lessons from Aadhaar apply to other large-scale systems?"

You: "Several key lessons:

  1. API-first thinking β€” Aadhaar was designed as a platform from day one. The authentication API has enabled hundreds of services. UPI, DigiLocker, ABDM β€” all built on Aadhaar's identity layer.

  2. Vendor neutrality β€” Three ABIS vendors, open APIs, no lock-in. This enabled continuous improvement and prevented any single vendor from becoming critical.

  3. Offline-first design β€” For systems that must work in challenging environments, assume no connectivity and design for sync.

  4. Minimal data principle β€” Aadhaar authentication returns only Yes/No. This minimizes privacy exposure and attack surface.

  5. Invest in inclusion β€” Adding face authentication made the system accessible to millions who struggled with fingerprints. Inclusion isn't just ethics β€” it's good engineering."


Summary: Concepts Applied from 10-Week Course

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                                        β”‚
β”‚             CONCEPTS FROM 10-WEEK COURSE IN AADHAAR DESIGN             β”‚
β”‚                                                                        β”‚
β”‚  WEEK 1: DATA AT SCALE                                                 β”‚
β”‚  β”œβ”€β”€ Partitioning: Demographic blocking reduces search space           β”‚
β”‚  β”œβ”€β”€ Sharding: CIDR distributed across data centers                    β”‚
β”‚  └── Replication: Active-active for disaster recovery                  β”‚
β”‚                                                                        β”‚
β”‚  WEEK 2: FAILURE-FIRST DESIGN                                          β”‚
β”‚  β”œβ”€β”€ Offline-first: Enrollment works without connectivity              β”‚
β”‚  β”œβ”€β”€ Graceful degradation: Fall back to OTP if biometric fails         β”‚
β”‚  β”œβ”€β”€ Timeouts: 200ms latency budget for authentication                 β”‚
β”‚  └── Retry: Multi-finger, multi-modality attempts                      β”‚
β”‚                                                                        β”‚
β”‚  WEEK 3: MESSAGING & ASYNC                                             β”‚
β”‚  β”œβ”€β”€ Queue-based: Enrollment packets queued for processing             β”‚
β”‚  β”œβ”€β”€ Async deduplication: Days/weeks for complex cases                 β”‚
β”‚  └── Audit trail: All transactions logged asynchronously               β”‚
β”‚                                                                        β”‚
β”‚  WEEK 4: CACHING                                                       β”‚
β”‚  β”œβ”€β”€ Template caching: Hot Aadhaars cached for fast auth               β”‚
β”‚  β”œβ”€β”€ Session caching: Reduce database lookups                          β”‚
β”‚  └── Pre-computed: Demographic indices for blocking                    β”‚
β”‚                                                                        β”‚
β”‚  WEEK 5: CONSISTENCY                                                   β”‚
β”‚  β”œβ”€β”€ Strong consistency: Deduplication must be accurate                β”‚
β”‚  β”œβ”€β”€ Consensus: 2-of-3 ABIS agreement for decisions                    β”‚
β”‚  └── Uniqueness guarantee: Core requirement of the system              β”‚
β”‚                                                                        β”‚
β”‚  WEEK 9: SECURITY                                                      β”‚
β”‚  β”œβ”€β”€ 2048-bit PKI encryption throughout                                β”‚
β”‚  β”œβ”€β”€ HSM for key management                                            β”‚
β”‚  β”œβ”€β”€ Zero-trust architecture                                           β”‚
β”‚  β”œβ”€β”€ Virtual ID for privacy                                            β”‚
β”‚  └── Tokenization to prevent tracking                                  β”‚
β”‚                                                                        β”‚
β”‚  WEEK 10: OPERATIONS                                                   β”‚
β”‚  β”œβ”€β”€ Multi-vendor: Three ABIS for resilience                           β”‚
β”‚  β”œβ”€β”€ SLOs: <200ms auth, 99.9% availability                             β”‚
β”‚  β”œβ”€β”€ Audit: Complete traceability of all operations                    β”‚
β”‚  └── Continuous innovation: Face auth, VID, e-KYC                      β”‚
β”‚                                                                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Why Aadhaar Matters

╔═════════════════════════════════════════════════════════════════════════╗
β•‘                                                                         β•‘
β•‘               WHY AADHAAR IS AN ENGINEERING MARVEL                      β•‘
β•‘                                                                         β•‘
β•‘  SCALE                                                                  β•‘
β•‘  ─────                                                                  β•‘
β•‘  β€’ Largest biometric database in history (1.4 billion)                  β•‘
β•‘  β€’ 150+ billion authentications (and counting)                          β•‘
β•‘  β€’ 90+ million authentications per day                                  β•‘
β•‘  β€’ Deduplication: Never done at billion scale before                    β•‘
β•‘                                                                         β•‘
β•‘  INCLUSION                                                              β•‘
β•‘  ─────────                                                              β•‘
β•‘  β€’ Works in 640,000 villages                                            β•‘
β•‘  β€’ Handles worn fingerprints, cataracts, amputees                       β•‘
β•‘  β€’ Multiple modalities (finger, iris, face, OTP)                        β•‘
β•‘  β€’ Low-cost enrollment (donkeys carrying equipment!)                    β•‘
β•‘                                                                         β•‘
β•‘  IMPACT                                                                 β•‘
β•‘  ──────                                                                 β•‘
β•‘  β€’ β‚Ή3.5+ lakh crore in DBT savings                                      β•‘
β•‘  β€’ Millions of ghost beneficiaries eliminated                           β•‘
β•‘  β€’ Foundation for UPI, DigiLocker, ABDM, CoWIN                          β•‘
β•‘  β€’ "India Stack" model studied globally                                 β•‘
β•‘                                                                         β•‘
β•‘  INNOVATION                                                             β•‘
β•‘  ──────────                                                             β•‘
β•‘  β€’ First multi-ABIS system in the world                                 β•‘
β•‘  β€’ Virtual ID for privacy (before it was trendy)                        β•‘
β•‘  β€’ AI-powered face authentication at scale                              β•‘
β•‘  β€’ Consent-based data sharing (e-KYC)                                   β•‘
β•‘                                                                         β•‘
β•‘  ════════════════════════════════════════════════════════════════════   β•‘
β•‘                                                                         β•‘
β•‘  "Aadhaar proved that identity infrastructure can be built as           β•‘
β•‘   a public good, at billion scale, with inclusion at its core."         β•‘
β•‘                                                                         β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

Self-Assessment Checklist

After studying this case study, you should be able to:

Architecture:

  • Design a biometric enrollment system with offline capability
  • Explain deduplication challenges at billion scale
  • Design authentication with sub-200ms latency

Distributed Systems:

  • Apply demographic blocking to reduce search space
  • Design multi-vendor consensus systems
  • Handle offline-first data sync

Security:

  • Implement end-to-end encryption with HSM
  • Design privacy-preserving authentication (VID, tokenization)
  • Apply zero-trust principles to sensitive data

Inclusion:

  • Design for biometric edge cases
  • Provide multiple authentication fallbacks
  • Build systems for challenging infrastructure environments

Sources

Official UIDAI Documentation:

Statistics and Press Releases:

Architecture and Technical Deep Dives:

Biometric Technology:

Security and Privacy:

Academic and Research:


Further Reading

Official Documentation:

Engineering Talks:

  • Dr. Pramod Varma - "Architecting World's Largest Biometric Identity System" (Strata+Hadoop World 2014)
  • Nandan Nilekani - Various talks on India Stack and Aadhaar design philosophy

Engineering Blogs:

  • ByteByteGo: System design breakdowns
  • High Scalability: Case studies on large-scale systems
  • Biometric Update: Aadhaar and biometric technology coverage

Books:

  • "Designing Data-Intensive Applications" by Martin Kleppmann β€” Distributed systems fundamentals
  • "Aadhaar: A Biometric History of India's 12-Digit Revolution" by N.S. Ramnath and Charles Assisi

Related Systems to Study:

  • Estonia e-Residency: European digital identity model
  • Singapore SingPass: National digital identity
  • UK Verify: Federated identity approach (contrast to centralized)
  • MOSIP: Open-source identity platform (inspired by Aadhaar)

Research Papers:

  • "India's Aadhaar: Structure, Security, and Vulnerabilities" β€” IACR ePrint 2022/481
  • IEEE/ACM papers on biometric deduplication at scale

End of Bonus Problem 3: Aadhaar (UIDAI)

"1.4 billion unique identities. 150 billion authentications. The foundation of India's digital public infrastructure."