Bonus Problem 3: Aadhaar (UIDAI)
The World's Largest Biometric Identity System
πͺͺ Identity at Billion Scale
Imagine this challenge: You need to uniquely identify 1.4 billion people.
Not just assign them a number β but guarantee that each person appears exactly once in your system. No duplicates. No fakes. Every identity verifiable in under 200 milliseconds.
You'll need to match billions of fingerprints against billions of other fingerprints. Trillions of biometric comparisons. Every single day.
This is Aadhaar β and it's the largest biometric identity system ever built.
THE AADHAAR SCALE (2025)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β ENROLLMENT β
β ββββββββββ β
β Total Enrolled: 1.38+ Billion people β
β Coverage: 99.9% of adult Indian population β
β Biometric Data: ~15 Petabytes β
β (10 fingerprints + 2 iris scans + photo per person) β
β β
β AUTHENTICATION β
β ββββββββββββββ β
β Daily Authentications: 90+ Million β
β Monthly Authentications: 2.5+ Billion β
β Cumulative (to date): 150+ Billion authentications β
β e-KYC Transactions: 45+ Million/month β
β Face Authentication: 18+ Million/month (AI-powered) β
β β
β PERFORMANCE β
β βββββββββββ β
β Authentication Latency: < 200ms β
β Availability: 99.9%+ β
β Active Entities (AUAs): 550+ β
β β
β DEDUPLICATION β
β βββββββββββββ β
β Biometric Matches/Day: 600+ Trillion (at peak) β
β ABIS Vendors: 3 (for redundancy) β
β Duplicate Detection: 99.965% accuracy β
β β
β IMPACT β
β ββββββ β
β DBT Savings: βΉ3.5+ Lakh Crore ($42B+) saved β
β Ghost Beneficiaries: Millions eliminated β
β Bank Accounts Linked: 788+ Million β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
This is the system we'll design today β and understand the engineering marvel behind proving "you are you" at planetary scale.
The Interview Begins
You're interviewing at a government technology agency. The Chief Architect draws on the whiteboard:
Interviewer: "India's Aadhaar is often cited as the world's most ambitious digital identity project. I want you to design a biometric identity system that can scale to a billion people. Walk me through how you'd approach it."
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β Design a National Biometric Identity System β
β β
β Build an identity system that can: β
β β
β Requirements: β
β β’ Enroll 1+ billion residents with biometrics β
β β’ Guarantee uniqueness (no duplicates in the system) β
β β’ Authenticate identity in real-time (< 500ms) β
β β’ Handle 100+ million authentications per day β
β β’ Work across 640,000 villages with unreliable connectivity β
β β’ Protect biometric data with highest security β
β β’ Provide e-KYC (Know Your Customer) service β
β β’ Support multiple authentication modes (fingerprint, iris, OTP) β
β β’ 99.9%+ availability β
β β
β Constraint: This is a government project with vendor neutrality β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Interviewer: "The uniqueness guarantee is the hardest part. You need to prove that each of 1.4 billion people appears exactly once. That's never been done before at this scale."
Phase 1: Requirements Clarification
You: "Let me understand the specific challenges before designing."
Your Questions
You: "First, what biometric modalities are we capturing? And what's the expected quality given enrollment happens in remote villages?"
Interviewer: "10 fingerprints, 2 iris scans, and a photograph. Quality will vary β many manual laborers have worn fingerprints, elderly may have faded prints, some people lack fingers. The system must handle all cases."
You: "For uniqueness, what's the acceptable error rate? False accepts (enrolling duplicates) vs false rejects (wrongly rejecting unique people)?"
Interviewer: "This is critical. A false accept means someone gets two identities β they can claim benefits twice. A false reject means a legitimate person can't get enrolled. Both are bad, but false accepts are worse for a welfare system."
You: "What about authentication? Is it 1:1 matching (verify this person is who they claim) or 1:N (find this person in the database)?"
Interviewer: "Authentication is always 1:1 β they provide their Aadhaar number plus biometric, we verify it matches the stored data. Deduplication during enrollment is 1:N β we search the entire database to ensure they're not already enrolled."
You: "What infrastructure can we assume in remote areas?"
Interviewer: "Minimal. Many areas have no internet, unreliable power, extreme temperatures. Enrollment must work offline. Authentication needs connectivity but should degrade gracefully."
Requirements Summary
Functional Requirements:
1. ENROLLMENT
β’ Capture demographics (name, address, DOB, gender)
β’ Capture biometrics (10 fingerprints, 2 iris, 1 photo)
β’ Verify supporting documents (proof of identity/address)
β’ Perform deduplication (1:N match against entire database)
β’ Generate unique 12-digit Aadhaar number
β’ Print and mail physical Aadhaar letter
2. AUTHENTICATION
β’ Demographic authentication (name/address matching)
β’ Biometric authentication (fingerprint, iris, face)
β’ OTP authentication (via registered mobile)
β’ Multi-factor authentication (combinations)
β’ Return only Yes/No (no PII in response)
3. e-KYC (Know Your Customer)
β’ Return verified identity data after authentication
β’ Digitally signed response
β’ Replace paper-based KYC for banks, telecom, etc.
4. UPDATE
β’ Demographics update (address, phone, etc.)
β’ Biometrics update (for degraded prints)
β’ Document-based or operator-assisted updates
5. PRIVACY FEATURES
β’ Virtual ID (16-digit temporary alias for Aadhaar)
β’ Masked Aadhaar (partially hidden number)
β’ Authentication history (user can see who queried)
Non-Functional Requirements:
SCALE
β’ 1.4 billion enrolled residents
β’ 90+ million authentications/day
β’ Peak: 1000+ authentications/second
β’ 15+ petabytes of biometric data
LATENCY
β’ Authentication: < 200ms (1:1 match)
β’ Deduplication: minutes (1:N against billions)
ACCURACY
β’ False Positive Identification Rate (FPIR): < 0.0035%
β’ False Negative Identification Rate (FNIR): < 0.035%
AVAILABILITY
β’ 99.9%+ uptime
β’ Geo-distributed for disaster recovery
SECURITY
β’ 2048-bit PKI encryption
β’ Data encrypted at rest and in transit
β’ HSM for key management
β’ No biometric data leaves CIDR
Phase 2: Back of the Envelope Estimation
You: "Let me work through the numbers to understand the computational challenge."
The Deduplication Challenge
THE IMPOSSIBLE MATH
To guarantee uniqueness, every new enrollment must be
compared against EVERY existing record.
For 1 billion people with 10 fingerprints each:
Fingerprint comparisons for new enrollment:
1,000,000,000 people Γ 10 fingers = 10 billion templates
If we enroll 1 million new people per day:
1,000,000 Γ 10 billion = 10,000,000,000,000,000 comparisons/day
= 10 quadrillion matches/day!
At traditional matching speed (100,000 matches/sec):
10^16 / 10^5 = 10^11 seconds
= 3,170 years per day of enrollment!
This is mathematically impossible with brute force.
The Solution: Multi-Modal + Multi-ABIS
MAKING DEDUPLICATION TRACTABLE
1. DEMOGRAPHIC PRE-FILTER
Before biometric matching, filter by:
β’ Name phonetics
β’ Date of birth
β’ Gender
β’ Geographic region
This reduces search space by 99%+
2. MULTI-MODAL BIOMETRICS
Using fingerprint + iris together:
β’ Fingerprint alone: 1 in 10^6 uniqueness
β’ Iris alone: 1 in 10^12 uniqueness
β’ Combined: 1 in 10^18 uniqueness
The combination allows lower thresholds per modality
3. HIERARCHICAL MATCHING
β’ First: Fast, approximate match (GPU-accelerated)
β’ If potential match: Detailed matching
β’ If still ambiguous: Human adjudication
4. THREE ABIS VENDORS
β’ Each vendor runs independent deduplication
β’ Consensus required (2 of 3 agree)
β’ Different algorithms catch different edge cases
Storage Estimation
BIOMETRIC DATA STORAGE
Per person:
10 fingerprints: ~100KB (10 Γ 10KB template)
2 iris scans: ~100KB (2 Γ 50KB template)
1 photograph: ~50KB
Demographics: ~2KB
Metadata: ~5KB
βββββββββββββββββββββββββββββ
Total per person: ~257KB
For 1.4 billion people:
1.4B Γ 257KB = 360 TB (templates only)
Raw biometric images (archived):
Per person: ~5MB (high-res captures)
Total: 1.4B Γ 5MB = 7 PB
With replication (3x):
~20 PB total storage
Authentication Traffic
AUTHENTICATION LOAD
Daily authentications: 90,000,000
Seconds per day: 86,400
Average TPS: ~1,040 auth/second
Peak multiplier: 3x
Peak TPS: ~3,000 auth/second
Each authentication requires:
1. Decrypt request (PKI)
2. Lookup Aadhaar record
3. Biometric 1:1 match
4. Sign response
Time budget: 200ms total
Network: 50ms
Crypto: 30ms
Lookup: 20ms
Match: 100ms
Phase 3: High-Level Architecture
You: "Aadhaar's architecture follows four key principles: openness, linear scalability, strong security, and vendor neutrality."
The Aadhaar Ecosystem
AADHAAR ARCHITECTURE OVERVIEW
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β ENROLLMENT ECOSYSTEM β
β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β Resident ββββββΆβ Enrollment ββββββΆβ Registrar β β
β β (Village) β β Agency β β(State Govt) β β
β βββββββββββββββ ββββββββ¬βββββββ ββββββββ¬βββββββ β
β β β β
β Encrypted packet Verification β
β β β β
β βΌ βΌ β
β βββββββββββββββββββββββββββββββ β
β β CIDR (Central DB) β β
β β βββββββββββ βββββββββββ β β
β β β ABIS 1 β β ABIS 2 β β Deduplication β
β β ββββββ¬βββββ ββββββ¬βββββ β β
β β ββββββββ¬ββββββ β β
β β βΌ β β
β β βββββββββββ β β
β β β ABIS 3 β β 3-way consensus β
β β βββββββββββ β β
β βββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β AUTHENTICATION ECOSYSTEM β
β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β Resident ββββββΆβ Service ββββββΆβ AUA β β
β β (at bank, β β Point β β(Auth User β β
β β telecom) β β Device β β Agency) β β
β βββββββββββββββ βββββββββββββββ ββββββββ¬βββββββ β
β β β
β Encrypted PID β
β β β
β βΌ β
β βββββββββββββββ β
β β ASA β β
β β(Auth Serviceβ β
β β Agency) β β
β ββββββββ¬βββββββ β
β β β
β Secure leased line β
β β β
β βΌ β
β βββββββββββββββββββββββββ β
β β CIDR β β
β β (1:1 matching) β β
β β β β
β β Returns: Yes/No β β
β β (no PII returned) β β
β βββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Key Components
COMPONENT BREAKDOWN
1. ENROLLMENT CLIENT
β’ Runs on laptops in villages
β’ Captures biometrics (certified devices only)
β’ Works OFFLINE (syncs when connected)
β’ Encrypts everything at capture time
β’ Operator + Supervisor biometric signatures
2. REGISTRAR
β’ State governments, banks, oil companies
β’ Responsible for enrollment agencies
β’ First-level quality checks
β’ Uploads packets to CIDR
3. CIDR (Central Identities Data Repository)
β’ The "crown jewels" β all biometric data
β’ Two data centers (Bengaluru + Manesar)
β’ Active-active configuration
β’ NEVER exposed to internet directly
β’ Only UIDAI has access
4. ABIS (Automated Biometric Identification System)
β’ Three independent vendors (TCS+Neurotechnology, etc.)
β’ Each runs complete deduplication
β’ Consensus-based decision
β’ Vendor-neutral API integration
5. AUA (Authentication User Agency)
β’ Banks, telecom, insurance companies
β’ Licensed to use authentication
β’ Must follow UIDAI security guidelines
β’ Audited regularly
6. ASA (Authentication Service Agency)
β’ Secure network intermediary
β’ Connects AUAs to CIDR
β’ Dedicated leased lines (not internet)
β’ 27 licensed ASAs in India
Data Flow: Enrollment
ENROLLMENT FLOW
Resident visits enrollment center with documents
Step 1: CAPTURE
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β Enrollment Station (Offline-capable laptop) β
β β
β 1 Operator logs in with biometric β
β 2 Captures resident's demographics β
β 3 Scans proof documents (Ration card, Voter ID, etc.) β
β 4 Captures 10 fingerprints (slaps + thumbs) β
β 5 Captures 2 iris scans β
β 6 Captures photograph β
β 7 Resident reviews and confirms β
β 8 Operator signs packet biometrically β
β 9 Supervisor approves (for exceptions) β
β β
β Output: Encrypted enrollment packet (3-5 MB) β
β Contains HMAC for tamper detection β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
Step 2: UPLOAD
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β Registrar Backend β
β β
β 1 Receives packets via SFTP or encrypted USB β
β 2 Validates packet structure and signatures β
β 3 Queues for CIDR upload β
β 4 Uploads via secure channel to CIDR β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
Step 3: DEDUPLICATION
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β CIDR Processing β
β β
β 1 Decrypt packet (only CIDR can decrypt) β
β 2 Extract biometric templates β
β 3 Demographic pre-filter (reduce search space) β
β 4 Send to ABIS 1, ABIS 2, ABIS 3 in parallel β
β 5 Each ABIS returns: UNIQUE / DUPLICATE / MANUAL_REVIEW β
β 6 Consensus: 2 of 3 must agree β
β 7 If DUPLICATE: Manual adjudication β
β 8 If UNIQUE: Generate Aadhaar number β
β 9 Store in database β
β 10 Queue letter for printing β
β β
β Timeline: 3-90 days (depending on duplicates) β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
Step 4: DELIVERY
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β Aadhaar Letter printed and mailed to resident's address β
β Contains: 12-digit Aadhaar number + QR code β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Phase 4: Deep Dives
Deep Dive 1: Biometric Deduplication at Billion Scale
Week 1 concepts: Partitioning, sharding. Week 3 concepts: Async processing.
You: "Deduplication is the hardest problem in Aadhaar. You must compare each new person against 1.4 billion existing records."
THE DEDUPLICATION CHALLENGE
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β Without optimization: β
β β
β New enrollment: 1 person β
β Existing database: 1,400,000,000 people β
β Fingers per person: 10 β
β Iris per person: 2 β
β β
β Fingerprint comparisons: β
β 10 (new) Γ 10 (existing) Γ 1.4B = 140 trillion comparisons β
β β
β At 1 million matches/second: 140 million seconds = 4.4 years! β
β β
β For 1 million enrollments/day: 4.4 million years of compute/day β
β β
β This is impossible. We need smarter approaches. β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
How Aadhaar Makes It Tractable:
DEDUPLICATION OPTIMIZATION STRATEGIES
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β LAYER 1: DEMOGRAPHIC BLOCKING β
β βββββββββββββββββββββββββββββ β
β β
β Before biometric matching, partition by: β
β β’ Gender (2 partitions) β
β β’ Age range (10-year buckets = 10 partitions) β
β β’ State (36 partitions) β
β β’ Name phonetic hash (100 partitions) β
β β
β Effective reduction: 2 Γ 10 Γ 36 Γ 100 = 72,000x smaller search β
β β
β 1.4B / 72,000 = 19,444 candidate matches per enrollment β
β (vs 1.4 billion without blocking) β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β LAYER 2: MULTI-MODAL FUSION β
β βββββββββββββββββββββββββββ β
β β
β Fingerprint (10 fingers) + Iris (2 eyes) combined: β
β β
β Fingerprint score (0-100) Γ weight + β
β Iris score (0-100) Γ weight = β
β Final fusion score β
β β
β Using both modalities: β
β β’ Handles worn fingerprints (use iris) β
β β’ Handles cataracts (use fingerprint) β
β β’ Much higher accuracy than either alone β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β LAYER 3: THREE-WAY ABIS CONSENSUS β
β βββββββββββββββββββββββββββββββββ β
β β
β Three independent vendors run deduplication: β
β β
β βββββββββββββ βββββββββββββ βββββββββββββ β
β β ABIS 1 β β ABIS 2 β β ABIS 3 β β
β β(Vendor A) β β(Vendor B) β β(Vendor C) β β
β ββββββ¬βββββββ ββββββ¬βββββββ ββββββ¬βββββββ β
β β β β β
β ββββββββββββββββΌβββββββββββββββ β
β βΌ β
β βββββββββββββ β
β β Consensus β β
β β Engine β β
β βββββββββββββ β
β β
β Decision rules: β
β β’ 3/3 UNIQUE β Accept enrollment β
β β’ 3/3 DUPLICATE β Reject enrollment β
β β’ 2/3 agree β Follow majority β
β β’ Mixed/uncertain β Manual adjudication β
β β
β Why three vendors? β
β β’ Different algorithms catch different edge cases β
β β’ No single vendor lock-in β
β β’ Higher accuracy through consensus β
β β’ Continuous quality competition β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# deduplication/abis_orchestrator.py
"""
ABIS (Automated Biometric Identification System) Orchestration
Aadhaar uses three independent ABIS vendors for deduplication.
This provides redundancy, accuracy, and vendor neutrality.
"""
from dataclasses import dataclass
from typing import List, Optional, Tuple
from enum import Enum
import asyncio
class DeduplicationResult(Enum):
UNIQUE = "unique" # No duplicates found
DUPLICATE = "duplicate" # Duplicate found
MANUAL_REVIEW = "review" # Uncertain, needs human review
@dataclass
class ABISMatch:
candidate_aadhaar: str
fingerprint_score: float # 0-100
iris_score: float # 0-100
fusion_score: float # Combined score
confidence: str # HIGH, MEDIUM, LOW
@dataclass
class ABISResponse:
vendor_id: str
result: DeduplicationResult
matches: List[ABISMatch]
processing_time_ms: int
class ABISOrchestrator:
"""
Orchestrates deduplication across three ABIS vendors.
Aadhaar's multi-ABIS approach:
1. Send enrollment to all three vendors in parallel
2. Each vendor searches against their copy of the database
3. Consensus determines final result
"""
def __init__(
self,
abis_clients: List, # Three ABIS vendor clients
demographic_filter,
manual_review_queue
):
self.abis_clients = abis_clients
self.demo_filter = demographic_filter
self.review_queue = manual_review_queue
# Thresholds for decision
self.duplicate_threshold = 80 # Fusion score > 80 = duplicate
self.unique_threshold = 30 # Fusion score < 30 = unique
async def deduplicate(
self,
enrollment_packet: dict
) -> Tuple[DeduplicationResult, Optional[str]]:
"""
Main deduplication flow.
Returns:
(result, duplicate_aadhaar if found)
"""
# Step 1: Demographic blocking to reduce search space
candidate_pool = await self.demo_filter.get_candidates(
gender=enrollment_packet['gender'],
dob=enrollment_packet['dob'],
state=enrollment_packet['state'],
name_phonetic=enrollment_packet['name_phonetic']
)
# Log the reduction achieved
reduction_ratio = 1_400_000_000 / len(candidate_pool)
print(f"Demographic blocking: {len(candidate_pool):,} candidates")
print(f"Search space reduced by {reduction_ratio:,.0f}x")
# Step 2: Send to all three ABIS in parallel
abis_tasks = [
client.search(
fingerprints=enrollment_packet['fingerprints'],
irises=enrollment_packet['irises'],
candidate_pool=candidate_pool
)
for client in self.abis_clients
]
responses: List[ABISResponse] = await asyncio.gather(*abis_tasks)
# Step 3: Consensus decision
return self._consensus_decision(responses)
def _consensus_decision(
self,
responses: List[ABISResponse]
) -> Tuple[DeduplicationResult, Optional[str]]:
"""
Three-way consensus logic.
Aadhaar requires 2/3 agreement for automated decision.
Mixed results go to manual adjudication.
"""
unique_count = sum(
1 for r in responses if r.result == DeduplicationResult.UNIQUE
)
duplicate_count = sum(
1 for r in responses if r.result == DeduplicationResult.DUPLICATE
)
# Case 1: All three agree UNIQUE
if unique_count == 3:
return (DeduplicationResult.UNIQUE, None)
# Case 2: All three agree DUPLICATE
if duplicate_count == 3:
# Find the matching Aadhaar (should be same across all)
duplicate_aadhaar = self._find_common_match(responses)
return (DeduplicationResult.DUPLICATE, duplicate_aadhaar)
# Case 3: 2/3 agree UNIQUE
if unique_count >= 2:
return (DeduplicationResult.UNIQUE, None)
# Case 4: 2/3 agree DUPLICATE
if duplicate_count >= 2:
duplicate_aadhaar = self._find_common_match(responses)
return (DeduplicationResult.DUPLICATE, duplicate_aadhaar)
# Case 5: No consensus β manual review
# Queue for human adjudicator
return (DeduplicationResult.MANUAL_REVIEW, None)
def _find_common_match(
self,
responses: List[ABISResponse]
) -> Optional[str]:
"""
Find the Aadhaar number that multiple ABIS agree is a duplicate.
"""
# Collect all top matches
match_counts = {}
for response in responses:
if response.matches:
top_match = response.matches[0]
aadhaar = top_match.candidate_aadhaar
match_counts[aadhaar] = match_counts.get(aadhaar, 0) + 1
# Return the one with most agreement
if match_counts:
return max(match_counts, key=match_counts.get)
return None
class DemographicBlockingFilter:
"""
Pre-filters the biometric search space using demographics.
This is crucial for making billion-scale deduplication tractable.
"""
def __init__(self, database):
self.db = database
async def get_candidates(
self,
gender: str,
dob: str,
state: str,
name_phonetic: str
) -> List[str]:
"""
Get candidate Aadhaar numbers that match demographic criteria.
Reduces 1.4 billion to ~20,000-50,000 candidates.
"""
# Parse DOB to get age range
birth_year = int(dob[:4])
age_range_start = birth_year - 5
age_range_end = birth_year + 5
# Query with demographic filters
candidates = await self.db.query("""
SELECT aadhaar_number
FROM residents
WHERE gender = ?
AND birth_year BETWEEN ? AND ?
AND state_code = ?
AND name_phonetic_hash = ?
""", [gender, age_range_start, age_range_end,
state, self._phonetic_hash(name_phonetic)])
return [row['aadhaar_number'] for row in candidates]
def _phonetic_hash(self, name: str) -> str:
"""
Generate phonetic hash for name matching.
Handles spelling variations:
"Rahul" and "Rahool" β same hash
"Priya" and "Priyaa" β same hash
"""
# Simplified phonetic algorithm (actual uses Soundex variant)
# Remove vowels, normalize consonants
consonants = ''.join(c for c in name.upper() if c not in 'AEIOU')
return consonants[:4].ljust(4, '0')
Deep Dive 2: Authentication in 200 Milliseconds
Week 2 concepts: Timeouts, latency budgets. Week 4 concepts: Caching.
You: "Authentication must verify identity in under 200ms. This is 1:1 matching β much simpler than deduplication, but still demanding at scale."
AUTHENTICATION LATENCY BUDGET
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β Total Budget: 200ms β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β BREAKDOWN β β
β β β β
β β Network (AUA β ASA β CIDR β ASA β AUA): 50ms β β
β β βββ AUA to ASA: 15ms β β
β β βββ ASA to CIDR: 10ms β β
β β βββ CIDR to ASA: 10ms β β
β β βββ ASA to AUA: 15ms β β
β β β β
β β Cryptographic operations: 30ms β β
β β βββ Decrypt PID block (at CIDR): 10ms β β
β β βββ Signature verification: 5ms β β
β β βββ Sign response: 15ms β β
β β β β
β β Database lookup: 20ms β β
β β βββ Find Aadhaar record: 10ms β β
β β βββ Load biometric template: 10ms β β
β β β β
β β Biometric 1:1 matching: 80ms β β
β β βββ Fingerprint match: 40ms β β
β β βββ Iris match (if used): 40ms β β
β β β β
β β Processing overhead: 20ms β β
β β β β
β β Total: 200ms β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# authentication/auth_service.py
"""
Aadhaar Authentication Service
Processes 90+ million authentications per day with <200ms latency.
Returns only Yes/No β never exposes biometric or demographic data.
"""
from dataclasses import dataclass
from typing import Optional
from enum import Enum
from datetime import datetime
import asyncio
class AuthMode(Enum):
DEMOGRAPHIC = "demo" # Name/address matching
OTP = "otp" # One-time password
FINGERPRINT = "fmr" # Fingerprint biometric
IRIS = "iir" # Iris biometric
FACE = "fid" # Face biometric
MULTI_FACTOR = "mf" # Combination
@dataclass
class AuthRequest:
aadhaar_number: str # Or VID (Virtual ID)
auth_mode: AuthMode
pid_block: bytes # Encrypted biometric/OTP
aua_code: str # Authentication User Agency
timestamp: datetime
transaction_id: str
consent: bool # Must be True
@dataclass
class AuthResponse:
transaction_id: str
status: str # "y" (yes) or "n" (no)
error_code: Optional[str] # If status is "n"
auth_token: Optional[str] # Unique token for this auth
timestamp: datetime
# NOTE: No PII is ever returned!
class AadhaarAuthService:
"""
Core authentication service running in CIDR.
Security principles:
1. All requests encrypted with 2048-bit PKI
2. Biometric data decrypted only inside CIDR
3. Response is only Yes/No (no data leakage)
4. All transactions logged for audit
"""
def __init__(
self,
biometric_db,
biometric_matcher,
otp_service,
hsm_client, # Hardware Security Module
audit_logger
):
self.db = biometric_db
self.matcher = biometric_matcher
self.otp = otp_service
self.hsm = hsm_client
self.audit = audit_logger
# Performance tuning
self.template_cache = {} # LRU cache for hot Aadhaars
self.cache_ttl = 300 # 5 minutes
async def authenticate(
self,
request: AuthRequest
) -> AuthResponse:
"""
Main authentication flow.
Must complete in <200ms.
"""
start_time = datetime.utcnow()
try:
# Step 1: Validate request format
self._validate_request(request)
# Step 2: Decrypt PID block using HSM
# Only CIDR's HSM can decrypt
pid_data = await self.hsm.decrypt_pid(request.pid_block)
# Step 3: Resolve Aadhaar number (handle VID)
aadhaar = await self._resolve_aadhaar(request.aadhaar_number)
# Step 4: Load resident's template (with caching)
resident_template = await self._load_template(aadhaar)
# Step 5: Perform matching based on auth mode
match_result = await self._perform_match(
mode=request.auth_mode,
pid_data=pid_data,
stored_template=resident_template
)
# Step 6: Generate response
response = AuthResponse(
transaction_id=request.transaction_id,
status="y" if match_result.success else "n",
error_code=match_result.error_code,
auth_token=self._generate_token(aadhaar, request.aua_code),
timestamp=datetime.utcnow()
)
# Step 7: Audit logging (async, don't wait)
asyncio.create_task(self.audit.log(
aadhaar=aadhaar,
aua_code=request.aua_code,
auth_mode=request.auth_mode,
result=response.status,
latency_ms=(datetime.utcnow() - start_time).total_seconds() * 1000
))
return response
except ValidationError as e:
return self._error_response(request, str(e))
except Exception as e:
# Never expose internal errors
return self._error_response(request, "INTERNAL_ERROR")
async def _load_template(self, aadhaar: str) -> dict:
"""
Load biometric template with caching.
Hot Aadhaars (frequently authenticated) are cached.
"""
# Check cache first
if aadhaar in self.template_cache:
cached = self.template_cache[aadhaar]
if cached['expires'] > datetime.utcnow():
return cached['template']
# Cache miss β load from database
template = await self.db.get_template(aadhaar)
# Cache for hot Aadhaars
self.template_cache[aadhaar] = {
'template': template,
'expires': datetime.utcnow() + timedelta(seconds=self.cache_ttl)
}
return template
async def _perform_match(
self,
mode: AuthMode,
pid_data: dict,
stored_template: dict
) -> MatchResult:
"""
Perform biometric/demographic/OTP matching.
"""
if mode == AuthMode.FINGERPRINT:
# 1:1 fingerprint matching
return await self.matcher.match_fingerprint(
captured=pid_data['fingerprint'],
stored=stored_template['fingerprints'],
finger_position=pid_data.get('position', 'ANY')
)
elif mode == AuthMode.IRIS:
# 1:1 iris matching
return await self.matcher.match_iris(
captured=pid_data['iris'],
stored=stored_template['irises']
)
elif mode == AuthMode.FACE:
# AI-powered face matching
return await self.matcher.match_face(
captured=pid_data['face_image'],
stored=stored_template['photo'],
liveness_check=pid_data.get('liveness_data')
)
elif mode == AuthMode.OTP:
# Verify OTP sent to registered mobile
return await self.otp.verify(
aadhaar=stored_template['aadhaar'],
submitted_otp=pid_data['otp']
)
elif mode == AuthMode.DEMOGRAPHIC:
# Fuzzy matching on name/address
return self._demographic_match(
submitted=pid_data['demographics'],
stored=stored_template['demographics']
)
def _demographic_match(self, submitted: dict, stored: dict) -> MatchResult:
"""
Fuzzy matching for name and address.
Handles variations like:
- "Raj Kumar" vs "Rajkumar"
- "Bangalore" vs "Bengaluru"
- Hindi transliterations
"""
name_score = self._fuzzy_name_match(
submitted.get('name', ''),
stored['name']
)
address_score = self._fuzzy_address_match(
submitted.get('address', ''),
stored['address']
)
# Thresholds from UIDAI guidelines
if name_score >= submitted.get('match_threshold', 100):
return MatchResult(success=True)
return MatchResult(success=False, error_code="DEMO_MISMATCH")
Deep Dive 3: Security Architecture β Protecting Billion Biometrics
Week 9 concepts: Security, encryption, zero-trust.
You: "Aadhaar stores the biometrics of 1.4 billion people. This is the ultimate honeypot. Security is existential."
AADHAAR SECURITY LAYERS
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β LAYER 1: ENCRYPTION EVERYWHERE β
β ββββββββββββββββββββββββββββββ β
β β
β At Capture: β
β β’ Biometrics encrypted on enrollment device β
β β’ 2048-bit PKI encryption β
β β’ Only CIDR has private key to decrypt β
β β
β In Transit: β
β β’ All communication over encrypted channels β
β β’ TLS 1.2+ mandatory β
β β’ Dedicated leased lines (not public internet) β
β β
β At Rest: β
β β’ AES-256 encryption for stored data β
β β’ Even within CIDR, data is encrypted β
β β’ Keys stored in HSM (Hardware Security Module) β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β LAYER 2: TAMPER DETECTION β
β βββββββββββββββββββββββββ β
β β
β Every enrollment packet includes: β
β β’ HMAC for integrity verification β
β β’ Operator's biometric signature β
β β’ Supervisor's biometric signature (for exceptions) β
β β’ GPS coordinates of enrollment station β
β β’ Timestamp β
β β’ Device ID β
β β
β Any tampering is detectable and traceable β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β LAYER 3: ACCESS CONTROL β
β βββββββββββββββββββββββ β
β β
β CIDR access: β
β β’ Only UIDAI employees (very few) β
β β’ Multi-factor authentication required β
β β’ All access logged and audited β
β β’ No external access to raw biometrics β
β β
β Partner access (AUA/ASA): β
β β’ Can only call authentication API β
β β’ Cannot query or download data β
β β’ Rate limited per entity β
β β’ Licensed and audited β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β LAYER 4: RESPONSE MINIMIZATION β
β ββββββββββββββββββββββββββββββ β
β β
β Authentication returns ONLY: β
β β’ Yes (match) or No (no match) β
β β’ Transaction ID β
β β’ Timestamp β
β β
β NEVER returns: β
β β’ Biometric data β
β β’ Demographic data (unless e-KYC with consent) β
β β’ Match scores β
β β’ Reason for failure (in detail) β
β β
β This prevents information leakage β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Privacy Features Added Over Time:
PRIVACY ENHANCEMENTS
1. VIRTUAL ID (VID) β Introduced 2018
βββββββββββββββββββββββββββββββββ
Problem: Every authentication exposes Aadhaar number
Solution: 16-digit temporary ID that maps to Aadhaar
Resident generates VID β Uses VID instead of Aadhaar
Each VID is:
β’ Revocable (generate new anytime)
β’ Mappable only by CIDR
β’ Usable for authentication
AUA never sees actual Aadhaar number
2. TOKENIZATION β For recurring services
βββββββββββββββββββββββββββββββββββββ
Problem: Same Aadhaar used at multiple services
Services could collude to track user
Solution: Each AUA gets a unique token for each Aadhaar
Aadhaar 1234-5678-9012 β
Bank A: Token ABC123
Telecom B: Token XYZ789
Insurance C: Token PQR456
Services cannot correlate tokens
3. MASKED AADHAAR
βββββββββββββββ
Display: XXXX-XXXX-9012
Only last 4 digits visible
Used for documents that need to show Aadhaar reference
4. AUTHENTICATION HISTORY
βββββββββββββββββββββββ
Resident can see:
β’ Who authenticated their Aadhaar
β’ When
β’ What type of authentication
Provides transparency and detects misuse
# security/encryption_service.py
"""
Aadhaar Encryption Service
All biometric data is encrypted at the point of capture.
Only CIDR can decrypt using HSM-protected private keys.
"""
from dataclasses import dataclass
from typing import Tuple
from cryptography.hazmat.primitives import hashes, serialization
from cryptography.hazmat.primitives.asymmetric import rsa, padding
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
import hmac
import hashlib
class PIDBlockEncryption:
"""
PID (Personal Identity Data) Block encryption.
The PID block contains biometric/OTP data captured at
the authentication point. It's encrypted before transmission.
Encryption scheme:
1. Generate random session key (AES-256)
2. Encrypt biometric data with session key
3. Encrypt session key with UIDAI's public key (RSA-2048)
4. Add HMAC for integrity
"""
def __init__(self, uidai_public_key: bytes):
self.uidai_public_key = serialization.load_pem_public_key(
uidai_public_key
)
def encrypt_pid(
self,
biometric_data: bytes,
timestamp: str,
device_id: str
) -> Tuple[bytes, bytes]:
"""
Encrypt PID block for transmission to CIDR.
Returns:
(encrypted_data, encrypted_session_key)
"""
# Generate random session key
session_key = os.urandom(32) # 256 bits
iv = os.urandom(12) # 96 bits for GCM
# Create PID plaintext with metadata
pid_plaintext = self._create_pid_xml(
biometric_data=biometric_data,
timestamp=timestamp,
device_id=device_id
)
# Encrypt with AES-256-GCM
cipher = Cipher(algorithms.AES(session_key), modes.GCM(iv))
encryptor = cipher.encryptor()
ciphertext = encryptor.update(pid_plaintext) + encryptor.finalize()
# Encrypt session key with UIDAI's RSA public key
encrypted_session_key = self.uidai_public_key.encrypt(
session_key + iv, # Include IV
padding.OAEP(
mgf=padding.MGF1(algorithm=hashes.SHA256()),
algorithm=hashes.SHA256(),
label=None
)
)
# Add HMAC for integrity
hmac_value = hmac.new(
session_key,
ciphertext,
hashlib.sha256
).digest()
return (ciphertext + encryptor.tag + hmac_value,
encrypted_session_key)
class HSMKeyManager:
"""
Hardware Security Module interface.
All cryptographic keys are stored and used within HSM.
Keys never leave the HSM in plaintext.
UIDAI uses FIPS 140-2 Level 3 certified HSMs.
"""
def __init__(self, hsm_connection):
self.hsm = hsm_connection
async def decrypt_pid(self, encrypted_pid: bytes,
encrypted_key: bytes) -> dict:
"""
Decrypt PID block inside HSM.
The private key never leaves the HSM.
Decryption happens entirely within the HSM.
"""
# Send to HSM for decryption
decrypted = await self.hsm.decrypt(
data=encrypted_pid,
encrypted_key=encrypted_key,
key_id="UIDAI_AUTH_PRIVATE_KEY",
algorithm="RSA-OAEP-256"
)
return self._parse_pid_xml(decrypted)
async def sign_response(self, response_data: bytes) -> bytes:
"""
Sign authentication response using HSM.
All responses are digitally signed so AUAs can
verify they came from authentic CIDR.
"""
signature = await self.hsm.sign(
data=response_data,
key_id="UIDAI_SIGNING_KEY",
algorithm="RSA-SHA256"
)
return signature
class AadhaarDataVault:
"""
Secure storage for Aadhaar numbers at AUA/KUA.
UIDAI mandates that any entity storing Aadhaar numbers
must use an "Aadhaar Data Vault" with:
- AES-256 encryption
- Keys in HSM
- Access logging
- No plaintext storage
"""
def __init__(self, hsm_client, database):
self.hsm = hsm_client
self.db = database
async def store_aadhaar(
self,
reference_id: str, # Your internal customer ID
aadhaar_number: str
):
"""
Store Aadhaar number securely.
"""
# Generate reference key in HSM
encrypted_aadhaar = await self.hsm.encrypt(
data=aadhaar_number.encode(),
key_id="AADHAAR_VAULT_KEY"
)
# Store only encrypted value
await self.db.insert({
'reference_id': reference_id,
'encrypted_aadhaar': encrypted_aadhaar,
'created_at': datetime.utcnow()
})
async def get_aadhaar(self, reference_id: str) -> str:
"""
Retrieve and decrypt Aadhaar number.
"""
record = await self.db.get(reference_id)
# Decrypt in HSM
aadhaar = await self.hsm.decrypt(
data=record['encrypted_aadhaar'],
key_id="AADHAAR_VAULT_KEY"
)
# Log access
await self._log_access(reference_id)
return aadhaar.decode()
Deep Dive 4: Face Authentication β AI at Scale
Week 10 concepts: Operational excellence, innovation.
You: "Face authentication was introduced in 2021 and has exploded to 18 million transactions per month. It's AI-powered and developed in-house by UIDAI."
FACE AUTHENTICATION EVOLUTION
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β WHY FACE AUTHENTICATION? β
β βββββββββββββββββββββββββ β
β β
β Problems with fingerprint: β
β β’ Manual laborers: Worn fingerprints β
β β’ Elderly: Faded fingerprints β
β β’ Amputees: Missing fingers β
β β’ Skin conditions: Temporary issues β
β β’ COVID-19: Hygiene concerns with touch-based β
β β
β Face authentication advantages: β
β β’ Contactless (post-COVID preference) β
β β’ Works on any smartphone β
β β’ More inclusive (no physical requirements) β
β β’ Convenient for remote authentication β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β GROWTH TRAJECTORY β
β βββββββββββββββββ β
β β
β Oct 2021: Launch β
β Dec 2023: 100 crore (1 billion) cumulative β
β Jan 2025: 12 crore (120 million) per month β
β Sep 2025: 1.5 crore (15 million) per DAY (record) β
β β
β Adoption: 150+ government and private entities β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# authentication/face_auth.py
"""
UIDAI Face Authentication Service
AI/ML-powered face matching developed in-house by UIDAI.
Uses liveness detection to prevent spoofing.
"""
from dataclasses import dataclass
from typing import Tuple
import numpy as np
@dataclass
class FaceAuthRequest:
aadhaar_or_vid: str
face_image: bytes # Captured selfie
liveness_data: dict # Blink detection, head movement
device_info: dict # Camera specs, OS version
@dataclass
class FaceAuthResult:
match: bool
confidence: float # 0-100
liveness_verified: bool
error_code: str = None
class FaceAuthenticationService:
"""
Face authentication with liveness detection.
Architecture:
1. Liveness detection (prevent photo/video attacks)
2. Face detection and alignment
3. Feature extraction (deep learning)
4. 1:1 matching against stored photo
"""
def __init__(
self,
face_detector,
liveness_model,
face_encoder,
match_threshold: float = 0.85
):
self.detector = face_detector
self.liveness = liveness_model
self.encoder = face_encoder
self.threshold = match_threshold
async def authenticate(
self,
request: FaceAuthRequest,
stored_photo: bytes
) -> FaceAuthResult:
"""
Perform face authentication.
Steps:
1. Verify liveness (not a photo/video)
2. Detect and align faces
3. Extract embeddings
4. Compare embeddings
"""
# Step 1: Liveness verification
liveness_result = await self._verify_liveness(
image=request.face_image,
liveness_data=request.liveness_data
)
if not liveness_result.is_live:
return FaceAuthResult(
match=False,
confidence=0,
liveness_verified=False,
error_code="LIVENESS_FAILED"
)
# Step 2: Face detection and quality check
captured_face = await self._detect_and_align(request.face_image)
stored_face = await self._detect_and_align(stored_photo)
if captured_face is None:
return FaceAuthResult(
match=False,
confidence=0,
liveness_verified=True,
error_code="FACE_NOT_DETECTED"
)
# Step 3: Extract face embeddings
captured_embedding = await self.encoder.encode(captured_face)
stored_embedding = await self.encoder.encode(stored_face)
# Step 4: Compare embeddings
similarity = self._cosine_similarity(
captured_embedding,
stored_embedding
)
# Convert to percentage
confidence = (similarity + 1) / 2 * 100
return FaceAuthResult(
match=confidence >= self.threshold * 100,
confidence=confidence,
liveness_verified=True
)
async def _verify_liveness(
self,
image: bytes,
liveness_data: dict
) -> LivenessResult:
"""
Verify the face is from a live person.
Liveness checks:
1. Texture analysis (detect printed photos)
2. Depth estimation (detect flat screens)
3. Eye blink detection (detect videos)
4. Random challenge-response (head movement)
"""
# Challenge: User was asked to blink/move head
challenge_type = liveness_data.get('challenge_type')
frames = liveness_data.get('frames', [])
if challenge_type == 'BLINK':
# Detect eye blink across frames
return await self.liveness.detect_blink(frames)
elif challenge_type == 'HEAD_TURN':
# Detect head movement
return await self.liveness.detect_head_movement(frames)
elif challenge_type == 'PASSIVE':
# Passive liveness (texture + depth analysis)
return await self.liveness.passive_check(image)
return LivenessResult(is_live=False)
def _cosine_similarity(
self,
embedding1: np.ndarray,
embedding2: np.ndarray
) -> float:
"""
Cosine similarity between face embeddings.
Range: -1 to 1 (higher = more similar)
"""
dot_product = np.dot(embedding1, embedding2)
norm1 = np.linalg.norm(embedding1)
norm2 = np.linalg.norm(embedding2)
return dot_product / (norm1 * norm2)
class LivenessDetector:
"""
Deep learning model for liveness detection.
Trained to distinguish:
- Real faces
- Printed photos
- Screen displays (replay attacks)
- 3D masks
"""
def __init__(self, model_path: str):
self.model = self._load_model(model_path)
async def passive_check(self, image: bytes) -> LivenessResult:
"""
Passive liveness without user action.
Analyzes:
- MoirΓ© patterns (screen artifacts)
- Color distribution
- Texture frequency
- Specular reflection
"""
features = self._extract_liveness_features(image)
# Model predicts: real (1) vs spoof (0)
prediction = self.model.predict(features)
return LivenessResult(
is_live=prediction > 0.7,
confidence=prediction,
method="PASSIVE"
)
async def detect_blink(self, frames: list) -> LivenessResult:
"""
Detect eye blink across video frames.
A real person blinks; a photo doesn't.
"""
eye_aspect_ratios = []
for frame in frames:
# Detect eyes
eyes = self._detect_eyes(frame)
if eyes:
ear = self._eye_aspect_ratio(eyes)
eye_aspect_ratios.append(ear)
# Blink = dip in eye aspect ratio
if len(eye_aspect_ratios) > 5:
min_ear = min(eye_aspect_ratios)
max_ear = max(eye_aspect_ratios)
# Significant dip indicates blink
if (max_ear - min_ear) > 0.15:
return LivenessResult(is_live=True, method="BLINK")
return LivenessResult(is_live=False, method="BLINK")
Phase 5: Scaling and Edge Cases
Interviewer: "Aadhaar went from 0 to 1 billion enrollments in about 6 years. How did they scale the enrollment infrastructure?"
Enrollment at Village Scale
ENROLLMENT INFRASTRUCTURE
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β THE CHALLENGE β
β βββββββββββββ β
β β
β β’ 640,000+ villages in India β
β β’ Many with no electricity, no internet β
β β’ Extreme temperatures (deserts, mountains) β
β β’ Low literacy levels β
β β’ Need to enroll 1+ million people per day β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β THE SOLUTION: MOBILE ENROLLMENT CAMPS β
β βββββββββββββββββββββββββββββββββββββ β
β β
β Equipment transported (sometimes by donkeys!): β
β β’ Ruggedized laptops β
β β’ USB fingerprint scanners β
β β’ Iris cameras β
β β’ Web cameras (for photos) β
β β’ Portable generators β
β β’ Tables, chairs, canopies β
β β
β Personnel: β
β β’ 150,000+ certified operators β
β β’ Supervisors for exception handling β
β β’ ~50 enrollments per station per day β
β β
β Peak capacity: β
β β’ 60,000-80,000 enrollment stations β
β β’ 1+ million new enrollments per day β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Edge Cases
EDGE CASE 1: Biometric Exceptions
Problem: Not everyone can provide all biometrics
- Missing fingers (accidents, leprosy)
- Cataract patients (can't capture iris)
- Worn fingerprints (manual laborers)
- Children (biometrics change as they grow)
Solution:
βββ Multi-modal: If fingerprint fails, use iris
βββ Best-finger approach: Use whichever fingers work
βββ Exception handling: Supervisor approval for special cases
βββ Child enrollment: Mandatory biometric update at 5 and 15 years
βββ Best Available Data (BAD) mode for extreme cases
EDGE CASE 2: Offline Enrollment
Problem: No internet in remote villages
Solution:
βββ Enrollment client works entirely offline
βββ Packets stored locally (encrypted)
βββ Sync when connectivity available
βββ USB-based upload via registrar
βββ 30-day buffer for packet upload
EDGE CASE 3: Duplicate Enrollment Attempts
Problem: People trying to get multiple Aadhaars for fraud
Solution:
βββ Three-way ABIS consensus
βββ Manual adjudication for borderline cases
βββ Reject if duplicate found
βββ Audit trail for investigation
βββ Criminal penalties for fraud
EDGE CASE 4: Authentication Failures
Problem: Genuine person fails authentication
Causes:
βββ Worn fingerprints (temporary or permanent)
βββ Cuts/injuries on fingers
βββ Wet/dirty fingers
βββ Sensor quality issues
βββ Aging (biometrics change over time)
Solution:
βββ Try multiple fingers
βββ Fall back to iris
βββ Fall back to OTP
βββ Face authentication option
βββ Biometric update facility
βββ Exception handling mode for genuine failures
EDGE CASE 5: System Under Attack
Problem: DDoS or brute-force attacks
Mitigation:
βββ Rate limiting per AUA
βββ IP whitelisting for ASAs
βββ No direct internet exposure
βββ Anomaly detection
βββ Fallback to degraded mode
Phase 6: Monitoring and Operations
You: "With 90 million daily authentications, operational excellence is critical."
MONITORING DASHBOARD
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β REAL-TIME METRICS β
β β
β Authentication Rate β
β βββ Current TPS: 1,042 auth/sec β
β βββ Today's Total: 67.2M authentications β
β βββ Success Rate: 99.2% β
β βββ p99 Latency: 185ms β
β β
β By Authentication Type β
β βββ Fingerprint: 58% [ββββββββββββ ] β
β βββ OTP: 22% [ββββ ] β
β βββ Face: 12% [ββ ] β
β βββ Iris: 5% [β ] β
β βββ Demographic: 3% [ ] β
β β
β Top AUAs (by volume) β
β βββ SBI Bank: 12.3M [ββββββββ ] β
β βββ NPCI/UPI: 9.8M [ββββββ ] β
β βββ Jio Telecom: 7.2M [βββββ ] β
β βββ HDFC Bank: 5.6M [ββββ ] β
β βββ Others: 32.3M β
β β
β Infrastructure Health β
β βββ CIDR Bengaluru: β Healthy (CPU: 45%) β
β βββ CIDR Manesar: β Healthy (CPU: 42%) β
β βββ ABIS 1: β Healthy (Queue: 234) β
β βββ ABIS 2: β Healthy (Queue: 189) β
β βββ ABIS 3: β Healthy (Queue: 267) β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Interview Conclusion
Interviewer: "Excellent walkthrough. A few quick questions:"
Interviewer: "What's the hardest part of building a system like Aadhaar?"
You: "Three things:
-
Biometric deduplication at scale β Proving uniqueness against 1.4 billion records requires clever algorithms, demographic blocking, and multi-ABIS consensus. Brute force is mathematically impossible.
-
Inclusive design β The system must work for manual laborers with worn fingerprints, elderly with cataracts, amputees, and people in villages with no electricity. You can't just optimize for the happy path.
-
Security of the crown jewels β If the CIDR is breached, the biometrics of 1.4 billion people are exposed. There's no 'password reset' for fingerprints. The security architecture must be bulletproof."
Interviewer: "What lessons from Aadhaar apply to other large-scale systems?"
You: "Several key lessons:
-
API-first thinking β Aadhaar was designed as a platform from day one. The authentication API has enabled hundreds of services. UPI, DigiLocker, ABDM β all built on Aadhaar's identity layer.
-
Vendor neutrality β Three ABIS vendors, open APIs, no lock-in. This enabled continuous improvement and prevented any single vendor from becoming critical.
-
Offline-first design β For systems that must work in challenging environments, assume no connectivity and design for sync.
-
Minimal data principle β Aadhaar authentication returns only Yes/No. This minimizes privacy exposure and attack surface.
-
Invest in inclusion β Adding face authentication made the system accessible to millions who struggled with fingerprints. Inclusion isn't just ethics β it's good engineering."
Summary: Concepts Applied from 10-Week Course
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β CONCEPTS FROM 10-WEEK COURSE IN AADHAAR DESIGN β
β β
β WEEK 1: DATA AT SCALE β
β βββ Partitioning: Demographic blocking reduces search space β
β βββ Sharding: CIDR distributed across data centers β
β βββ Replication: Active-active for disaster recovery β
β β
β WEEK 2: FAILURE-FIRST DESIGN β
β βββ Offline-first: Enrollment works without connectivity β
β βββ Graceful degradation: Fall back to OTP if biometric fails β
β βββ Timeouts: 200ms latency budget for authentication β
β βββ Retry: Multi-finger, multi-modality attempts β
β β
β WEEK 3: MESSAGING & ASYNC β
β βββ Queue-based: Enrollment packets queued for processing β
β βββ Async deduplication: Days/weeks for complex cases β
β βββ Audit trail: All transactions logged asynchronously β
β β
β WEEK 4: CACHING β
β βββ Template caching: Hot Aadhaars cached for fast auth β
β βββ Session caching: Reduce database lookups β
β βββ Pre-computed: Demographic indices for blocking β
β β
β WEEK 5: CONSISTENCY β
β βββ Strong consistency: Deduplication must be accurate β
β βββ Consensus: 2-of-3 ABIS agreement for decisions β
β βββ Uniqueness guarantee: Core requirement of the system β
β β
β WEEK 9: SECURITY β
β βββ 2048-bit PKI encryption throughout β
β βββ HSM for key management β
β βββ Zero-trust architecture β
β βββ Virtual ID for privacy β
β βββ Tokenization to prevent tracking β
β β
β WEEK 10: OPERATIONS β
β βββ Multi-vendor: Three ABIS for resilience β
β βββ SLOs: <200ms auth, 99.9% availability β
β βββ Audit: Complete traceability of all operations β
β βββ Continuous innovation: Face auth, VID, e-KYC β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Why Aadhaar Matters
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β WHY AADHAAR IS AN ENGINEERING MARVEL β
β β
β SCALE β
β βββββ β
β β’ Largest biometric database in history (1.4 billion) β
β β’ 150+ billion authentications (and counting) β
β β’ 90+ million authentications per day β
β β’ Deduplication: Never done at billion scale before β
β β
β INCLUSION β
β βββββββββ β
β β’ Works in 640,000 villages β
β β’ Handles worn fingerprints, cataracts, amputees β
β β’ Multiple modalities (finger, iris, face, OTP) β
β β’ Low-cost enrollment (donkeys carrying equipment!) β
β β
β IMPACT β
β ββββββ β
β β’ βΉ3.5+ lakh crore in DBT savings β
β β’ Millions of ghost beneficiaries eliminated β
β β’ Foundation for UPI, DigiLocker, ABDM, CoWIN β
β β’ "India Stack" model studied globally β
β β
β INNOVATION β
β ββββββββββ β
β β’ First multi-ABIS system in the world β
β β’ Virtual ID for privacy (before it was trendy) β
β β’ AI-powered face authentication at scale β
β β’ Consent-based data sharing (e-KYC) β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β "Aadhaar proved that identity infrastructure can be built as β
β a public good, at billion scale, with inclusion at its core." β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Self-Assessment Checklist
After studying this case study, you should be able to:
Architecture:
- Design a biometric enrollment system with offline capability
- Explain deduplication challenges at billion scale
- Design authentication with sub-200ms latency
Distributed Systems:
- Apply demographic blocking to reduce search space
- Design multi-vendor consensus systems
- Handle offline-first data sync
Security:
- Implement end-to-end encryption with HSM
- Design privacy-preserving authentication (VID, tokenization)
- Apply zero-trust principles to sensitive data
Inclusion:
- Design for biometric edge cases
- Provide multiple authentication fallbacks
- Build systems for challenging infrastructure environments
Sources
Official UIDAI Documentation:
- UIDAI Official Website: https://uidai.gov.in/
- Aadhaar Authentication API Specification: https://uidai.gov.in/images/resource/Aadhaar_Authentication_API-2.5_Revision-1_of_January_2022.pdf
- Aadhaar e-KYC API Specification: https://uidai.gov.in/images/resource/aadhaar_ekyc_api_2_5.pdf
- UIDAI Operation Model: https://uidai.gov.in/en/ecosystem/authentication-ecosystem/operation-model.html
- Network and Server Security: https://uidai.gov.in/en/16-english-uk/aapka-aadhaar/31-network-and-server-security.html
- Enrollment Data Security: https://uidai.gov.in/en/16-english-uk/aapka-aadhaar/33-enrolment-data-security.html
- About Aadhaar Paperless Offline e-KYC: https://uidai.gov.in/en/ecosystem/authentication-devices-documents/about-aadhaar-paperless-offline-e-kyc.html
Statistics and Press Releases:
- UIDAI January 2025 Statistics: https://www.pib.gov.in/PressReleasePage.aspx?PRID=2100685
- Aadhaar Authentication Surpasses 150 Billion: https://www.pib.gov.in/PressReleasePage.aspx?PRID=2129121
- UIDAI August 2025 Statistics: https://www.pib.gov.in/PressReleasePage.aspx?PRID=2163733
- IBEF Aadhaar Coverage: https://ibef.org/news/aadhaar-authentication-crosses-150-billion-transactions-powering-india-s-digital-economy-and-welfare-services
- MobileIDWorld Coverage: https://mobileidworld.com/indias-aadhaar-digital-id-system-surpasses-150-billion-authentication-transactions/
- OpenGov Asia Analysis: https://opengovasia.com/2025/02/10/aadhaars-expanding-role-in-indias-digital-economy/
Architecture and Technical Deep Dives:
- HPE Developer Portal - Aadhaar Experience: https://developer.hpe.com/blog/architecting-the-worlds-largest-biometric-identity-system-the-aadhaar-ex/
- MapR - Aadhaar Architecture: https://mapr.com/blog/architecting-worlds-largest-biometric-identity-system-aadhaar-experience/
- Teknonauts - Aadhaar Architecture: https://teknonauts.com/aadhaar-card-architecture/
- Medium - Aadhaar System Design: https://medium.com/career-drill/aadhar-system-design-39b1425a0983
- IACR ePrint - Aadhaar Security Analysis: https://eprint.iacr.org/2022/481.pdf
Biometric Technology:
- Neurotechnology and TCS ABIS: https://www.biometricupdate.com/202103/neurotechnology-to-provide-biometric-de-duplication-software-for-indias-aadhaar-program
- Neurotechnology Press Release: https://www.neurotechnology.com/press_release_india_uidai_aadhaar_id.html
- MegaMatcher ABIS: https://www.neurotechnology.com/megamatcher-abis.html
- UIDAI Role of Biometric Technology: https://www.dematerialisedid.com/PDFs/role_of_biometric_technology_in_aadhaar_jan21_2012.pdf
Security and Privacy:
- Privacy International Analysis: https://privacyinternational.org/case-study/4698/id-systems-analysed-aadhaar
- Protean Tech Security Overview: https://www.proteantech.in/articles/aadhaar-authentication-data-security-27-06-2025/
- AWS Aadhaar Data Vault: https://aws.amazon.com/blogs/publicsector/build-aadhaar-data-vault-aws/
Academic and Research:
- Taylor & Francis - Aadhaar Governing with Biometrics: https://www.tandfonline.com/doi/full/10.1080/00856401.2019.1595343
- ScienceDirect - Decoding Indian Data Governance: https://www.sciencedirect.com/science/article/pii/S2590291125001354
Further Reading
Official Documentation:
- UIDAI Developer Portal: https://uidai.gov.in/en/ecosystem/authentication-devices-documents.html
- Authentication Regulations: https://uidai.gov.in/legal-framework/aadhaar-regulations.html
- Aadhaar Act 2016: Full text with amendments
Engineering Talks:
- Dr. Pramod Varma - "Architecting World's Largest Biometric Identity System" (Strata+Hadoop World 2014)
- Nandan Nilekani - Various talks on India Stack and Aadhaar design philosophy
Engineering Blogs:
- ByteByteGo: System design breakdowns
- High Scalability: Case studies on large-scale systems
- Biometric Update: Aadhaar and biometric technology coverage
Books:
- "Designing Data-Intensive Applications" by Martin Kleppmann β Distributed systems fundamentals
- "Aadhaar: A Biometric History of India's 12-Digit Revolution" by N.S. Ramnath and Charles Assisi
Related Systems to Study:
- Estonia e-Residency: European digital identity model
- Singapore SingPass: National digital identity
- UK Verify: Federated identity approach (contrast to centralized)
- MOSIP: Open-source identity platform (inspired by Aadhaar)
Research Papers:
- "India's Aadhaar: Structure, Security, and Vulnerabilities" β IACR ePrint 2022/481
- IEEE/ACM papers on biometric deduplication at scale
End of Bonus Problem 3: Aadhaar (UIDAI)
"1.4 billion unique identities. 150 billion authentications. The foundation of India's digital public infrastructure."