Himanshu Kukreja
0%
Day 03

Week 9 — Day 3: Data Residency and GDPR

System Design Mastery Series — Multi-Tenancy, Security, and Compliance Week


Preface

Your startup just closed a deal with a German enterprise customer. Celebration! Then their legal team sends this:

THE DATA RESIDENCY REALITY

From: legal@deutschebank.example.de
Subject: Data Processing Requirements

Dear Vendor,

Before we can proceed, please confirm:

1. All personal data of our employees will be stored exclusively
   in the European Union

2. No personal data will be transferred to the United States
   or any other third country without appropriate safeguards

3. You will sign our Data Processing Agreement (DPA) based on
   GDPR Article 28

4. You can demonstrate compliance with:
   - GDPR (EU)
   - BDSG (German Federal Data Protection Act)
   - Schrems II ruling implications

5. You can provide data residency certification and audit logs

Please provide your technical architecture showing how you ensure
EU data stays in the EU.

Regards,
Deutsche Bank Legal

---

You look at your architecture:

Current state:
├── Single AWS region: us-east-1
├── All data in one PostgreSQL cluster
├── Backups replicated to us-west-2
├── Analytics processed in BigQuery (US)
├── Customer support via Zendesk (US)
└── Error tracking via Sentry (US)

Your response options:
❌ "Sorry, we can't do this" (lose the deal)
❌ "Sure, we comply!" (lie and risk massive fines)
✓ "We need to re-architect" (this week's lesson)

Today, we'll learn how to build systems that respect data residency requirements while running a global platform.


Part I: Foundations

Chapter 1: Understanding Data Residency

1.1 What Is Data Residency?

Data residency refers to the physical or geographic location where data is stored and processed. Data residency requirements are laws or policies that mandate data must remain within specific geographic boundaries.

DATA RESIDENCY CONCEPTS

DATA RESIDENCY:
├── Where data is physically stored
├── Geographic location of servers
├── Can be chosen by organization
└── Example: "Our EU data is in Frankfurt"

DATA SOVEREIGNTY:
├── Legal jurisdiction over data
├── Which country's laws apply
├── Based on storage location + company location
└── Example: "EU data subject to GDPR"

DATA LOCALIZATION:
├── Legal requirement to keep data in-country
├── Government-mandated residency
├── Often for national security or privacy
└── Example: "Russian personal data must stay in Russia"

DATA TRANSFER:
├── Moving data across borders
├── May require legal basis
├── Subject to adequacy decisions
└── Example: "EU to US transfer requires SCCs"

1.2 Why Data Residency Matters

REGULATORY LANDSCAPE

EUROPEAN UNION (GDPR):
├── Applies to: EU residents' personal data
├── Key rules:
│   ├── Transfers outside EU need legal basis
│   ├── Adequacy decisions (few countries qualify)
│   ├── Standard Contractual Clauses (SCCs)
│   └── Binding Corporate Rules (BCRs)
├── Fines: Up to €20M or 4% global revenue
└── Notable: Schrems II invalidated Privacy Shield

RUSSIA (Federal Law 242-FZ):
├── Applies to: Russian citizens' personal data
├── Key rules:
│   ├── Initial collection must be in Russia
│   ├── Primary database must be in Russia
│   └── Cross-border transfer allowed after local storage
├── Enforcement: Website blocking, fines
└── Notable: LinkedIn blocked in Russia

CHINA (PIPL + DSL + Cybersecurity Law):
├── Applies to: Data collected in China
├── Key rules:
│   ├── Critical data must stay in China
│   ├── Security assessment for cross-border
│   └── Government access requirements
├── Enforcement: Business license revocation
└── Notable: Extremely broad scope

INDIA (DPDP Act 2023):
├── Applies to: Indian residents' data
├── Key rules:
│   ├── Certain data cannot leave India
│   ├── Government notification for transfers
│   └── Still evolving
├── Enforcement: Fines up to ₹250 crore
└── Notable: Localization for payment data (RBI)

BRAZIL (LGPD):
├── Applies to: Brazilian residents' data
├── Key rules:
│   ├── Similar to GDPR
│   ├── Adequate protection required for transfers
│   └── Consent or legitimate interest basis
├── Enforcement: Fines up to 2% revenue
└── Notable: Closely modeled on GDPR

1.3 GDPR Deep Dive

GDPR KEY CONCEPTS FOR ENGINEERS

PERSONAL DATA (Article 4):
├── Any information relating to identified/identifiable person
├── Examples:
│   ├── Name, email, phone number
│   ├── IP address, cookie IDs
│   ├── Location data
│   ├── Behavioral data
│   └── Device identifiers
└── Note: Pseudonymized data is still personal data

SPECIAL CATEGORIES (Article 9):
├── Extra protection required for:
│   ├── Racial or ethnic origin
│   ├── Political opinions
│   ├── Religious beliefs
│   ├── Trade union membership
│   ├── Genetic/biometric data
│   ├── Health data
│   └── Sexual orientation
└── Generally prohibited without explicit consent

DATA SUBJECT RIGHTS:
├── Right to access (Article 15)
├── Right to rectification (Article 16)
├── Right to erasure (Article 17) ← Tomorrow's topic
├── Right to data portability (Article 20)
├── Right to object (Article 21)
└── Rights related to automated decisions (Article 22)

LAWFUL BASIS FOR PROCESSING (Article 6):
├── Consent (explicit, informed, withdrawable)
├── Contract performance
├── Legal obligation
├── Vital interests
├── Public interest
└── Legitimate interests (balance test)

CROSS-BORDER TRANSFERS (Chapter 5):
├── Adequacy decision (safest)
│   └── Countries: Andorra, Argentina, Canada (commercial),
│       Faroe Islands, Guernsey, Israel, Isle of Man, Japan,
│       Jersey, New Zealand, South Korea, Switzerland,
│       UK, Uruguay, and US (new Data Privacy Framework)
├── Standard Contractual Clauses (SCCs)
├── Binding Corporate Rules (BCRs)
├── Derogations (limited cases)
└── Note: Must assess destination country's laws

Chapter 2: Data Residency Architecture Patterns

2.1 Architecture Options

DATA RESIDENCY PATTERNS

PATTERN 1: SINGLE REGION (Limited Compliance)
─────────────────────────────────────────────
┌───────────────────────────────────────────────────────────────────────┐
│                                                                       │
│                         US-EAST-1 (Virginia)                          │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                    │
│  │   App       │  │   Database  │  │   Storage   │                    │
│  │   Servers   │  │   Cluster   │  │   (S3)      │                    │
│  └─────────────┘  └─────────────┘  └─────────────┘                    │
│                                                                       │
│  ALL data from ALL regions stored here                                │
│                                                                       │
└───────────────────────────────────────────────────────────────────────┘

Pros: Simple, cheap
Cons: Can't comply with data residency laws
Use: Internal tools, non-regulated data


PATTERN 2: REGIONAL DEPLOYMENTS (Full Isolation)
────────────────────────────────────────────────
┌───────────────────────┐  ┌───────────────────────┐  ┌───────────────────────┐
│    US Deployment      │  │    EU Deployment      │  │   APAC Deployment     │
│    (us-east-1)        │  │    (eu-central-1)     │  │   (ap-southeast-1)    │
│ ┌───────────────────┐ │  │ ┌───────────────────┐ │  │ ┌───────────────────┐ │
│ │ App + DB + Storage│ │  │ │ App + DB + Storage│ │  │ │ App + DB + Storage│ │
│ │ US customers only │ │  │ │ EU customers only │ │  │ │ APAC customers    │ │
│ └───────────────────┘ │  │ └───────────────────┘ │  │ └───────────────────┘ │
└───────────────────────┘  └───────────────────────┘  └───────────────────────┘
        │                          │                         │
        └──────────────────────────┴─────────────────────────┘
                    NO DATA SHARING BETWEEN REGIONS

Pros: Strong compliance, data stays in region
Cons: Complex, expensive, no cross-region features
Use: Highly regulated industries, strict localization


PATTERN 3: REGIONAL DATA, GLOBAL CONTROL PLANE
──────────────────────────────────────────────
                    ┌───────────────────────┐
                    │    Global Control     │
                    │    (Metadata only)    │
                    │    - Tenant registry  │
                    │    - Configuration    │
                    │    - Routing rules    │
                    └───────────┬───────────┘
                                │
        ┌───────────────────────┼───────────────────────┐
        │                       │                       │
        ▼                       ▼                       ▼
┌───────────────┐      ┌───────────────┐      ┌───────────────┐
│  US Region    │      │  EU Region    │      │ APAC Region   │
│  ┌─────────┐  │      │  ┌─────────┐  │      │  ┌─────────┐  │
│  │ US Data │  │      │  │ EU Data │  │      │  │APAC Data│  │
│  └─────────┘  │      │  └─────────┘  │      │  └─────────┘  │
└───────────────┘      └───────────────┘      └───────────────┘

Pros: Compliance + some global features
Cons: Complex routing, careful about what's "metadata"
Use: Most SaaS companies with compliance needs


PATTERN 4: DATA RESIDENCY BY TENANT
───────────────────────────────────
┌────────────────────────────────────────────────────────────────────────┐
│                          Global Application                            │
│                                                                        │
│  Request → Tenant Lookup → Route to Tenant's Region → Process          │
│                                                                        │
│  Tenant A (US)    → us-east-1                                          │
│  Tenant B (EU)    → eu-central-1                                       │
│  Tenant C (EU)    → eu-central-1                                       │
│  Tenant D (APAC)  → ap-southeast-1                                     │
│                                                                        │
└────────────────────────────────────────────────────────────────────────┘

Pros: Flexible, per-tenant compliance
Cons: Routing complexity, cross-tenant features limited
Use: B2B SaaS with enterprise customers

2.2 What Data Goes Where?

DATA CLASSIFICATION FOR RESIDENCY

PERSONAL DATA (Must respect residency):
├── User profiles (name, email, phone)
├── Employee data
├── Customer communications
├── Support tickets with PII
├── Files uploaded by users
├── Activity logs with user IDs
├── IP addresses and location data
└── Any data linked to an individual

OPERATIONAL DATA (Usually can be global):
├── Anonymized/aggregated analytics
├── System metrics and monitoring
├── Error logs (if PII stripped)
├── Configuration data
├── Feature flags
├── Infrastructure metadata
└── Audit logs of system events (not user actions)

GRAY AREAS (Careful analysis needed):
├── Pseudonymized data
│   └── Still personal data under GDPR!
├── Behavioral analytics
│   └── May be personal if linkable
├── Machine learning training data
│   └── Depends on source
├── Backups
│   └── Same rules as primary data
└── Cached data
    └── Same rules as primary data

DECISION FRAMEWORK:
┌────────────────────────────────────────────────────────────────────────┐
│                                                                        │
│  Question 1: Can this data identify a person?                          │
│  ├── Yes → Personal data → Residency rules apply                       │
│  └── No → Continue to Question 2                                       │
│                                                                        │
│  Question 2: Can this data be combined with other data to identify?    │
│  ├── Yes → Personal data → Residency rules apply                       │
│  └── No → Operational data → Usually global OK                         │
│                                                                        │
│  Question 3: Is this derived from personal data?                       │
│  ├── Yes → Analyze if truly anonymous                                  │
│  └── No → Operational data                                             │
│                                                                        │
└────────────────────────────────────────────────────────────────────────┘

3.1 Lawful Basis Decision Tree

CHOOSING LAWFUL BASIS FOR PROCESSING

                    ┌───────────────────────┐
                    │ Why are you processing │
                    │ this personal data?    │
                    └───────────┬───────────┘
                                │
        ┌───────────────────────┼───────────────────────┐
        │                       │                       │
        ▼                       ▼                       ▼
┌───────────────┐      ┌───────────────┐      ┌───────────────┐
│ User requested│      │ Required by   │      │ Business      │
│ a service     │      │ law           │      │ benefit       │
└───────┬───────┘      └───────┬───────┘      └───────┬───────┘
        │                      │                      │
        ▼                      ▼                      ▼
┌───────────────┐      ┌───────────────┐      ┌───────────────┐
│  CONTRACT     │      │    LEGAL      │      │  LEGITIMATE   │
│  PERFORMANCE  │      │  OBLIGATION   │      │   INTEREST    │
│               │      │               │      │  (needs test) │
│ Examples:     │      │ Examples:     │      │               │
│ - Deliver     │      │ - Tax records │      │ Examples:     │
│   product     │      │ - Employment  │      │ - Fraud       │
│ - Process     │      │   law         │      │   prevention  │
│   payment     │      │ - Court order │      │ - Analytics   │
│ - Send order  │      │               │      │ - Marketing   │
│   updates     │      │               │      │   (maybe)     │
└───────────────┘      └───────────────┘      └───────────────┘
                                                      │
                                                      ▼
                                              ┌───────────────┐
                                              │ Need CONSENT? │
                                              │               │
                                              │ - Marketing   │
                                              │ - Cookies     │
                                              │ - Third-party │
                                              │   sharing     │
                                              │ - Special     │
                                              │   categories  │
                                              └───────────────┘

Part II: Implementation

Chapter 4: Regional Data Architecture

4.1 Tenant Region Assignment

# data_residency/tenant_region.py

"""
Tenant region assignment and routing.

Each tenant is assigned to a region based on their data residency requirements.
"""

from dataclasses import dataclass
from typing import Optional, Dict
from enum import Enum
import logging

logger = logging.getLogger(__name__)


class DataRegion(Enum):
    """Supported data regions."""
    US = "us"
    EU = "eu"
    UK = "uk"
    APAC = "apac"
    BRAZIL = "brazil"
    INDIA = "india"


@dataclass
class RegionConfig:
    """Configuration for a data region."""
    region_id: DataRegion
    display_name: str
    aws_region: str
    database_endpoint: str
    storage_bucket: str
    search_endpoint: str
    cache_endpoint: str
    
    # Compliance attributes
    gdpr_compliant: bool = False
    data_localization: bool = False
    adequacy_decision: bool = False


# Region configurations
REGION_CONFIGS = {
    DataRegion.US: RegionConfig(
        region_id=DataRegion.US,
        display_name="United States",
        aws_region="us-east-1",
        database_endpoint="db-us.example.com",
        storage_bucket="data-us-example",
        search_endpoint="search-us.example.com",
        cache_endpoint="cache-us.example.com",
        gdpr_compliant=False,
        adequacy_decision=True  # US-EU Data Privacy Framework
    ),
    DataRegion.EU: RegionConfig(
        region_id=DataRegion.EU,
        display_name="European Union",
        aws_region="eu-central-1",
        database_endpoint="db-eu.example.com",
        storage_bucket="data-eu-example",
        search_endpoint="search-eu.example.com",
        cache_endpoint="cache-eu.example.com",
        gdpr_compliant=True,
        adequacy_decision=True
    ),
    DataRegion.UK: RegionConfig(
        region_id=DataRegion.UK,
        display_name="United Kingdom",
        aws_region="eu-west-2",
        database_endpoint="db-uk.example.com",
        storage_bucket="data-uk-example",
        search_endpoint="search-uk.example.com",
        cache_endpoint="cache-uk.example.com",
        gdpr_compliant=True,  # UK GDPR
        adequacy_decision=True
    ),
    DataRegion.APAC: RegionConfig(
        region_id=DataRegion.APAC,
        display_name="Asia Pacific",
        aws_region="ap-southeast-1",
        database_endpoint="db-apac.example.com",
        storage_bucket="data-apac-example",
        search_endpoint="search-apac.example.com",
        cache_endpoint="cache-apac.example.com",
        gdpr_compliant=False,
        adequacy_decision=False
    ),
}


@dataclass
class TenantRegionAssignment:
    """A tenant's region assignment."""
    tenant_id: str
    data_region: DataRegion
    reason: str  # Why this region was assigned
    assigned_at: str
    can_change: bool = True  # Some contracts lock the region


class TenantRegionService:
    """
    Service for managing tenant region assignments.
    """
    
    def __init__(self, db, cache):
        self.db = db
        self.cache = cache
    
    async def get_tenant_region(self, tenant_id: str) -> RegionConfig:
        """
        Get the region configuration for a tenant.
        """
        # Check cache first
        cache_key = f"tenant_region:{tenant_id}"
        cached = await self.cache.get(cache_key)
        
        if cached:
            return REGION_CONFIGS[DataRegion(cached)]
        
        # Load from database
        result = await self.db.fetchone(
            "SELECT data_region FROM tenants WHERE id = $1",
            tenant_id
        )
        
        if not result:
            raise ValueError(f"Tenant not found: {tenant_id}")
        
        region = DataRegion(result["data_region"])
        
        # Cache for 1 hour
        await self.cache.set(cache_key, region.value, ttl=3600)
        
        return REGION_CONFIGS[region]
    
    async def assign_region(
        self,
        tenant_id: str,
        region: DataRegion,
        reason: str
    ) -> TenantRegionAssignment:
        """
        Assign a tenant to a data region.
        
        This should be done during tenant onboarding.
        Changing region later requires data migration.
        """
        from datetime import datetime
        
        # Validate region is supported
        if region not in REGION_CONFIGS:
            raise ValueError(f"Unsupported region: {region}")
        
        # Update tenant
        await self.db.execute(
            """
            UPDATE tenants 
            SET data_region = $2, region_assigned_at = $3, region_reason = $4
            WHERE id = $1
            """,
            tenant_id, region.value, datetime.utcnow(), reason
        )
        
        # Invalidate cache
        await self.cache.delete(f"tenant_region:{tenant_id}")
        
        logger.info(
            f"Assigned tenant {tenant_id} to region {region.value}",
            extra={"tenant_id": tenant_id, "region": region.value, "reason": reason}
        )
        
        return TenantRegionAssignment(
            tenant_id=tenant_id,
            data_region=region,
            reason=reason,
            assigned_at=datetime.utcnow().isoformat()
        )
    
    async def suggest_region(
        self,
        country_code: str,
        compliance_requirements: list
    ) -> DataRegion:
        """
        Suggest a region based on country and compliance needs.
        """
        # EU countries → EU region
        eu_countries = [
            "AT", "BE", "BG", "HR", "CY", "CZ", "DK", "EE", "FI", "FR",
            "DE", "GR", "HU", "IE", "IT", "LV", "LT", "LU", "MT", "NL",
            "PL", "PT", "RO", "SK", "SI", "ES", "SE"
        ]
        
        if country_code in eu_countries:
            return DataRegion.EU
        
        if country_code == "GB":
            return DataRegion.UK
        
        if country_code == "US":
            return DataRegion.US
        
        if country_code == "BR":
            return DataRegion.BRAZIL
        
        if country_code == "IN":
            return DataRegion.INDIA
        
        # APAC countries
        apac_countries = ["AU", "NZ", "SG", "JP", "KR", "HK", "TW"]
        if country_code in apac_countries:
            return DataRegion.APAC
        
        # Default to US for others (with SCCs if needed)
        return DataRegion.US

4.2 Regional Database Routing

# data_residency/database_router.py

"""
Database routing based on tenant region.

Routes database queries to the correct regional database.
"""

from typing import Dict, Any
import asyncpg
import logging

logger = logging.getLogger(__name__)


class RegionalDatabaseRouter:
    """
    Routes database connections to regional databases.
    
    Each tenant's data is stored in their assigned region's database.
    """
    
    def __init__(self, region_configs: Dict[DataRegion, RegionConfig]):
        self.region_configs = region_configs
        self._pools: Dict[DataRegion, asyncpg.Pool] = {}
    
    async def initialize(self):
        """Initialize connection pools for all regions."""
        for region, config in self.region_configs.items():
            pool = await asyncpg.create_pool(
                host=config.database_endpoint,
                database="app_db",
                min_size=5,
                max_size=20
            )
            self._pools[region] = pool
            logger.info(f"Initialized database pool for region: {region.value}")
    
    async def get_connection(self, tenant_id: str):
        """
        Get database connection for a tenant.
        
        Routes to the correct regional database.
        """
        # Get tenant's region
        region_config = await self.tenant_region_service.get_tenant_region(tenant_id)
        region = region_config.region_id
        
        if region not in self._pools:
            raise ValueError(f"No database pool for region: {region}")
        
        return self._pools[region].acquire()
    
    async def execute_in_region(
        self,
        region: DataRegion,
        query: str,
        *args
    ) -> Any:
        """
        Execute a query in a specific region.
        
        Used for admin operations that target a specific region.
        """
        if region not in self._pools:
            raise ValueError(f"No database pool for region: {region}")
        
        async with self._pools[region].acquire() as conn:
            return await conn.fetch(query, *args)


class RegionalStorageRouter:
    """
    Routes file storage to regional S3 buckets.
    """
    
    def __init__(self, region_configs: Dict[DataRegion, RegionConfig]):
        self.region_configs = region_configs
        self._clients: Dict[DataRegion, Any] = {}
    
    async def initialize(self):
        """Initialize S3 clients for all regions."""
        import aioboto3
        
        for region, config in self.region_configs.items():
            session = aioboto3.Session()
            client = await session.client(
                's3',
                region_name=config.aws_region
            ).__aenter__()
            self._clients[region] = (client, config.storage_bucket)
    
    async def upload_file(
        self,
        tenant_id: str,
        file_key: str,
        file_data: bytes,
        content_type: str
    ) -> str:
        """
        Upload a file to the tenant's regional storage.
        """
        region_config = await self.tenant_region_service.get_tenant_region(tenant_id)
        region = region_config.region_id
        
        client, bucket = self._clients[region]
        
        # Include tenant_id in key for organization
        full_key = f"tenants/{tenant_id}/{file_key}"
        
        await client.put_object(
            Bucket=bucket,
            Key=full_key,
            Body=file_data,
            ContentType=content_type,
            Metadata={
                "tenant_id": tenant_id,
                "region": region.value
            }
        )
        
        logger.info(
            f"Uploaded file to regional storage",
            extra={
                "tenant_id": tenant_id,
                "region": region.value,
                "bucket": bucket,
                "key": full_key
            }
        )
        
        return f"s3://{bucket}/{full_key}"
    
    async def get_file(
        self,
        tenant_id: str,
        file_key: str
    ) -> bytes:
        """
        Get a file from the tenant's regional storage.
        """
        region_config = await self.tenant_region_service.get_tenant_region(tenant_id)
        region = region_config.region_id
        
        client, bucket = self._clients[region]
        full_key = f"tenants/{tenant_id}/{file_key}"
        
        response = await client.get_object(Bucket=bucket, Key=full_key)
        return await response['Body'].read()

# data_residency/consent.py

"""
Consent management for GDPR compliance.

Tracks user consent for different processing purposes.
"""

from dataclasses import dataclass
from typing import List, Optional, Dict
from datetime import datetime
from enum import Enum
import uuid
import logging

logger = logging.getLogger(__name__)


class ConsentPurpose(Enum):
    """Purposes for which consent can be given."""
    SERVICE_DELIVERY = "service_delivery"  # Usually contract basis, not consent
    MARKETING_EMAIL = "marketing_email"
    MARKETING_PHONE = "marketing_phone"
    ANALYTICS = "analytics"
    PERSONALIZATION = "personalization"
    THIRD_PARTY_SHARING = "third_party_sharing"
    PROFILING = "profiling"
    COOKIES_ESSENTIAL = "cookies_essential"
    COOKIES_ANALYTICS = "cookies_analytics"
    COOKIES_MARKETING = "cookies_marketing"


class ConsentStatus(Enum):
    """Status of consent."""
    GRANTED = "granted"
    DENIED = "denied"
    WITHDRAWN = "withdrawn"
    NOT_ASKED = "not_asked"


@dataclass
class ConsentRecord:
    """Record of a consent decision."""
    id: str
    user_id: str
    tenant_id: str
    purpose: ConsentPurpose
    status: ConsentStatus
    granted_at: Optional[datetime]
    withdrawn_at: Optional[datetime]
    ip_address: str
    user_agent: str
    consent_text: str  # Exact text shown to user
    consent_version: str  # Version of consent form


@dataclass
class ConsentPreferences:
    """A user's current consent preferences."""
    user_id: str
    consents: Dict[ConsentPurpose, ConsentStatus]
    last_updated: datetime


class ConsentService:
    """
    Service for managing user consent.
    
    Key principles:
    - Consent must be freely given, specific, informed, unambiguous
    - Must be as easy to withdraw as to give
    - Must keep records of when/how consent was given
    - Consent is per-purpose, not blanket
    """
    
    def __init__(self, db, event_publisher):
        self.db = db
        self.events = event_publisher
    
    async def record_consent(
        self,
        user_id: str,
        tenant_id: str,
        purpose: ConsentPurpose,
        granted: bool,
        ip_address: str,
        user_agent: str,
        consent_text: str,
        consent_version: str
    ) -> ConsentRecord:
        """
        Record a consent decision.
        
        This creates an immutable audit record.
        """
        record = ConsentRecord(
            id=str(uuid.uuid4()),
            user_id=user_id,
            tenant_id=tenant_id,
            purpose=purpose,
            status=ConsentStatus.GRANTED if granted else ConsentStatus.DENIED,
            granted_at=datetime.utcnow() if granted else None,
            withdrawn_at=None,
            ip_address=ip_address,
            user_agent=user_agent,
            consent_text=consent_text,
            consent_version=consent_version
        )
        
        # Store in database (immutable log)
        await self.db.execute(
            """
            INSERT INTO consent_records 
            (id, user_id, tenant_id, purpose, status, granted_at, 
             ip_address, user_agent, consent_text, consent_version, created_at)
            VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11)
            """,
            record.id, record.user_id, record.tenant_id,
            record.purpose.value, record.status.value, record.granted_at,
            record.ip_address, record.user_agent, record.consent_text,
            record.consent_version, datetime.utcnow()
        )
        
        # Update current preferences
        await self._update_current_preferences(
            user_id, tenant_id, purpose,
            ConsentStatus.GRANTED if granted else ConsentStatus.DENIED
        )
        
        # Publish event for downstream systems
        await self.events.publish(
            "consent",
            {
                "type": "consent.recorded",
                "user_id": user_id,
                "tenant_id": tenant_id,
                "purpose": purpose.value,
                "granted": granted
            }
        )
        
        logger.info(
            f"Consent recorded",
            extra={
                "user_id": user_id,
                "purpose": purpose.value,
                "granted": granted
            }
        )
        
        return record
    
    async def withdraw_consent(
        self,
        user_id: str,
        tenant_id: str,
        purpose: ConsentPurpose,
        ip_address: str,
        user_agent: str
    ) -> ConsentRecord:
        """
        Withdraw previously given consent.
        
        Must be as easy as giving consent.
        """
        record = ConsentRecord(
            id=str(uuid.uuid4()),
            user_id=user_id,
            tenant_id=tenant_id,
            purpose=purpose,
            status=ConsentStatus.WITHDRAWN,
            granted_at=None,
            withdrawn_at=datetime.utcnow(),
            ip_address=ip_address,
            user_agent=user_agent,
            consent_text="Consent withdrawn by user",
            consent_version="withdrawal"
        )
        
        await self.db.execute(
            """
            INSERT INTO consent_records 
            (id, user_id, tenant_id, purpose, status, withdrawn_at,
             ip_address, user_agent, consent_text, consent_version, created_at)
            VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11)
            """,
            record.id, record.user_id, record.tenant_id,
            record.purpose.value, record.status.value, record.withdrawn_at,
            record.ip_address, record.user_agent, record.consent_text,
            record.consent_version, datetime.utcnow()
        )
        
        await self._update_current_preferences(
            user_id, tenant_id, purpose, ConsentStatus.WITHDRAWN
        )
        
        # Publish event - systems must stop processing
        await self.events.publish(
            "consent",
            {
                "type": "consent.withdrawn",
                "user_id": user_id,
                "tenant_id": tenant_id,
                "purpose": purpose.value
            }
        )
        
        logger.info(
            f"Consent withdrawn",
            extra={"user_id": user_id, "purpose": purpose.value}
        )
        
        return record
    
    async def get_current_consent(
        self,
        user_id: str,
        tenant_id: str,
        purpose: ConsentPurpose
    ) -> ConsentStatus:
        """
        Get current consent status for a purpose.
        """
        result = await self.db.fetchone(
            """
            SELECT status FROM user_consent_preferences
            WHERE user_id = $1 AND tenant_id = $2 AND purpose = $3
            """,
            user_id, tenant_id, purpose.value
        )
        
        if not result:
            return ConsentStatus.NOT_ASKED
        
        return ConsentStatus(result["status"])
    
    async def has_consent(
        self,
        user_id: str,
        tenant_id: str,
        purpose: ConsentPurpose
    ) -> bool:
        """
        Check if user has given consent for a purpose.
        """
        status = await self.get_current_consent(user_id, tenant_id, purpose)
        return status == ConsentStatus.GRANTED
    
    async def get_all_preferences(
        self,
        user_id: str,
        tenant_id: str
    ) -> ConsentPreferences:
        """
        Get all consent preferences for a user.
        """
        results = await self.db.fetch(
            """
            SELECT purpose, status, updated_at
            FROM user_consent_preferences
            WHERE user_id = $1 AND tenant_id = $2
            """,
            user_id, tenant_id
        )
        
        consents = {}
        last_updated = datetime.min
        
        for row in results:
            consents[ConsentPurpose(row["purpose"])] = ConsentStatus(row["status"])
            if row["updated_at"] > last_updated:
                last_updated = row["updated_at"]
        
        return ConsentPreferences(
            user_id=user_id,
            consents=consents,
            last_updated=last_updated
        )
    
    async def get_consent_history(
        self,
        user_id: str,
        tenant_id: str
    ) -> List[ConsentRecord]:
        """
        Get full consent history for a user.
        
        Required for data subject access requests.
        """
        results = await self.db.fetch(
            """
            SELECT * FROM consent_records
            WHERE user_id = $1 AND tenant_id = $2
            ORDER BY created_at DESC
            """,
            user_id, tenant_id
        )
        
        return [
            ConsentRecord(
                id=row["id"],
                user_id=row["user_id"],
                tenant_id=row["tenant_id"],
                purpose=ConsentPurpose(row["purpose"]),
                status=ConsentStatus(row["status"]),
                granted_at=row["granted_at"],
                withdrawn_at=row["withdrawn_at"],
                ip_address=row["ip_address"],
                user_agent=row["user_agent"],
                consent_text=row["consent_text"],
                consent_version=row["consent_version"]
            )
            for row in results
        ]
    
    async def _update_current_preferences(
        self,
        user_id: str,
        tenant_id: str,
        purpose: ConsentPurpose,
        status: ConsentStatus
    ):
        """Update current preferences table."""
        await self.db.execute(
            """
            INSERT INTO user_consent_preferences (user_id, tenant_id, purpose, status, updated_at)
            VALUES ($1, $2, $3, $4, $5)
            ON CONFLICT (user_id, tenant_id, purpose) 
            DO UPDATE SET status = $4, updated_at = $5
            """,
            user_id, tenant_id, purpose.value, status.value, datetime.utcnow()
        )
# data_residency/consent_middleware.py

"""
Middleware and decorators for consent-aware processing.
"""

from functools import wraps
from fastapi import HTTPException


def requires_consent(purpose: ConsentPurpose):
    """
    Decorator that ensures user has consented to a purpose.
    
    Usage:
        @requires_consent(ConsentPurpose.MARKETING_EMAIL)
        async def send_marketing_email(user_id: str):
            ...
    """
    def decorator(func):
        @wraps(func)
        async def wrapper(*args, **kwargs):
            # Get user_id from kwargs or context
            user_id = kwargs.get("user_id")
            tenant_id = get_current_tenant_id()
            
            if not user_id:
                raise ValueError("user_id required for consent check")
            
            # Check consent
            has_consent = await consent_service.has_consent(
                user_id, tenant_id, purpose
            )
            
            if not has_consent:
                raise ConsentRequiredError(
                    f"User has not consented to {purpose.value}"
                )
            
            return await func(*args, **kwargs)
        return wrapper
    return decorator


class ConsentRequiredError(Exception):
    """Raised when required consent is not present."""
    pass


# Example usage in a service
class MarketingService:
    """Service that requires consent for operations."""
    
    def __init__(self, consent_service: ConsentService, email_client):
        self.consent = consent_service
        self.email = email_client
    
    async def send_newsletter(self, user_id: str, content: str):
        """
        Send newsletter to user.
        
        Requires marketing email consent.
        """
        tenant_id = get_current_tenant_id()
        
        # Check consent before sending
        if not await self.consent.has_consent(
            user_id, tenant_id, ConsentPurpose.MARKETING_EMAIL
        ):
            logger.info(
                f"Skipping newsletter for user without consent",
                extra={"user_id": user_id}
            )
            return False
        
        await self.email.send(
            to=user_id,
            subject="Newsletter",
            content=content
        )
        
        return True
    
    async def send_bulk_newsletter(self, user_ids: List[str], content: str):
        """
        Send newsletter to multiple users.
        
        Filters to only users with consent.
        """
        tenant_id = get_current_tenant_id()
        
        # Batch check consent
        consented_users = []
        
        for user_id in user_ids:
            if await self.consent.has_consent(
                user_id, tenant_id, ConsentPurpose.MARKETING_EMAIL
            ):
                consented_users.append(user_id)
        
        logger.info(
            f"Sending newsletter to {len(consented_users)}/{len(user_ids)} users with consent"
        )
        
        for user_id in consented_users:
            await self.email.send(
                to=user_id,
                subject="Newsletter",
                content=content
            )
        
        return len(consented_users)

Chapter 6: Cross-Border Data Transfers

6.1 Transfer Impact Assessment

# data_residency/transfer_assessment.py

"""
Cross-border data transfer assessment and documentation.
"""

from dataclasses import dataclass
from typing import List, Optional
from datetime import datetime
from enum import Enum


class TransferMechanism(Enum):
    """Legal mechanisms for cross-border transfers."""
    ADEQUACY_DECISION = "adequacy_decision"
    STANDARD_CONTRACTUAL_CLAUSES = "sccs"
    BINDING_CORPORATE_RULES = "bcrs"
    DEROGATION = "derogation"
    CONSENT = "consent"


class RiskLevel(Enum):
    """Risk level for data transfers."""
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"
    PROHIBITED = "prohibited"


@dataclass
class TransferAssessment:
    """Assessment of a cross-border data transfer."""
    id: str
    source_region: str
    destination_region: str
    data_categories: List[str]
    data_subjects: str  # Description of affected individuals
    transfer_mechanism: TransferMechanism
    risk_level: RiskLevel
    supplementary_measures: List[str]
    assessment_date: datetime
    next_review_date: datetime
    approved_by: str
    notes: str


class TransferImpactAssessment:
    """
    Performs Transfer Impact Assessments (TIAs) as required by Schrems II.
    """
    
    # Countries with adequacy decisions (simplified)
    ADEQUATE_COUNTRIES = {
        "EU", "EEA", "GB", "CH", "JP", "KR", "CA", "NZ", "IL", "UY", "AR"
    }
    
    # Countries with high surveillance risk (simplified assessment)
    HIGH_RISK_COUNTRIES = {
        "CN", "RU"  # This is a simplification - real assessment is more nuanced
    }
    
    def assess_transfer(
        self,
        source_country: str,
        destination_country: str,
        data_categories: List[str],
        special_categories: bool = False
    ) -> TransferAssessment:
        """
        Assess a proposed data transfer.
        """
        # Same region = no transfer
        if source_country == destination_country:
            return self._create_assessment(
                source_country, destination_country, data_categories,
                TransferMechanism.ADEQUACY_DECISION,  # Not really a transfer
                RiskLevel.LOW,
                []
            )
        
        # Check adequacy
        if destination_country in self.ADEQUATE_COUNTRIES:
            return self._create_assessment(
                source_country, destination_country, data_categories,
                TransferMechanism.ADEQUACY_DECISION,
                RiskLevel.LOW,
                []
            )
        
        # US-specific handling (Data Privacy Framework)
        if destination_country == "US":
            return self._create_assessment(
                source_country, destination_country, data_categories,
                TransferMechanism.ADEQUACY_DECISION,  # DPF
                RiskLevel.MEDIUM,  # Some risk remains
                ["Verify recipient is DPF certified",
                 "Review specific data categories"]
            )
        
        # High-risk countries
        if destination_country in self.HIGH_RISK_COUNTRIES:
            if special_categories:
                return self._create_assessment(
                    source_country, destination_country, data_categories,
                    TransferMechanism.DEROGATION,
                    RiskLevel.PROHIBITED,
                    ["Transfer of special categories to this jurisdiction is not recommended"]
                )
            
            return self._create_assessment(
                source_country, destination_country, data_categories,
                TransferMechanism.STANDARD_CONTRACTUAL_CLAUSES,
                RiskLevel.HIGH,
                ["Implement encryption in transit and at rest",
                 "Minimize data transferred",
                 "Regular review of legal situation",
                 "Consider pseudonymization"]
            )
        
        # Default: SCCs with supplementary measures
        return self._create_assessment(
            source_country, destination_country, data_categories,
            TransferMechanism.STANDARD_CONTRACTUAL_CLAUSES,
            RiskLevel.MEDIUM,
            ["Sign SCCs with recipient",
             "Document supplementary measures",
             "Review annually"]
        )
    
    def _create_assessment(
        self,
        source: str,
        dest: str,
        data_categories: List[str],
        mechanism: TransferMechanism,
        risk: RiskLevel,
        measures: List[str]
    ) -> TransferAssessment:
        """Create assessment record."""
        import uuid
        from datetime import timedelta
        
        return TransferAssessment(
            id=str(uuid.uuid4()),
            source_region=source,
            destination_region=dest,
            data_categories=data_categories,
            data_subjects="Users and employees of tenant",
            transfer_mechanism=mechanism,
            risk_level=risk,
            supplementary_measures=measures,
            assessment_date=datetime.utcnow(),
            next_review_date=datetime.utcnow() + timedelta(days=365),
            approved_by="",
            notes=""
        )

6.2 Data Processing Agreements

# data_residency/dpa.py

"""
Data Processing Agreement (DPA) management.

GDPR Article 28 requires written contracts with data processors.
"""

from dataclasses import dataclass
from typing import List, Optional
from datetime import datetime
from enum import Enum


class DPAStatus(Enum):
    """Status of a DPA."""
    DRAFT = "draft"
    PENDING_SIGNATURE = "pending_signature"
    ACTIVE = "active"
    EXPIRED = "expired"
    TERMINATED = "terminated"


@dataclass
class DataProcessingAgreement:
    """A Data Processing Agreement with a tenant or vendor."""
    id: str
    tenant_id: str
    counterparty_name: str
    counterparty_type: str  # "customer" or "vendor"
    
    # Processing details
    processing_purposes: List[str]
    data_categories: List[str]
    data_subject_categories: List[str]
    retention_period: str
    
    # Transfer details
    processing_locations: List[str]
    subprocessors: List[str]
    transfer_mechanism: Optional[str]
    
    # Agreement details
    status: DPAStatus
    signed_date: Optional[datetime]
    effective_date: Optional[datetime]
    expiration_date: Optional[datetime]
    document_url: str
    
    # Audit
    created_at: datetime
    updated_at: datetime


class DPAService:
    """
    Service for managing Data Processing Agreements.
    """
    
    def __init__(self, db, document_storage):
        self.db = db
        self.storage = document_storage
    
    async def create_dpa(
        self,
        tenant_id: str,
        counterparty_name: str,
        counterparty_type: str,
        processing_purposes: List[str],
        data_categories: List[str],
        processing_locations: List[str]
    ) -> DataProcessingAgreement:
        """
        Create a new DPA.
        """
        import uuid
        
        dpa_id = str(uuid.uuid4())
        
        dpa = DataProcessingAgreement(
            id=dpa_id,
            tenant_id=tenant_id,
            counterparty_name=counterparty_name,
            counterparty_type=counterparty_type,
            processing_purposes=processing_purposes,
            data_categories=data_categories,
            data_subject_categories=["Employees", "End users"],
            retention_period="As specified in main agreement",
            processing_locations=processing_locations,
            subprocessors=[],
            transfer_mechanism=None,
            status=DPAStatus.DRAFT,
            signed_date=None,
            effective_date=None,
            expiration_date=None,
            document_url="",
            created_at=datetime.utcnow(),
            updated_at=datetime.utcnow()
        )
        
        # Store in database
        await self._save_dpa(dpa)
        
        return dpa
    
    async def get_active_dpas(self, tenant_id: str) -> List[DataProcessingAgreement]:
        """Get all active DPAs for a tenant."""
        results = await self.db.fetch(
            """
            SELECT * FROM data_processing_agreements
            WHERE tenant_id = $1 AND status = 'active'
            """,
            tenant_id
        )
        
        return [self._row_to_dpa(row) for row in results]
    
    async def get_subprocessors(self, tenant_id: str) -> List[dict]:
        """
        Get list of subprocessors for a tenant.
        
        Required for GDPR transparency.
        """
        # Our subprocessors (third-party services we use)
        our_subprocessors = [
            {
                "name": "Amazon Web Services",
                "purpose": "Cloud infrastructure",
                "location": "EU (Frankfurt)",
                "dpa_url": "https://aws.amazon.com/compliance/gdpr-center/"
            },
            {
                "name": "Stripe",
                "purpose": "Payment processing",
                "location": "US (with DPF certification)",
                "dpa_url": "https://stripe.com/legal/dpa"
            },
            {
                "name": "SendGrid",
                "purpose": "Email delivery",
                "location": "US (with SCCs)",
                "dpa_url": "https://sendgrid.com/policies/dpa/"
            }
        ]
        
        return our_subprocessors
    
    async def _save_dpa(self, dpa: DataProcessingAgreement):
        """Save DPA to database."""
        await self.db.execute(
            """
            INSERT INTO data_processing_agreements 
            (id, tenant_id, counterparty_name, counterparty_type,
             processing_purposes, data_categories, processing_locations,
             status, created_at, updated_at)
            VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)
            """,
            dpa.id, dpa.tenant_id, dpa.counterparty_name, dpa.counterparty_type,
            dpa.processing_purposes, dpa.data_categories, dpa.processing_locations,
            dpa.status.value, dpa.created_at, dpa.updated_at
        )
    
    def _row_to_dpa(self, row) -> DataProcessingAgreement:
        """Convert database row to DPA object."""
        return DataProcessingAgreement(
            id=row["id"],
            tenant_id=row["tenant_id"],
            counterparty_name=row["counterparty_name"],
            counterparty_type=row["counterparty_type"],
            processing_purposes=row["processing_purposes"],
            data_categories=row["data_categories"],
            data_subject_categories=row.get("data_subject_categories", []),
            retention_period=row.get("retention_period", ""),
            processing_locations=row["processing_locations"],
            subprocessors=row.get("subprocessors", []),
            transfer_mechanism=row.get("transfer_mechanism"),
            status=DPAStatus(row["status"]),
            signed_date=row.get("signed_date"),
            effective_date=row.get("effective_date"),
            expiration_date=row.get("expiration_date"),
            document_url=row.get("document_url", ""),
            created_at=row["created_at"],
            updated_at=row["updated_at"]
        )

Part III: Real-World Application

Chapter 7: Case Studies

7.1 Slack's Data Residency

SLACK DATA RESIDENCY ARCHITECTURE

Challenge:
├── Global customer base
├── Enterprise customers need EU data residency
├── Real-time messaging requires low latency
├── Collaboration features need cross-region access

Solution: DATA RESIDENCY FOR ENTERPRISE GRID

┌────────────────────────────────────────────────────────────────────────┐
│                                                                        │
│  SLACK ENTERPRISE GRID ARCHITECTURE                                    │
│                                                                        │
│  Global Services (Metadata):                                           │
│  ├── Authentication/SSO                                                │
│  ├── Workspace directory                                               │
│  ├── Routing information                                               │
│  └── Feature configuration                                             │
│                                                                        │
│  Regional Data Stores:                                                 │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐         │
│  │   US Region     │  │   EU Region     │  │  GovCloud       │         │
│  │   ───────────   │  │   ───────────   │  │  ───────────    │         │
│  │   Messages      │  │   Messages      │  │  Messages       │         │
│  │   Files         │  │   Files         │  │  Files          │         │
│  │   User profiles │  │   User profiles │  │  User profiles  │         │
│  │   Search index  │  │   Search index  │  │  Search index   │         │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘         │
│                                                                        │
│  Per-Organization Choice:                                              │
│  ├── Organization assigned to one region                               │
│  ├── All data for that org stays in region                             │
│  └── Slack Connect (cross-org) respects both orgs' residency           │
│                                                                        │
└────────────────────────────────────────────────────────────────────────┘

Key Decisions:
├── ORGANIZATION = RESIDENCY BOUNDARY
│   └── Not user-level, org-level
│
├── METADATA CAN BE GLOBAL
│   └── Workspace IDs, routing info
│   └── Not personal data
│
├── ENCRYPTION AT REST
│   └── Customer-managed keys (Enterprise Key Management)
│   └── Per-organization keys
│
└── SLACK CONNECT HANDLING
    └── Messages between orgs stay in most restrictive region
    └── Both parties must allow the connection

Lessons:
├── Organization-level residency is manageable
├── Distinguish metadata from content
├── Encryption adds extra protection
└── Cross-org features need careful design

7.2 AWS Regional Architecture

AWS APPROACH TO DATA RESIDENCY

AWS provides building blocks for customers to implement residency:

REGIONAL SERVICES:
├── Data stays in chosen region by default
├── Customer controls replication
├── Some services (IAM, Route53) are global
└── S3 can be configured for single-region

TOOLS FOR COMPLIANCE:
├── AWS Config Rules
│   └── Detect resources outside approved regions
│
├── Service Control Policies (SCPs)
│   └── Prevent creating resources in wrong regions
│
├── AWS Artifact
│   └── Compliance reports and DPAs
│
└── Data residency guardrails
    └── AWS Control Tower for multi-account

EXAMPLE SCP FOR EU-ONLY:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Deny",
      "Action": "*",
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:RequestedRegion": [
            "eu-central-1",
            "eu-west-1",
            "eu-west-2",
            "eu-west-3",
            "eu-north-1"
          ]
        }
      }
    }
  ]
}

Lessons:
├── Cloud providers offer tools, not solutions
├── You must architect for residency
├── Policy enforcement prevents accidents
└── Global services need special handling

Chapter 8: Common Mistakes

8.1 Data Residency Anti-Patterns

DATA RESIDENCY MISTAKES

❌ MISTAKE 1: Forgetting About Backups

Wrong:
  # Data in EU region
  database = "eu-central-1-db.example.com"
  
  # But backups go to US!
  backup_bucket = "s3://backups-us-east-1/"

Problem:
  Backups are still personal data
  US backups violate EU residency

Right:
  # Data and backups in same region
  database = "eu-central-1-db.example.com"
  backup_bucket = "s3://backups-eu-central-1/"


❌ MISTAKE 2: Logging PII to Global Services

Wrong:
  # Global Datadog/Splunk instance
  logger.info(f"User {user.email} from {user.country} logged in")
  # PII now in US logging infrastructure

Problem:
  User email is personal data
  Now stored in global logging service

Right:
  # Log without PII, or use regional logging
  logger.info(
      "User logged in",
      extra={"user_id": user.id, "region": user.region}
  )


❌ MISTAKE 3: Analytics Without Consent

Wrong:
  # Track everything, figure out consent later
  analytics.track("page_view", {
      "user_id": user.id,
      "page": request.path,
      "ip": request.client.ip
  })

Problem:
  Analytics tracking may require consent
  IP addresses are personal data

Right:
  if await consent_service.has_consent(user.id, ConsentPurpose.ANALYTICS):
      analytics.track("page_view", {
          "user_id": user.id,
          "page": request.path
          # No IP - minimize data
      })


❌ MISTAKE 4: Third-Party Services Without DPAs

Wrong:
  # Just use Mixpanel, they're big so probably fine
  mixpanel.track(user.email, "signup")

Problem:
  No DPA with Mixpanel
  Data transferred to US without safeguards
  You're liable as data controller

Right:
  # Verify DPA exists, use SCCs, minimize data
  if mixpanel_dpa_signed:
      mixpanel.track(
          anonymize(user.id),  # Not email
          "signup",
          {"region": user.region}
      )


❌ MISTAKE 5: Assuming Adequacy Decisions Are Permanent

Wrong:
  # US has Privacy Shield, we're fine forever!
  transfer_data_to_us(eu_user_data)

Problem:
  Privacy Shield was invalidated (Schrems II)
  Adequacy decisions can be revoked
  
Right:
  # Monitor regulatory changes
  # Have fallback mechanisms
  # Document your transfer assessment
  if us_adequacy_valid():
      transfer_data_to_us(eu_user_data)
  else:
      use_sccs_with_supplementary_measures(eu_user_data)

Part IV: Interview Preparation

Chapter 9: Interview Tips

9.1 Data Residency Discussion Framework

DISCUSSING DATA RESIDENCY IN INTERVIEWS

When the topic comes up:

1. CLARIFY REQUIREMENTS
   "What are the data residency requirements? Are we dealing with
    GDPR (EU), specific country laws, or enterprise customer demands?"

2. IDENTIFY DATA CATEGORIES
   "Let me categorize the data:
    - Personal data that needs residency: user profiles, content
    - Metadata that might be global: routing, configuration
    - Truly anonymous data: aggregated analytics"

3. PROPOSE ARCHITECTURE
   "I'd implement regional deployments with a global control plane.
    Each tenant is assigned to a region during onboarding. Personal
    data stays in that region. Metadata and routing information
    can be global since it's not personal data."

4. ADDRESS CROSS-REGION
   "For features that span regions, like messaging between users
    in different regions, data stays in the more restrictive region.
    Or we block cross-region features for strict compliance tenants."

5. MENTION ENFORCEMENT
   "I'd use infrastructure-as-code with policy enforcement to prevent
    accidental data leakage. AWS SCPs or GCP Organization Policies
    can block resource creation in wrong regions."

9.2 Key Phrases

DATA RESIDENCY KEY PHRASES

On Regional Architecture:
"I'd deploy regional data stores with a global control plane. The
control plane handles routing and metadata - things that aren't
personal data. All personal data stays in the tenant's assigned
region, including backups and logs."

On GDPR Transfers:
"For cross-border transfers, we need a legal basis. If the destination
has an adequacy decision, we're good. Otherwise, we need Standard
Contractual Clauses with supplementary measures. Post-Schrems II,
we also need a Transfer Impact Assessment."

On Consent:
"Consent must be freely given, specific, informed, and unambiguous.
I'd implement a consent management system that records the exact
text shown, timestamp, IP, and allows easy withdrawal. Different
purposes need separate consent - no bundling."

On Third Parties:
"Every third-party processor needs a Data Processing Agreement.
We need to track subprocessors and their locations. If they process
EU data in the US, they need appropriate safeguards like DPF
certification or SCCs."

Chapter 10: Practice Problems

Problem 1: Multi-Region SaaS

Scenario: Your B2B SaaS has customers in US, EU, and Asia. EU customers require GDPR compliance including data residency. You currently have one region (us-east-1).

Questions:

  1. How do you migrate to support EU data residency?
  2. What happens to features that need cross-region data?
  3. How do you handle a user who moves from EU to US?
  • Add EU region with separate database
  • Tenant-level region assignment
  • Cross-region features: either block or store in most restrictive
  • User moving: they might need to be re-assigned to new region
  • Consider data migration tools and procedures

Problem 2: Analytics Pipeline Compliance

Scenario: You run analytics on user behavior using BigQuery (US). EU customers are complaining about GDPR compliance.

Questions:

  1. Can you continue using BigQuery for EU user data?
  2. What changes would make this compliant?
  3. How do you handle historical data that's already in BigQuery?
  • BigQuery has EU regions - use them for EU data
  • Anonymize/aggregate before cross-border transfer
  • Historical data: delete or anonymize
  • Consider consent basis for analytics
  • Document Transfer Impact Assessment

Chapter 11: Sample Interview Dialogue

Interviewer: "We need to serve EU customers. How do you handle GDPR compliance?"

You: "GDPR compliance has several aspects. Let me break it down by the main requirements.

For data residency, I'd deploy EU infrastructure - database, storage, and search in eu-central-1 or eu-west-1. Each tenant is assigned a region during onboarding based on their location. All personal data stays in that region.

For lawful basis, we'd use contract performance for core functionality - storing their data to provide the service. For marketing or analytics, we need consent. I'd implement a consent management system..."

CONSENT FLOW

User signs up
     │
     ▼
Show consent form:
├── Essential cookies: Required (legitimate interest)
├── Analytics: Optional, unchecked by default
├── Marketing: Optional, unchecked by default
     │
     ▼
Record consent with timestamp, IP, exact text shown
     │
     ▼
If they later withdraw, immediately stop processing

Interviewer: "What about our analytics that currently runs in the US?"

You: "A few options:

  1. Regional analytics: Run BigQuery in EU multi-region for EU data. More expensive but cleanest.

  2. Anonymize before transfer: Aggregate data to the point it's no longer personal data before sending to US. For example, 'Users in Germany viewed page X 1000 times' is not personal data.

  3. Transfer with safeguards: Use BigQuery in US but with SCCs and supplementary measures. Requires Transfer Impact Assessment and ongoing monitoring of US surveillance laws.

I'd recommend option 1 for personal data and option 2 for aggregated metrics. We'd need to document this in our Records of Processing Activities."

Interviewer: "How do you prove compliance to customers?"

You: "Several mechanisms:

  • DPA signing: Automated DPA generation and signing during enterprise onboarding
  • Subprocessor list: Published list of all third parties that process data
  • Data residency documentation: Architecture diagrams showing data flows
  • Audit logs: Records of all data access, exportable for audits
  • Certifications: SOC 2 Type II, ISO 27001 for security controls

For enterprise customers, we could offer a compliance portal showing their data location, consent records, and processing activities."


Summary

DAY 3 KEY TAKEAWAYS

DATA RESIDENCY BASICS:
├── Residency = where data is stored
├── Sovereignty = which laws apply
├── Localization = legal requirement to keep data in-country
└── Transfer = moving data across borders

KEY REGULATIONS:
├── GDPR (EU): Most influential, extraterritorial
├── LGPD (Brazil): GDPR-like
├── PIPL (China): Strict localization
├── DPDP (India): Emerging requirements
└── Various country-specific laws

ARCHITECTURE PATTERNS:
├── Single region: Simple but limited compliance
├── Regional deployments: Full isolation
├── Global control + regional data: Balance
└── Per-tenant region: Maximum flexibility

IMPLEMENTATION:
├── Tenant region assignment at onboarding
├── Regional database routing
├── Regional storage routing
├── Consent management system
├── DPA tracking

GDPR TRANSFERS:
├── Adequacy decision: Easiest
├── SCCs: Most common
├── BCRs: For corporate groups
├── Supplementary measures: Post-Schrems II

CONSENT REQUIREMENTS:
├── Freely given
├── Specific (per purpose)
├── Informed (clear language)
├── Unambiguous (affirmative action)
├── Withdrawable (easy as giving)

COMMON MISTAKES:
├── Forgetting backups
├── PII in global logs
├── Analytics without consent
├── Missing DPAs
└── Assuming adequacy is permanent

DEFAULT APPROACH:
├── Regional data stores for personal data
├── Global control plane for metadata
├── Consent management from day one
├── DPAs with all processors
└── Document everything

Further Reading

Official Resources:

Compliance Tools:

  • OneTrust (consent management)
  • TrustArc (privacy management)
  • BigID (data discovery)

Cloud Provider Resources:


End of Day 3: Data Residency and GDPR

Tomorrow: Day 4 — Right to Deletion. We'll learn how to actually delete user data when it's spread across dozens of systems - the hardest GDPR requirement to implement.