Week 9 — Day 3: Data Residency and GDPR
System Design Mastery Series — Multi-Tenancy, Security, and Compliance Week
Preface
Your startup just closed a deal with a German enterprise customer. Celebration! Then their legal team sends this:
THE DATA RESIDENCY REALITY
From: legal@deutschebank.example.de
Subject: Data Processing Requirements
Dear Vendor,
Before we can proceed, please confirm:
1. All personal data of our employees will be stored exclusively
in the European Union
2. No personal data will be transferred to the United States
or any other third country without appropriate safeguards
3. You will sign our Data Processing Agreement (DPA) based on
GDPR Article 28
4. You can demonstrate compliance with:
- GDPR (EU)
- BDSG (German Federal Data Protection Act)
- Schrems II ruling implications
5. You can provide data residency certification and audit logs
Please provide your technical architecture showing how you ensure
EU data stays in the EU.
Regards,
Deutsche Bank Legal
---
You look at your architecture:
Current state:
├── Single AWS region: us-east-1
├── All data in one PostgreSQL cluster
├── Backups replicated to us-west-2
├── Analytics processed in BigQuery (US)
├── Customer support via Zendesk (US)
└── Error tracking via Sentry (US)
Your response options:
❌ "Sorry, we can't do this" (lose the deal)
❌ "Sure, we comply!" (lie and risk massive fines)
✓ "We need to re-architect" (this week's lesson)
Today, we'll learn how to build systems that respect data residency requirements while running a global platform.
Part I: Foundations
Chapter 1: Understanding Data Residency
1.1 What Is Data Residency?
Data residency refers to the physical or geographic location where data is stored and processed. Data residency requirements are laws or policies that mandate data must remain within specific geographic boundaries.
DATA RESIDENCY CONCEPTS
DATA RESIDENCY:
├── Where data is physically stored
├── Geographic location of servers
├── Can be chosen by organization
└── Example: "Our EU data is in Frankfurt"
DATA SOVEREIGNTY:
├── Legal jurisdiction over data
├── Which country's laws apply
├── Based on storage location + company location
└── Example: "EU data subject to GDPR"
DATA LOCALIZATION:
├── Legal requirement to keep data in-country
├── Government-mandated residency
├── Often for national security or privacy
└── Example: "Russian personal data must stay in Russia"
DATA TRANSFER:
├── Moving data across borders
├── May require legal basis
├── Subject to adequacy decisions
└── Example: "EU to US transfer requires SCCs"
1.2 Why Data Residency Matters
REGULATORY LANDSCAPE
EUROPEAN UNION (GDPR):
├── Applies to: EU residents' personal data
├── Key rules:
│ ├── Transfers outside EU need legal basis
│ ├── Adequacy decisions (few countries qualify)
│ ├── Standard Contractual Clauses (SCCs)
│ └── Binding Corporate Rules (BCRs)
├── Fines: Up to €20M or 4% global revenue
└── Notable: Schrems II invalidated Privacy Shield
RUSSIA (Federal Law 242-FZ):
├── Applies to: Russian citizens' personal data
├── Key rules:
│ ├── Initial collection must be in Russia
│ ├── Primary database must be in Russia
│ └── Cross-border transfer allowed after local storage
├── Enforcement: Website blocking, fines
└── Notable: LinkedIn blocked in Russia
CHINA (PIPL + DSL + Cybersecurity Law):
├── Applies to: Data collected in China
├── Key rules:
│ ├── Critical data must stay in China
│ ├── Security assessment for cross-border
│ └── Government access requirements
├── Enforcement: Business license revocation
└── Notable: Extremely broad scope
INDIA (DPDP Act 2023):
├── Applies to: Indian residents' data
├── Key rules:
│ ├── Certain data cannot leave India
│ ├── Government notification for transfers
│ └── Still evolving
├── Enforcement: Fines up to ₹250 crore
└── Notable: Localization for payment data (RBI)
BRAZIL (LGPD):
├── Applies to: Brazilian residents' data
├── Key rules:
│ ├── Similar to GDPR
│ ├── Adequate protection required for transfers
│ └── Consent or legitimate interest basis
├── Enforcement: Fines up to 2% revenue
└── Notable: Closely modeled on GDPR
1.3 GDPR Deep Dive
GDPR KEY CONCEPTS FOR ENGINEERS
PERSONAL DATA (Article 4):
├── Any information relating to identified/identifiable person
├── Examples:
│ ├── Name, email, phone number
│ ├── IP address, cookie IDs
│ ├── Location data
│ ├── Behavioral data
│ └── Device identifiers
└── Note: Pseudonymized data is still personal data
SPECIAL CATEGORIES (Article 9):
├── Extra protection required for:
│ ├── Racial or ethnic origin
│ ├── Political opinions
│ ├── Religious beliefs
│ ├── Trade union membership
│ ├── Genetic/biometric data
│ ├── Health data
│ └── Sexual orientation
└── Generally prohibited without explicit consent
DATA SUBJECT RIGHTS:
├── Right to access (Article 15)
├── Right to rectification (Article 16)
├── Right to erasure (Article 17) ← Tomorrow's topic
├── Right to data portability (Article 20)
├── Right to object (Article 21)
└── Rights related to automated decisions (Article 22)
LAWFUL BASIS FOR PROCESSING (Article 6):
├── Consent (explicit, informed, withdrawable)
├── Contract performance
├── Legal obligation
├── Vital interests
├── Public interest
└── Legitimate interests (balance test)
CROSS-BORDER TRANSFERS (Chapter 5):
├── Adequacy decision (safest)
│ └── Countries: Andorra, Argentina, Canada (commercial),
│ Faroe Islands, Guernsey, Israel, Isle of Man, Japan,
│ Jersey, New Zealand, South Korea, Switzerland,
│ UK, Uruguay, and US (new Data Privacy Framework)
├── Standard Contractual Clauses (SCCs)
├── Binding Corporate Rules (BCRs)
├── Derogations (limited cases)
└── Note: Must assess destination country's laws
Chapter 2: Data Residency Architecture Patterns
2.1 Architecture Options
DATA RESIDENCY PATTERNS
PATTERN 1: SINGLE REGION (Limited Compliance)
─────────────────────────────────────────────
┌───────────────────────────────────────────────────────────────────────┐
│ │
│ US-EAST-1 (Virginia) │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ App │ │ Database │ │ Storage │ │
│ │ Servers │ │ Cluster │ │ (S3) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ ALL data from ALL regions stored here │
│ │
└───────────────────────────────────────────────────────────────────────┘
Pros: Simple, cheap
Cons: Can't comply with data residency laws
Use: Internal tools, non-regulated data
PATTERN 2: REGIONAL DEPLOYMENTS (Full Isolation)
────────────────────────────────────────────────
┌───────────────────────┐ ┌───────────────────────┐ ┌───────────────────────┐
│ US Deployment │ │ EU Deployment │ │ APAC Deployment │
│ (us-east-1) │ │ (eu-central-1) │ │ (ap-southeast-1) │
│ ┌───────────────────┐ │ │ ┌───────────────────┐ │ │ ┌───────────────────┐ │
│ │ App + DB + Storage│ │ │ │ App + DB + Storage│ │ │ │ App + DB + Storage│ │
│ │ US customers only │ │ │ │ EU customers only │ │ │ │ APAC customers │ │
│ └───────────────────┘ │ │ └───────────────────┘ │ │ └───────────────────┘ │
└───────────────────────┘ └───────────────────────┘ └───────────────────────┘
│ │ │
└──────────────────────────┴─────────────────────────┘
NO DATA SHARING BETWEEN REGIONS
Pros: Strong compliance, data stays in region
Cons: Complex, expensive, no cross-region features
Use: Highly regulated industries, strict localization
PATTERN 3: REGIONAL DATA, GLOBAL CONTROL PLANE
──────────────────────────────────────────────
┌───────────────────────┐
│ Global Control │
│ (Metadata only) │
│ - Tenant registry │
│ - Configuration │
│ - Routing rules │
└───────────┬───────────┘
│
┌───────────────────────┼───────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ US Region │ │ EU Region │ │ APAC Region │
│ ┌─────────┐ │ │ ┌─────────┐ │ │ ┌─────────┐ │
│ │ US Data │ │ │ │ EU Data │ │ │ │APAC Data│ │
│ └─────────┘ │ │ └─────────┘ │ │ └─────────┘ │
└───────────────┘ └───────────────┘ └───────────────┘
Pros: Compliance + some global features
Cons: Complex routing, careful about what's "metadata"
Use: Most SaaS companies with compliance needs
PATTERN 4: DATA RESIDENCY BY TENANT
───────────────────────────────────
┌────────────────────────────────────────────────────────────────────────┐
│ Global Application │
│ │
│ Request → Tenant Lookup → Route to Tenant's Region → Process │
│ │
│ Tenant A (US) → us-east-1 │
│ Tenant B (EU) → eu-central-1 │
│ Tenant C (EU) → eu-central-1 │
│ Tenant D (APAC) → ap-southeast-1 │
│ │
└────────────────────────────────────────────────────────────────────────┘
Pros: Flexible, per-tenant compliance
Cons: Routing complexity, cross-tenant features limited
Use: B2B SaaS with enterprise customers
2.2 What Data Goes Where?
DATA CLASSIFICATION FOR RESIDENCY
PERSONAL DATA (Must respect residency):
├── User profiles (name, email, phone)
├── Employee data
├── Customer communications
├── Support tickets with PII
├── Files uploaded by users
├── Activity logs with user IDs
├── IP addresses and location data
└── Any data linked to an individual
OPERATIONAL DATA (Usually can be global):
├── Anonymized/aggregated analytics
├── System metrics and monitoring
├── Error logs (if PII stripped)
├── Configuration data
├── Feature flags
├── Infrastructure metadata
└── Audit logs of system events (not user actions)
GRAY AREAS (Careful analysis needed):
├── Pseudonymized data
│ └── Still personal data under GDPR!
├── Behavioral analytics
│ └── May be personal if linkable
├── Machine learning training data
│ └── Depends on source
├── Backups
│ └── Same rules as primary data
└── Cached data
└── Same rules as primary data
DECISION FRAMEWORK:
┌────────────────────────────────────────────────────────────────────────┐
│ │
│ Question 1: Can this data identify a person? │
│ ├── Yes → Personal data → Residency rules apply │
│ └── No → Continue to Question 2 │
│ │
│ Question 2: Can this data be combined with other data to identify? │
│ ├── Yes → Personal data → Residency rules apply │
│ └── No → Operational data → Usually global OK │
│ │
│ Question 3: Is this derived from personal data? │
│ ├── Yes → Analyze if truly anonymous │
│ └── No → Operational data │
│ │
└────────────────────────────────────────────────────────────────────────┘
Chapter 3: GDPR Consent and Legal Basis
3.1 Lawful Basis Decision Tree
CHOOSING LAWFUL BASIS FOR PROCESSING
┌───────────────────────┐
│ Why are you processing │
│ this personal data? │
└───────────┬───────────┘
│
┌───────────────────────┼───────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ User requested│ │ Required by │ │ Business │
│ a service │ │ law │ │ benefit │
└───────┬───────┘ └───────┬───────┘ └───────┬───────┘
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ CONTRACT │ │ LEGAL │ │ LEGITIMATE │
│ PERFORMANCE │ │ OBLIGATION │ │ INTEREST │
│ │ │ │ │ (needs test) │
│ Examples: │ │ Examples: │ │ │
│ - Deliver │ │ - Tax records │ │ Examples: │
│ product │ │ - Employment │ │ - Fraud │
│ - Process │ │ law │ │ prevention │
│ payment │ │ - Court order │ │ - Analytics │
│ - Send order │ │ │ │ - Marketing │
│ updates │ │ │ │ (maybe) │
└───────────────┘ └───────────────┘ └───────────────┘
│
▼
┌───────────────┐
│ Need CONSENT? │
│ │
│ - Marketing │
│ - Cookies │
│ - Third-party │
│ sharing │
│ - Special │
│ categories │
└───────────────┘
Part II: Implementation
Chapter 4: Regional Data Architecture
4.1 Tenant Region Assignment
# data_residency/tenant_region.py
"""
Tenant region assignment and routing.
Each tenant is assigned to a region based on their data residency requirements.
"""
from dataclasses import dataclass
from typing import Optional, Dict
from enum import Enum
import logging
logger = logging.getLogger(__name__)
class DataRegion(Enum):
"""Supported data regions."""
US = "us"
EU = "eu"
UK = "uk"
APAC = "apac"
BRAZIL = "brazil"
INDIA = "india"
@dataclass
class RegionConfig:
"""Configuration for a data region."""
region_id: DataRegion
display_name: str
aws_region: str
database_endpoint: str
storage_bucket: str
search_endpoint: str
cache_endpoint: str
# Compliance attributes
gdpr_compliant: bool = False
data_localization: bool = False
adequacy_decision: bool = False
# Region configurations
REGION_CONFIGS = {
DataRegion.US: RegionConfig(
region_id=DataRegion.US,
display_name="United States",
aws_region="us-east-1",
database_endpoint="db-us.example.com",
storage_bucket="data-us-example",
search_endpoint="search-us.example.com",
cache_endpoint="cache-us.example.com",
gdpr_compliant=False,
adequacy_decision=True # US-EU Data Privacy Framework
),
DataRegion.EU: RegionConfig(
region_id=DataRegion.EU,
display_name="European Union",
aws_region="eu-central-1",
database_endpoint="db-eu.example.com",
storage_bucket="data-eu-example",
search_endpoint="search-eu.example.com",
cache_endpoint="cache-eu.example.com",
gdpr_compliant=True,
adequacy_decision=True
),
DataRegion.UK: RegionConfig(
region_id=DataRegion.UK,
display_name="United Kingdom",
aws_region="eu-west-2",
database_endpoint="db-uk.example.com",
storage_bucket="data-uk-example",
search_endpoint="search-uk.example.com",
cache_endpoint="cache-uk.example.com",
gdpr_compliant=True, # UK GDPR
adequacy_decision=True
),
DataRegion.APAC: RegionConfig(
region_id=DataRegion.APAC,
display_name="Asia Pacific",
aws_region="ap-southeast-1",
database_endpoint="db-apac.example.com",
storage_bucket="data-apac-example",
search_endpoint="search-apac.example.com",
cache_endpoint="cache-apac.example.com",
gdpr_compliant=False,
adequacy_decision=False
),
}
@dataclass
class TenantRegionAssignment:
"""A tenant's region assignment."""
tenant_id: str
data_region: DataRegion
reason: str # Why this region was assigned
assigned_at: str
can_change: bool = True # Some contracts lock the region
class TenantRegionService:
"""
Service for managing tenant region assignments.
"""
def __init__(self, db, cache):
self.db = db
self.cache = cache
async def get_tenant_region(self, tenant_id: str) -> RegionConfig:
"""
Get the region configuration for a tenant.
"""
# Check cache first
cache_key = f"tenant_region:{tenant_id}"
cached = await self.cache.get(cache_key)
if cached:
return REGION_CONFIGS[DataRegion(cached)]
# Load from database
result = await self.db.fetchone(
"SELECT data_region FROM tenants WHERE id = $1",
tenant_id
)
if not result:
raise ValueError(f"Tenant not found: {tenant_id}")
region = DataRegion(result["data_region"])
# Cache for 1 hour
await self.cache.set(cache_key, region.value, ttl=3600)
return REGION_CONFIGS[region]
async def assign_region(
self,
tenant_id: str,
region: DataRegion,
reason: str
) -> TenantRegionAssignment:
"""
Assign a tenant to a data region.
This should be done during tenant onboarding.
Changing region later requires data migration.
"""
from datetime import datetime
# Validate region is supported
if region not in REGION_CONFIGS:
raise ValueError(f"Unsupported region: {region}")
# Update tenant
await self.db.execute(
"""
UPDATE tenants
SET data_region = $2, region_assigned_at = $3, region_reason = $4
WHERE id = $1
""",
tenant_id, region.value, datetime.utcnow(), reason
)
# Invalidate cache
await self.cache.delete(f"tenant_region:{tenant_id}")
logger.info(
f"Assigned tenant {tenant_id} to region {region.value}",
extra={"tenant_id": tenant_id, "region": region.value, "reason": reason}
)
return TenantRegionAssignment(
tenant_id=tenant_id,
data_region=region,
reason=reason,
assigned_at=datetime.utcnow().isoformat()
)
async def suggest_region(
self,
country_code: str,
compliance_requirements: list
) -> DataRegion:
"""
Suggest a region based on country and compliance needs.
"""
# EU countries → EU region
eu_countries = [
"AT", "BE", "BG", "HR", "CY", "CZ", "DK", "EE", "FI", "FR",
"DE", "GR", "HU", "IE", "IT", "LV", "LT", "LU", "MT", "NL",
"PL", "PT", "RO", "SK", "SI", "ES", "SE"
]
if country_code in eu_countries:
return DataRegion.EU
if country_code == "GB":
return DataRegion.UK
if country_code == "US":
return DataRegion.US
if country_code == "BR":
return DataRegion.BRAZIL
if country_code == "IN":
return DataRegion.INDIA
# APAC countries
apac_countries = ["AU", "NZ", "SG", "JP", "KR", "HK", "TW"]
if country_code in apac_countries:
return DataRegion.APAC
# Default to US for others (with SCCs if needed)
return DataRegion.US
4.2 Regional Database Routing
# data_residency/database_router.py
"""
Database routing based on tenant region.
Routes database queries to the correct regional database.
"""
from typing import Dict, Any
import asyncpg
import logging
logger = logging.getLogger(__name__)
class RegionalDatabaseRouter:
"""
Routes database connections to regional databases.
Each tenant's data is stored in their assigned region's database.
"""
def __init__(self, region_configs: Dict[DataRegion, RegionConfig]):
self.region_configs = region_configs
self._pools: Dict[DataRegion, asyncpg.Pool] = {}
async def initialize(self):
"""Initialize connection pools for all regions."""
for region, config in self.region_configs.items():
pool = await asyncpg.create_pool(
host=config.database_endpoint,
database="app_db",
min_size=5,
max_size=20
)
self._pools[region] = pool
logger.info(f"Initialized database pool for region: {region.value}")
async def get_connection(self, tenant_id: str):
"""
Get database connection for a tenant.
Routes to the correct regional database.
"""
# Get tenant's region
region_config = await self.tenant_region_service.get_tenant_region(tenant_id)
region = region_config.region_id
if region not in self._pools:
raise ValueError(f"No database pool for region: {region}")
return self._pools[region].acquire()
async def execute_in_region(
self,
region: DataRegion,
query: str,
*args
) -> Any:
"""
Execute a query in a specific region.
Used for admin operations that target a specific region.
"""
if region not in self._pools:
raise ValueError(f"No database pool for region: {region}")
async with self._pools[region].acquire() as conn:
return await conn.fetch(query, *args)
class RegionalStorageRouter:
"""
Routes file storage to regional S3 buckets.
"""
def __init__(self, region_configs: Dict[DataRegion, RegionConfig]):
self.region_configs = region_configs
self._clients: Dict[DataRegion, Any] = {}
async def initialize(self):
"""Initialize S3 clients for all regions."""
import aioboto3
for region, config in self.region_configs.items():
session = aioboto3.Session()
client = await session.client(
's3',
region_name=config.aws_region
).__aenter__()
self._clients[region] = (client, config.storage_bucket)
async def upload_file(
self,
tenant_id: str,
file_key: str,
file_data: bytes,
content_type: str
) -> str:
"""
Upload a file to the tenant's regional storage.
"""
region_config = await self.tenant_region_service.get_tenant_region(tenant_id)
region = region_config.region_id
client, bucket = self._clients[region]
# Include tenant_id in key for organization
full_key = f"tenants/{tenant_id}/{file_key}"
await client.put_object(
Bucket=bucket,
Key=full_key,
Body=file_data,
ContentType=content_type,
Metadata={
"tenant_id": tenant_id,
"region": region.value
}
)
logger.info(
f"Uploaded file to regional storage",
extra={
"tenant_id": tenant_id,
"region": region.value,
"bucket": bucket,
"key": full_key
}
)
return f"s3://{bucket}/{full_key}"
async def get_file(
self,
tenant_id: str,
file_key: str
) -> bytes:
"""
Get a file from the tenant's regional storage.
"""
region_config = await self.tenant_region_service.get_tenant_region(tenant_id)
region = region_config.region_id
client, bucket = self._clients[region]
full_key = f"tenants/{tenant_id}/{file_key}"
response = await client.get_object(Bucket=bucket, Key=full_key)
return await response['Body'].read()
Chapter 5: Consent Management
5.1 Consent Service Implementation
# data_residency/consent.py
"""
Consent management for GDPR compliance.
Tracks user consent for different processing purposes.
"""
from dataclasses import dataclass
from typing import List, Optional, Dict
from datetime import datetime
from enum import Enum
import uuid
import logging
logger = logging.getLogger(__name__)
class ConsentPurpose(Enum):
"""Purposes for which consent can be given."""
SERVICE_DELIVERY = "service_delivery" # Usually contract basis, not consent
MARKETING_EMAIL = "marketing_email"
MARKETING_PHONE = "marketing_phone"
ANALYTICS = "analytics"
PERSONALIZATION = "personalization"
THIRD_PARTY_SHARING = "third_party_sharing"
PROFILING = "profiling"
COOKIES_ESSENTIAL = "cookies_essential"
COOKIES_ANALYTICS = "cookies_analytics"
COOKIES_MARKETING = "cookies_marketing"
class ConsentStatus(Enum):
"""Status of consent."""
GRANTED = "granted"
DENIED = "denied"
WITHDRAWN = "withdrawn"
NOT_ASKED = "not_asked"
@dataclass
class ConsentRecord:
"""Record of a consent decision."""
id: str
user_id: str
tenant_id: str
purpose: ConsentPurpose
status: ConsentStatus
granted_at: Optional[datetime]
withdrawn_at: Optional[datetime]
ip_address: str
user_agent: str
consent_text: str # Exact text shown to user
consent_version: str # Version of consent form
@dataclass
class ConsentPreferences:
"""A user's current consent preferences."""
user_id: str
consents: Dict[ConsentPurpose, ConsentStatus]
last_updated: datetime
class ConsentService:
"""
Service for managing user consent.
Key principles:
- Consent must be freely given, specific, informed, unambiguous
- Must be as easy to withdraw as to give
- Must keep records of when/how consent was given
- Consent is per-purpose, not blanket
"""
def __init__(self, db, event_publisher):
self.db = db
self.events = event_publisher
async def record_consent(
self,
user_id: str,
tenant_id: str,
purpose: ConsentPurpose,
granted: bool,
ip_address: str,
user_agent: str,
consent_text: str,
consent_version: str
) -> ConsentRecord:
"""
Record a consent decision.
This creates an immutable audit record.
"""
record = ConsentRecord(
id=str(uuid.uuid4()),
user_id=user_id,
tenant_id=tenant_id,
purpose=purpose,
status=ConsentStatus.GRANTED if granted else ConsentStatus.DENIED,
granted_at=datetime.utcnow() if granted else None,
withdrawn_at=None,
ip_address=ip_address,
user_agent=user_agent,
consent_text=consent_text,
consent_version=consent_version
)
# Store in database (immutable log)
await self.db.execute(
"""
INSERT INTO consent_records
(id, user_id, tenant_id, purpose, status, granted_at,
ip_address, user_agent, consent_text, consent_version, created_at)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11)
""",
record.id, record.user_id, record.tenant_id,
record.purpose.value, record.status.value, record.granted_at,
record.ip_address, record.user_agent, record.consent_text,
record.consent_version, datetime.utcnow()
)
# Update current preferences
await self._update_current_preferences(
user_id, tenant_id, purpose,
ConsentStatus.GRANTED if granted else ConsentStatus.DENIED
)
# Publish event for downstream systems
await self.events.publish(
"consent",
{
"type": "consent.recorded",
"user_id": user_id,
"tenant_id": tenant_id,
"purpose": purpose.value,
"granted": granted
}
)
logger.info(
f"Consent recorded",
extra={
"user_id": user_id,
"purpose": purpose.value,
"granted": granted
}
)
return record
async def withdraw_consent(
self,
user_id: str,
tenant_id: str,
purpose: ConsentPurpose,
ip_address: str,
user_agent: str
) -> ConsentRecord:
"""
Withdraw previously given consent.
Must be as easy as giving consent.
"""
record = ConsentRecord(
id=str(uuid.uuid4()),
user_id=user_id,
tenant_id=tenant_id,
purpose=purpose,
status=ConsentStatus.WITHDRAWN,
granted_at=None,
withdrawn_at=datetime.utcnow(),
ip_address=ip_address,
user_agent=user_agent,
consent_text="Consent withdrawn by user",
consent_version="withdrawal"
)
await self.db.execute(
"""
INSERT INTO consent_records
(id, user_id, tenant_id, purpose, status, withdrawn_at,
ip_address, user_agent, consent_text, consent_version, created_at)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11)
""",
record.id, record.user_id, record.tenant_id,
record.purpose.value, record.status.value, record.withdrawn_at,
record.ip_address, record.user_agent, record.consent_text,
record.consent_version, datetime.utcnow()
)
await self._update_current_preferences(
user_id, tenant_id, purpose, ConsentStatus.WITHDRAWN
)
# Publish event - systems must stop processing
await self.events.publish(
"consent",
{
"type": "consent.withdrawn",
"user_id": user_id,
"tenant_id": tenant_id,
"purpose": purpose.value
}
)
logger.info(
f"Consent withdrawn",
extra={"user_id": user_id, "purpose": purpose.value}
)
return record
async def get_current_consent(
self,
user_id: str,
tenant_id: str,
purpose: ConsentPurpose
) -> ConsentStatus:
"""
Get current consent status for a purpose.
"""
result = await self.db.fetchone(
"""
SELECT status FROM user_consent_preferences
WHERE user_id = $1 AND tenant_id = $2 AND purpose = $3
""",
user_id, tenant_id, purpose.value
)
if not result:
return ConsentStatus.NOT_ASKED
return ConsentStatus(result["status"])
async def has_consent(
self,
user_id: str,
tenant_id: str,
purpose: ConsentPurpose
) -> bool:
"""
Check if user has given consent for a purpose.
"""
status = await self.get_current_consent(user_id, tenant_id, purpose)
return status == ConsentStatus.GRANTED
async def get_all_preferences(
self,
user_id: str,
tenant_id: str
) -> ConsentPreferences:
"""
Get all consent preferences for a user.
"""
results = await self.db.fetch(
"""
SELECT purpose, status, updated_at
FROM user_consent_preferences
WHERE user_id = $1 AND tenant_id = $2
""",
user_id, tenant_id
)
consents = {}
last_updated = datetime.min
for row in results:
consents[ConsentPurpose(row["purpose"])] = ConsentStatus(row["status"])
if row["updated_at"] > last_updated:
last_updated = row["updated_at"]
return ConsentPreferences(
user_id=user_id,
consents=consents,
last_updated=last_updated
)
async def get_consent_history(
self,
user_id: str,
tenant_id: str
) -> List[ConsentRecord]:
"""
Get full consent history for a user.
Required for data subject access requests.
"""
results = await self.db.fetch(
"""
SELECT * FROM consent_records
WHERE user_id = $1 AND tenant_id = $2
ORDER BY created_at DESC
""",
user_id, tenant_id
)
return [
ConsentRecord(
id=row["id"],
user_id=row["user_id"],
tenant_id=row["tenant_id"],
purpose=ConsentPurpose(row["purpose"]),
status=ConsentStatus(row["status"]),
granted_at=row["granted_at"],
withdrawn_at=row["withdrawn_at"],
ip_address=row["ip_address"],
user_agent=row["user_agent"],
consent_text=row["consent_text"],
consent_version=row["consent_version"]
)
for row in results
]
async def _update_current_preferences(
self,
user_id: str,
tenant_id: str,
purpose: ConsentPurpose,
status: ConsentStatus
):
"""Update current preferences table."""
await self.db.execute(
"""
INSERT INTO user_consent_preferences (user_id, tenant_id, purpose, status, updated_at)
VALUES ($1, $2, $3, $4, $5)
ON CONFLICT (user_id, tenant_id, purpose)
DO UPDATE SET status = $4, updated_at = $5
""",
user_id, tenant_id, purpose.value, status.value, datetime.utcnow()
)
5.2 Consent-Aware Processing
# data_residency/consent_middleware.py
"""
Middleware and decorators for consent-aware processing.
"""
from functools import wraps
from fastapi import HTTPException
def requires_consent(purpose: ConsentPurpose):
"""
Decorator that ensures user has consented to a purpose.
Usage:
@requires_consent(ConsentPurpose.MARKETING_EMAIL)
async def send_marketing_email(user_id: str):
...
"""
def decorator(func):
@wraps(func)
async def wrapper(*args, **kwargs):
# Get user_id from kwargs or context
user_id = kwargs.get("user_id")
tenant_id = get_current_tenant_id()
if not user_id:
raise ValueError("user_id required for consent check")
# Check consent
has_consent = await consent_service.has_consent(
user_id, tenant_id, purpose
)
if not has_consent:
raise ConsentRequiredError(
f"User has not consented to {purpose.value}"
)
return await func(*args, **kwargs)
return wrapper
return decorator
class ConsentRequiredError(Exception):
"""Raised when required consent is not present."""
pass
# Example usage in a service
class MarketingService:
"""Service that requires consent for operations."""
def __init__(self, consent_service: ConsentService, email_client):
self.consent = consent_service
self.email = email_client
async def send_newsletter(self, user_id: str, content: str):
"""
Send newsletter to user.
Requires marketing email consent.
"""
tenant_id = get_current_tenant_id()
# Check consent before sending
if not await self.consent.has_consent(
user_id, tenant_id, ConsentPurpose.MARKETING_EMAIL
):
logger.info(
f"Skipping newsletter for user without consent",
extra={"user_id": user_id}
)
return False
await self.email.send(
to=user_id,
subject="Newsletter",
content=content
)
return True
async def send_bulk_newsletter(self, user_ids: List[str], content: str):
"""
Send newsletter to multiple users.
Filters to only users with consent.
"""
tenant_id = get_current_tenant_id()
# Batch check consent
consented_users = []
for user_id in user_ids:
if await self.consent.has_consent(
user_id, tenant_id, ConsentPurpose.MARKETING_EMAIL
):
consented_users.append(user_id)
logger.info(
f"Sending newsletter to {len(consented_users)}/{len(user_ids)} users with consent"
)
for user_id in consented_users:
await self.email.send(
to=user_id,
subject="Newsletter",
content=content
)
return len(consented_users)
Chapter 6: Cross-Border Data Transfers
6.1 Transfer Impact Assessment
# data_residency/transfer_assessment.py
"""
Cross-border data transfer assessment and documentation.
"""
from dataclasses import dataclass
from typing import List, Optional
from datetime import datetime
from enum import Enum
class TransferMechanism(Enum):
"""Legal mechanisms for cross-border transfers."""
ADEQUACY_DECISION = "adequacy_decision"
STANDARD_CONTRACTUAL_CLAUSES = "sccs"
BINDING_CORPORATE_RULES = "bcrs"
DEROGATION = "derogation"
CONSENT = "consent"
class RiskLevel(Enum):
"""Risk level for data transfers."""
LOW = "low"
MEDIUM = "medium"
HIGH = "high"
PROHIBITED = "prohibited"
@dataclass
class TransferAssessment:
"""Assessment of a cross-border data transfer."""
id: str
source_region: str
destination_region: str
data_categories: List[str]
data_subjects: str # Description of affected individuals
transfer_mechanism: TransferMechanism
risk_level: RiskLevel
supplementary_measures: List[str]
assessment_date: datetime
next_review_date: datetime
approved_by: str
notes: str
class TransferImpactAssessment:
"""
Performs Transfer Impact Assessments (TIAs) as required by Schrems II.
"""
# Countries with adequacy decisions (simplified)
ADEQUATE_COUNTRIES = {
"EU", "EEA", "GB", "CH", "JP", "KR", "CA", "NZ", "IL", "UY", "AR"
}
# Countries with high surveillance risk (simplified assessment)
HIGH_RISK_COUNTRIES = {
"CN", "RU" # This is a simplification - real assessment is more nuanced
}
def assess_transfer(
self,
source_country: str,
destination_country: str,
data_categories: List[str],
special_categories: bool = False
) -> TransferAssessment:
"""
Assess a proposed data transfer.
"""
# Same region = no transfer
if source_country == destination_country:
return self._create_assessment(
source_country, destination_country, data_categories,
TransferMechanism.ADEQUACY_DECISION, # Not really a transfer
RiskLevel.LOW,
[]
)
# Check adequacy
if destination_country in self.ADEQUATE_COUNTRIES:
return self._create_assessment(
source_country, destination_country, data_categories,
TransferMechanism.ADEQUACY_DECISION,
RiskLevel.LOW,
[]
)
# US-specific handling (Data Privacy Framework)
if destination_country == "US":
return self._create_assessment(
source_country, destination_country, data_categories,
TransferMechanism.ADEQUACY_DECISION, # DPF
RiskLevel.MEDIUM, # Some risk remains
["Verify recipient is DPF certified",
"Review specific data categories"]
)
# High-risk countries
if destination_country in self.HIGH_RISK_COUNTRIES:
if special_categories:
return self._create_assessment(
source_country, destination_country, data_categories,
TransferMechanism.DEROGATION,
RiskLevel.PROHIBITED,
["Transfer of special categories to this jurisdiction is not recommended"]
)
return self._create_assessment(
source_country, destination_country, data_categories,
TransferMechanism.STANDARD_CONTRACTUAL_CLAUSES,
RiskLevel.HIGH,
["Implement encryption in transit and at rest",
"Minimize data transferred",
"Regular review of legal situation",
"Consider pseudonymization"]
)
# Default: SCCs with supplementary measures
return self._create_assessment(
source_country, destination_country, data_categories,
TransferMechanism.STANDARD_CONTRACTUAL_CLAUSES,
RiskLevel.MEDIUM,
["Sign SCCs with recipient",
"Document supplementary measures",
"Review annually"]
)
def _create_assessment(
self,
source: str,
dest: str,
data_categories: List[str],
mechanism: TransferMechanism,
risk: RiskLevel,
measures: List[str]
) -> TransferAssessment:
"""Create assessment record."""
import uuid
from datetime import timedelta
return TransferAssessment(
id=str(uuid.uuid4()),
source_region=source,
destination_region=dest,
data_categories=data_categories,
data_subjects="Users and employees of tenant",
transfer_mechanism=mechanism,
risk_level=risk,
supplementary_measures=measures,
assessment_date=datetime.utcnow(),
next_review_date=datetime.utcnow() + timedelta(days=365),
approved_by="",
notes=""
)
6.2 Data Processing Agreements
# data_residency/dpa.py
"""
Data Processing Agreement (DPA) management.
GDPR Article 28 requires written contracts with data processors.
"""
from dataclasses import dataclass
from typing import List, Optional
from datetime import datetime
from enum import Enum
class DPAStatus(Enum):
"""Status of a DPA."""
DRAFT = "draft"
PENDING_SIGNATURE = "pending_signature"
ACTIVE = "active"
EXPIRED = "expired"
TERMINATED = "terminated"
@dataclass
class DataProcessingAgreement:
"""A Data Processing Agreement with a tenant or vendor."""
id: str
tenant_id: str
counterparty_name: str
counterparty_type: str # "customer" or "vendor"
# Processing details
processing_purposes: List[str]
data_categories: List[str]
data_subject_categories: List[str]
retention_period: str
# Transfer details
processing_locations: List[str]
subprocessors: List[str]
transfer_mechanism: Optional[str]
# Agreement details
status: DPAStatus
signed_date: Optional[datetime]
effective_date: Optional[datetime]
expiration_date: Optional[datetime]
document_url: str
# Audit
created_at: datetime
updated_at: datetime
class DPAService:
"""
Service for managing Data Processing Agreements.
"""
def __init__(self, db, document_storage):
self.db = db
self.storage = document_storage
async def create_dpa(
self,
tenant_id: str,
counterparty_name: str,
counterparty_type: str,
processing_purposes: List[str],
data_categories: List[str],
processing_locations: List[str]
) -> DataProcessingAgreement:
"""
Create a new DPA.
"""
import uuid
dpa_id = str(uuid.uuid4())
dpa = DataProcessingAgreement(
id=dpa_id,
tenant_id=tenant_id,
counterparty_name=counterparty_name,
counterparty_type=counterparty_type,
processing_purposes=processing_purposes,
data_categories=data_categories,
data_subject_categories=["Employees", "End users"],
retention_period="As specified in main agreement",
processing_locations=processing_locations,
subprocessors=[],
transfer_mechanism=None,
status=DPAStatus.DRAFT,
signed_date=None,
effective_date=None,
expiration_date=None,
document_url="",
created_at=datetime.utcnow(),
updated_at=datetime.utcnow()
)
# Store in database
await self._save_dpa(dpa)
return dpa
async def get_active_dpas(self, tenant_id: str) -> List[DataProcessingAgreement]:
"""Get all active DPAs for a tenant."""
results = await self.db.fetch(
"""
SELECT * FROM data_processing_agreements
WHERE tenant_id = $1 AND status = 'active'
""",
tenant_id
)
return [self._row_to_dpa(row) for row in results]
async def get_subprocessors(self, tenant_id: str) -> List[dict]:
"""
Get list of subprocessors for a tenant.
Required for GDPR transparency.
"""
# Our subprocessors (third-party services we use)
our_subprocessors = [
{
"name": "Amazon Web Services",
"purpose": "Cloud infrastructure",
"location": "EU (Frankfurt)",
"dpa_url": "https://aws.amazon.com/compliance/gdpr-center/"
},
{
"name": "Stripe",
"purpose": "Payment processing",
"location": "US (with DPF certification)",
"dpa_url": "https://stripe.com/legal/dpa"
},
{
"name": "SendGrid",
"purpose": "Email delivery",
"location": "US (with SCCs)",
"dpa_url": "https://sendgrid.com/policies/dpa/"
}
]
return our_subprocessors
async def _save_dpa(self, dpa: DataProcessingAgreement):
"""Save DPA to database."""
await self.db.execute(
"""
INSERT INTO data_processing_agreements
(id, tenant_id, counterparty_name, counterparty_type,
processing_purposes, data_categories, processing_locations,
status, created_at, updated_at)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)
""",
dpa.id, dpa.tenant_id, dpa.counterparty_name, dpa.counterparty_type,
dpa.processing_purposes, dpa.data_categories, dpa.processing_locations,
dpa.status.value, dpa.created_at, dpa.updated_at
)
def _row_to_dpa(self, row) -> DataProcessingAgreement:
"""Convert database row to DPA object."""
return DataProcessingAgreement(
id=row["id"],
tenant_id=row["tenant_id"],
counterparty_name=row["counterparty_name"],
counterparty_type=row["counterparty_type"],
processing_purposes=row["processing_purposes"],
data_categories=row["data_categories"],
data_subject_categories=row.get("data_subject_categories", []),
retention_period=row.get("retention_period", ""),
processing_locations=row["processing_locations"],
subprocessors=row.get("subprocessors", []),
transfer_mechanism=row.get("transfer_mechanism"),
status=DPAStatus(row["status"]),
signed_date=row.get("signed_date"),
effective_date=row.get("effective_date"),
expiration_date=row.get("expiration_date"),
document_url=row.get("document_url", ""),
created_at=row["created_at"],
updated_at=row["updated_at"]
)
Part III: Real-World Application
Chapter 7: Case Studies
7.1 Slack's Data Residency
SLACK DATA RESIDENCY ARCHITECTURE
Challenge:
├── Global customer base
├── Enterprise customers need EU data residency
├── Real-time messaging requires low latency
├── Collaboration features need cross-region access
Solution: DATA RESIDENCY FOR ENTERPRISE GRID
┌────────────────────────────────────────────────────────────────────────┐
│ │
│ SLACK ENTERPRISE GRID ARCHITECTURE │
│ │
│ Global Services (Metadata): │
│ ├── Authentication/SSO │
│ ├── Workspace directory │
│ ├── Routing information │
│ └── Feature configuration │
│ │
│ Regional Data Stores: │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ US Region │ │ EU Region │ │ GovCloud │ │
│ │ ─────────── │ │ ─────────── │ │ ─────────── │ │
│ │ Messages │ │ Messages │ │ Messages │ │
│ │ Files │ │ Files │ │ Files │ │
│ │ User profiles │ │ User profiles │ │ User profiles │ │
│ │ Search index │ │ Search index │ │ Search index │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
│ │
│ Per-Organization Choice: │
│ ├── Organization assigned to one region │
│ ├── All data for that org stays in region │
│ └── Slack Connect (cross-org) respects both orgs' residency │
│ │
└────────────────────────────────────────────────────────────────────────┘
Key Decisions:
├── ORGANIZATION = RESIDENCY BOUNDARY
│ └── Not user-level, org-level
│
├── METADATA CAN BE GLOBAL
│ └── Workspace IDs, routing info
│ └── Not personal data
│
├── ENCRYPTION AT REST
│ └── Customer-managed keys (Enterprise Key Management)
│ └── Per-organization keys
│
└── SLACK CONNECT HANDLING
└── Messages between orgs stay in most restrictive region
└── Both parties must allow the connection
Lessons:
├── Organization-level residency is manageable
├── Distinguish metadata from content
├── Encryption adds extra protection
└── Cross-org features need careful design
7.2 AWS Regional Architecture
AWS APPROACH TO DATA RESIDENCY
AWS provides building blocks for customers to implement residency:
REGIONAL SERVICES:
├── Data stays in chosen region by default
├── Customer controls replication
├── Some services (IAM, Route53) are global
└── S3 can be configured for single-region
TOOLS FOR COMPLIANCE:
├── AWS Config Rules
│ └── Detect resources outside approved regions
│
├── Service Control Policies (SCPs)
│ └── Prevent creating resources in wrong regions
│
├── AWS Artifact
│ └── Compliance reports and DPAs
│
└── Data residency guardrails
└── AWS Control Tower for multi-account
EXAMPLE SCP FOR EU-ONLY:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Deny",
"Action": "*",
"Resource": "*",
"Condition": {
"StringNotEquals": {
"aws:RequestedRegion": [
"eu-central-1",
"eu-west-1",
"eu-west-2",
"eu-west-3",
"eu-north-1"
]
}
}
}
]
}
Lessons:
├── Cloud providers offer tools, not solutions
├── You must architect for residency
├── Policy enforcement prevents accidents
└── Global services need special handling
Chapter 8: Common Mistakes
8.1 Data Residency Anti-Patterns
DATA RESIDENCY MISTAKES
❌ MISTAKE 1: Forgetting About Backups
Wrong:
# Data in EU region
database = "eu-central-1-db.example.com"
# But backups go to US!
backup_bucket = "s3://backups-us-east-1/"
Problem:
Backups are still personal data
US backups violate EU residency
Right:
# Data and backups in same region
database = "eu-central-1-db.example.com"
backup_bucket = "s3://backups-eu-central-1/"
❌ MISTAKE 2: Logging PII to Global Services
Wrong:
# Global Datadog/Splunk instance
logger.info(f"User {user.email} from {user.country} logged in")
# PII now in US logging infrastructure
Problem:
User email is personal data
Now stored in global logging service
Right:
# Log without PII, or use regional logging
logger.info(
"User logged in",
extra={"user_id": user.id, "region": user.region}
)
❌ MISTAKE 3: Analytics Without Consent
Wrong:
# Track everything, figure out consent later
analytics.track("page_view", {
"user_id": user.id,
"page": request.path,
"ip": request.client.ip
})
Problem:
Analytics tracking may require consent
IP addresses are personal data
Right:
if await consent_service.has_consent(user.id, ConsentPurpose.ANALYTICS):
analytics.track("page_view", {
"user_id": user.id,
"page": request.path
# No IP - minimize data
})
❌ MISTAKE 4: Third-Party Services Without DPAs
Wrong:
# Just use Mixpanel, they're big so probably fine
mixpanel.track(user.email, "signup")
Problem:
No DPA with Mixpanel
Data transferred to US without safeguards
You're liable as data controller
Right:
# Verify DPA exists, use SCCs, minimize data
if mixpanel_dpa_signed:
mixpanel.track(
anonymize(user.id), # Not email
"signup",
{"region": user.region}
)
❌ MISTAKE 5: Assuming Adequacy Decisions Are Permanent
Wrong:
# US has Privacy Shield, we're fine forever!
transfer_data_to_us(eu_user_data)
Problem:
Privacy Shield was invalidated (Schrems II)
Adequacy decisions can be revoked
Right:
# Monitor regulatory changes
# Have fallback mechanisms
# Document your transfer assessment
if us_adequacy_valid():
transfer_data_to_us(eu_user_data)
else:
use_sccs_with_supplementary_measures(eu_user_data)
Part IV: Interview Preparation
Chapter 9: Interview Tips
9.1 Data Residency Discussion Framework
DISCUSSING DATA RESIDENCY IN INTERVIEWS
When the topic comes up:
1. CLARIFY REQUIREMENTS
"What are the data residency requirements? Are we dealing with
GDPR (EU), specific country laws, or enterprise customer demands?"
2. IDENTIFY DATA CATEGORIES
"Let me categorize the data:
- Personal data that needs residency: user profiles, content
- Metadata that might be global: routing, configuration
- Truly anonymous data: aggregated analytics"
3. PROPOSE ARCHITECTURE
"I'd implement regional deployments with a global control plane.
Each tenant is assigned to a region during onboarding. Personal
data stays in that region. Metadata and routing information
can be global since it's not personal data."
4. ADDRESS CROSS-REGION
"For features that span regions, like messaging between users
in different regions, data stays in the more restrictive region.
Or we block cross-region features for strict compliance tenants."
5. MENTION ENFORCEMENT
"I'd use infrastructure-as-code with policy enforcement to prevent
accidental data leakage. AWS SCPs or GCP Organization Policies
can block resource creation in wrong regions."
9.2 Key Phrases
DATA RESIDENCY KEY PHRASES
On Regional Architecture:
"I'd deploy regional data stores with a global control plane. The
control plane handles routing and metadata - things that aren't
personal data. All personal data stays in the tenant's assigned
region, including backups and logs."
On GDPR Transfers:
"For cross-border transfers, we need a legal basis. If the destination
has an adequacy decision, we're good. Otherwise, we need Standard
Contractual Clauses with supplementary measures. Post-Schrems II,
we also need a Transfer Impact Assessment."
On Consent:
"Consent must be freely given, specific, informed, and unambiguous.
I'd implement a consent management system that records the exact
text shown, timestamp, IP, and allows easy withdrawal. Different
purposes need separate consent - no bundling."
On Third Parties:
"Every third-party processor needs a Data Processing Agreement.
We need to track subprocessors and their locations. If they process
EU data in the US, they need appropriate safeguards like DPF
certification or SCCs."
Chapter 10: Practice Problems
Problem 1: Multi-Region SaaS
Scenario: Your B2B SaaS has customers in US, EU, and Asia. EU customers require GDPR compliance including data residency. You currently have one region (us-east-1).
Questions:
- How do you migrate to support EU data residency?
- What happens to features that need cross-region data?
- How do you handle a user who moves from EU to US?
- Add EU region with separate database
- Tenant-level region assignment
- Cross-region features: either block or store in most restrictive
- User moving: they might need to be re-assigned to new region
- Consider data migration tools and procedures
Problem 2: Analytics Pipeline Compliance
Scenario: You run analytics on user behavior using BigQuery (US). EU customers are complaining about GDPR compliance.
Questions:
- Can you continue using BigQuery for EU user data?
- What changes would make this compliant?
- How do you handle historical data that's already in BigQuery?
- BigQuery has EU regions - use them for EU data
- Anonymize/aggregate before cross-border transfer
- Historical data: delete or anonymize
- Consider consent basis for analytics
- Document Transfer Impact Assessment
Chapter 11: Sample Interview Dialogue
Interviewer: "We need to serve EU customers. How do you handle GDPR compliance?"
You: "GDPR compliance has several aspects. Let me break it down by the main requirements.
For data residency, I'd deploy EU infrastructure - database, storage, and search in eu-central-1 or eu-west-1. Each tenant is assigned a region during onboarding based on their location. All personal data stays in that region.
For lawful basis, we'd use contract performance for core functionality - storing their data to provide the service. For marketing or analytics, we need consent. I'd implement a consent management system..."
CONSENT FLOW
User signs up
│
▼
Show consent form:
├── Essential cookies: Required (legitimate interest)
├── Analytics: Optional, unchecked by default
├── Marketing: Optional, unchecked by default
│
▼
Record consent with timestamp, IP, exact text shown
│
▼
If they later withdraw, immediately stop processing
Interviewer: "What about our analytics that currently runs in the US?"
You: "A few options:
-
Regional analytics: Run BigQuery in EU multi-region for EU data. More expensive but cleanest.
-
Anonymize before transfer: Aggregate data to the point it's no longer personal data before sending to US. For example, 'Users in Germany viewed page X 1000 times' is not personal data.
-
Transfer with safeguards: Use BigQuery in US but with SCCs and supplementary measures. Requires Transfer Impact Assessment and ongoing monitoring of US surveillance laws.
I'd recommend option 1 for personal data and option 2 for aggregated metrics. We'd need to document this in our Records of Processing Activities."
Interviewer: "How do you prove compliance to customers?"
You: "Several mechanisms:
- DPA signing: Automated DPA generation and signing during enterprise onboarding
- Subprocessor list: Published list of all third parties that process data
- Data residency documentation: Architecture diagrams showing data flows
- Audit logs: Records of all data access, exportable for audits
- Certifications: SOC 2 Type II, ISO 27001 for security controls
For enterprise customers, we could offer a compliance portal showing their data location, consent records, and processing activities."
Summary
DAY 3 KEY TAKEAWAYS
DATA RESIDENCY BASICS:
├── Residency = where data is stored
├── Sovereignty = which laws apply
├── Localization = legal requirement to keep data in-country
└── Transfer = moving data across borders
KEY REGULATIONS:
├── GDPR (EU): Most influential, extraterritorial
├── LGPD (Brazil): GDPR-like
├── PIPL (China): Strict localization
├── DPDP (India): Emerging requirements
└── Various country-specific laws
ARCHITECTURE PATTERNS:
├── Single region: Simple but limited compliance
├── Regional deployments: Full isolation
├── Global control + regional data: Balance
└── Per-tenant region: Maximum flexibility
IMPLEMENTATION:
├── Tenant region assignment at onboarding
├── Regional database routing
├── Regional storage routing
├── Consent management system
├── DPA tracking
GDPR TRANSFERS:
├── Adequacy decision: Easiest
├── SCCs: Most common
├── BCRs: For corporate groups
├── Supplementary measures: Post-Schrems II
CONSENT REQUIREMENTS:
├── Freely given
├── Specific (per purpose)
├── Informed (clear language)
├── Unambiguous (affirmative action)
├── Withdrawable (easy as giving)
COMMON MISTAKES:
├── Forgetting backups
├── PII in global logs
├── Analytics without consent
├── Missing DPAs
└── Assuming adequacy is permanent
DEFAULT APPROACH:
├── Regional data stores for personal data
├── Global control plane for metadata
├── Consent management from day one
├── DPAs with all processors
└── Document everything
Further Reading
Official Resources:
- GDPR Full Text: https://gdpr.eu/
- EDPB Guidelines: https://edpb.europa.eu/
- UK ICO Guidance: https://ico.org.uk/
Compliance Tools:
- OneTrust (consent management)
- TrustArc (privacy management)
- BigID (data discovery)
Cloud Provider Resources:
- AWS GDPR Center: https://aws.amazon.com/compliance/gdpr-center/
- GCP Data Residency: https://cloud.google.com/security/compliance/data-residency
- Azure Compliance: https://docs.microsoft.com/compliance/
End of Day 3: Data Residency and GDPR
Tomorrow: Day 4 — Right to Deletion. We'll learn how to actually delete user data when it's spread across dozens of systems - the hardest GDPR requirement to implement.