Week 9 — Day 4: Right to Deletion
System Design Mastery Series — Multi-Tenancy, Security, and Compliance Week
Preface
A support ticket lands in your queue:
THE DELETION REQUEST
From: angry.user@example.com
Subject: GDPR - Delete ALL my data immediately
I am exercising my right to erasure under GDPR Article 17.
Delete ALL my personal data from ALL your systems within 30 days.
I want written confirmation when complete.
If you fail to comply, I will file a complaint with my national
data protection authority.
---
You check where this user's data lives:
┌────────────────────────────────────────────────────────────────────────┐
│ │
│ USER DATA LOCATIONS │
│ │
│ Primary Systems: │
│ ├── PostgreSQL (users table, orders, preferences) │
│ ├── MongoDB (activity logs, user-generated content) │
│ ├── Elasticsearch (user profile in search index) │
│ ├── Redis (session cache, user preferences cache) │
│ └── S3 (profile photos, uploaded documents) │
│ │
│ Analytics & Data Warehouse: │
│ ├── BigQuery (event data, behavioral analytics) │
│ ├── Mixpanel (product analytics) │
│ └── Amplitude (user journeys) │
│ │
│ Third-Party Services: │
│ ├── Stripe (payment history) │
│ ├── Zendesk (support tickets) │
│ ├── Intercom (chat history) │
│ ├── Mailchimp (email list) │
│ └── HubSpot (CRM records) │
│ │
│ Backups: │
│ ├── Daily PostgreSQL backups (90 day retention) │
│ ├── MongoDB snapshots (30 day retention) │
│ └── S3 versioning (infinite retention... oops) │
│ │
│ Logs: │
│ ├── Application logs (Datadog - 15 day retention) │
│ ├── Access logs (CloudWatch - 90 day retention) │
│ └── Audit logs (S3 - 7 year retention for compliance) │
│ │
│ Derived Data: │
│ ├── ML training datasets │
│ ├── Aggregated reports │
│ └── Anonymized analytics │
│ │
└────────────────────────────────────────────────────────────────────────┘
Question: Can you prove you deleted everything?
Answer: Not without a system designed for this.
Today, we'll build a deletion system that can answer "yes" with confidence.
Part I: Foundations
Chapter 1: Understanding the Right to Erasure
1.1 What GDPR Article 17 Requires
GDPR ARTICLE 17: RIGHT TO ERASURE ("RIGHT TO BE FORGOTTEN")
When it applies - User can request deletion when:
├── Data no longer necessary for original purpose
├── User withdraws consent (and no other legal basis exists)
├── User objects to processing (and no overriding legitimate grounds)
├── Data was unlawfully processed
├── Legal obligation to erase
└── Data collected from children for online services
When you can refuse:
├── Freedom of expression and information
├── Legal obligation requiring processing
├── Public health purposes
├── Archiving in public interest, research, statistics
├── Establishment, exercise, or defense of legal claims
Timeline:
├── Response required: Within 1 month
├── Extension possible: Up to 2 additional months for complex cases
├── Must inform user of extension within first month
└── Must explain reasons if refusing
What to delete:
├── All personal data about the individual
├── All copies (including backups, eventually)
├── Inform other controllers you've shared data with
└── Make reasonable efforts to inform processors
1.2 The Deletion Complexity Problem
WHY DELETION IS HARD
PROBLEM 1: DATA FRAGMENTATION
├── Data spread across dozens of systems
├── Each system has different deletion API
├── Some systems don't have deletion APIs
└── No central registry of where data lives
PROBLEM 2: DATA RELATIONSHIPS
├── User has orders → orders have items → items have reviews
├── Deleting user breaks foreign key constraints
├── What happens to their reviews? Comments? Shared content?
└── Cascade vs soft delete vs anonymize?
PROBLEM 3: BACKUPS
├── User data in daily backup from 3 months ago
├── Can't surgically remove one user from backup
├── Options: Keep backup (delay deletion) or destroy backup
└── Need clear retention policy that accounts for this
PROBLEM 4: THIRD PARTIES
├── Stripe has payment history
├── Zendesk has support tickets
├── You're the controller, they're processors
├── You must ensure they delete too
PROBLEM 5: DERIVED DATA
├── ML model trained on user's data
├── Can't "untrain" a model
├── Aggregated statistics include user
└── Where's the line between personal and anonymous?
PROBLEM 6: PROVING DELETION
├── Auditor asks: "Prove user X was deleted"
├── If you deleted everything, how do you prove it?
├── Need audit trail of deletion itself
└── Paradox: Must keep record of deletion
1.3 What "Deletion" Actually Means
DELETION STRATEGIES
HARD DELETE:
├── Data physically removed from storage
├── Cannot be recovered
├── Appropriate for: Most personal data
└── Challenge: May break referential integrity
SOFT DELETE:
├── Data marked as deleted but still exists
├── Excluded from queries
├── NOT compliant with GDPR by itself
└── Use as: Intermediate state before hard delete
ANONYMIZATION:
├── Remove all identifying information
├── Remaining data cannot identify anyone
├── Can be retained indefinitely
├── Challenge: True anonymization is hard
PSEUDONYMIZATION:
├── Replace identifiers with pseudonyms
├── Still personal data under GDPR!
├── NOT sufficient for deletion request
└── Can be used for: Active data protection
AGGREGATION:
├── Combine with other data to remove individual
├── "User X had 5 orders" → "Users made 10,000 orders"
├── If truly anonymous, can retain
└── Challenge: Ensuring k-anonymity
Chapter 2: Data Inventory and Mapping
2.1 Building a Data Map
DATA MAPPING REQUIREMENTS
For each personal data element, document:
┌────────────────────────────────────────────────────────────────────────┐
│ DATA INVENTORY RECORD │
│ │
│ Data Element: user_email │
│ ───────────────────────────────────────────────────────────────── │
│ │
│ LOCATIONS: │
│ ├── Primary: PostgreSQL.users.email │
│ ├── Search: Elasticsearch.users.email │
│ ├── Cache: Redis.user:{id}:profile.email │
│ ├── Analytics: BigQuery.events.user_email │
│ └── Third-party: Stripe.customers.email │
│ │
│ RETENTION: │
│ ├── Primary: Until account deletion │
│ ├── Backups: 90 days after primary deletion │
│ └── Logs: 15 days (then auto-purged) │
│ │
│ DELETION METHOD: │
│ ├── PostgreSQL: DELETE FROM users WHERE id = ? │
│ ├── Elasticsearch: DELETE /users/_doc/{id} │
│ ├── Redis: DEL user:{id}:* │
│ ├── BigQuery: DELETE FROM events WHERE user_id = ? │
│ └── Stripe: stripe.Customer.delete(customer_id) │
│ │
│ DEPENDENCIES: │
│ ├── Orders table references user_id │
│ ├── Comments table references user_id │
│ └── Notifications table references user_id │
│ │
│ LEGAL BASIS: Contract performance │
│ DATA CONTROLLER: Our Company │
│ DATA PROCESSORS: Stripe, SendGrid │
│ │
└────────────────────────────────────────────────────────────────────────┘
2.2 Personal Data Categories
PERSONAL DATA CLASSIFICATION
DIRECTLY IDENTIFYING:
├── Name, email, phone
├── Government IDs (SSN, passport)
├── Financial account numbers
├── Biometric data
└── Deletion: Must delete or anonymize
INDIRECTLY IDENTIFYING:
├── IP addresses
├── Device identifiers
├── Cookie IDs
├── Location data (precise)
├── Behavioral patterns
└── Deletion: Must delete or anonymize
SENSITIVE (SPECIAL CATEGORIES):
├── Health information
├── Religious beliefs
├── Political opinions
├── Sexual orientation
├── Biometric data for identification
├── Genetic data
└── Deletion: Priority deletion, extra care
DERIVED DATA:
├── Predictions and scores
├── Segments and categories
├── Behavioral models
├── Recommendations
└── Deletion: Usually delete with source data
AGGREGATED DATA:
├── Statistics where individual not identifiable
├── Counts, averages, distributions
├── k-anonymous datasets (k>10 typically)
└── Deletion: May retain if truly anonymous
Chapter 3: Deletion Strategies by System Type
3.1 Strategy Matrix
DELETION STRATEGY BY SYSTEM
┌────────────────────────────────────────────────────────────────────────┐
│ SYSTEM DELETION STRATEGIES │
│ │
│ System Type │ Strategy │ Timeline │ Verification │
│ ─────────────────┼─────────────────┼─────────────┼────────────────── │
│ Primary DB │ Hard delete │ Immediate │ Query returns null │
│ Search Index │ Delete document │ Immediate │ Search returns 0 │
│ Cache │ Delete keys │ Immediate │ Key not found │
│ File Storage │ Delete objects │ Immediate │ 404 on access │
│ Analytics │ Delete or anon │ 24-48 hours │ Query returns 0 │
│ Third-party │ API call │ Varies │ Confirmation email │
│ Backups │ Let expire │ Per policy │ Retention tracking │
│ Logs │ Let expire │ Per policy │ Retention tracking │
│ ML Models │ Retrain without │ Next cycle │ Model version │
│ │
└────────────────────────────────────────────────────────────────────────┘
Part II: Implementation
Chapter 4: Deletion Service Architecture
4.1 Core Deletion Service
# deletion/service.py
"""
User data deletion service.
Orchestrates deletion across all systems where user data exists.
"""
from dataclasses import dataclass, field
from typing import List, Dict, Optional, Any
from datetime import datetime, timedelta
from enum import Enum
import uuid
import logging
import asyncio
logger = logging.getLogger(__name__)
class DeletionStatus(Enum):
"""Status of a deletion request."""
PENDING = "pending"
IN_PROGRESS = "in_progress"
AWAITING_VERIFICATION = "awaiting_verification"
COMPLETED = "completed"
PARTIALLY_COMPLETED = "partially_completed"
FAILED = "failed"
CANCELLED = "cancelled"
class SystemDeletionStatus(Enum):
"""Status of deletion in a specific system."""
PENDING = "pending"
IN_PROGRESS = "in_progress"
COMPLETED = "completed"
FAILED = "failed"
SKIPPED = "skipped"
NOT_APPLICABLE = "not_applicable"
@dataclass
class DeletionTarget:
"""A system that needs deletion."""
system_name: str
system_type: str # database, cache, storage, third_party, etc.
deletion_method: str # How to delete in this system
priority: int # Order of deletion (lower = first)
status: SystemDeletionStatus = SystemDeletionStatus.PENDING
started_at: Optional[datetime] = None
completed_at: Optional[datetime] = None
error_message: Optional[str] = None
records_deleted: int = 0
@dataclass
class DeletionRequest:
"""A request to delete a user's data."""
id: str
user_id: str
tenant_id: str
requested_at: datetime
requested_by: str # user, admin, automated
reason: str
status: DeletionStatus
targets: List[DeletionTarget]
deadline: datetime # GDPR 30 days
completed_at: Optional[datetime] = None
verified_at: Optional[datetime] = None
verification_report: Optional[Dict] = None
class DeletionService:
"""
Orchestrates user data deletion across all systems.
Key responsibilities:
- Accept deletion requests
- Coordinate deletion across systems
- Track progress and handle failures
- Verify deletion completion
- Maintain audit trail
"""
# Systems in deletion order (dependencies first)
DELETION_TARGETS = [
# First: Caches (quick, no dependencies)
DeletionTarget("redis_cache", "cache", "delete_keys", priority=1),
DeletionTarget("cdn_cache", "cache", "purge_urls", priority=1),
# Second: Search indexes
DeletionTarget("elasticsearch", "search", "delete_document", priority=2),
# Third: File storage
DeletionTarget("s3_files", "storage", "delete_objects", priority=3),
DeletionTarget("s3_uploads", "storage", "delete_objects", priority=3),
# Fourth: Analytics (before primary DB to capture user_id mapping)
DeletionTarget("bigquery", "analytics", "delete_rows", priority=4),
DeletionTarget("mixpanel", "analytics", "delete_user", priority=4),
# Fifth: Third-party services
DeletionTarget("stripe", "third_party", "delete_customer", priority=5),
DeletionTarget("zendesk", "third_party", "delete_user", priority=5),
DeletionTarget("intercom", "third_party", "delete_user", priority=5),
DeletionTarget("mailchimp", "third_party", "unsubscribe_delete", priority=5),
# Last: Primary database (has foreign keys)
DeletionTarget("postgresql", "database", "delete_cascade", priority=10),
]
def __init__(
self,
db,
deletion_executors: Dict[str, 'DeletionExecutor'],
event_publisher,
notification_service
):
self.db = db
self.executors = deletion_executors
self.events = event_publisher
self.notifications = notification_service
async def create_deletion_request(
self,
user_id: str,
tenant_id: str,
requested_by: str,
reason: str
) -> DeletionRequest:
"""
Create a new deletion request.
This starts the deletion process.
"""
request_id = str(uuid.uuid4())
now = datetime.utcnow()
# Determine which targets apply to this user
targets = await self._determine_targets(user_id, tenant_id)
request = DeletionRequest(
id=request_id,
user_id=user_id,
tenant_id=tenant_id,
requested_at=now,
requested_by=requested_by,
reason=reason,
status=DeletionStatus.PENDING,
targets=targets,
deadline=now + timedelta(days=30) # GDPR deadline
)
# Store request
await self._save_request(request)
# Publish event
await self.events.publish("deletion", {
"type": "deletion.requested",
"request_id": request_id,
"user_id": user_id,
"tenant_id": tenant_id
})
logger.info(
f"Deletion request created",
extra={
"request_id": request_id,
"user_id": user_id,
"targets": len(targets)
}
)
return request
async def execute_deletion(self, request_id: str) -> DeletionRequest:
"""
Execute a deletion request.
Coordinates deletion across all target systems.
"""
request = await self.get_request(request_id)
if request.status not in [DeletionStatus.PENDING, DeletionStatus.FAILED]:
raise ValueError(f"Cannot execute request in status: {request.status}")
request.status = DeletionStatus.IN_PROGRESS
await self._save_request(request)
# Group targets by priority
priority_groups = {}
for target in request.targets:
if target.priority not in priority_groups:
priority_groups[target.priority] = []
priority_groups[target.priority].append(target)
# Execute in priority order
all_succeeded = True
for priority in sorted(priority_groups.keys()):
targets = priority_groups[priority]
# Execute targets at same priority level in parallel
results = await asyncio.gather(
*[self._execute_target(request, target) for target in targets],
return_exceptions=True
)
# Check for failures
for target, result in zip(targets, results):
if isinstance(result, Exception):
target.status = SystemDeletionStatus.FAILED
target.error_message = str(result)
all_succeeded = False
logger.error(
f"Deletion failed for {target.system_name}: {result}",
extra={"request_id": request_id}
)
# Update final status
if all_succeeded:
request.status = DeletionStatus.AWAITING_VERIFICATION
else:
request.status = DeletionStatus.PARTIALLY_COMPLETED
await self._save_request(request)
# Publish completion event
await self.events.publish("deletion", {
"type": "deletion.executed",
"request_id": request_id,
"status": request.status.value,
"all_succeeded": all_succeeded
})
return request
async def _execute_target(
self,
request: DeletionRequest,
target: DeletionTarget
):
"""Execute deletion for a single target system."""
target.status = SystemDeletionStatus.IN_PROGRESS
target.started_at = datetime.utcnow()
executor = self.executors.get(target.system_name)
if not executor:
logger.warning(f"No executor for system: {target.system_name}")
target.status = SystemDeletionStatus.SKIPPED
return
try:
result = await executor.delete_user_data(
user_id=request.user_id,
tenant_id=request.tenant_id
)
target.status = SystemDeletionStatus.COMPLETED
target.completed_at = datetime.utcnow()
target.records_deleted = result.get("records_deleted", 0)
logger.info(
f"Deletion completed for {target.system_name}",
extra={
"request_id": request.id,
"records_deleted": target.records_deleted
}
)
except Exception as e:
target.status = SystemDeletionStatus.FAILED
target.error_message = str(e)
target.completed_at = datetime.utcnow()
raise
async def verify_deletion(self, request_id: str) -> Dict[str, Any]:
"""
Verify that deletion was successful.
Checks each system to confirm data is gone.
"""
request = await self.get_request(request_id)
verification_results = {}
all_verified = True
for target in request.targets:
if target.status == SystemDeletionStatus.SKIPPED:
verification_results[target.system_name] = {
"status": "skipped",
"verified": True
}
continue
executor = self.executors.get(target.system_name)
if not executor:
verification_results[target.system_name] = {
"status": "no_executor",
"verified": False
}
all_verified = False
continue
try:
exists = await executor.check_user_exists(
user_id=request.user_id,
tenant_id=request.tenant_id
)
verification_results[target.system_name] = {
"status": "verified" if not exists else "data_found",
"verified": not exists,
"checked_at": datetime.utcnow().isoformat()
}
if exists:
all_verified = False
logger.warning(
f"Data still exists in {target.system_name}",
extra={"request_id": request_id}
)
except Exception as e:
verification_results[target.system_name] = {
"status": "error",
"verified": False,
"error": str(e)
}
all_verified = False
# Update request
request.verification_report = verification_results
request.verified_at = datetime.utcnow()
if all_verified:
request.status = DeletionStatus.COMPLETED
request.completed_at = datetime.utcnow()
await self._save_request(request)
# Notify user if complete
if all_verified:
await self._notify_completion(request)
return {
"all_verified": all_verified,
"results": verification_results
}
async def _determine_targets(
self,
user_id: str,
tenant_id: str
) -> List[DeletionTarget]:
"""Determine which systems have data for this user."""
targets = []
for target_template in self.DELETION_TARGETS:
# Check if system has data for this user
executor = self.executors.get(target_template.system_name)
if executor:
has_data = await executor.check_user_exists(user_id, tenant_id)
if has_data:
targets.append(DeletionTarget(
system_name=target_template.system_name,
system_type=target_template.system_type,
deletion_method=target_template.deletion_method,
priority=target_template.priority
))
return targets
async def _notify_completion(self, request: DeletionRequest):
"""Notify user that deletion is complete."""
await self.notifications.send(
user_id=request.user_id,
template="deletion_complete",
context={
"request_id": request.id,
"completed_at": request.completed_at.isoformat(),
"systems_deleted": len([
t for t in request.targets
if t.status == SystemDeletionStatus.COMPLETED
])
}
)
async def get_request(self, request_id: str) -> DeletionRequest:
"""Get a deletion request by ID."""
row = await self.db.fetchone(
"SELECT * FROM deletion_requests WHERE id = $1",
request_id
)
if not row:
raise ValueError(f"Deletion request not found: {request_id}")
return self._row_to_request(row)
async def _save_request(self, request: DeletionRequest):
"""Save deletion request to database."""
await self.db.execute(
"""
INSERT INTO deletion_requests
(id, user_id, tenant_id, requested_at, requested_by, reason,
status, targets, deadline, completed_at, verified_at, verification_report)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12)
ON CONFLICT (id) DO UPDATE SET
status = $7, targets = $8, completed_at = $10,
verified_at = $11, verification_report = $12
""",
request.id, request.user_id, request.tenant_id,
request.requested_at, request.requested_by, request.reason,
request.status.value,
[t.__dict__ for t in request.targets],
request.deadline, request.completed_at, request.verified_at,
request.verification_report
)
def _row_to_request(self, row) -> DeletionRequest:
"""Convert database row to DeletionRequest."""
return DeletionRequest(
id=row["id"],
user_id=row["user_id"],
tenant_id=row["tenant_id"],
requested_at=row["requested_at"],
requested_by=row["requested_by"],
reason=row["reason"],
status=DeletionStatus(row["status"]),
targets=[DeletionTarget(**t) for t in row["targets"]],
deadline=row["deadline"],
completed_at=row.get("completed_at"),
verified_at=row.get("verified_at"),
verification_report=row.get("verification_report")
)
4.2 System-Specific Deletion Executors
# deletion/executors.py
"""
System-specific deletion executors.
Each executor knows how to delete user data from a specific system.
"""
from abc import ABC, abstractmethod
from typing import Dict, Any
import logging
logger = logging.getLogger(__name__)
class DeletionExecutor(ABC):
"""Base class for deletion executors."""
@abstractmethod
async def delete_user_data(
self,
user_id: str,
tenant_id: str
) -> Dict[str, Any]:
"""
Delete all user data from this system.
Returns dict with deletion details.
"""
pass
@abstractmethod
async def check_user_exists(
self,
user_id: str,
tenant_id: str
) -> bool:
"""Check if user data exists in this system."""
pass
class PostgreSQLDeletionExecutor(DeletionExecutor):
"""
Deletes user data from PostgreSQL.
Handles cascading deletions across related tables.
"""
def __init__(self, db_pool):
self.db = db_pool
async def delete_user_data(
self,
user_id: str,
tenant_id: str
) -> Dict[str, Any]:
"""
Delete user and all related data.
Order matters due to foreign keys:
1. Delete from leaf tables first
2. Work up to parent tables
3. Finally delete user record
"""
records_deleted = 0
async with self.db.acquire() as conn:
async with conn.transaction():
# Delete from leaf tables first (no foreign key dependencies)
# Notifications
result = await conn.execute(
"DELETE FROM notifications WHERE user_id = $1 AND tenant_id = $2",
user_id, tenant_id
)
records_deleted += int(result.split()[-1])
# Activity logs
result = await conn.execute(
"DELETE FROM activity_logs WHERE user_id = $1 AND tenant_id = $2",
user_id, tenant_id
)
records_deleted += int(result.split()[-1])
# Sessions
result = await conn.execute(
"DELETE FROM sessions WHERE user_id = $1 AND tenant_id = $2",
user_id, tenant_id
)
records_deleted += int(result.split()[-1])
# User preferences
result = await conn.execute(
"DELETE FROM user_preferences WHERE user_id = $1 AND tenant_id = $2",
user_id, tenant_id
)
records_deleted += int(result.split()[-1])
# Consent records (keep anonymized version for audit)
result = await conn.execute(
"""
UPDATE consent_records
SET user_id = 'DELETED', ip_address = 'DELETED'
WHERE user_id = $1 AND tenant_id = $2
""",
user_id, tenant_id
)
records_deleted += int(result.split()[-1])
# Orders - anonymize rather than delete (financial records)
result = await conn.execute(
"""
UPDATE orders
SET user_id = NULL,
shipping_address = 'DELETED',
billing_address = 'DELETED',
customer_email = 'DELETED',
customer_phone = 'DELETED'
WHERE user_id = $1 AND tenant_id = $2
""",
user_id, tenant_id
)
records_deleted += int(result.split()[-1])
# Comments - anonymize to preserve content integrity
result = await conn.execute(
"""
UPDATE comments
SET user_id = NULL, author_name = 'Deleted User'
WHERE user_id = $1 AND tenant_id = $2
""",
user_id, tenant_id
)
records_deleted += int(result.split()[-1])
# Finally, delete the user record
result = await conn.execute(
"DELETE FROM users WHERE id = $1 AND tenant_id = $2",
user_id, tenant_id
)
records_deleted += int(result.split()[-1])
logger.info(
f"PostgreSQL deletion complete",
extra={
"user_id": user_id,
"records_deleted": records_deleted
}
)
return {"records_deleted": records_deleted}
async def check_user_exists(
self,
user_id: str,
tenant_id: str
) -> bool:
"""Check if user exists in PostgreSQL."""
result = await self.db.fetchone(
"SELECT 1 FROM users WHERE id = $1 AND tenant_id = $2",
user_id, tenant_id
)
return result is not None
class ElasticsearchDeletionExecutor(DeletionExecutor):
"""Deletes user data from Elasticsearch."""
def __init__(self, es_client):
self.es = es_client
async def delete_user_data(
self,
user_id: str,
tenant_id: str
) -> Dict[str, Any]:
"""Delete user documents from all indices."""
indices = ["users", "documents", "activities"]
total_deleted = 0
for index in indices:
try:
result = await self.es.delete_by_query(
index=f"{tenant_id}_{index}",
body={
"query": {
"term": {"user_id": user_id}
}
}
)
total_deleted += result.get("deleted", 0)
except Exception as e:
logger.warning(f"ES deletion from {index} failed: {e}")
return {"records_deleted": total_deleted}
async def check_user_exists(
self,
user_id: str,
tenant_id: str
) -> bool:
"""Check if user exists in any index."""
indices = ["users", "documents", "activities"]
for index in indices:
try:
result = await self.es.count(
index=f"{tenant_id}_{index}",
body={
"query": {
"term": {"user_id": user_id}
}
}
)
if result.get("count", 0) > 0:
return True
except Exception:
pass
return False
class RedisDeletionExecutor(DeletionExecutor):
"""Deletes user data from Redis cache."""
def __init__(self, redis_client):
self.redis = redis_client
async def delete_user_data(
self,
user_id: str,
tenant_id: str
) -> Dict[str, Any]:
"""Delete all user cache keys."""
patterns = [
f"tenant:{tenant_id}:user:{user_id}:*",
f"tenant:{tenant_id}:session:{user_id}:*",
f"tenant:{tenant_id}:cache:user:{user_id}",
]
total_deleted = 0
for pattern in patterns:
keys = await self.redis.keys(pattern)
if keys:
deleted = await self.redis.delete(*keys)
total_deleted += deleted
return {"records_deleted": total_deleted}
async def check_user_exists(
self,
user_id: str,
tenant_id: str
) -> bool:
"""Check if user data exists in cache."""
patterns = [
f"tenant:{tenant_id}:user:{user_id}:*",
f"tenant:{tenant_id}:session:{user_id}:*",
]
for pattern in patterns:
keys = await self.redis.keys(pattern)
if keys:
return True
return False
class S3DeletionExecutor(DeletionExecutor):
"""Deletes user files from S3."""
def __init__(self, s3_client, bucket: str):
self.s3 = s3_client
self.bucket = bucket
async def delete_user_data(
self,
user_id: str,
tenant_id: str
) -> Dict[str, Any]:
"""Delete all user files from S3."""
prefix = f"tenants/{tenant_id}/users/{user_id}/"
# List all objects with prefix
objects_to_delete = []
paginator = self.s3.get_paginator('list_objects_v2')
async for page in paginator.paginate(Bucket=self.bucket, Prefix=prefix):
for obj in page.get('Contents', []):
objects_to_delete.append({'Key': obj['Key']})
if not objects_to_delete:
return {"records_deleted": 0}
# Delete in batches of 1000 (S3 limit)
total_deleted = 0
for i in range(0, len(objects_to_delete), 1000):
batch = objects_to_delete[i:i+1000]
await self.s3.delete_objects(
Bucket=self.bucket,
Delete={'Objects': batch}
)
total_deleted += len(batch)
return {"records_deleted": total_deleted}
async def check_user_exists(
self,
user_id: str,
tenant_id: str
) -> bool:
"""Check if user has files in S3."""
prefix = f"tenants/{tenant_id}/users/{user_id}/"
result = await self.s3.list_objects_v2(
Bucket=self.bucket,
Prefix=prefix,
MaxKeys=1
)
return result.get('KeyCount', 0) > 0
class StripeDeletionExecutor(DeletionExecutor):
"""Deletes user data from Stripe."""
def __init__(self, stripe_client):
self.stripe = stripe_client
async def delete_user_data(
self,
user_id: str,
tenant_id: str
) -> Dict[str, Any]:
"""Delete customer from Stripe."""
# Find Stripe customer by metadata
customers = await self.stripe.Customer.list(
limit=1,
metadata={"user_id": user_id, "tenant_id": tenant_id}
)
if not customers.data:
return {"records_deleted": 0}
customer = customers.data[0]
# Delete the customer (Stripe handles cascading)
await self.stripe.Customer.delete(customer.id)
return {"records_deleted": 1, "stripe_customer_id": customer.id}
async def check_user_exists(
self,
user_id: str,
tenant_id: str
) -> bool:
"""Check if customer exists in Stripe."""
customers = await self.stripe.Customer.list(
limit=1,
metadata={"user_id": user_id, "tenant_id": tenant_id}
)
return len(customers.data) > 0
class BigQueryDeletionExecutor(DeletionExecutor):
"""Deletes user data from BigQuery analytics."""
def __init__(self, bq_client, dataset: str):
self.bq = bq_client
self.dataset = dataset
async def delete_user_data(
self,
user_id: str,
tenant_id: str
) -> Dict[str, Any]:
"""
Delete user data from BigQuery tables.
Note: BigQuery DELETE can be slow and expensive.
Consider partitioning strategy for better deletion.
"""
tables = ["events", "page_views", "user_properties"]
total_deleted = 0
for table in tables:
query = f"""
DELETE FROM `{self.dataset}.{table}`
WHERE user_id = @user_id AND tenant_id = @tenant_id
"""
job_config = bigquery.QueryJobConfig(
query_parameters=[
bigquery.ScalarQueryParameter("user_id", "STRING", user_id),
bigquery.ScalarQueryParameter("tenant_id", "STRING", tenant_id),
]
)
result = await self.bq.query(query, job_config=job_config)
total_deleted += result.num_dml_affected_rows
return {"records_deleted": total_deleted}
async def check_user_exists(
self,
user_id: str,
tenant_id: str
) -> bool:
"""Check if user exists in BigQuery."""
query = f"""
SELECT 1 FROM `{self.dataset}.events`
WHERE user_id = @user_id AND tenant_id = @tenant_id
LIMIT 1
"""
job_config = bigquery.QueryJobConfig(
query_parameters=[
bigquery.ScalarQueryParameter("user_id", "STRING", user_id),
bigquery.ScalarQueryParameter("tenant_id", "STRING", tenant_id),
]
)
result = await self.bq.query(query, job_config=job_config)
return result.total_rows > 0
Chapter 5: Handling Special Cases
5.1 Backup Data Handling
# deletion/backup_handler.py
"""
Handling user data in backups.
Backups are the hardest deletion challenge because you can't
surgically remove one user from a backup.
"""
from dataclasses import dataclass
from datetime import datetime, timedelta
from typing import List
import logging
logger = logging.getLogger(__name__)
@dataclass
class BackupRetentionPolicy:
"""Backup retention configuration."""
backup_type: str
retention_days: int
deletion_strategy: str # "expire" or "exclude_restore"
class BackupDeletionHandler:
"""
Handles deletion requests for data in backups.
Strategy:
1. Track users pending deletion
2. When backup expires, deletion is complete for that backup
3. If backup is restored, apply pending deletions
"""
RETENTION_POLICIES = [
BackupRetentionPolicy("postgresql_daily", 30, "expire"),
BackupRetentionPolicy("postgresql_weekly", 90, "expire"),
BackupRetentionPolicy("mongodb_daily", 30, "expire"),
BackupRetentionPolicy("s3_versioning", 90, "expire"),
]
def __init__(self, db):
self.db = db
async def register_pending_deletion(
self,
deletion_request_id: str,
user_id: str,
tenant_id: str
):
"""
Register a user for deletion from backups.
This tracks that when backups expire or are restored,
this user's data must be deleted.
"""
# Calculate when backup retention expires for each type
expiry_dates = {}
for policy in self.RETENTION_POLICIES:
expiry_date = datetime.utcnow() + timedelta(days=policy.retention_days)
expiry_dates[policy.backup_type] = expiry_date
await self.db.execute(
"""
INSERT INTO backup_pending_deletions
(deletion_request_id, user_id, tenant_id, created_at, backup_expiry_dates)
VALUES ($1, $2, $3, $4, $5)
""",
deletion_request_id, user_id, tenant_id,
datetime.utcnow(), expiry_dates
)
logger.info(
f"Registered pending backup deletion",
extra={
"deletion_request_id": deletion_request_id,
"user_id": user_id,
"latest_expiry": max(expiry_dates.values())
}
)
async def get_backup_deletion_status(
self,
deletion_request_id: str
) -> dict:
"""
Get status of backup deletion for a request.
"""
record = await self.db.fetchone(
"""
SELECT * FROM backup_pending_deletions
WHERE deletion_request_id = $1
""",
deletion_request_id
)
if not record:
return {"status": "not_tracked"}
expiry_dates = record["backup_expiry_dates"]
now = datetime.utcnow()
status = {}
all_expired = True
for backup_type, expiry_date in expiry_dates.items():
if expiry_date <= now:
status[backup_type] = "expired"
else:
status[backup_type] = f"expires_{expiry_date.isoformat()}"
all_expired = False
return {
"status": "complete" if all_expired else "pending",
"backup_status": status,
"fully_deleted_at": max(expiry_dates.values()) if not all_expired else None
}
async def on_backup_restore(
self,
backup_type: str,
backup_date: datetime,
restore_database: str
):
"""
Called when a backup is restored.
Must apply all pending deletions to the restored data.
"""
# Get all pending deletions that were requested before the backup
pending = await self.db.fetch(
"""
SELECT * FROM backup_pending_deletions
WHERE created_at <= $1
""",
backup_date
)
logger.warning(
f"Applying {len(pending)} pending deletions to restored backup",
extra={"backup_type": backup_type, "backup_date": backup_date}
)
for record in pending:
# Apply deletion to restored database
await self._apply_deletion_to_restore(
record["user_id"],
record["tenant_id"],
restore_database
)
async def _apply_deletion_to_restore(
self,
user_id: str,
tenant_id: str,
database: str
):
"""Apply a pending deletion to a restored database."""
# This would use the same logic as the main deletion
# but target the restored database
logger.info(
f"Applying deletion to restored database",
extra={
"user_id": user_id,
"database": database
}
)
5.2 Anonymization Service
# deletion/anonymization.py
"""
Data anonymization for cases where deletion isn't possible.
Some data can't be deleted (legal retention) but can be anonymized.
"""
from typing import Dict, Any
import hashlib
import uuid
class AnonymizationService:
"""
Anonymizes data instead of deleting it.
Used for:
- Financial records (legal retention requirements)
- Aggregated statistics
- Audit logs
"""
def anonymize_user_record(self, user: Dict[str, Any]) -> Dict[str, Any]:
"""
Anonymize a user record.
Removes all PII while preserving structure.
"""
return {
"id": self._generate_anonymous_id(user["id"]),
"email": "deleted@anonymized.local",
"name": "Deleted User",
"phone": None,
"address": None,
"created_at": user["created_at"], # Keep for analytics
"tenant_id": user["tenant_id"], # Keep for tenant analytics
}
def anonymize_order(self, order: Dict[str, Any]) -> Dict[str, Any]:
"""
Anonymize an order record.
Keeps financial data but removes PII.
"""
return {
"id": order["id"],
"user_id": None, # Remove link to user
"tenant_id": order["tenant_id"],
"amount": order["amount"], # Keep for financial records
"currency": order["currency"],
"created_at": order["created_at"],
"shipping_address": "REDACTED",
"billing_address": "REDACTED",
"customer_email": "deleted@anonymized.local",
"customer_phone": None,
"items": order["items"], # Keep order details
}
def anonymize_event(self, event: Dict[str, Any]) -> Dict[str, Any]:
"""
Anonymize an analytics event.
"""
return {
"event_type": event["event_type"],
"tenant_id": event["tenant_id"],
"timestamp": event["timestamp"],
"user_id": None, # Remove user link
"session_id": self._hash_value(event.get("session_id", "")),
"properties": self._anonymize_properties(event.get("properties", {})),
}
def _generate_anonymous_id(self, original_id: str) -> str:
"""Generate a consistent anonymous ID."""
# One-way hash so original can't be recovered
return hashlib.sha256(f"anon:{original_id}".encode()).hexdigest()[:16]
def _hash_value(self, value: str) -> str:
"""Hash a value for anonymization."""
if not value:
return ""
return hashlib.sha256(value.encode()).hexdigest()[:12]
def _anonymize_properties(self, props: Dict) -> Dict:
"""Remove PII from event properties."""
pii_keys = ["email", "name", "phone", "address", "ip_address", "user_agent"]
return {
k: "REDACTED" if k in pii_keys else v
for k, v in props.items()
}
Chapter 6: Audit Trail and Compliance Reporting
6.1 Deletion Audit Log
# deletion/audit.py
"""
Audit logging for deletion requests.
Critical for proving compliance with GDPR.
"""
from dataclasses import dataclass
from datetime import datetime
from typing import Optional, List
from enum import Enum
import uuid
class AuditEventType(Enum):
"""Types of deletion audit events."""
REQUEST_CREATED = "request_created"
REQUEST_APPROVED = "request_approved"
DELETION_STARTED = "deletion_started"
SYSTEM_DELETION_STARTED = "system_deletion_started"
SYSTEM_DELETION_COMPLETED = "system_deletion_completed"
SYSTEM_DELETION_FAILED = "system_deletion_failed"
VERIFICATION_STARTED = "verification_started"
VERIFICATION_COMPLETED = "verification_completed"
REQUEST_COMPLETED = "request_completed"
REQUEST_FAILED = "request_failed"
@dataclass
class DeletionAuditEvent:
"""An audit event for a deletion request."""
id: str
deletion_request_id: str
event_type: AuditEventType
timestamp: datetime
actor: str # Who/what triggered this event
system_name: Optional[str]
details: dict
class DeletionAuditService:
"""
Records audit trail for all deletion activities.
This audit log is retained even after deletion completes
to prove compliance.
"""
def __init__(self, db):
self.db = db
async def log_event(
self,
deletion_request_id: str,
event_type: AuditEventType,
actor: str,
system_name: Optional[str] = None,
details: dict = None
) -> DeletionAuditEvent:
"""
Log an audit event.
"""
event = DeletionAuditEvent(
id=str(uuid.uuid4()),
deletion_request_id=deletion_request_id,
event_type=event_type,
timestamp=datetime.utcnow(),
actor=actor,
system_name=system_name,
details=details or {}
)
await self.db.execute(
"""
INSERT INTO deletion_audit_log
(id, deletion_request_id, event_type, timestamp, actor, system_name, details)
VALUES ($1, $2, $3, $4, $5, $6, $7)
""",
event.id, event.deletion_request_id, event.event_type.value,
event.timestamp, event.actor, event.system_name, event.details
)
return event
async def get_audit_trail(
self,
deletion_request_id: str
) -> List[DeletionAuditEvent]:
"""
Get complete audit trail for a deletion request.
"""
rows = await self.db.fetch(
"""
SELECT * FROM deletion_audit_log
WHERE deletion_request_id = $1
ORDER BY timestamp ASC
""",
deletion_request_id
)
return [
DeletionAuditEvent(
id=row["id"],
deletion_request_id=row["deletion_request_id"],
event_type=AuditEventType(row["event_type"]),
timestamp=row["timestamp"],
actor=row["actor"],
system_name=row["system_name"],
details=row["details"]
)
for row in rows
]
async def generate_compliance_report(
self,
deletion_request_id: str
) -> dict:
"""
Generate a compliance report for a deletion request.
This can be provided to auditors or data protection authorities.
"""
# Get the request
request = await self.db.fetchone(
"SELECT * FROM deletion_requests WHERE id = $1",
deletion_request_id
)
# Get audit trail
audit_trail = await self.get_audit_trail(deletion_request_id)
# Build report
report = {
"report_generated_at": datetime.utcnow().isoformat(),
"deletion_request": {
"id": request["id"],
"user_id": "REDACTED", # Don't include actual user ID
"requested_at": request["requested_at"].isoformat(),
"requested_by": request["requested_by"],
"reason": request["reason"],
"deadline": request["deadline"].isoformat(),
"status": request["status"],
"completed_at": request["completed_at"].isoformat() if request["completed_at"] else None,
},
"systems_affected": [
{
"system": t["system_name"],
"status": t["status"],
"records_deleted": t.get("records_deleted", 0),
"completed_at": t.get("completed_at")
}
for t in request["targets"]
],
"audit_trail": [
{
"timestamp": e.timestamp.isoformat(),
"event": e.event_type.value,
"system": e.system_name,
"actor": e.actor
}
for e in audit_trail
],
"verification": request.get("verification_report", {}),
"compliance_statement": self._generate_compliance_statement(request)
}
return report
def _generate_compliance_statement(self, request: dict) -> str:
"""Generate a compliance statement for the report."""
if request["status"] == "completed":
return (
f"This deletion request was completed on {request['completed_at']}. "
f"All personal data for the data subject has been deleted from primary systems. "
f"Data in backups will be fully purged according to our retention policy."
)
elif request["status"] == "partially_completed":
return (
"This deletion request has been partially completed. "
"Some systems reported errors during deletion. "
"Manual intervention may be required."
)
else:
return f"This deletion request is currently in status: {request['status']}."
Part III: Real-World Application
Chapter 7: Case Studies
7.1 How Stripe Handles Deletion
STRIPE'S DATA DELETION APPROACH
Challenge:
├── Massive data spread across many systems
├── Financial data has legal retention requirements
├── Payment data subject to PCI DSS
├── Customers in multiple jurisdictions
Solution:
1. DATA CATEGORIZATION
├── Personal data: Name, email, address
│ └── Delete or anonymize on request
├── Financial records: Transaction history
│ └── Anonymize, retain for legal requirements
├── Payment instruments: Card numbers
│ └── Already tokenized, delete tokens
└── Audit logs: Access records
└── Retain with anonymized references
2. DELETION WORKFLOW
├── Self-serve deletion via Dashboard API
├── Immediate deletion from primary stores
├── Async deletion from analytics/logs
├── Verification job confirms deletion
└── Compliance report generated
3. RETENTION POLICY
├── Personal data: Delete on request
├── Transaction records: 7 years (tax/legal)
│ └── Anonymized after customer deletion
├── Logs: 90 days standard
└── Backups: 30 day retention
Lessons:
├── Distinguish delete vs anonymize vs retain
├── Self-serve deletion reduces support burden
├── Clear retention policies simplify compliance
└── Verification step catches edge cases
7.2 How Slack Handles Deletion
SLACK'S DATA DELETION APPROACH
Challenge:
├── Messages involve multiple users
├── Files shared across conversations
├── Search indexes contain message content
├── Enterprise customers need audit retention
Solution:
1. USER DELETION
├── Deactivate account immediately
├── Messages: Keep but show "Deleted User"
├── Files: Delete if sole owner
│ └── Keep if shared, reassign ownership
├── DMs: Delete both sides' view
└── Profile: Fully deleted
2. MESSAGE DELETION
├── User can delete own messages
├── Admins can delete any message
├── Files in messages: Separate deletion
└── Search index updated async
3. WORKSPACE DELETION (Data Export + Delete)
├── Export all data first (GDPR portability)
├── 7-day grace period
├── Then hard delete everything
└── Cannot be recovered
4. ENTERPRISE COMPLIANCE MODE
├── Org can require message retention
├── Users cannot delete in retention period
├── After retention: Normal deletion rules
└── Legal hold can prevent all deletion
Lessons:
├── Collaborative content needs special handling
├── Show "Deleted User" preserves context
├── Enterprise compliance may override user rights
└── Grace period prevents accidental deletion
Chapter 8: Common Mistakes
8.1 Deletion Anti-Patterns
DELETION MISTAKES
❌ MISTAKE 1: Soft Delete Only
Wrong:
async def delete_user(user_id):
await db.execute(
"UPDATE users SET deleted = true WHERE id = $1",
user_id
)
# Done! User is "deleted"
Problem:
Data still exists, not GDPR compliant
Can be queried by mistake
Backups still contain data
Right:
async def delete_user(user_id):
# Soft delete first (for grace period)
await db.execute(
"UPDATE users SET deleted = true, deleted_at = NOW() WHERE id = $1",
user_id
)
# Schedule hard delete after grace period
await schedule_hard_delete(user_id, delay_days=30)
# Then actually delete
await db.execute("DELETE FROM users WHERE id = $1", user_id)
❌ MISTAKE 2: Forgetting Foreign Keys
Wrong:
async def delete_user(user_id):
await db.execute("DELETE FROM users WHERE id = $1", user_id)
# Fails with foreign key violation!
Problem:
Orders, comments, etc. reference user_id
Deletion fails or orphans records
Right:
async def delete_user(user_id):
async with db.transaction():
# Delete leaf records first
await db.execute("DELETE FROM notifications WHERE user_id = $1", user_id)
await db.execute("DELETE FROM sessions WHERE user_id = $1", user_id)
# Anonymize records we must keep
await db.execute(
"UPDATE orders SET user_id = NULL WHERE user_id = $1",
user_id
)
# Finally delete user
await db.execute("DELETE FROM users WHERE id = $1", user_id)
❌ MISTAKE 3: Not Deleting from Analytics
Wrong:
async def delete_user(user_id):
await db.execute("DELETE FROM users WHERE id = $1", user_id)
# Forgot BigQuery, Mixpanel, Amplitude...
Problem:
User data still in analytics systems
Can be queried and linked back
Violates deletion request
Right:
async def delete_user(user_id):
# Primary DB
await db.execute("DELETE FROM users WHERE id = $1", user_id)
# Analytics - all of them
await bigquery.delete_user(user_id)
await mixpanel.delete_user(user_id)
await amplitude.delete_user(user_id)
# Third parties
await stripe.delete_customer(user_id)
await intercom.delete_user(user_id)
❌ MISTAKE 4: No Verification
Wrong:
async def delete_user(user_id):
for system in systems:
await system.delete(user_id)
return {"status": "deleted"} # Trust it worked!
Problem:
Deletion might have failed silently
No proof of deletion for auditors
Can't answer "was user X deleted?"
Right:
async def delete_user(user_id):
for system in systems:
await system.delete(user_id)
# Verify deletion
for system in systems:
exists = await system.check_exists(user_id)
if exists:
raise DeletionVerificationError(f"Data still in {system}")
# Log completion
await audit_log.record_deletion_complete(user_id)
return {"status": "verified_deleted"}
❌ MISTAKE 5: Deleting Audit Logs
Wrong:
async def delete_user(user_id):
# Delete everything including audit trail
await db.execute("DELETE FROM audit_logs WHERE user_id = $1", user_id)
await db.execute("DELETE FROM users WHERE id = $1", user_id)
Problem:
Can't prove what happened to the data
Compliance audit will fail
Suspicious - looks like cover-up
Right:
async def delete_user(user_id):
# Anonymize audit logs, don't delete
await db.execute(
"UPDATE audit_logs SET user_id = 'DELETED' WHERE user_id = $1",
user_id
)
# Add deletion record to audit log
await db.execute(
"INSERT INTO audit_logs (action, user_id, timestamp) VALUES ($1, $2, $3)",
"USER_DELETED", "DELETED", datetime.now()
)
await db.execute("DELETE FROM users WHERE id = $1", user_id)
Part IV: Interview Preparation
Chapter 9: Interview Tips
9.1 Deletion Discussion Framework
DISCUSSING DELETION IN INTERVIEWS
When the topic comes up:
1. ACKNOWLEDGE THE COMPLEXITY
"Deletion sounds simple but is actually one of the hardest
compliance challenges. Data is spread across many systems,
and you need to prove it's actually gone."
2. LIST THE CHALLENGES
"The main challenges are:
- Data fragmentation across systems
- Foreign key relationships
- Third-party processors
- Backup retention
- Proving deletion happened"
3. PROPOSE A SYSTEMATIC APPROACH
"I'd implement:
- A data inventory mapping all PII locations
- A deletion orchestration service
- System-specific executors for each data store
- Verification that confirms deletion
- Audit trail that survives the deletion"
4. ADDRESS SPECIAL CASES
"Some data can't be deleted:
- Financial records: Anonymize instead
- Audit logs: Keep with anonymized references
- Backups: Let expire per retention policy
- Shared content: Show 'Deleted User'"
5. MENTION COMPLIANCE
"GDPR gives 30 days to respond. I'd implement:
- Dashboard for users to request deletion
- Automated workflow with manual review option
- Progress tracking visible to user
- Compliance report generation for auditors"
9.2 Key Phrases
DELETION KEY PHRASES
On Architecture:
"I'd build a deletion orchestration service that coordinates
deletion across all systems. Each system has an executor that
knows how to delete data and verify it's gone. The orchestrator
tracks progress and handles failures."
On Data Mapping:
"The first step is a data inventory - mapping every place
personal data lives. Without knowing where data is, you can't
delete it. This includes primary databases, caches, search
indexes, analytics, third-party services, and backups."
On Verification:
"Deletion without verification is incomplete. After deletion,
the system queries each data store to confirm the user's data
is actually gone. Only then is the deletion marked complete."
On Backups:
"Backups are the tricky part. You can't surgically remove one
user from a backup. The solution is clear retention policies -
backups expire after 30-90 days. Until then, if a backup is
restored, we apply pending deletions to the restored data."
On Audit Trail:
"The paradox of deletion is you need to prove it happened.
I'd keep an anonymized audit trail - recording that 'user
DELETED-12345 was deleted on date X' without the actual PII.
This satisfies auditors while respecting the deletion."
Chapter 10: Practice Problems
Problem 1: Social Media Platform
Scenario: Your social platform has users with posts, comments, likes, and followers. A user requests deletion under GDPR.
Questions:
- What happens to their posts that others have commented on?
- What about posts they're mentioned in?
- How do you handle their follower relationships?
- Posts they authored: Delete or show "Deleted User"
- Comments on others' posts: Anonymize ("Deleted User said...")
- Mentions: Replace @username with @deleted
- Followers: Delete both sides of relationship
- Consider what happens to replies to their posts
Problem 2: E-commerce With Legal Retention
Scenario: Your e-commerce platform must keep transaction records for 7 years for tax purposes. A user requests deletion after making purchases.
Questions:
- How do you handle the conflict between deletion and retention?
- What data can you delete, and what must you keep?
- How do you explain this to the user?
- Account data: Delete (name, email, preferences)
- Transaction records: Anonymize but retain
- Shipping addresses: GDPR allows deletion even from invoices
- Explain: "Account deleted. Financial records anonymized per legal requirements."
- Document the legal basis for retention
Chapter 11: Sample Interview Dialogue
Interviewer: "A user requests deletion of all their data. Walk me through how you'd handle it."
You: "This is a GDPR Article 17 request. Let me walk through the systematic approach.
First, I need to know where data lives. I'd maintain a data inventory mapping every system with personal data. For a typical SaaS, that's:
- Primary database (user profile, content)
- Search index (user in search results)
- Cache (session, preferences)
- File storage (uploads)
- Analytics (behavioral data)
- Third-party services (payment, support)
The deletion flow would be:"
User clicks "Delete Account"
│
▼
┌─────────────────────────────┐
│ Create Deletion Request │
│ - Record request details │
│ - Start 30-day timer │
│ - Notify user of timeline │
└──────────────┬──────────────┘
│
▼
┌─────────────────────────────┐
│ Execute Deletion │
│ - Delete from cache first │
│ - Then search index │
│ - Then file storage │
│ - Then analytics │
│ - Then third parties │
│ - Finally primary DB │
└──────────────┬──────────────┘
│
▼
┌─────────────────────────────┐
│ Verify Deletion │
│ - Query each system │
│ - Confirm data is gone │
│ - Flag failures for review │
└──────────────┬──────────────┘
│
▼
┌─────────────────────────────┐
│ Complete and Notify │
│ - Update audit log │
│ - Send confirmation email │
│ - Generate compliance cert │
└─────────────────────────────┘
Interviewer: "What about their order history? We need that for financial records."
You: "Good point - there's a conflict between deletion right and legal retention. Here's how I'd handle it:
The user record gets deleted - name, email, preferences. But the order records get anonymized rather than deleted. The order stays with user_id = NULL and addresses redacted as 'DELETED'.
The legal basis for retention is Article 17(3)(b) - 'compliance with a legal obligation.' We'd document this and explain to the user: 'Your account has been deleted. Transaction records are retained in anonymized form as required by tax law.'
This way we satisfy both the deletion request and the 7-year tax retention requirement."
Interviewer: "How do you handle backups?"
You: "Backups are the hardest part because you can't surgically remove one user.
My approach is a clear retention policy. Say 30-day backup retention. We track pending deletions, and the compliance report notes: 'Primary systems deleted on Day 0. Backup data will be fully purged by Day 30.'
If we need to restore from backup before Day 30, we apply all pending deletions to the restored data before it goes live. This is why tracking pending deletions is critical."
Summary
DAY 4 KEY TAKEAWAYS
DELETION REQUIREMENTS (GDPR):
├── Respond within 30 days
├── Delete from all systems
├── Inform third parties
├── Exceptions: Legal obligations, public interest
└── Must be able to prove deletion
DELETION STRATEGIES:
├── Hard delete: Physically remove
├── Soft delete: Mark as deleted (not sufficient alone)
├── Anonymize: Remove identifying info, keep record
├── Aggregate: Combine to remove individual
└── Expire: Let retention policy handle
DATA INVENTORY:
├── Map all PII locations
├── Document deletion method per system
├── Track dependencies (foreign keys)
├── Include third parties and backups
└── Update when adding new systems
DELETION ARCHITECTURE:
├── Orchestration service coordinates
├── System executors handle specifics
├── Order matters (dependencies)
├── Verification confirms success
└── Audit trail proves compliance
SPECIAL CASES:
├── Backups: Let expire, track pending
├── Financial records: Anonymize, retain
├── Shared content: Show "Deleted User"
├── Audit logs: Anonymize, don't delete
└── Third parties: API calls or manual
VERIFICATION:
├── Query each system after deletion
├── Confirm no data returned
├── Flag failures for manual review
├── Generate compliance report
└── Keep proof of deletion
DEFAULT APPROACH:
├── Build data inventory first
├── Implement executor per system
├── Orchestrate with dependency order
├── Always verify
├── Keep anonymized audit trail
Further Reading
Official Resources:
- GDPR Article 17: https://gdpr.eu/article-17-right-to-be-forgotten/
- ICO Guidance on Erasure: https://ico.org.uk/right-of-erasure/
Engineering Blogs:
- Slack: "How Slack Handles Data Deletion"
- Airbnb: "GDPR Compliance at Airbnb"
Tools:
- OneTrust (privacy management)
- BigID (data discovery)
- Transcend (data deletion automation)
End of Day 4: Right to Deletion
Tomorrow: Day 5 — Security Architecture. We'll bring together all the concepts with defense in depth, zero trust, secrets management, and security-first design.