Himanshu Kukreja
0%
LearnSystem DesignWeek 9Right To Deletion
Day 04

Week 9 — Day 4: Right to Deletion

System Design Mastery Series — Multi-Tenancy, Security, and Compliance Week


Preface

A support ticket lands in your queue:

THE DELETION REQUEST

From: angry.user@example.com
Subject: GDPR - Delete ALL my data immediately

I am exercising my right to erasure under GDPR Article 17.

Delete ALL my personal data from ALL your systems within 30 days.

I want written confirmation when complete.

If you fail to comply, I will file a complaint with my national
data protection authority.

---

You check where this user's data lives:

┌────────────────────────────────────────────────────────────────────────┐
│                                                                        │
│  USER DATA LOCATIONS                                                   │
│                                                                        │
│  Primary Systems:                                                      │
│  ├── PostgreSQL (users table, orders, preferences)                     │
│  ├── MongoDB (activity logs, user-generated content)                   │
│  ├── Elasticsearch (user profile in search index)                      │
│  ├── Redis (session cache, user preferences cache)                     │
│  └── S3 (profile photos, uploaded documents)                           │
│                                                                        │
│  Analytics & Data Warehouse:                                           │
│  ├── BigQuery (event data, behavioral analytics)                       │
│  ├── Mixpanel (product analytics)                                      │
│  └── Amplitude (user journeys)                                         │
│                                                                        │
│  Third-Party Services:                                                 │
│  ├── Stripe (payment history)                                          │
│  ├── Zendesk (support tickets)                                         │
│  ├── Intercom (chat history)                                           │
│  ├── Mailchimp (email list)                                            │
│  └── HubSpot (CRM records)                                             │
│                                                                        │
│  Backups:                                                              │
│  ├── Daily PostgreSQL backups (90 day retention)                       │
│  ├── MongoDB snapshots (30 day retention)                              │
│  └── S3 versioning (infinite retention... oops)                        │
│                                                                        │
│  Logs:                                                                 │
│  ├── Application logs (Datadog - 15 day retention)                     │
│  ├── Access logs (CloudWatch - 90 day retention)                       │
│  └── Audit logs (S3 - 7 year retention for compliance)                 │
│                                                                        │
│  Derived Data:                                                         │
│  ├── ML training datasets                                              │
│  ├── Aggregated reports                                                │
│  └── Anonymized analytics                                              │
│                                                                        │
└────────────────────────────────────────────────────────────────────────┘

Question: Can you prove you deleted everything?
Answer: Not without a system designed for this.

Today, we'll build a deletion system that can answer "yes" with confidence.


Part I: Foundations

Chapter 1: Understanding the Right to Erasure

1.1 What GDPR Article 17 Requires

GDPR ARTICLE 17: RIGHT TO ERASURE ("RIGHT TO BE FORGOTTEN")

When it applies - User can request deletion when:
├── Data no longer necessary for original purpose
├── User withdraws consent (and no other legal basis exists)
├── User objects to processing (and no overriding legitimate grounds)
├── Data was unlawfully processed
├── Legal obligation to erase
└── Data collected from children for online services

When you can refuse:
├── Freedom of expression and information
├── Legal obligation requiring processing
├── Public health purposes
├── Archiving in public interest, research, statistics
├── Establishment, exercise, or defense of legal claims

Timeline:
├── Response required: Within 1 month
├── Extension possible: Up to 2 additional months for complex cases
├── Must inform user of extension within first month
└── Must explain reasons if refusing

What to delete:
├── All personal data about the individual
├── All copies (including backups, eventually)
├── Inform other controllers you've shared data with
└── Make reasonable efforts to inform processors

1.2 The Deletion Complexity Problem

WHY DELETION IS HARD

PROBLEM 1: DATA FRAGMENTATION
├── Data spread across dozens of systems
├── Each system has different deletion API
├── Some systems don't have deletion APIs
└── No central registry of where data lives

PROBLEM 2: DATA RELATIONSHIPS
├── User has orders → orders have items → items have reviews
├── Deleting user breaks foreign key constraints
├── What happens to their reviews? Comments? Shared content?
└── Cascade vs soft delete vs anonymize?

PROBLEM 3: BACKUPS
├── User data in daily backup from 3 months ago
├── Can't surgically remove one user from backup
├── Options: Keep backup (delay deletion) or destroy backup
└── Need clear retention policy that accounts for this

PROBLEM 4: THIRD PARTIES
├── Stripe has payment history
├── Zendesk has support tickets
├── You're the controller, they're processors
├── You must ensure they delete too

PROBLEM 5: DERIVED DATA
├── ML model trained on user's data
├── Can't "untrain" a model
├── Aggregated statistics include user
└── Where's the line between personal and anonymous?

PROBLEM 6: PROVING DELETION
├── Auditor asks: "Prove user X was deleted"
├── If you deleted everything, how do you prove it?
├── Need audit trail of deletion itself
└── Paradox: Must keep record of deletion

1.3 What "Deletion" Actually Means

DELETION STRATEGIES

HARD DELETE:
├── Data physically removed from storage
├── Cannot be recovered
├── Appropriate for: Most personal data
└── Challenge: May break referential integrity

SOFT DELETE:
├── Data marked as deleted but still exists
├── Excluded from queries
├── NOT compliant with GDPR by itself
└── Use as: Intermediate state before hard delete

ANONYMIZATION:
├── Remove all identifying information
├── Remaining data cannot identify anyone
├── Can be retained indefinitely
├── Challenge: True anonymization is hard

PSEUDONYMIZATION:
├── Replace identifiers with pseudonyms
├── Still personal data under GDPR!
├── NOT sufficient for deletion request
└── Can be used for: Active data protection

AGGREGATION:
├── Combine with other data to remove individual
├── "User X had 5 orders" → "Users made 10,000 orders"
├── If truly anonymous, can retain
└── Challenge: Ensuring k-anonymity

Chapter 2: Data Inventory and Mapping

2.1 Building a Data Map

DATA MAPPING REQUIREMENTS

For each personal data element, document:

┌────────────────────────────────────────────────────────────────────────┐
│                         DATA INVENTORY RECORD                          │
│                                                                        │
│  Data Element: user_email                                              │
│  ─────────────────────────────────────────────────────────────────     │
│                                                                        │
│  LOCATIONS:                                                            │
│  ├── Primary: PostgreSQL.users.email                                   │
│  ├── Search: Elasticsearch.users.email                                 │
│  ├── Cache: Redis.user:{id}:profile.email                              │
│  ├── Analytics: BigQuery.events.user_email                             │
│  └── Third-party: Stripe.customers.email                               │
│                                                                        │
│  RETENTION:                                                            │
│  ├── Primary: Until account deletion                                   │
│  ├── Backups: 90 days after primary deletion                           │
│  └── Logs: 15 days (then auto-purged)                                  │
│                                                                        │
│  DELETION METHOD:                                                      │
│  ├── PostgreSQL: DELETE FROM users WHERE id = ?                        │
│  ├── Elasticsearch: DELETE /users/_doc/{id}                            │
│  ├── Redis: DEL user:{id}:*                                            │
│  ├── BigQuery: DELETE FROM events WHERE user_id = ?                    │
│  └── Stripe: stripe.Customer.delete(customer_id)                       │
│                                                                        │
│  DEPENDENCIES:                                                         │
│  ├── Orders table references user_id                                   │
│  ├── Comments table references user_id                                 │
│  └── Notifications table references user_id                            │
│                                                                        │
│  LEGAL BASIS: Contract performance                                     │
│  DATA CONTROLLER: Our Company                                          │
│  DATA PROCESSORS: Stripe, SendGrid                                     │
│                                                                        │
└────────────────────────────────────────────────────────────────────────┘

2.2 Personal Data Categories

PERSONAL DATA CLASSIFICATION

DIRECTLY IDENTIFYING:
├── Name, email, phone
├── Government IDs (SSN, passport)
├── Financial account numbers
├── Biometric data
└── Deletion: Must delete or anonymize

INDIRECTLY IDENTIFYING:
├── IP addresses
├── Device identifiers
├── Cookie IDs
├── Location data (precise)
├── Behavioral patterns
└── Deletion: Must delete or anonymize

SENSITIVE (SPECIAL CATEGORIES):
├── Health information
├── Religious beliefs
├── Political opinions
├── Sexual orientation
├── Biometric data for identification
├── Genetic data
└── Deletion: Priority deletion, extra care

DERIVED DATA:
├── Predictions and scores
├── Segments and categories
├── Behavioral models
├── Recommendations
└── Deletion: Usually delete with source data

AGGREGATED DATA:
├── Statistics where individual not identifiable
├── Counts, averages, distributions
├── k-anonymous datasets (k>10 typically)
└── Deletion: May retain if truly anonymous

Chapter 3: Deletion Strategies by System Type

3.1 Strategy Matrix

DELETION STRATEGY BY SYSTEM

┌────────────────────────────────────────────────────────────────────────┐
│                    SYSTEM DELETION STRATEGIES                          │
│                                                                        │
│  System Type      │ Strategy        │ Timeline    │ Verification       │
│  ─────────────────┼─────────────────┼─────────────┼──────────────────  │
│  Primary DB       │ Hard delete     │ Immediate   │ Query returns null │
│  Search Index     │ Delete document │ Immediate   │ Search returns 0   │
│  Cache            │ Delete keys     │ Immediate   │ Key not found      │
│  File Storage     │ Delete objects  │ Immediate   │ 404 on access      │
│  Analytics        │ Delete or anon  │ 24-48 hours │ Query returns 0    │
│  Third-party      │ API call        │ Varies      │ Confirmation email │
│  Backups          │ Let expire      │ Per policy  │ Retention tracking │
│  Logs             │ Let expire      │ Per policy  │ Retention tracking │
│  ML Models        │ Retrain without │ Next cycle  │ Model version      │
│                                                                        │
└────────────────────────────────────────────────────────────────────────┘

Part II: Implementation

Chapter 4: Deletion Service Architecture

4.1 Core Deletion Service

# deletion/service.py

"""
User data deletion service.

Orchestrates deletion across all systems where user data exists.
"""

from dataclasses import dataclass, field
from typing import List, Dict, Optional, Any
from datetime import datetime, timedelta
from enum import Enum
import uuid
import logging
import asyncio

logger = logging.getLogger(__name__)


class DeletionStatus(Enum):
    """Status of a deletion request."""
    PENDING = "pending"
    IN_PROGRESS = "in_progress"
    AWAITING_VERIFICATION = "awaiting_verification"
    COMPLETED = "completed"
    PARTIALLY_COMPLETED = "partially_completed"
    FAILED = "failed"
    CANCELLED = "cancelled"


class SystemDeletionStatus(Enum):
    """Status of deletion in a specific system."""
    PENDING = "pending"
    IN_PROGRESS = "in_progress"
    COMPLETED = "completed"
    FAILED = "failed"
    SKIPPED = "skipped"
    NOT_APPLICABLE = "not_applicable"


@dataclass
class DeletionTarget:
    """A system that needs deletion."""
    system_name: str
    system_type: str  # database, cache, storage, third_party, etc.
    deletion_method: str  # How to delete in this system
    priority: int  # Order of deletion (lower = first)
    status: SystemDeletionStatus = SystemDeletionStatus.PENDING
    started_at: Optional[datetime] = None
    completed_at: Optional[datetime] = None
    error_message: Optional[str] = None
    records_deleted: int = 0


@dataclass
class DeletionRequest:
    """A request to delete a user's data."""
    id: str
    user_id: str
    tenant_id: str
    requested_at: datetime
    requested_by: str  # user, admin, automated
    reason: str
    status: DeletionStatus
    targets: List[DeletionTarget]
    deadline: datetime  # GDPR 30 days
    completed_at: Optional[datetime] = None
    verified_at: Optional[datetime] = None
    verification_report: Optional[Dict] = None


class DeletionService:
    """
    Orchestrates user data deletion across all systems.
    
    Key responsibilities:
    - Accept deletion requests
    - Coordinate deletion across systems
    - Track progress and handle failures
    - Verify deletion completion
    - Maintain audit trail
    """
    
    # Systems in deletion order (dependencies first)
    DELETION_TARGETS = [
        # First: Caches (quick, no dependencies)
        DeletionTarget("redis_cache", "cache", "delete_keys", priority=1),
        DeletionTarget("cdn_cache", "cache", "purge_urls", priority=1),
        
        # Second: Search indexes
        DeletionTarget("elasticsearch", "search", "delete_document", priority=2),
        
        # Third: File storage
        DeletionTarget("s3_files", "storage", "delete_objects", priority=3),
        DeletionTarget("s3_uploads", "storage", "delete_objects", priority=3),
        
        # Fourth: Analytics (before primary DB to capture user_id mapping)
        DeletionTarget("bigquery", "analytics", "delete_rows", priority=4),
        DeletionTarget("mixpanel", "analytics", "delete_user", priority=4),
        
        # Fifth: Third-party services
        DeletionTarget("stripe", "third_party", "delete_customer", priority=5),
        DeletionTarget("zendesk", "third_party", "delete_user", priority=5),
        DeletionTarget("intercom", "third_party", "delete_user", priority=5),
        DeletionTarget("mailchimp", "third_party", "unsubscribe_delete", priority=5),
        
        # Last: Primary database (has foreign keys)
        DeletionTarget("postgresql", "database", "delete_cascade", priority=10),
    ]
    
    def __init__(
        self,
        db,
        deletion_executors: Dict[str, 'DeletionExecutor'],
        event_publisher,
        notification_service
    ):
        self.db = db
        self.executors = deletion_executors
        self.events = event_publisher
        self.notifications = notification_service
    
    async def create_deletion_request(
        self,
        user_id: str,
        tenant_id: str,
        requested_by: str,
        reason: str
    ) -> DeletionRequest:
        """
        Create a new deletion request.
        
        This starts the deletion process.
        """
        request_id = str(uuid.uuid4())
        now = datetime.utcnow()
        
        # Determine which targets apply to this user
        targets = await self._determine_targets(user_id, tenant_id)
        
        request = DeletionRequest(
            id=request_id,
            user_id=user_id,
            tenant_id=tenant_id,
            requested_at=now,
            requested_by=requested_by,
            reason=reason,
            status=DeletionStatus.PENDING,
            targets=targets,
            deadline=now + timedelta(days=30)  # GDPR deadline
        )
        
        # Store request
        await self._save_request(request)
        
        # Publish event
        await self.events.publish("deletion", {
            "type": "deletion.requested",
            "request_id": request_id,
            "user_id": user_id,
            "tenant_id": tenant_id
        })
        
        logger.info(
            f"Deletion request created",
            extra={
                "request_id": request_id,
                "user_id": user_id,
                "targets": len(targets)
            }
        )
        
        return request
    
    async def execute_deletion(self, request_id: str) -> DeletionRequest:
        """
        Execute a deletion request.
        
        Coordinates deletion across all target systems.
        """
        request = await self.get_request(request_id)
        
        if request.status not in [DeletionStatus.PENDING, DeletionStatus.FAILED]:
            raise ValueError(f"Cannot execute request in status: {request.status}")
        
        request.status = DeletionStatus.IN_PROGRESS
        await self._save_request(request)
        
        # Group targets by priority
        priority_groups = {}
        for target in request.targets:
            if target.priority not in priority_groups:
                priority_groups[target.priority] = []
            priority_groups[target.priority].append(target)
        
        # Execute in priority order
        all_succeeded = True
        
        for priority in sorted(priority_groups.keys()):
            targets = priority_groups[priority]
            
            # Execute targets at same priority level in parallel
            results = await asyncio.gather(
                *[self._execute_target(request, target) for target in targets],
                return_exceptions=True
            )
            
            # Check for failures
            for target, result in zip(targets, results):
                if isinstance(result, Exception):
                    target.status = SystemDeletionStatus.FAILED
                    target.error_message = str(result)
                    all_succeeded = False
                    logger.error(
                        f"Deletion failed for {target.system_name}: {result}",
                        extra={"request_id": request_id}
                    )
        
        # Update final status
        if all_succeeded:
            request.status = DeletionStatus.AWAITING_VERIFICATION
        else:
            request.status = DeletionStatus.PARTIALLY_COMPLETED
        
        await self._save_request(request)
        
        # Publish completion event
        await self.events.publish("deletion", {
            "type": "deletion.executed",
            "request_id": request_id,
            "status": request.status.value,
            "all_succeeded": all_succeeded
        })
        
        return request
    
    async def _execute_target(
        self,
        request: DeletionRequest,
        target: DeletionTarget
    ):
        """Execute deletion for a single target system."""
        target.status = SystemDeletionStatus.IN_PROGRESS
        target.started_at = datetime.utcnow()
        
        executor = self.executors.get(target.system_name)
        
        if not executor:
            logger.warning(f"No executor for system: {target.system_name}")
            target.status = SystemDeletionStatus.SKIPPED
            return
        
        try:
            result = await executor.delete_user_data(
                user_id=request.user_id,
                tenant_id=request.tenant_id
            )
            
            target.status = SystemDeletionStatus.COMPLETED
            target.completed_at = datetime.utcnow()
            target.records_deleted = result.get("records_deleted", 0)
            
            logger.info(
                f"Deletion completed for {target.system_name}",
                extra={
                    "request_id": request.id,
                    "records_deleted": target.records_deleted
                }
            )
            
        except Exception as e:
            target.status = SystemDeletionStatus.FAILED
            target.error_message = str(e)
            target.completed_at = datetime.utcnow()
            raise
    
    async def verify_deletion(self, request_id: str) -> Dict[str, Any]:
        """
        Verify that deletion was successful.
        
        Checks each system to confirm data is gone.
        """
        request = await self.get_request(request_id)
        
        verification_results = {}
        all_verified = True
        
        for target in request.targets:
            if target.status == SystemDeletionStatus.SKIPPED:
                verification_results[target.system_name] = {
                    "status": "skipped",
                    "verified": True
                }
                continue
            
            executor = self.executors.get(target.system_name)
            
            if not executor:
                verification_results[target.system_name] = {
                    "status": "no_executor",
                    "verified": False
                }
                all_verified = False
                continue
            
            try:
                exists = await executor.check_user_exists(
                    user_id=request.user_id,
                    tenant_id=request.tenant_id
                )
                
                verification_results[target.system_name] = {
                    "status": "verified" if not exists else "data_found",
                    "verified": not exists,
                    "checked_at": datetime.utcnow().isoformat()
                }
                
                if exists:
                    all_verified = False
                    logger.warning(
                        f"Data still exists in {target.system_name}",
                        extra={"request_id": request_id}
                    )
                    
            except Exception as e:
                verification_results[target.system_name] = {
                    "status": "error",
                    "verified": False,
                    "error": str(e)
                }
                all_verified = False
        
        # Update request
        request.verification_report = verification_results
        request.verified_at = datetime.utcnow()
        
        if all_verified:
            request.status = DeletionStatus.COMPLETED
            request.completed_at = datetime.utcnow()
        
        await self._save_request(request)
        
        # Notify user if complete
        if all_verified:
            await self._notify_completion(request)
        
        return {
            "all_verified": all_verified,
            "results": verification_results
        }
    
    async def _determine_targets(
        self,
        user_id: str,
        tenant_id: str
    ) -> List[DeletionTarget]:
        """Determine which systems have data for this user."""
        targets = []
        
        for target_template in self.DELETION_TARGETS:
            # Check if system has data for this user
            executor = self.executors.get(target_template.system_name)
            
            if executor:
                has_data = await executor.check_user_exists(user_id, tenant_id)
                
                if has_data:
                    targets.append(DeletionTarget(
                        system_name=target_template.system_name,
                        system_type=target_template.system_type,
                        deletion_method=target_template.deletion_method,
                        priority=target_template.priority
                    ))
        
        return targets
    
    async def _notify_completion(self, request: DeletionRequest):
        """Notify user that deletion is complete."""
        await self.notifications.send(
            user_id=request.user_id,
            template="deletion_complete",
            context={
                "request_id": request.id,
                "completed_at": request.completed_at.isoformat(),
                "systems_deleted": len([
                    t for t in request.targets 
                    if t.status == SystemDeletionStatus.COMPLETED
                ])
            }
        )
    
    async def get_request(self, request_id: str) -> DeletionRequest:
        """Get a deletion request by ID."""
        row = await self.db.fetchone(
            "SELECT * FROM deletion_requests WHERE id = $1",
            request_id
        )
        
        if not row:
            raise ValueError(f"Deletion request not found: {request_id}")
        
        return self._row_to_request(row)
    
    async def _save_request(self, request: DeletionRequest):
        """Save deletion request to database."""
        await self.db.execute(
            """
            INSERT INTO deletion_requests 
            (id, user_id, tenant_id, requested_at, requested_by, reason,
             status, targets, deadline, completed_at, verified_at, verification_report)
            VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12)
            ON CONFLICT (id) DO UPDATE SET
                status = $7, targets = $8, completed_at = $10,
                verified_at = $11, verification_report = $12
            """,
            request.id, request.user_id, request.tenant_id,
            request.requested_at, request.requested_by, request.reason,
            request.status.value, 
            [t.__dict__ for t in request.targets],
            request.deadline, request.completed_at, request.verified_at,
            request.verification_report
        )
    
    def _row_to_request(self, row) -> DeletionRequest:
        """Convert database row to DeletionRequest."""
        return DeletionRequest(
            id=row["id"],
            user_id=row["user_id"],
            tenant_id=row["tenant_id"],
            requested_at=row["requested_at"],
            requested_by=row["requested_by"],
            reason=row["reason"],
            status=DeletionStatus(row["status"]),
            targets=[DeletionTarget(**t) for t in row["targets"]],
            deadline=row["deadline"],
            completed_at=row.get("completed_at"),
            verified_at=row.get("verified_at"),
            verification_report=row.get("verification_report")
        )

4.2 System-Specific Deletion Executors

# deletion/executors.py

"""
System-specific deletion executors.

Each executor knows how to delete user data from a specific system.
"""

from abc import ABC, abstractmethod
from typing import Dict, Any
import logging

logger = logging.getLogger(__name__)


class DeletionExecutor(ABC):
    """Base class for deletion executors."""
    
    @abstractmethod
    async def delete_user_data(
        self,
        user_id: str,
        tenant_id: str
    ) -> Dict[str, Any]:
        """
        Delete all user data from this system.
        
        Returns dict with deletion details.
        """
        pass
    
    @abstractmethod
    async def check_user_exists(
        self,
        user_id: str,
        tenant_id: str
    ) -> bool:
        """Check if user data exists in this system."""
        pass


class PostgreSQLDeletionExecutor(DeletionExecutor):
    """
    Deletes user data from PostgreSQL.
    
    Handles cascading deletions across related tables.
    """
    
    def __init__(self, db_pool):
        self.db = db_pool
    
    async def delete_user_data(
        self,
        user_id: str,
        tenant_id: str
    ) -> Dict[str, Any]:
        """
        Delete user and all related data.
        
        Order matters due to foreign keys:
        1. Delete from leaf tables first
        2. Work up to parent tables
        3. Finally delete user record
        """
        records_deleted = 0
        
        async with self.db.acquire() as conn:
            async with conn.transaction():
                # Delete from leaf tables first (no foreign key dependencies)
                
                # Notifications
                result = await conn.execute(
                    "DELETE FROM notifications WHERE user_id = $1 AND tenant_id = $2",
                    user_id, tenant_id
                )
                records_deleted += int(result.split()[-1])
                
                # Activity logs
                result = await conn.execute(
                    "DELETE FROM activity_logs WHERE user_id = $1 AND tenant_id = $2",
                    user_id, tenant_id
                )
                records_deleted += int(result.split()[-1])
                
                # Sessions
                result = await conn.execute(
                    "DELETE FROM sessions WHERE user_id = $1 AND tenant_id = $2",
                    user_id, tenant_id
                )
                records_deleted += int(result.split()[-1])
                
                # User preferences
                result = await conn.execute(
                    "DELETE FROM user_preferences WHERE user_id = $1 AND tenant_id = $2",
                    user_id, tenant_id
                )
                records_deleted += int(result.split()[-1])
                
                # Consent records (keep anonymized version for audit)
                result = await conn.execute(
                    """
                    UPDATE consent_records 
                    SET user_id = 'DELETED', ip_address = 'DELETED'
                    WHERE user_id = $1 AND tenant_id = $2
                    """,
                    user_id, tenant_id
                )
                records_deleted += int(result.split()[-1])
                
                # Orders - anonymize rather than delete (financial records)
                result = await conn.execute(
                    """
                    UPDATE orders 
                    SET user_id = NULL, 
                        shipping_address = 'DELETED',
                        billing_address = 'DELETED',
                        customer_email = 'DELETED',
                        customer_phone = 'DELETED'
                    WHERE user_id = $1 AND tenant_id = $2
                    """,
                    user_id, tenant_id
                )
                records_deleted += int(result.split()[-1])
                
                # Comments - anonymize to preserve content integrity
                result = await conn.execute(
                    """
                    UPDATE comments 
                    SET user_id = NULL, author_name = 'Deleted User'
                    WHERE user_id = $1 AND tenant_id = $2
                    """,
                    user_id, tenant_id
                )
                records_deleted += int(result.split()[-1])
                
                # Finally, delete the user record
                result = await conn.execute(
                    "DELETE FROM users WHERE id = $1 AND tenant_id = $2",
                    user_id, tenant_id
                )
                records_deleted += int(result.split()[-1])
        
        logger.info(
            f"PostgreSQL deletion complete",
            extra={
                "user_id": user_id,
                "records_deleted": records_deleted
            }
        )
        
        return {"records_deleted": records_deleted}
    
    async def check_user_exists(
        self,
        user_id: str,
        tenant_id: str
    ) -> bool:
        """Check if user exists in PostgreSQL."""
        result = await self.db.fetchone(
            "SELECT 1 FROM users WHERE id = $1 AND tenant_id = $2",
            user_id, tenant_id
        )
        return result is not None


class ElasticsearchDeletionExecutor(DeletionExecutor):
    """Deletes user data from Elasticsearch."""
    
    def __init__(self, es_client):
        self.es = es_client
    
    async def delete_user_data(
        self,
        user_id: str,
        tenant_id: str
    ) -> Dict[str, Any]:
        """Delete user documents from all indices."""
        indices = ["users", "documents", "activities"]
        total_deleted = 0
        
        for index in indices:
            try:
                result = await self.es.delete_by_query(
                    index=f"{tenant_id}_{index}",
                    body={
                        "query": {
                            "term": {"user_id": user_id}
                        }
                    }
                )
                total_deleted += result.get("deleted", 0)
            except Exception as e:
                logger.warning(f"ES deletion from {index} failed: {e}")
        
        return {"records_deleted": total_deleted}
    
    async def check_user_exists(
        self,
        user_id: str,
        tenant_id: str
    ) -> bool:
        """Check if user exists in any index."""
        indices = ["users", "documents", "activities"]
        
        for index in indices:
            try:
                result = await self.es.count(
                    index=f"{tenant_id}_{index}",
                    body={
                        "query": {
                            "term": {"user_id": user_id}
                        }
                    }
                )
                if result.get("count", 0) > 0:
                    return True
            except Exception:
                pass
        
        return False


class RedisDeletionExecutor(DeletionExecutor):
    """Deletes user data from Redis cache."""
    
    def __init__(self, redis_client):
        self.redis = redis_client
    
    async def delete_user_data(
        self,
        user_id: str,
        tenant_id: str
    ) -> Dict[str, Any]:
        """Delete all user cache keys."""
        patterns = [
            f"tenant:{tenant_id}:user:{user_id}:*",
            f"tenant:{tenant_id}:session:{user_id}:*",
            f"tenant:{tenant_id}:cache:user:{user_id}",
        ]
        
        total_deleted = 0
        
        for pattern in patterns:
            keys = await self.redis.keys(pattern)
            if keys:
                deleted = await self.redis.delete(*keys)
                total_deleted += deleted
        
        return {"records_deleted": total_deleted}
    
    async def check_user_exists(
        self,
        user_id: str,
        tenant_id: str
    ) -> bool:
        """Check if user data exists in cache."""
        patterns = [
            f"tenant:{tenant_id}:user:{user_id}:*",
            f"tenant:{tenant_id}:session:{user_id}:*",
        ]
        
        for pattern in patterns:
            keys = await self.redis.keys(pattern)
            if keys:
                return True
        
        return False


class S3DeletionExecutor(DeletionExecutor):
    """Deletes user files from S3."""
    
    def __init__(self, s3_client, bucket: str):
        self.s3 = s3_client
        self.bucket = bucket
    
    async def delete_user_data(
        self,
        user_id: str,
        tenant_id: str
    ) -> Dict[str, Any]:
        """Delete all user files from S3."""
        prefix = f"tenants/{tenant_id}/users/{user_id}/"
        
        # List all objects with prefix
        objects_to_delete = []
        paginator = self.s3.get_paginator('list_objects_v2')
        
        async for page in paginator.paginate(Bucket=self.bucket, Prefix=prefix):
            for obj in page.get('Contents', []):
                objects_to_delete.append({'Key': obj['Key']})
        
        if not objects_to_delete:
            return {"records_deleted": 0}
        
        # Delete in batches of 1000 (S3 limit)
        total_deleted = 0
        
        for i in range(0, len(objects_to_delete), 1000):
            batch = objects_to_delete[i:i+1000]
            await self.s3.delete_objects(
                Bucket=self.bucket,
                Delete={'Objects': batch}
            )
            total_deleted += len(batch)
        
        return {"records_deleted": total_deleted}
    
    async def check_user_exists(
        self,
        user_id: str,
        tenant_id: str
    ) -> bool:
        """Check if user has files in S3."""
        prefix = f"tenants/{tenant_id}/users/{user_id}/"
        
        result = await self.s3.list_objects_v2(
            Bucket=self.bucket,
            Prefix=prefix,
            MaxKeys=1
        )
        
        return result.get('KeyCount', 0) > 0


class StripeDeletionExecutor(DeletionExecutor):
    """Deletes user data from Stripe."""
    
    def __init__(self, stripe_client):
        self.stripe = stripe_client
    
    async def delete_user_data(
        self,
        user_id: str,
        tenant_id: str
    ) -> Dict[str, Any]:
        """Delete customer from Stripe."""
        # Find Stripe customer by metadata
        customers = await self.stripe.Customer.list(
            limit=1,
            metadata={"user_id": user_id, "tenant_id": tenant_id}
        )
        
        if not customers.data:
            return {"records_deleted": 0}
        
        customer = customers.data[0]
        
        # Delete the customer (Stripe handles cascading)
        await self.stripe.Customer.delete(customer.id)
        
        return {"records_deleted": 1, "stripe_customer_id": customer.id}
    
    async def check_user_exists(
        self,
        user_id: str,
        tenant_id: str
    ) -> bool:
        """Check if customer exists in Stripe."""
        customers = await self.stripe.Customer.list(
            limit=1,
            metadata={"user_id": user_id, "tenant_id": tenant_id}
        )
        
        return len(customers.data) > 0


class BigQueryDeletionExecutor(DeletionExecutor):
    """Deletes user data from BigQuery analytics."""
    
    def __init__(self, bq_client, dataset: str):
        self.bq = bq_client
        self.dataset = dataset
    
    async def delete_user_data(
        self,
        user_id: str,
        tenant_id: str
    ) -> Dict[str, Any]:
        """
        Delete user data from BigQuery tables.
        
        Note: BigQuery DELETE can be slow and expensive.
        Consider partitioning strategy for better deletion.
        """
        tables = ["events", "page_views", "user_properties"]
        total_deleted = 0
        
        for table in tables:
            query = f"""
            DELETE FROM `{self.dataset}.{table}`
            WHERE user_id = @user_id AND tenant_id = @tenant_id
            """
            
            job_config = bigquery.QueryJobConfig(
                query_parameters=[
                    bigquery.ScalarQueryParameter("user_id", "STRING", user_id),
                    bigquery.ScalarQueryParameter("tenant_id", "STRING", tenant_id),
                ]
            )
            
            result = await self.bq.query(query, job_config=job_config)
            total_deleted += result.num_dml_affected_rows
        
        return {"records_deleted": total_deleted}
    
    async def check_user_exists(
        self,
        user_id: str,
        tenant_id: str
    ) -> bool:
        """Check if user exists in BigQuery."""
        query = f"""
        SELECT 1 FROM `{self.dataset}.events`
        WHERE user_id = @user_id AND tenant_id = @tenant_id
        LIMIT 1
        """
        
        job_config = bigquery.QueryJobConfig(
            query_parameters=[
                bigquery.ScalarQueryParameter("user_id", "STRING", user_id),
                bigquery.ScalarQueryParameter("tenant_id", "STRING", tenant_id),
            ]
        )
        
        result = await self.bq.query(query, job_config=job_config)
        return result.total_rows > 0

Chapter 5: Handling Special Cases

5.1 Backup Data Handling

# deletion/backup_handler.py

"""
Handling user data in backups.

Backups are the hardest deletion challenge because you can't
surgically remove one user from a backup.
"""

from dataclasses import dataclass
from datetime import datetime, timedelta
from typing import List
import logging

logger = logging.getLogger(__name__)


@dataclass
class BackupRetentionPolicy:
    """Backup retention configuration."""
    backup_type: str
    retention_days: int
    deletion_strategy: str  # "expire" or "exclude_restore"


class BackupDeletionHandler:
    """
    Handles deletion requests for data in backups.
    
    Strategy:
    1. Track users pending deletion
    2. When backup expires, deletion is complete for that backup
    3. If backup is restored, apply pending deletions
    """
    
    RETENTION_POLICIES = [
        BackupRetentionPolicy("postgresql_daily", 30, "expire"),
        BackupRetentionPolicy("postgresql_weekly", 90, "expire"),
        BackupRetentionPolicy("mongodb_daily", 30, "expire"),
        BackupRetentionPolicy("s3_versioning", 90, "expire"),
    ]
    
    def __init__(self, db):
        self.db = db
    
    async def register_pending_deletion(
        self,
        deletion_request_id: str,
        user_id: str,
        tenant_id: str
    ):
        """
        Register a user for deletion from backups.
        
        This tracks that when backups expire or are restored,
        this user's data must be deleted.
        """
        # Calculate when backup retention expires for each type
        expiry_dates = {}
        
        for policy in self.RETENTION_POLICIES:
            expiry_date = datetime.utcnow() + timedelta(days=policy.retention_days)
            expiry_dates[policy.backup_type] = expiry_date
        
        await self.db.execute(
            """
            INSERT INTO backup_pending_deletions
            (deletion_request_id, user_id, tenant_id, created_at, backup_expiry_dates)
            VALUES ($1, $2, $3, $4, $5)
            """,
            deletion_request_id, user_id, tenant_id,
            datetime.utcnow(), expiry_dates
        )
        
        logger.info(
            f"Registered pending backup deletion",
            extra={
                "deletion_request_id": deletion_request_id,
                "user_id": user_id,
                "latest_expiry": max(expiry_dates.values())
            }
        )
    
    async def get_backup_deletion_status(
        self,
        deletion_request_id: str
    ) -> dict:
        """
        Get status of backup deletion for a request.
        """
        record = await self.db.fetchone(
            """
            SELECT * FROM backup_pending_deletions
            WHERE deletion_request_id = $1
            """,
            deletion_request_id
        )
        
        if not record:
            return {"status": "not_tracked"}
        
        expiry_dates = record["backup_expiry_dates"]
        now = datetime.utcnow()
        
        status = {}
        all_expired = True
        
        for backup_type, expiry_date in expiry_dates.items():
            if expiry_date <= now:
                status[backup_type] = "expired"
            else:
                status[backup_type] = f"expires_{expiry_date.isoformat()}"
                all_expired = False
        
        return {
            "status": "complete" if all_expired else "pending",
            "backup_status": status,
            "fully_deleted_at": max(expiry_dates.values()) if not all_expired else None
        }
    
    async def on_backup_restore(
        self,
        backup_type: str,
        backup_date: datetime,
        restore_database: str
    ):
        """
        Called when a backup is restored.
        
        Must apply all pending deletions to the restored data.
        """
        # Get all pending deletions that were requested before the backup
        pending = await self.db.fetch(
            """
            SELECT * FROM backup_pending_deletions
            WHERE created_at <= $1
            """,
            backup_date
        )
        
        logger.warning(
            f"Applying {len(pending)} pending deletions to restored backup",
            extra={"backup_type": backup_type, "backup_date": backup_date}
        )
        
        for record in pending:
            # Apply deletion to restored database
            await self._apply_deletion_to_restore(
                record["user_id"],
                record["tenant_id"],
                restore_database
            )
    
    async def _apply_deletion_to_restore(
        self,
        user_id: str,
        tenant_id: str,
        database: str
    ):
        """Apply a pending deletion to a restored database."""
        # This would use the same logic as the main deletion
        # but target the restored database
        logger.info(
            f"Applying deletion to restored database",
            extra={
                "user_id": user_id,
                "database": database
            }
        )

5.2 Anonymization Service

# deletion/anonymization.py

"""
Data anonymization for cases where deletion isn't possible.

Some data can't be deleted (legal retention) but can be anonymized.
"""

from typing import Dict, Any
import hashlib
import uuid


class AnonymizationService:
    """
    Anonymizes data instead of deleting it.
    
    Used for:
    - Financial records (legal retention requirements)
    - Aggregated statistics
    - Audit logs
    """
    
    def anonymize_user_record(self, user: Dict[str, Any]) -> Dict[str, Any]:
        """
        Anonymize a user record.
        
        Removes all PII while preserving structure.
        """
        return {
            "id": self._generate_anonymous_id(user["id"]),
            "email": "deleted@anonymized.local",
            "name": "Deleted User",
            "phone": None,
            "address": None,
            "created_at": user["created_at"],  # Keep for analytics
            "tenant_id": user["tenant_id"],    # Keep for tenant analytics
        }
    
    def anonymize_order(self, order: Dict[str, Any]) -> Dict[str, Any]:
        """
        Anonymize an order record.
        
        Keeps financial data but removes PII.
        """
        return {
            "id": order["id"],
            "user_id": None,  # Remove link to user
            "tenant_id": order["tenant_id"],
            "amount": order["amount"],  # Keep for financial records
            "currency": order["currency"],
            "created_at": order["created_at"],
            "shipping_address": "REDACTED",
            "billing_address": "REDACTED",
            "customer_email": "deleted@anonymized.local",
            "customer_phone": None,
            "items": order["items"],  # Keep order details
        }
    
    def anonymize_event(self, event: Dict[str, Any]) -> Dict[str, Any]:
        """
        Anonymize an analytics event.
        """
        return {
            "event_type": event["event_type"],
            "tenant_id": event["tenant_id"],
            "timestamp": event["timestamp"],
            "user_id": None,  # Remove user link
            "session_id": self._hash_value(event.get("session_id", "")),
            "properties": self._anonymize_properties(event.get("properties", {})),
        }
    
    def _generate_anonymous_id(self, original_id: str) -> str:
        """Generate a consistent anonymous ID."""
        # One-way hash so original can't be recovered
        return hashlib.sha256(f"anon:{original_id}".encode()).hexdigest()[:16]
    
    def _hash_value(self, value: str) -> str:
        """Hash a value for anonymization."""
        if not value:
            return ""
        return hashlib.sha256(value.encode()).hexdigest()[:12]
    
    def _anonymize_properties(self, props: Dict) -> Dict:
        """Remove PII from event properties."""
        pii_keys = ["email", "name", "phone", "address", "ip_address", "user_agent"]
        
        return {
            k: "REDACTED" if k in pii_keys else v
            for k, v in props.items()
        }

Chapter 6: Audit Trail and Compliance Reporting

6.1 Deletion Audit Log

# deletion/audit.py

"""
Audit logging for deletion requests.

Critical for proving compliance with GDPR.
"""

from dataclasses import dataclass
from datetime import datetime
from typing import Optional, List
from enum import Enum
import uuid


class AuditEventType(Enum):
    """Types of deletion audit events."""
    REQUEST_CREATED = "request_created"
    REQUEST_APPROVED = "request_approved"
    DELETION_STARTED = "deletion_started"
    SYSTEM_DELETION_STARTED = "system_deletion_started"
    SYSTEM_DELETION_COMPLETED = "system_deletion_completed"
    SYSTEM_DELETION_FAILED = "system_deletion_failed"
    VERIFICATION_STARTED = "verification_started"
    VERIFICATION_COMPLETED = "verification_completed"
    REQUEST_COMPLETED = "request_completed"
    REQUEST_FAILED = "request_failed"


@dataclass
class DeletionAuditEvent:
    """An audit event for a deletion request."""
    id: str
    deletion_request_id: str
    event_type: AuditEventType
    timestamp: datetime
    actor: str  # Who/what triggered this event
    system_name: Optional[str]
    details: dict
    

class DeletionAuditService:
    """
    Records audit trail for all deletion activities.
    
    This audit log is retained even after deletion completes
    to prove compliance.
    """
    
    def __init__(self, db):
        self.db = db
    
    async def log_event(
        self,
        deletion_request_id: str,
        event_type: AuditEventType,
        actor: str,
        system_name: Optional[str] = None,
        details: dict = None
    ) -> DeletionAuditEvent:
        """
        Log an audit event.
        """
        event = DeletionAuditEvent(
            id=str(uuid.uuid4()),
            deletion_request_id=deletion_request_id,
            event_type=event_type,
            timestamp=datetime.utcnow(),
            actor=actor,
            system_name=system_name,
            details=details or {}
        )
        
        await self.db.execute(
            """
            INSERT INTO deletion_audit_log
            (id, deletion_request_id, event_type, timestamp, actor, system_name, details)
            VALUES ($1, $2, $3, $4, $5, $6, $7)
            """,
            event.id, event.deletion_request_id, event.event_type.value,
            event.timestamp, event.actor, event.system_name, event.details
        )
        
        return event
    
    async def get_audit_trail(
        self,
        deletion_request_id: str
    ) -> List[DeletionAuditEvent]:
        """
        Get complete audit trail for a deletion request.
        """
        rows = await self.db.fetch(
            """
            SELECT * FROM deletion_audit_log
            WHERE deletion_request_id = $1
            ORDER BY timestamp ASC
            """,
            deletion_request_id
        )
        
        return [
            DeletionAuditEvent(
                id=row["id"],
                deletion_request_id=row["deletion_request_id"],
                event_type=AuditEventType(row["event_type"]),
                timestamp=row["timestamp"],
                actor=row["actor"],
                system_name=row["system_name"],
                details=row["details"]
            )
            for row in rows
        ]
    
    async def generate_compliance_report(
        self,
        deletion_request_id: str
    ) -> dict:
        """
        Generate a compliance report for a deletion request.
        
        This can be provided to auditors or data protection authorities.
        """
        # Get the request
        request = await self.db.fetchone(
            "SELECT * FROM deletion_requests WHERE id = $1",
            deletion_request_id
        )
        
        # Get audit trail
        audit_trail = await self.get_audit_trail(deletion_request_id)
        
        # Build report
        report = {
            "report_generated_at": datetime.utcnow().isoformat(),
            "deletion_request": {
                "id": request["id"],
                "user_id": "REDACTED",  # Don't include actual user ID
                "requested_at": request["requested_at"].isoformat(),
                "requested_by": request["requested_by"],
                "reason": request["reason"],
                "deadline": request["deadline"].isoformat(),
                "status": request["status"],
                "completed_at": request["completed_at"].isoformat() if request["completed_at"] else None,
            },
            "systems_affected": [
                {
                    "system": t["system_name"],
                    "status": t["status"],
                    "records_deleted": t.get("records_deleted", 0),
                    "completed_at": t.get("completed_at")
                }
                for t in request["targets"]
            ],
            "audit_trail": [
                {
                    "timestamp": e.timestamp.isoformat(),
                    "event": e.event_type.value,
                    "system": e.system_name,
                    "actor": e.actor
                }
                for e in audit_trail
            ],
            "verification": request.get("verification_report", {}),
            "compliance_statement": self._generate_compliance_statement(request)
        }
        
        return report
    
    def _generate_compliance_statement(self, request: dict) -> str:
        """Generate a compliance statement for the report."""
        if request["status"] == "completed":
            return (
                f"This deletion request was completed on {request['completed_at']}. "
                f"All personal data for the data subject has been deleted from primary systems. "
                f"Data in backups will be fully purged according to our retention policy."
            )
        elif request["status"] == "partially_completed":
            return (
                "This deletion request has been partially completed. "
                "Some systems reported errors during deletion. "
                "Manual intervention may be required."
            )
        else:
            return f"This deletion request is currently in status: {request['status']}."

Part III: Real-World Application

Chapter 7: Case Studies

7.1 How Stripe Handles Deletion

STRIPE'S DATA DELETION APPROACH

Challenge:
├── Massive data spread across many systems
├── Financial data has legal retention requirements
├── Payment data subject to PCI DSS
├── Customers in multiple jurisdictions

Solution:

1. DATA CATEGORIZATION
   ├── Personal data: Name, email, address
   │   └── Delete or anonymize on request
   ├── Financial records: Transaction history
   │   └── Anonymize, retain for legal requirements
   ├── Payment instruments: Card numbers
   │   └── Already tokenized, delete tokens
   └── Audit logs: Access records
       └── Retain with anonymized references

2. DELETION WORKFLOW
   ├── Self-serve deletion via Dashboard API
   ├── Immediate deletion from primary stores
   ├── Async deletion from analytics/logs
   ├── Verification job confirms deletion
   └── Compliance report generated

3. RETENTION POLICY
   ├── Personal data: Delete on request
   ├── Transaction records: 7 years (tax/legal)
   │   └── Anonymized after customer deletion
   ├── Logs: 90 days standard
   └── Backups: 30 day retention

Lessons:
├── Distinguish delete vs anonymize vs retain
├── Self-serve deletion reduces support burden
├── Clear retention policies simplify compliance
└── Verification step catches edge cases

7.2 How Slack Handles Deletion

SLACK'S DATA DELETION APPROACH

Challenge:
├── Messages involve multiple users
├── Files shared across conversations
├── Search indexes contain message content
├── Enterprise customers need audit retention

Solution:

1. USER DELETION
   ├── Deactivate account immediately
   ├── Messages: Keep but show "Deleted User"
   ├── Files: Delete if sole owner
   │   └── Keep if shared, reassign ownership
   ├── DMs: Delete both sides' view
   └── Profile: Fully deleted

2. MESSAGE DELETION
   ├── User can delete own messages
   ├── Admins can delete any message
   ├── Files in messages: Separate deletion
   └── Search index updated async

3. WORKSPACE DELETION (Data Export + Delete)
   ├── Export all data first (GDPR portability)
   ├── 7-day grace period
   ├── Then hard delete everything
   └── Cannot be recovered

4. ENTERPRISE COMPLIANCE MODE
   ├── Org can require message retention
   ├── Users cannot delete in retention period
   ├── After retention: Normal deletion rules
   └── Legal hold can prevent all deletion

Lessons:
├── Collaborative content needs special handling
├── Show "Deleted User" preserves context
├── Enterprise compliance may override user rights
└── Grace period prevents accidental deletion

Chapter 8: Common Mistakes

8.1 Deletion Anti-Patterns

DELETION MISTAKES

❌ MISTAKE 1: Soft Delete Only

Wrong:
  async def delete_user(user_id):
      await db.execute(
          "UPDATE users SET deleted = true WHERE id = $1",
          user_id
      )
      # Done! User is "deleted"

Problem:
  Data still exists, not GDPR compliant
  Can be queried by mistake
  Backups still contain data

Right:
  async def delete_user(user_id):
      # Soft delete first (for grace period)
      await db.execute(
          "UPDATE users SET deleted = true, deleted_at = NOW() WHERE id = $1",
          user_id
      )
      
      # Schedule hard delete after grace period
      await schedule_hard_delete(user_id, delay_days=30)
      
      # Then actually delete
      await db.execute("DELETE FROM users WHERE id = $1", user_id)


❌ MISTAKE 2: Forgetting Foreign Keys

Wrong:
  async def delete_user(user_id):
      await db.execute("DELETE FROM users WHERE id = $1", user_id)
      # Fails with foreign key violation!

Problem:
  Orders, comments, etc. reference user_id
  Deletion fails or orphans records

Right:
  async def delete_user(user_id):
      async with db.transaction():
          # Delete leaf records first
          await db.execute("DELETE FROM notifications WHERE user_id = $1", user_id)
          await db.execute("DELETE FROM sessions WHERE user_id = $1", user_id)
          
          # Anonymize records we must keep
          await db.execute(
              "UPDATE orders SET user_id = NULL WHERE user_id = $1",
              user_id
          )
          
          # Finally delete user
          await db.execute("DELETE FROM users WHERE id = $1", user_id)


❌ MISTAKE 3: Not Deleting from Analytics

Wrong:
  async def delete_user(user_id):
      await db.execute("DELETE FROM users WHERE id = $1", user_id)
      # Forgot BigQuery, Mixpanel, Amplitude...

Problem:
  User data still in analytics systems
  Can be queried and linked back
  Violates deletion request

Right:
  async def delete_user(user_id):
      # Primary DB
      await db.execute("DELETE FROM users WHERE id = $1", user_id)
      
      # Analytics - all of them
      await bigquery.delete_user(user_id)
      await mixpanel.delete_user(user_id)
      await amplitude.delete_user(user_id)
      
      # Third parties
      await stripe.delete_customer(user_id)
      await intercom.delete_user(user_id)


❌ MISTAKE 4: No Verification

Wrong:
  async def delete_user(user_id):
      for system in systems:
          await system.delete(user_id)
      
      return {"status": "deleted"}  # Trust it worked!

Problem:
  Deletion might have failed silently
  No proof of deletion for auditors
  Can't answer "was user X deleted?"

Right:
  async def delete_user(user_id):
      for system in systems:
          await system.delete(user_id)
      
      # Verify deletion
      for system in systems:
          exists = await system.check_exists(user_id)
          if exists:
              raise DeletionVerificationError(f"Data still in {system}")
      
      # Log completion
      await audit_log.record_deletion_complete(user_id)
      
      return {"status": "verified_deleted"}


❌ MISTAKE 5: Deleting Audit Logs

Wrong:
  async def delete_user(user_id):
      # Delete everything including audit trail
      await db.execute("DELETE FROM audit_logs WHERE user_id = $1", user_id)
      await db.execute("DELETE FROM users WHERE id = $1", user_id)

Problem:
  Can't prove what happened to the data
  Compliance audit will fail
  Suspicious - looks like cover-up

Right:
  async def delete_user(user_id):
      # Anonymize audit logs, don't delete
      await db.execute(
          "UPDATE audit_logs SET user_id = 'DELETED' WHERE user_id = $1",
          user_id
      )
      
      # Add deletion record to audit log
      await db.execute(
          "INSERT INTO audit_logs (action, user_id, timestamp) VALUES ($1, $2, $3)",
          "USER_DELETED", "DELETED", datetime.now()
      )
      
      await db.execute("DELETE FROM users WHERE id = $1", user_id)

Part IV: Interview Preparation

Chapter 9: Interview Tips

9.1 Deletion Discussion Framework

DISCUSSING DELETION IN INTERVIEWS

When the topic comes up:

1. ACKNOWLEDGE THE COMPLEXITY
   "Deletion sounds simple but is actually one of the hardest
    compliance challenges. Data is spread across many systems,
    and you need to prove it's actually gone."

2. LIST THE CHALLENGES
   "The main challenges are:
    - Data fragmentation across systems
    - Foreign key relationships
    - Third-party processors
    - Backup retention
    - Proving deletion happened"

3. PROPOSE A SYSTEMATIC APPROACH
   "I'd implement:
    - A data inventory mapping all PII locations
    - A deletion orchestration service
    - System-specific executors for each data store
    - Verification that confirms deletion
    - Audit trail that survives the deletion"

4. ADDRESS SPECIAL CASES
   "Some data can't be deleted:
    - Financial records: Anonymize instead
    - Audit logs: Keep with anonymized references
    - Backups: Let expire per retention policy
    - Shared content: Show 'Deleted User'"

5. MENTION COMPLIANCE
   "GDPR gives 30 days to respond. I'd implement:
    - Dashboard for users to request deletion
    - Automated workflow with manual review option
    - Progress tracking visible to user
    - Compliance report generation for auditors"

9.2 Key Phrases

DELETION KEY PHRASES

On Architecture:
"I'd build a deletion orchestration service that coordinates
deletion across all systems. Each system has an executor that
knows how to delete data and verify it's gone. The orchestrator
tracks progress and handles failures."

On Data Mapping:
"The first step is a data inventory - mapping every place
personal data lives. Without knowing where data is, you can't
delete it. This includes primary databases, caches, search
indexes, analytics, third-party services, and backups."

On Verification:
"Deletion without verification is incomplete. After deletion,
the system queries each data store to confirm the user's data
is actually gone. Only then is the deletion marked complete."

On Backups:
"Backups are the tricky part. You can't surgically remove one
user from a backup. The solution is clear retention policies -
backups expire after 30-90 days. Until then, if a backup is
restored, we apply pending deletions to the restored data."

On Audit Trail:
"The paradox of deletion is you need to prove it happened.
I'd keep an anonymized audit trail - recording that 'user
DELETED-12345 was deleted on date X' without the actual PII.
This satisfies auditors while respecting the deletion."

Chapter 10: Practice Problems

Problem 1: Social Media Platform

Scenario: Your social platform has users with posts, comments, likes, and followers. A user requests deletion under GDPR.

Questions:

  1. What happens to their posts that others have commented on?
  2. What about posts they're mentioned in?
  3. How do you handle their follower relationships?
  • Posts they authored: Delete or show "Deleted User"
  • Comments on others' posts: Anonymize ("Deleted User said...")
  • Mentions: Replace @username with @deleted
  • Followers: Delete both sides of relationship
  • Consider what happens to replies to their posts

Scenario: Your e-commerce platform must keep transaction records for 7 years for tax purposes. A user requests deletion after making purchases.

Questions:

  1. How do you handle the conflict between deletion and retention?
  2. What data can you delete, and what must you keep?
  3. How do you explain this to the user?
  • Account data: Delete (name, email, preferences)
  • Transaction records: Anonymize but retain
  • Shipping addresses: GDPR allows deletion even from invoices
  • Explain: "Account deleted. Financial records anonymized per legal requirements."
  • Document the legal basis for retention

Chapter 11: Sample Interview Dialogue

Interviewer: "A user requests deletion of all their data. Walk me through how you'd handle it."

You: "This is a GDPR Article 17 request. Let me walk through the systematic approach.

First, I need to know where data lives. I'd maintain a data inventory mapping every system with personal data. For a typical SaaS, that's:

  • Primary database (user profile, content)
  • Search index (user in search results)
  • Cache (session, preferences)
  • File storage (uploads)
  • Analytics (behavioral data)
  • Third-party services (payment, support)

The deletion flow would be:"

User clicks "Delete Account"
         │
         ▼
┌─────────────────────────────┐
│  Create Deletion Request    │
│  - Record request details   │
│  - Start 30-day timer       │
│  - Notify user of timeline  │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│  Execute Deletion           │
│  - Delete from cache first  │
│  - Then search index        │
│  - Then file storage        │
│  - Then analytics           │
│  - Then third parties       │
│  - Finally primary DB       │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│  Verify Deletion            │
│  - Query each system        │
│  - Confirm data is gone     │
│  - Flag failures for review │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│  Complete and Notify        │
│  - Update audit log         │
│  - Send confirmation email  │
│  - Generate compliance cert │
└─────────────────────────────┘

Interviewer: "What about their order history? We need that for financial records."

You: "Good point - there's a conflict between deletion right and legal retention. Here's how I'd handle it:

The user record gets deleted - name, email, preferences. But the order records get anonymized rather than deleted. The order stays with user_id = NULL and addresses redacted as 'DELETED'.

The legal basis for retention is Article 17(3)(b) - 'compliance with a legal obligation.' We'd document this and explain to the user: 'Your account has been deleted. Transaction records are retained in anonymized form as required by tax law.'

This way we satisfy both the deletion request and the 7-year tax retention requirement."

Interviewer: "How do you handle backups?"

You: "Backups are the hardest part because you can't surgically remove one user.

My approach is a clear retention policy. Say 30-day backup retention. We track pending deletions, and the compliance report notes: 'Primary systems deleted on Day 0. Backup data will be fully purged by Day 30.'

If we need to restore from backup before Day 30, we apply all pending deletions to the restored data before it goes live. This is why tracking pending deletions is critical."


Summary

DAY 4 KEY TAKEAWAYS

DELETION REQUIREMENTS (GDPR):
├── Respond within 30 days
├── Delete from all systems
├── Inform third parties
├── Exceptions: Legal obligations, public interest
└── Must be able to prove deletion

DELETION STRATEGIES:
├── Hard delete: Physically remove
├── Soft delete: Mark as deleted (not sufficient alone)
├── Anonymize: Remove identifying info, keep record
├── Aggregate: Combine to remove individual
└── Expire: Let retention policy handle

DATA INVENTORY:
├── Map all PII locations
├── Document deletion method per system
├── Track dependencies (foreign keys)
├── Include third parties and backups
└── Update when adding new systems

DELETION ARCHITECTURE:
├── Orchestration service coordinates
├── System executors handle specifics
├── Order matters (dependencies)
├── Verification confirms success
└── Audit trail proves compliance

SPECIAL CASES:
├── Backups: Let expire, track pending
├── Financial records: Anonymize, retain
├── Shared content: Show "Deleted User"
├── Audit logs: Anonymize, don't delete
└── Third parties: API calls or manual

VERIFICATION:
├── Query each system after deletion
├── Confirm no data returned
├── Flag failures for manual review
├── Generate compliance report
└── Keep proof of deletion

DEFAULT APPROACH:
├── Build data inventory first
├── Implement executor per system
├── Orchestrate with dependency order
├── Always verify
├── Keep anonymized audit trail

Further Reading

Official Resources:

Engineering Blogs:

  • Slack: "How Slack Handles Data Deletion"
  • Airbnb: "GDPR Compliance at Airbnb"

Tools:

  • OneTrust (privacy management)
  • BigID (data discovery)
  • Transcend (data deletion automation)

End of Day 4: Right to Deletion

Tomorrow: Day 5 — Security Architecture. We'll bring together all the concepts with defense in depth, zero trust, secrets management, and security-first design.