Week 6 Preview: Designing a Notification Platform
π― One System, Five Days, Complete Mastery
Week 6 Philosophy
Unlike Weeks 1-5 where each day covered a new concept, Week 6-8 are immersive practical weeks. We take ONE complex real-world system and spend the entire week designing it end-to-end.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β WEEK 6-8 APPROACH β
β β
β WEEKS 1-5: Learn concepts β
β βββββββ βββββββ βββββββ βββββββ βββββββ β
β βDay 1β βDay 2β βDay 3β βDay 4β βDay 5β β
β βTopicβ βTopicβ βTopicβ βTopicβ βTopicβ β
β β A β β B β β C β β D β β E β β
β βββββββ βββββββ βββββββ βββββββ βββββββ β
β β
β WEEKS 6-8: Apply concepts to ONE real system β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β NOTIFICATION PLATFORM β β
β β Day 1 Day 2 Day 3 Day 4 Day 5 β β
β β Problem Core Advanced Scale & Operations β β
β β & Design Flows Features Edge Cases & Interview β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Why Notification Platform?
A notification platform is the perfect teaching system because it touches EVERYTHING:
CONCEPTS FROM WEEKS 1-5 APPLIED:
Week 1 (Data at Scale):
βββ Partitioning notification queues
βββ Replication for delivery guarantees
βββ Rate limiting per user/channel
βββ Hot key handling (celebrity notifications)
βββ Session management for push tokens
Week 2 (Failure-First Design):
βββ Timeout management with external providers
βββ Idempotency for duplicate prevention
βββ Circuit breakers for failing providers
βββ Retry strategies per channel
βββ Dead letter handling
Week 3 (Messaging & Async):
βββ Queue vs stream for notification pipeline
βββ Transactional outbox for reliable publishing
βββ Backpressure from external providers
βββ Dead letter queues for failed notifications
βββ Audit logging for compliance
Week 4 (Caching):
βββ User preference caching
βββ Template caching
βββ Device token caching
βββ Rate limit counter caching
βββ Provider health caching
Week 5 (Consistency & Coordination):
βββ Consistency for preference updates
βββ Saga for multi-channel notifications
βββ Workflow orchestration for complex flows
βββ Conflict resolution for preference sync
βββ Leader election for batch processors
The Problem Statement
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β Design a Multi-Channel Notification Platform β
β β
β You're building the notification infrastructure for a fintech β
β super-app (like Revolut, Cash App, or PayTM). Users receive β
β notifications about: β
β β
β β’ Transactions (payments, transfers, refunds) β
β β’ Security alerts (login, password change, suspicious activity) β
β β’ Marketing campaigns (promotions, new features) β
β β’ Reminders (bills due, low balance, scheduled payments) β
β β’ Social (friend requests, splits, payment requests) β
β β
β Channels: Push, Email, SMS, In-App, WhatsApp β
β β
β Scale: β
β β’ 50M users β
β β’ 500M notifications/day β
β β’ 10M notifications/hour during campaigns β
β β’ 99.9% delivery SLA for critical notifications β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Week 6 Daily Breakdown
Day 1: Problem Understanding & High-Level Design
Theme: "Before you solve it, understand it deeply"
WHAT WE'LL COVER:
1. INTERVIEW APPROACH
βββ How to clarify requirements (with sample dialogue)
βββ Functional vs non-functional requirements
βββ Asking the RIGHT questions
βββ Establishing constraints and priorities
2. DOMAIN DEEP DIVE
βββ Notification types and their characteristics
βββ Channel comparison (push vs email vs SMS vs...)
βββ User preferences and consent management
βββ Regulatory requirements (GDPR, CAN-SPAM, etc.)
3. BACK OF ENVELOPE ESTIMATION
βββ Traffic patterns (steady state vs campaigns)
βββ Storage requirements
βββ Provider costs and limits
βββ Infrastructure sizing
4. HIGH-LEVEL ARCHITECTURE
βββ Component identification
βββ Data flow design
βββ Technology choices with justification
βββ API design (ingestion, preferences, status)
5. SCHEMA DESIGN
βββ Notification schema
βββ User preferences schema
βββ Delivery status schema
βββ Audit log schema
WEEK 1-5 CONCEPTS APPLIED:
β’ Partitioning strategy (Week 1)
β’ Queue vs stream decision (Week 3)
β’ Consistency model for preferences (Week 5)
Day 2: Core Notification Flow
Theme: "The happy path must be bulletproof"
WHAT WE'LL COVER:
1. NOTIFICATION INGESTION
βββ API design for sending notifications
βββ Validation and enrichment
βββ Priority classification
βββ Idempotency handling
2. ROUTING AND CHANNEL SELECTION
βββ User preference lookup
βββ Channel eligibility rules
βββ Fallback channel logic
βββ Time-zone aware delivery
3. QUEUE ARCHITECTURE
βββ Topic/partition design
βββ Priority queues
βββ Per-channel queues
βββ Ordering guarantees
4. PROVIDER INTEGRATION
βββ Provider abstraction layer
βββ Push: FCM, APNs integration
βββ Email: SendGrid, SES integration
βββ SMS: Twilio, SNS integration
βββ WhatsApp: Business API integration
5. DELIVERY TRACKING
βββ Status state machine
βββ Delivery receipts and callbacks
βββ Bounce handling
βββ Read receipts (where available)
6. IMPLEMENTATION
βββ Complete code for notification service
βββ Provider integration code
βββ Database operations
βββ Unit and integration tests
WEEK 1-5 CONCEPTS APPLIED:
β’ Transactional outbox (Week 3)
β’ Idempotency patterns (Week 2)
β’ Rate limiting (Week 1)
β’ Timeout management (Week 2)
Day 3: Advanced Features
Theme: "The features that separate good from great"
WHAT WE'LL COVER:
1. TEMPLATE SYSTEM
βββ Template storage and versioning
βββ Personalization and variables
βββ Localization (i18n)
βββ A/B testing support
βββ Template preview and validation
2. BATCHING AND DIGESTS
βββ When to batch (trading notifications)
βββ Digest generation (daily summary)
βββ Smart batching algorithms
βββ User-configurable digest preferences
3. SCHEDULING AND DELAYED DELIVERY
βββ Scheduled notifications
βββ Time-zone aware scheduling
βββ "Best time to send" algorithms
βββ Reminder workflows
βββ Cancellation handling
4. MULTI-CHANNEL ORCHESTRATION
βββ Saga pattern for multi-channel
βββ Fallback chains (push β SMS β email)
βββ Escalation workflows
βββ Channel coordination
5. USER PREFERENCE MANAGEMENT
βββ Preference hierarchy (global β category β channel)
βββ Quiet hours / Do Not Disturb
βββ Frequency capping
βββ Unsubscribe handling
6. REAL-TIME FEATURES
βββ In-app notification center
βββ WebSocket delivery
βββ Read/unread state sync
βββ Notification grouping
WEEK 1-5 CONCEPTS APPLIED:
β’ Saga pattern (Week 5)
β’ Workflow orchestration (Week 5)
β’ Caching strategies (Week 4)
β’ Conflict resolution for preferences (Week 5)
Day 4: Scale, Reliability & Edge Cases
Theme: "What breaks at scale? Everything."
WHAT WE'LL COVER:
1. SCALING CHALLENGES
βββ Campaign mode (10M notifications in 1 hour)
βββ Hot user problem (celebrity with 1M followers)
βββ Provider rate limits
βββ Database bottlenecks
2. RELIABILITY PATTERNS
βββ Circuit breakers per provider
βββ Retry strategies with backoff
βββ Dead letter queue processing
βββ Provider failover
βββ Graceful degradation
3. EDGE CASES (THE HARD STUFF)
βββ User uninstalls app mid-notification
βββ Phone number/email changes
βββ Device token rotation
βββ Duplicate device registrations
βββ Time zone edge cases (DST)
βββ Provider outages
βββ Partial delivery (multi-channel)
βββ Race conditions in preferences
4. FAILURE SCENARIOS
βββ Database failure
βββ Queue failure
βββ Provider failure (all channels)
βββ Network partition
βββ Cascading failures
5. DATA CONSISTENCY
βββ Exactly-once delivery (is it possible?)
βββ At-least-once with deduplication
βββ Preference consistency across devices
βββ Status consistency
6. COST OPTIMIZATION
βββ Provider cost comparison
βββ Batching for cost reduction
βββ Smart channel selection
βββ Reducing unnecessary notifications
WEEK 1-5 CONCEPTS APPLIED:
β’ Circuit breakers (Week 2)
β’ Backpressure handling (Week 3)
β’ Dead letter queues (Week 3)
β’ Hot key mitigation (Week 1)
β’ Leader election for processors (Week 5)
Day 5: Operations, Monitoring & Interview Mastery
Theme: "Ship it, run it, ace the interview"
WHAT WE'LL COVER:
1. OBSERVABILITY
βββ Key metrics (delivery rate, latency, cost)
βββ Distributed tracing setup
βββ Log aggregation
βββ Custom dashboards
2. ALERTING STRATEGY
βββ SLOs and SLIs definition
βββ Alert hierarchy (critical/warning/info)
βββ On-call runbooks
βββ Incident response
3. OPERATIONAL TOOLING
βββ Admin dashboard features
βββ Notification search and debugging
βββ Manual retry interface
βββ Provider health dashboard
βββ Cost monitoring
4. DEPLOYMENT & ROLLOUT
βββ Canary deployment
βββ Feature flags for new channels
βββ Database migrations
βββ Rollback procedures
5. INTERVIEW WALKTHROUGH
βββ Complete 45-minute interview simulation
βββ Common interviewer questions
βββ How to handle curveballs
βββ Trade-off discussions
βββ What NOT to say
6. REAL-WORLD CASE STUDIES
βββ How Uber built their notification platform
βββ How Slack handles notifications
βββ How WhatsApp scaled messaging
βββ Lessons from production incidents
7. COMPLETE SYSTEM SUMMARY
βββ Architecture diagram (final)
βββ Component interaction matrix
βββ Decision log with rationale
βββ Future improvements roadmap
WEEK 1-5 CONCEPTS APPLIED:
β’ All concepts integrated
β’ Full system thinking
β’ Production readiness
What Makes This Week Different
1. Interview-Focused Approach
Each day includes dialogue showing HOW to present in an interview:
EXAMPLE FROM DAY 2:
**Interviewer**: "Walk me through how a notification gets sent."
**You**: "Let me trace a transaction notification end-to-end.
First, when a payment completes, the payment service publishes a
PaymentCompleted event. I'd use the transactional outbox pattern
here β we write the event to an outbox table in the same transaction
as the payment, then a separate process publishes to Kafka.
The notification service consumes this event and..."
2. Production Reality
We don't just design β we discuss what ACTUALLY breaks:
EXAMPLE FROM DAY 4:
EDGE CASE: Device Token Rotation
Problem:
iOS rotates device tokens periodically
Old token stored in our database
Push notification fails with "InvalidToken"
If we just retry:
Same failure, wasted resources
User never gets notification
Production solution:
1. Detect InvalidToken response
2. Mark token as invalid in DB
3. Route to fallback channel (email/SMS)
4. When app opens next, register new token
5. Have background job to clean stale tokens
3. Complete Implementation
Not pseudo-code β production-ready patterns:
# Example from Day 2: Provider abstraction
class NotificationProvider(ABC):
"""Base class for notification providers."""
@abstractmethod
async def send(self, notification: Notification) -> DeliveryResult:
"""Send notification through this provider."""
pass
@abstractmethod
async def check_health(self) -> HealthStatus:
"""Check provider health."""
pass
@abstractmethod
def get_rate_limit(self) -> RateLimit:
"""Get current rate limit info."""
pass
class FCMProvider(NotificationProvider):
"""Firebase Cloud Messaging provider."""
async def send(self, notification: Notification) -> DeliveryResult:
# Full implementation with error handling,
# retries, token validation, etc.
...
4. Edge Case Exhaustiveness
Every edge case you might face in production OR interviews:
EDGE CASES WE'LL COVER:
Notification Creation:
βββ Duplicate notification requests
βββ Invalid user ID
βββ User doesn't exist
βββ Missing required fields
βββ Template not found
βββ Invalid channel specified
Delivery:
βββ Invalid device token
βββ Expired device token
βββ User unsubscribed
βββ Provider timeout
βββ Provider rate limited
βββ Provider returns unknown error
βββ Partial multi-channel delivery
βββ Network failure mid-delivery
User State:
βββ User deletes account mid-notification
βββ User changes email during send
βββ User in multiple time zones
βββ User preferences change during send
βββ User blocks sender
βββ User marks as spam
System:
βββ Database failover during write
βββ Kafka partition rebalance
βββ Worker crashes mid-processing
βββ Clock skew between services
βββ Memory pressure
βββ Provider certificate expiry
Concepts Mapping
Here's how Week 1-5 concepts map to Week 6:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CONCEPT APPLICATION MAP β
β β
β CONCEPT β WHERE APPLIED IN NOTIFICATION PLATFORM β
β βββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββ
β β β
β WEEK 1: DATA AT SCALE β β
β Partitioning β Kafka topics by priority/channel β
β Replication β PostgreSQL replicas for reads β
β Rate Limiting β Per-user, per-channel, per-provider β
β Hot Keys β Celebrity notifications fan-out β
β Session Store β Device token management β
β β β
β WEEK 2: FAILURE-FIRST β β
β Timeouts β Provider API timeouts β
β Idempotency β Deduplication keys β
β Circuit Breakers β Per-provider circuit breakers β
β Retries β Exponential backoff for failures β
β β β
β WEEK 3: MESSAGING β β
β Queue vs Stream β Kafka for notifications pipeline β
β Transactional Outbox β Reliable event publishing β
β Backpressure β Provider rate limit handling β
β Dead Letter Queue β Failed notification handling β
β Audit Log β Notification audit trail β
β β β
β WEEK 4: CACHING β β
β Cache Patterns β User preferences cache β
β Invalidation β Preference change propagation β
β Thundering Herd β Template cache warming β
β Multi-Tier β Local + Redis + DB β
β β β
β WEEK 5: COORDINATION β β
β Consistency β Preference read-your-writes β
β Saga β Multi-channel delivery β
β Workflow β Complex notification flows β
β Conflict Resolution β Preference sync across devices β
β Leader Election β Batch processor coordination β
β β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Expected Outcomes
By the end of Week 6, you will:
β Design a notification platform from scratch in 45 minutes
β Handle ANY edge case interviewer throws at you
β Explain trade-offs with confidence
β Write production-quality code for key components
β Debug notification delivery issues
β Understand real-world notification systems (Uber, Slack, etc.)
β Size infrastructure correctly
β Design monitoring and alerting
β Handle multi-provider failover
β Implement preference management correctly
Ready to Start?
Day 1 begins with understanding the problem deeply and creating the high-level design.
We'll approach it exactly as you would in an interview β clarifying requirements, estimating scale, and making initial architecture decisions.
Let's build a world-class notification platform! π
Week 6 of the System Design Mastery Series