MCQ Assessment
Week 1 Data at Scale - MCQ Assessment
150 QuestionsAnswered: 0 / 150
Part 1: Partitioning (Sharding) Deep Dive
Questions 1-20
Question 1
What is the primary reason for partitioning (sharding) a database?
Question 2
In hash partitioning using partition = hash(key) % N, what happens when you add a new partition (increase N)?
Question 3
Which partitioning strategy is BEST for range queries like "find all orders between timestamps X and Y"?
Question 4
What is the main weakness of range partitioning with auto-incrementing IDs?
Question 5
What does consistent hashing solve?
Question 6
In consistent hashing with virtual nodes, what is the purpose of having multiple virtual nodes per physical server?
Question 7
What is directory-based (lookup) partitioning?
Question 8
What is the main disadvantage of directory-based partitioning?
Question 9
In a partitioned URL shortener storing 100M URLs, if you use hash partitioning with 10 partitions, approximately how many URLs per partition?
Question 10
What type of query becomes expensive with hash partitioning?
Question 11
Which partitioning strategy gives you the most flexibility to handle data skew?
Question 12
What is a "cross-shard query"?
Question 13
When using consistent hashing with 3 replicas, what happens to data placement when one node fails?
Question 14
What is the "thundering herd" problem in partitioned systems?
Question 15
Which statement about hash partitioning is TRUE?
Question 16
What is the main advantage of range partitioning over hash partitioning?
Question 17
In a time-series database partitioned by timestamp ranges, what problem commonly occurs?
Question 18
What is "partition skew"?
Question 19
How does secondary index partitioning differ from primary data partitioning?
Part 2: Replication Trade-offs
Questions 21-40
Question 20
What is "partition tolerance" in the context of the CAP theorem?
Question 21
What are the three main reasons to replicate data?
Question 22
In synchronous replication, what happens before the leader acknowledges a write to the client?
Question 23
What is the main advantage of asynchronous replication over synchronous?
Question 24
What is replication lag?
Question 25
What happens during a "split-brain" scenario in leader-follower replication?
Question 26
In semi-synchronous replication, what is the typical strategy?
Question 27
What is "read-your-writes" consistency?
Question 28
In multi-leader replication, what is the biggest challenge?
Question 29
What is a common conflict resolution strategy in multi-leader replication?
Question 30
What is "eventual consistency"?
Question 31
In leaderless replication (like Dynamo/Cassandra), what are quorum reads and writes?
Question 32
With 5 replicas, if you write with W=3 and read with R=3, what consistency guarantee do you get?
Question 33
What is a "hinted handoff" in leaderless replication?
Question 34
What is the Read Repair mechanism?
Question 35
In leader-follower replication, if the leader crashes, what must happen?
Question 36
What is the main risk of automatic failover?
Question 37
Which consistency model is STRONGEST?
Question 38
Why is multi-leader replication commonly used for multi-datacenter deployment?
Question 39
What is the "write ahead log" (WAL) used for in replication?
Part 3: Rate Limiting at Scale
Questions 41-60
Question 40
In a system with async replication, what is the maximum data loss if the leader fails?
Question 41
What is the primary purpose of rate limiting?
Question 42
In the Fixed Window Counter algorithm, what is the main weakness?
Question 43
In a Fixed Window Counter with limit of 100 requests per minute, if a user makes 100 requests at 00:59 and 100 at 01:01, what happens?
Question 44
What does the Sliding Window Log algorithm store?
Question 45
What is the main advantage of Sliding Window Log over Fixed Window Counter?
Question 46
What is the main disadvantage of Sliding Window Log?
Question 47
In the Token Bucket algorithm, tokens are added at what rate?
Question 48
What does the Token Bucket algorithm allow that Fixed Window doesn't?
Question 49
In a Token Bucket with rate = 10 tokens/sec and capacity = 50 tokens, if a user hasn't made requests for 10 seconds, how many requests can they make immediately?
Question 50
What is the Sliding Window Counter (Hybrid) algorithm?
Question 51
In distributed rate limiting, what is the main challenge?
Question 52
What is a common solution for distributed rate limiting?
Question 53
What happens if your rate limit store (e.g., Redis) becomes unavailable?
Question 54
What does "fail open" mean in rate limiting?
Question 55
Which HTTP status code should you return when rate limiting rejects a request?
Question 56
What headers should you include in rate limit responses?
Question 57
What is "adaptive rate limiting"?
Question 58
In a multi-tier rate limiting system, where should the first layer typically be?
Question 59
What is the Leaky Bucket algorithm?
Part 4: Hot Keys and Skew
Questions 61-80
Question 60
How does Leaky Bucket differ from Token Bucket?
Question 61
What is a "hot key"?
Question 62
What distribution law describes traffic patterns in most systems?
Question 63
According to Zipf's Law, approximately what percentage of traffic do the top 1% of keys receive?
Question 64
What is "partition skew"?
Question 65
Which scenario represents a predictable hot key?
Question 66
What is the most common first-line defense against hot keys?
Question 67
In the "local cache" pattern for hot keys, where is the cache located?
Question 68
What is the risk of using local caches for hot key mitigation?
Question 69
What is "key splitting" or "key cloning" as a hot key mitigation?
Question 70
With key splitting, if you clone a hot key 10 times, what is the effect on each partition?
Question 71
What is the main downside of key splitting?
Question 72
What is "probabilistic early expiration" for cache entries?
Question 73
How can you detect hot keys in real-time?
Question 74
What is the "thundering herd" problem with hot keys?
Question 75
What is "request coalescing" for hot keys?
Question 76
In the context of hot keys, what is "write amplification"?
Question 77
For a URL shortener, if bit.ly/popular gets 1M requests/sec, what architecture is most appropriate?
Question 78
What is "shard consolidation" for hot partition mitigation?
Question 79
What monitoring metric is MOST important for detecting hot keys?
Part 5: Session Store Design
Questions 81-100
Question 80
Why can't you simply "scale the database" to handle hot keys?
Question 81
What is a session in web applications?
Question 82
What is typically stored in a session?
Question 83
What is the typical latency requirement for session store reads?
Question 84
For 10 million concurrent users with 2KB sessions each, what is the total storage requirement?
Question 85
What is "sticky sessions" or "session affinity"?
Question 86
What is the main disadvantage of sticky sessions?
Question 87
In a distributed session store architecture, where are sessions stored?
Question 88
What consistency level is required for session stores?
Question 89
What is the typical approach to session expiration?
Question 90
What happens if a session store becomes unavailable?
Question 91
What is the typical read-to-write ratio for session stores?
Question 92
Why is Redis commonly chosen for session stores?
Question 93
What is "session stealing" or "session hijacking"?
Question 94
How can you protect against session hijacking?
Question 95
What is "session fixation"?
Question 96
What should you do after a user logs in to prevent session fixation?
Question 97
In a Redis cluster for sessions, what partitioning strategy is commonly used?
Question 98
What is the advantage of client-side sessions (JWT)?
Question 99
What is the main disadvantage of JWT-based sessions?
Question 100
For a globally distributed application, what session store strategy is best?
Question 101
What is "session stickyness" different from "session affinity"?
Question 102
When using Redis for sessions, what persistence strategy is recommended?
Question 103
What is a typical session ID format?
Question 104
How should session IDs be transmitted to clients?
Question 105
What is "session timeout" and what are typical values?
Question 106
What happens to sessions during a rolling deployment with distributed session store?
Question 107
For a chat application with 10M concurrent users and real-time presence, what additional session challenge exists?
Question 108
What is "session replication" in the context of Redis?
Question 109
In Redis Sentinel for session HA, what is the typical failover time?
Question 110
What metrics should you monitor for session store health?
Question 111
What is "session data compression"?
Question 112
When should you use a database (PostgreSQL/MySQL) as a session store?
Question 113
What is "lazy session loading"?
Question 114
For a multi-tenant SaaS with millions of tenants, how should sessions be partitioned?
Question 115
What is the "session stickiness coefficient" in load balancers?
Question 116
Why do real-time applications (WebSocket) especially benefit from sticky sessions?
Question 117
What is the trade-off of increasing session TTL from 30 minutes to 24 hours?
Question 118
In Redis cluster mode, how many master nodes are typically used for session storage at scale?
Question 119
What is "session flooding" attack?
Part 6: Integration and Advanced Concepts
Questions 121-150
Question 120
How can you mitigate session flooding attacks?
Question 121
When designing a URL shortener at scale, which combination is most critical?
Question 122
For an analytics system processing clickstream data, which partitioning strategy is best?
Question 123
What is the correct order of operations when handling a cache miss on a hot key with multiple concurrent requests?
Question 124
In a globally distributed system with multi-leader replication, what conflict resolution is most pragmatic?
Question 125
For a real-time leaderboard (gaming) with 100M players, what storage architecture is most appropriate?
Question 126
What is the relationship between consistent hashing and hot keys?
Question 127
When using both partitioning and replication, what is the typical architecture?
Question 128
For a social media feed with 1B users, what is the biggest challenge?
Question 129
In a rate-limited system with hot keys, what happens?
Question 130
What is "write skew" in distributed databases?
Question 131
For a payment system requiring ACID guarantees, what storage is most appropriate?
Question 132
What is "read-after-write consistency" and when is it needed?
Question 133
In a partitioned system with hot keys, what is the best mitigation strategy?
Question 134
What is "partition key" versus "clustering key" in systems like Cassandra?
Question 135
For a video streaming platform serving billions of views, what caching strategy is optimal?
Question 136
What is "gossip protocol" in distributed systems?
Question 137
When should you use synchronous replication?
Question 138
What is "quorum" in distributed consensus?
Question 139
For a ride-sharing app matching drivers and riders, what is the critical challenge?
Question 140
What is "cache stampede"?
Question 141
How does cache stampede relate to hot keys?
Question 142
What is "eventual consistency window"?
Question 143
For a stock trading platform, what consistency requirement is needed?
Question 144
What is "denormalization" and when is it useful in partitioned systems?
Question 145
What is "backpressure" in data streaming?
Question 146
In a microservices architecture, where should rate limiting be implemented?
Question 147
What is "sharding key" and how should you choose it?
Question 148
For a messaging system (WhatsApp, Telegram), what partitioning strategy for messages?
Question 149
What happens when you combine async replication, network partition, and failover?
Question 150
What is the most important takeaway from Week 1: Data at Scale?
💬 Public Discussion: Comments are visible to all users. Please be respectful and mindful of what you share.
Discussion (0)
Sort by:
Sign in to join the discussion