Himanshu Kukreja
0%
LearnSystem DesignWeek 1Week 1 Data at Scale - MCQ Assessment
MCQ Assessment

Week 1 Data at Scale - MCQ Assessment

150 QuestionsAnswered: 0 / 150

Part 1: Partitioning (Sharding) Deep Dive

Questions 1-20

Question 1

What is the primary reason for partitioning (sharding) a database?

Question 2

In hash partitioning using partition = hash(key) % N, what happens when you add a new partition (increase N)?

Question 3

Which partitioning strategy is BEST for range queries like "find all orders between timestamps X and Y"?

Question 4

What is the main weakness of range partitioning with auto-incrementing IDs?

Question 5

What does consistent hashing solve?

Question 6

In consistent hashing with virtual nodes, what is the purpose of having multiple virtual nodes per physical server?

Question 7

What is directory-based (lookup) partitioning?

Question 8

What is the main disadvantage of directory-based partitioning?

Question 9

In a partitioned URL shortener storing 100M URLs, if you use hash partitioning with 10 partitions, approximately how many URLs per partition?

Question 10

What type of query becomes expensive with hash partitioning?

Question 11

Which partitioning strategy gives you the most flexibility to handle data skew?

Question 12

What is a "cross-shard query"?

Question 13

When using consistent hashing with 3 replicas, what happens to data placement when one node fails?

Question 14

What is the "thundering herd" problem in partitioned systems?

Question 15

Which statement about hash partitioning is TRUE?

Question 16

What is the main advantage of range partitioning over hash partitioning?

Question 17

In a time-series database partitioned by timestamp ranges, what problem commonly occurs?

Question 18

What is "partition skew"?

Question 19

How does secondary index partitioning differ from primary data partitioning?

Part 2: Replication Trade-offs

Questions 21-40

Question 20

What is "partition tolerance" in the context of the CAP theorem?

Question 21

What are the three main reasons to replicate data?

Question 22

In synchronous replication, what happens before the leader acknowledges a write to the client?

Question 23

What is the main advantage of asynchronous replication over synchronous?

Question 24

What is replication lag?

Question 25

What happens during a "split-brain" scenario in leader-follower replication?

Question 26

In semi-synchronous replication, what is the typical strategy?

Question 27

What is "read-your-writes" consistency?

Question 28

In multi-leader replication, what is the biggest challenge?

Question 29

What is a common conflict resolution strategy in multi-leader replication?

Question 30

What is "eventual consistency"?

Question 31

In leaderless replication (like Dynamo/Cassandra), what are quorum reads and writes?

Question 32

With 5 replicas, if you write with W=3 and read with R=3, what consistency guarantee do you get?

Question 33

What is a "hinted handoff" in leaderless replication?

Question 34

What is the Read Repair mechanism?

Question 35

In leader-follower replication, if the leader crashes, what must happen?

Question 36

What is the main risk of automatic failover?

Question 37

Which consistency model is STRONGEST?

Question 38

Why is multi-leader replication commonly used for multi-datacenter deployment?

Question 39

What is the "write ahead log" (WAL) used for in replication?

Part 3: Rate Limiting at Scale

Questions 41-60

Question 40

In a system with async replication, what is the maximum data loss if the leader fails?

Question 41

What is the primary purpose of rate limiting?

Question 42

In the Fixed Window Counter algorithm, what is the main weakness?

Question 43

In a Fixed Window Counter with limit of 100 requests per minute, if a user makes 100 requests at 00:59 and 100 at 01:01, what happens?

Question 44

What does the Sliding Window Log algorithm store?

Question 45

What is the main advantage of Sliding Window Log over Fixed Window Counter?

Question 46

What is the main disadvantage of Sliding Window Log?

Question 47

In the Token Bucket algorithm, tokens are added at what rate?

Question 48

What does the Token Bucket algorithm allow that Fixed Window doesn't?

Question 49

In a Token Bucket with rate = 10 tokens/sec and capacity = 50 tokens, if a user hasn't made requests for 10 seconds, how many requests can they make immediately?

Question 50

What is the Sliding Window Counter (Hybrid) algorithm?

Question 51

In distributed rate limiting, what is the main challenge?

Question 52

What is a common solution for distributed rate limiting?

Question 53

What happens if your rate limit store (e.g., Redis) becomes unavailable?

Question 54

What does "fail open" mean in rate limiting?

Question 55

Which HTTP status code should you return when rate limiting rejects a request?

Question 56

What headers should you include in rate limit responses?

Question 57

What is "adaptive rate limiting"?

Question 58

In a multi-tier rate limiting system, where should the first layer typically be?

Question 59

What is the Leaky Bucket algorithm?

Part 4: Hot Keys and Skew

Questions 61-80

Question 60

How does Leaky Bucket differ from Token Bucket?

Question 61

What is a "hot key"?

Question 62

What distribution law describes traffic patterns in most systems?

Question 63

According to Zipf's Law, approximately what percentage of traffic do the top 1% of keys receive?

Question 64

What is "partition skew"?

Question 65

Which scenario represents a predictable hot key?

Question 66

What is the most common first-line defense against hot keys?

Question 67

In the "local cache" pattern for hot keys, where is the cache located?

Question 68

What is the risk of using local caches for hot key mitigation?

Question 69

What is "key splitting" or "key cloning" as a hot key mitigation?

Question 70

With key splitting, if you clone a hot key 10 times, what is the effect on each partition?

Question 71

What is the main downside of key splitting?

Question 72

What is "probabilistic early expiration" for cache entries?

Question 73

How can you detect hot keys in real-time?

Question 74

What is the "thundering herd" problem with hot keys?

Question 75

What is "request coalescing" for hot keys?

Question 76

In the context of hot keys, what is "write amplification"?

Question 77

For a URL shortener, if bit.ly/popular gets 1M requests/sec, what architecture is most appropriate?

Question 78

What is "shard consolidation" for hot partition mitigation?

Question 79

What monitoring metric is MOST important for detecting hot keys?

Part 5: Session Store Design

Questions 81-100

Question 80

Why can't you simply "scale the database" to handle hot keys?

Question 81

What is a session in web applications?

Question 82

What is typically stored in a session?

Question 83

What is the typical latency requirement for session store reads?

Question 84

For 10 million concurrent users with 2KB sessions each, what is the total storage requirement?

Question 85

What is "sticky sessions" or "session affinity"?

Question 86

What is the main disadvantage of sticky sessions?

Question 87

In a distributed session store architecture, where are sessions stored?

Question 88

What consistency level is required for session stores?

Question 89

What is the typical approach to session expiration?

Question 90

What happens if a session store becomes unavailable?

Question 91

What is the typical read-to-write ratio for session stores?

Question 92

Why is Redis commonly chosen for session stores?

Question 93

What is "session stealing" or "session hijacking"?

Question 94

How can you protect against session hijacking?

Question 95

What is "session fixation"?

Question 96

What should you do after a user logs in to prevent session fixation?

Question 97

In a Redis cluster for sessions, what partitioning strategy is commonly used?

Question 98

What is the advantage of client-side sessions (JWT)?

Question 99

What is the main disadvantage of JWT-based sessions?

Question 100

For a globally distributed application, what session store strategy is best?

Question 101

What is "session stickyness" different from "session affinity"?

Question 102

When using Redis for sessions, what persistence strategy is recommended?

Question 103

What is a typical session ID format?

Question 104

How should session IDs be transmitted to clients?

Question 105

What is "session timeout" and what are typical values?

Question 106

What happens to sessions during a rolling deployment with distributed session store?

Question 107

For a chat application with 10M concurrent users and real-time presence, what additional session challenge exists?

Question 108

What is "session replication" in the context of Redis?

Question 109

In Redis Sentinel for session HA, what is the typical failover time?

Question 110

What metrics should you monitor for session store health?

Question 111

What is "session data compression"?

Question 112

When should you use a database (PostgreSQL/MySQL) as a session store?

Question 113

What is "lazy session loading"?

Question 114

For a multi-tenant SaaS with millions of tenants, how should sessions be partitioned?

Question 115

What is the "session stickiness coefficient" in load balancers?

Question 116

Why do real-time applications (WebSocket) especially benefit from sticky sessions?

Question 117

What is the trade-off of increasing session TTL from 30 minutes to 24 hours?

Question 118

In Redis cluster mode, how many master nodes are typically used for session storage at scale?

Question 119

What is "session flooding" attack?

Part 6: Integration and Advanced Concepts

Questions 121-150

Question 120

How can you mitigate session flooding attacks?

Question 121

When designing a URL shortener at scale, which combination is most critical?

Question 122

For an analytics system processing clickstream data, which partitioning strategy is best?

Question 123

What is the correct order of operations when handling a cache miss on a hot key with multiple concurrent requests?

Question 124

In a globally distributed system with multi-leader replication, what conflict resolution is most pragmatic?

Question 125

For a real-time leaderboard (gaming) with 100M players, what storage architecture is most appropriate?

Question 126

What is the relationship between consistent hashing and hot keys?

Question 127

When using both partitioning and replication, what is the typical architecture?

Question 128

For a social media feed with 1B users, what is the biggest challenge?

Question 129

In a rate-limited system with hot keys, what happens?

Question 130

What is "write skew" in distributed databases?

Question 131

For a payment system requiring ACID guarantees, what storage is most appropriate?

Question 132

What is "read-after-write consistency" and when is it needed?

Question 133

In a partitioned system with hot keys, what is the best mitigation strategy?

Question 134

What is "partition key" versus "clustering key" in systems like Cassandra?

Question 135

For a video streaming platform serving billions of views, what caching strategy is optimal?

Question 136

What is "gossip protocol" in distributed systems?

Question 137

When should you use synchronous replication?

Question 138

What is "quorum" in distributed consensus?

Question 139

For a ride-sharing app matching drivers and riders, what is the critical challenge?

Question 140

What is "cache stampede"?

Question 141

How does cache stampede relate to hot keys?

Question 142

What is "eventual consistency window"?

Question 143

For a stock trading platform, what consistency requirement is needed?

Question 144

What is "denormalization" and when is it useful in partitioned systems?

Question 145

What is "backpressure" in data streaming?

Question 146

In a microservices architecture, where should rate limiting be implemented?

Question 147

What is "sharding key" and how should you choose it?

Question 148

For a messaging system (WhatsApp, Telegram), what partitioning strategy for messages?

Question 149

What happens when you combine async replication, network partition, and failover?

Question 150

What is the most important takeaway from Week 1: Data at Scale?

💬 Public Discussion: Comments are visible to all users. Please be respectful and mindful of what you share.

Discussion (0)

Sort by:

Sign in to join the discussion