MCQ Assessment
Week 2 Failure-First Design - MCQ Assessment
150 QuestionsAnswered: 0 / 150
Part 1: Timeout Hell
Questions 1-25
Question 1
What is the primary danger of NOT setting timeouts on downstream service calls?
Question 2
Why is a slow service often worse than a down service?
Question 3
When measuring latency for timeout decisions, which metric is most appropriate?
Question 4
Why should you NOT use average latency to set timeouts?
Question 5
Your service calls 3 downstream services sequentially with P99 latencies of 100ms, 200ms, and 300ms. What should your total timeout budget be?
Question 6
What is "timeout propagation" in a distributed system?
Question 7
What happens if a parent service has a 1-second timeout but calls 3 children each with 1-second timeouts?
Question 8
What is the "timeout budget" pattern?
Question 9
In a long request chain (A → B → C → D), how should timeout budgets be allocated?
Question 10
What is connection timeout vs request timeout?
Question 11
Why should connection timeouts typically be shorter than request timeouts?
Question 12
What is the recommended approach when a request times out?
Question 13
Your database query has P99 latency of 50ms but P99.9 of 5 seconds (due to GC pauses). What timeout should you set?
Question 14
What is "defensive timeout"?
Question 15
In a payment processing system, which operation should have the LONGEST timeout?
Question 16
What is the "thundering herd" problem related to timeouts?
Question 17
How can you mitigate thundering herd after timeouts?
Question 18
What is exponential backoff with jitter?
Question 19
Your service timeout is 3 seconds but downstream times out after 5 seconds. What happens?
Question 20
What is the risk of setting timeouts too aggressively (too short)?
Question 21
How should timeout values be determined?
Question 22
What is "timeout testing"?
Question 23
In a microservices architecture with deep call chains, timeout budgets should:
Question 24
What is the relationship between timeouts and circuit breakers?
Part 2: Idempotency in Practice
Questions 26-50
Question 25
When calling a slow batch processing API, should you use standard timeouts?
Question 26
What does it mean for an operation to be idempotent?
Question 27
Which HTTP method is naturally idempotent?
Question 28
Why is POST typically NOT idempotent?
Question 29
What is an idempotency key?
Question 30
Where should idempotency keys be generated?
Question 31
What should be included when generating an idempotency key?
Question 32
How long should idempotency keys be stored?
Question 33
User submits payment with key "pay_abc123". Server starts processing but crashes. User retries with same key. What should happen?
Question 34
What is the "two-phase idempotency" pattern?
Question 35
What HTTP status code should you return when request is rejected due to duplicate idempotency key with different payload?
Question 36
What if two requests arrive with same idempotency key but different amounts?
Question 37
Should idempotency keys be reusable after the original operation completes?
Question 38
What is the "idempotency window"?
Question 39
In a distributed system, where should idempotency state be stored?
Question 40
What is "request fingerprinting" for idempotency?
Question 41
Is request fingerprinting (auto-generating keys from request body) safe?
Question 42
User clicks "Pay" button twice accidentally. How does idempotency prevent double charge?
Question 43
What should be stored along with the idempotency key?
Question 44
Can you make a DELETE operation idempotent?
Question 45
What is the challenge with idempotent operations that have side effects?
Question 46
How do you make email notifications idempotent?
Question 47
Payment request times out. Server doesn't know if bank processed it. What should client do?
Question 48
What is the relationship between idempotency and retries?
Question 49
Should GET requests use idempotency keys?
Part 3: Circuit Breakers
Questions 51-75
Question 50
What happens if idempotency key storage (Redis) fails?
Question 51
What is the primary purpose of a circuit breaker?
Question 52
What are the three states of a circuit breaker?
Question 53
In the CLOSED state, what does the circuit breaker do?
Question 54
What triggers transition from CLOSED to OPEN state?
Question 55
In the OPEN state, what happens to incoming requests?
Question 56
Why is failing fast (OPEN state) better than waiting for timeouts?
Question 57
What is the purpose of the HALF-OPEN state?
Question 58
In HALF-OPEN state, if test requests succeed, what happens?
Question 59
In HALF-OPEN state, if test requests fail, what happens?
Question 60
How long should a circuit breaker stay in OPEN state before trying HALF-OPEN?
Question 61
What should count as a "failure" for circuit breaker thresholds?
Question 62
Why shouldn't 4xx errors trigger circuit breakers?
Question 63
What is the recommended failure threshold before opening circuit?
Question 64
What is the purpose of the sliding window in circuit breakers?
Question 65
Should circuit breakers be per-instance or shared across all instances?
Question 66
What should happen when circuit opens during a user request?
Question 67
What is a good fallback strategy when circuit is OPEN?
Question 68
Your payment service circuit opens. What should you show users?
Question 69
How do circuit breakers relate to rate limiting?
Question 70
What is "circuit breaker state sharing"?
Question 71
Should you have separate circuit breakers for each downstream dependency?
Question 72
What metrics should you monitor for circuit breakers?
Question 73
Can circuit breakers prevent all cascade failures?
Question 74
What happens if you set failure threshold too low (e.g., 1%)?
Part 4: Webhook Delivery System
Questions 76-100
Question 75
What is the "retry storm" problem after circuit closes?
Question 76
What is a webhook?
Question 77
What is the difference between webhooks (push) and polling (pull)?
Question 78
What are the three delivery guarantees in distributed systems?
Question 79
Which delivery guarantee is most common for webhooks?
Question 80
Why is exactly-once delivery impossible in distributed systems?
Question 81
How do you achieve "effectively exactly-once" delivery?
Question 82
What HTTP status codes indicate successful webhook delivery?
Question 83
Should you retry on 4xx responses?
Question 84
Should you retry on 5xx responses?
Question 85
What is exponential backoff in webhook retries?
Question 86
Why add jitter to retry delays?
Question 87
What is a "dead letter queue" (DLQ) in webhook systems?
Question 88
After how many retries should webhooks move to DLQ?
Question 89
What is webhook signature verification?
Question 90
How do you generate webhook signatures?
Question 91
Should webhook delivery block the user request?
Question 92
What is webhook ordering?
Question 93
Can you guarantee webhook ordering?
Question 94
What information should webhook payloads include?
Question 95
Should receivers trust webhook data?
Question 96
What is webhook "fan-out"?
Question 97
How should webhook systems handle rate limiting?
Question 98
What is webhook retry strategy "exponential backoff with cap"?
Question 99
Should webhook delivery timeouts be longer than normal API timeouts?
Part 5: Distributed Cron & Job Scheduling
Questions 101-125
Question 100
What monitoring is critical for webhook systems?
Question 101
What is the main challenge of distributed cron?
Question 102
Why can't you simply run cron on multiple servers?
Question 103
What is leader election in distributed systems?
Question 104
What is the purpose of leader election in distributed cron?
Question 105
What happens if the leader crashes?
Question 106
What is a "fencing token"?
Question 107
How do fencing tokens prevent duplicate job execution?
Question 108
What is "split-brain" in distributed cron?
Question 109
How do fencing tokens prevent split-brain problems?
Question 110
What is ZooKeeper commonly used for in distributed systems?
Question 111
What is the CAP theorem trade-off for leader election?
Question 112
Should you prefer CP or AP for leader election?
Question 113
What is a "lease" in distributed locking?
Question 114
What is the typical lease duration for leader election?
Question 115
What happens if leader's lease expires?
Question 116
Should job execution time be shorter than lease duration?
Question 117
What is "job idempotency" in distributed cron?
Question 118
How do you make a "send report email" job idempotent?
Question 119
What is "job locking" in distributed cron?
Question 120
What should happen if a job crashes mid-execution?
Question 121
What is "heartbeat" in leader election?
Question 122
What is the advantage of using existing systems (Kubernetes CronJob, AWS EventBridge) vs building your own?
Question 123
What monitoring is essential for distributed cron?
Question 124
Should cron jobs be long-running or short-lived?
Part 6: Integration and Advanced Concepts
Questions 126-150
Question 125
What is "job sharding" in distributed cron?
Question 126
In a payment system with 3-second timeout, bank API timeouts, user retries. What patterns prevent double charge?
Question 127
Your downstream service has 20% error rate. What should happen?
Question 128
Webhook receiver is down for 2 hours. What ensures delivery?
Question 129
Two servers think they're leader and both execute monthly billing job. How to prevent?
Question 130
What is the correct timeout chain for: API Gateway → Service A → Service B → Service C?
Question 131
User clicks "Delete Account" twice. How do you ensure account deleted exactly once?
Question 132
Payment processed successfully but response lost. User retries. What happens?
Question 133
What should webhook payload include for idempotent processing?
Question 134
Your service depends on 5 downstream services. One is flaky. What pattern helps?
Question 135
Job runs every hour. At 10:00, leader crashes after starting job. What prevents duplicate at 10:01?
Question 136
Bank API takes 10 seconds on Black Friday (usually 200ms). What patterns help?
Question 137
Webhook receiver returns 500. Should you retry?
Question 138
What is the purpose of webhook signatures?
Question 139
Scheduled job must run at exactly 9:00 AM. Leader fails at 8:59 AM. When does job run?
Question 140
Your API has timeout of 30s but calls 10 services each needing 5s. What's wrong?
Question 141
How do you test circuit breaker logic?
Question 142
Idempotency key stored in Redis which fails. What should system do?
Question 143
Job scheduler needs exactly-once execution but network partitions occur. What's possible?
Question 144
Webhook delivers events out of order. How should receiver handle?
Question 145
What's the relationship between timeouts and retries?
Question 146
Circuit breaker is OPEN but business needs urgent payment processed. What should happen?
Question 147
How do you prevent webhook retry storms when receiver recovers?
Question 148
Distributed cron job creates invoices. Job runs twice due to split-brain. Impact?
Question 149
What's the best way to set timeout values?
Question 150
What is the most important lesson from Week 2?
💬 Public Discussion: Comments are visible to all users. Please be respectful and mindful of what you share.
Discussion (0)
Sort by:
Sign in to join the discussion