MCQ Assessment

Week 2 Failure-First Design - MCQ Assessment

150 QuestionsAnswered: 0 / 150

Part 1: Timeout Hell

Questions 1-25

Question 1

What is the primary danger of NOT setting timeouts on downstream service calls?

Question 2

Why is a slow service often worse than a down service?

Question 3

When measuring latency for timeout decisions, which metric is most appropriate?

Question 4

Why should you NOT use average latency to set timeouts?

Question 5

Your service calls 3 downstream services sequentially with P99 latencies of 100ms, 200ms, and 300ms. What should your total timeout budget be?

Question 6

What is "timeout propagation" in a distributed system?

Question 7

What happens if a parent service has a 1-second timeout but calls 3 children each with 1-second timeouts?

Question 8

What is the "timeout budget" pattern?

Question 9

In a long request chain (A → B → C → D), how should timeout budgets be allocated?

Question 10

What is connection timeout vs request timeout?

Question 11

Why should connection timeouts typically be shorter than request timeouts?

Question 12

What is the recommended approach when a request times out?

Question 13

Your database query has P99 latency of 50ms but P99.9 of 5 seconds (due to GC pauses). What timeout should you set?

Question 14

What is "defensive timeout"?

Question 15

In a payment processing system, which operation should have the LONGEST timeout?

Question 16

What is the "thundering herd" problem related to timeouts?

Question 17

How can you mitigate thundering herd after timeouts?

Question 18

What is exponential backoff with jitter?

Question 19

Your service timeout is 3 seconds but downstream times out after 5 seconds. What happens?

Question 20

What is the risk of setting timeouts too aggressively (too short)?

Question 21

How should timeout values be determined?

Question 22

What is "timeout testing"?

Question 23

In a microservices architecture with deep call chains, timeout budgets should:

Question 24

What is the relationship between timeouts and circuit breakers?

Part 2: Idempotency in Practice

Questions 26-50

Question 25

When calling a slow batch processing API, should you use standard timeouts?

Question 26

What does it mean for an operation to be idempotent?

Question 27

Which HTTP method is naturally idempotent?

Question 28

Why is POST typically NOT idempotent?

Question 29

What is an idempotency key?

Question 30

Where should idempotency keys be generated?

Question 31

What should be included when generating an idempotency key?

Question 32

How long should idempotency keys be stored?

Question 33

User submits payment with key "pay_abc123". Server starts processing but crashes. User retries with same key. What should happen?

Question 34

What is the "two-phase idempotency" pattern?

Question 35

What HTTP status code should you return when request is rejected due to duplicate idempotency key with different payload?

Question 36

What if two requests arrive with same idempotency key but different amounts?

Question 37

Should idempotency keys be reusable after the original operation completes?

Question 38

What is the "idempotency window"?

Question 39

In a distributed system, where should idempotency state be stored?

Question 40

What is "request fingerprinting" for idempotency?

Question 41

Is request fingerprinting (auto-generating keys from request body) safe?

Question 42

User clicks "Pay" button twice accidentally. How does idempotency prevent double charge?

Question 43

What should be stored along with the idempotency key?

Question 44

Can you make a DELETE operation idempotent?

Question 45

What is the challenge with idempotent operations that have side effects?

Question 46

How do you make email notifications idempotent?

Question 47

Payment request times out. Server doesn't know if bank processed it. What should client do?

Question 48

What is the relationship between idempotency and retries?

Question 49

Should GET requests use idempotency keys?

Part 3: Circuit Breakers

Questions 51-75

Question 50

What happens if idempotency key storage (Redis) fails?

Question 51

What is the primary purpose of a circuit breaker?

Question 52

What are the three states of a circuit breaker?

Question 53

In the CLOSED state, what does the circuit breaker do?

Question 54

What triggers transition from CLOSED to OPEN state?

Question 55

In the OPEN state, what happens to incoming requests?

Question 56

Why is failing fast (OPEN state) better than waiting for timeouts?

Question 57

What is the purpose of the HALF-OPEN state?

Question 58

In HALF-OPEN state, if test requests succeed, what happens?

Question 59

In HALF-OPEN state, if test requests fail, what happens?

Question 60

How long should a circuit breaker stay in OPEN state before trying HALF-OPEN?

Question 61

What should count as a "failure" for circuit breaker thresholds?

Question 62

Why shouldn't 4xx errors trigger circuit breakers?

Question 63

What is the recommended failure threshold before opening circuit?

Question 64

What is the purpose of the sliding window in circuit breakers?

Question 65

Should circuit breakers be per-instance or shared across all instances?

Question 66

What should happen when circuit opens during a user request?

Question 67

What is a good fallback strategy when circuit is OPEN?

Question 68

Your payment service circuit opens. What should you show users?

Question 69

How do circuit breakers relate to rate limiting?

Question 70

What is "circuit breaker state sharing"?

Question 71

Should you have separate circuit breakers for each downstream dependency?

Question 72

What metrics should you monitor for circuit breakers?

Question 73

Can circuit breakers prevent all cascade failures?

Question 74

What happens if you set failure threshold too low (e.g., 1%)?

Part 4: Webhook Delivery System

Questions 76-100

Question 75

What is the "retry storm" problem after circuit closes?

Question 76

What is a webhook?

Question 77

What is the difference between webhooks (push) and polling (pull)?

Question 78

What are the three delivery guarantees in distributed systems?

Question 79

Which delivery guarantee is most common for webhooks?

Question 80

Why is exactly-once delivery impossible in distributed systems?

Question 81

How do you achieve "effectively exactly-once" delivery?

Question 82

What HTTP status codes indicate successful webhook delivery?

Question 83

Should you retry on 4xx responses?

Question 84

Should you retry on 5xx responses?

Question 85

What is exponential backoff in webhook retries?

Question 86

Why add jitter to retry delays?

Question 87

What is a "dead letter queue" (DLQ) in webhook systems?

Question 88

After how many retries should webhooks move to DLQ?

Question 89

What is webhook signature verification?

Question 90

How do you generate webhook signatures?

Question 91

Should webhook delivery block the user request?

Question 92

What is webhook ordering?

Question 93

Can you guarantee webhook ordering?

Question 94

What information should webhook payloads include?

Question 95

Should receivers trust webhook data?

Question 96

What is webhook "fan-out"?

Question 97

How should webhook systems handle rate limiting?

Question 98

What is webhook retry strategy "exponential backoff with cap"?

Question 99

Should webhook delivery timeouts be longer than normal API timeouts?

Part 5: Distributed Cron & Job Scheduling

Questions 101-125

Question 100

What monitoring is critical for webhook systems?

Question 101

What is the main challenge of distributed cron?

Question 102

Why can't you simply run cron on multiple servers?

Question 103

What is leader election in distributed systems?

Question 104

What is the purpose of leader election in distributed cron?

Question 105

What happens if the leader crashes?

Question 106

What is a "fencing token"?

Question 107

How do fencing tokens prevent duplicate job execution?

Question 108

What is "split-brain" in distributed cron?

Question 109

How do fencing tokens prevent split-brain problems?

Question 110

What is ZooKeeper commonly used for in distributed systems?

Question 111

What is the CAP theorem trade-off for leader election?

Question 112

Should you prefer CP or AP for leader election?

Question 113

What is a "lease" in distributed locking?

Question 114

What is the typical lease duration for leader election?

Question 115

What happens if leader's lease expires?

Question 116

Should job execution time be shorter than lease duration?

Question 117

What is "job idempotency" in distributed cron?

Question 118

How do you make a "send report email" job idempotent?

Question 119

What is "job locking" in distributed cron?

Question 120

What should happen if a job crashes mid-execution?

Question 121

What is "heartbeat" in leader election?

Question 122

What is the advantage of using existing systems (Kubernetes CronJob, AWS EventBridge) vs building your own?

Question 123

What monitoring is essential for distributed cron?

Question 124

Should cron jobs be long-running or short-lived?

Part 6: Integration and Advanced Concepts

Questions 126-150

Question 125

What is "job sharding" in distributed cron?

Question 126

In a payment system with 3-second timeout, bank API timeouts, user retries. What patterns prevent double charge?

Question 127

Your downstream service has 20% error rate. What should happen?

Question 128

Webhook receiver is down for 2 hours. What ensures delivery?

Question 129

Two servers think they're leader and both execute monthly billing job. How to prevent?

Question 130

What is the correct timeout chain for: API Gateway → Service A → Service B → Service C?

Question 131

User clicks "Delete Account" twice. How do you ensure account deleted exactly once?

Question 132

Payment processed successfully but response lost. User retries. What happens?

Question 133

What should webhook payload include for idempotent processing?

Question 134

Your service depends on 5 downstream services. One is flaky. What pattern helps?

Question 135

Job runs every hour. At 10:00, leader crashes after starting job. What prevents duplicate at 10:01?

Question 136

Bank API takes 10 seconds on Black Friday (usually 200ms). What patterns help?

Question 137

Webhook receiver returns 500. Should you retry?

Question 138

What is the purpose of webhook signatures?

Question 139

Scheduled job must run at exactly 9:00 AM. Leader fails at 8:59 AM. When does job run?

Question 140

Your API has timeout of 30s but calls 10 services each needing 5s. What's wrong?

Question 141

How do you test circuit breaker logic?

Question 142

Idempotency key stored in Redis which fails. What should system do?

Question 143

Job scheduler needs exactly-once execution but network partitions occur. What's possible?

Question 144

Webhook delivers events out of order. How should receiver handle?

Question 145

What's the relationship between timeouts and retries?

Question 146

Circuit breaker is OPEN but business needs urgent payment processed. What should happen?

Question 147

How do you prevent webhook retry storms when receiver recovers?

Question 148

Distributed cron job creates invoices. Job runs twice due to split-brain. Impact?

Question 149

What's the best way to set timeout values?

Question 150

What is the most important lesson from Week 2?

💬 Public Discussion: Comments are visible to all users. Please be respectful and mindful of what you share.

Discussion (0)

Sort by:

Back to Course Overview