Skip to content
Low Level Design Mastery Logo
LowLevelDesign Mastery

CAP Theorem Deep Dive

The fundamental trade-off of distributed systems

The CAP Theorem: The Unavoidable Trade-off

Section titled “The CAP Theorem: The Unavoidable Trade-off”

The CAP theorem is one of the most fundamental concepts in distributed systems. It states a simple but profound truth:

In a distributed system, you can only guarantee two out of three:

  • Consistency
  • Availability
  • Partition tolerance
Diagram

Before diving into the trade-offs, let’s understand what each property means:

Consistency means all nodes see the same data at the same time. After a write completes, all reads (from any node) return the same value.

Diagram

Key Point: Consistency requires coordination between nodes. All nodes must agree before a write is considered complete.


Availability means the system responds to every request, even if some nodes are down or unreachable.

Diagram

Key Point: Availability means no single point of failure. Even if some nodes fail, others can handle requests.


Partition tolerance means the system continues operating even when network communication between nodes fails.

Diagram

Key Point: Network partitions are inevitable in distributed systems. The question is: how does your system handle them?


The Impossible Choice: What Happens During a Partition?

Section titled “The Impossible Choice: What Happens During a Partition?”

When a network partition occurs, you face an impossible choice:

Diagram

This is the heart of CAP: During a partition, you must choose between Consistency and Availability. You cannot have both.


CA: Consistency + Availability (Not Distributed)

Section titled “CA: Consistency + Availability (Not Distributed)”

CA systems guarantee consistency and availability, but only work in non-distributed environments (single node).

Diagram

Examples: Single-node databases, in-memory caches on one server.

Why CA Doesn’t Work Distributed: As soon as you have multiple nodes, network partitions become possible. When a partition occurs, you must choose CP or AP.


CP systems prioritize consistency and partition tolerance. During partitions, they reject requests rather than serve potentially inconsistent data.

Diagram

Characteristics:

  • ✅ Strong consistency guaranteed
  • ✅ Handles partitions gracefully
  • Sacrifices availability during partitions
  • ❌ Writes may fail if quorum not available

Examples: Traditional relational databases (PostgreSQL, MySQL), MongoDB (with strong consistency), HBase.

Use Cases: Financial systems, inventory management, critical state where accuracy > availability.


AP systems prioritize availability and partition tolerance. During partitions, they continue serving requests even if data might be stale or inconsistent.

Diagram

Characteristics:

  • ✅ Always available (responds to requests)
  • ✅ Handles partitions gracefully
  • Sacrifices consistency during partitions
  • ⚠️ May serve stale data temporarily

Examples: DNS, DynamoDB, Cassandra, CouchDB, most NoSQL databases.

Use Cases: Social media, content delivery, systems where availability > perfect consistency.


Diagram

Why CP? Incorrect balances are worse than temporary unavailability. Better to reject the transaction than create inconsistency.


Diagram

Why AP? Users prefer seeing slightly outdated content over seeing nothing. Availability trumps perfect consistency.


Diagram

Example: Many systems use AP for reads (fast, available) and CP for writes (consistent, accurate).


How CAP choices affect your class design:

CP systems need coordination to maintain consistency:

CP System: Strong Synchronization
from threading import Lock
from typing import Optional
class CPDataStore:
def __init__(self):
self._data = {}
self._lock = Lock() # Ensures consistency
self._quorum_size = 2 # Need majority for writes
def write(self, key, value, replicas: list) -> bool:
with self._lock:
# Check if we have quorum
if len(replicas) < self._quorum_size:
return False # Reject if can't guarantee consistency
# Write to all replicas synchronously
for replica in replicas:
replica.set(key, value)
self._data[key] = value
return True

AP systems need conflict resolution for eventual consistency:

AP System: Conflict Resolution
from typing import Optional, List
from datetime import datetime
class APDataStore:
def __init__(self):
self._data = {}
self._versions = {} # Track versions for conflict detection
def write(self, key, value, version: int) -> bool:
# Always accept writes (availability)
current_version = self._versions.get(key, 0)
if version > current_version:
self._data[key] = value
self._versions[key] = version
return True
elif version == current_version:
# Conflict - use last-write-wins or merge
self._data[key] = value # Last write wins
return True
else:
# Stale version - still accept (availability)
return True
def read(self, key) -> Optional[object]:
# Always return something (availability)
return self._data.get(key)

Misconception 1: “I can have all three if I design it right”

Section titled “Misconception 1: “I can have all three if I design it right””
Diagram

Truth: CAP is a mathematical theorem, not a design challenge. You cannot have all three during a partition.


Misconception 2: “I choose CAP once and stick with it”

Section titled “Misconception 2: “I choose CAP once and stick with it””

Truth: Most systems are tunable. You can:

  • Use CP for critical operations, AP for others
  • Adjust consistency levels per operation
  • Switch modes based on conditions

Misconception 3: “Partitions are rare, so CAP doesn’t matter”

Section titled “Misconception 3: “Partitions are rare, so CAP doesn’t matter””

Truth: Partitions happen more often than you think:

  • Network hiccups
  • Data center issues
  • Load balancer failures
  • DNS problems

If you’re distributed, partitions will happen. Design for them.


While CAP is foundational, it has been criticized and refined over the years:

Criticism 1: “Partitions are Rare”

  • Reality: Network partitions happen more often than you think
  • Production data: AWS EC2 instances experience ~0.1% network partition rate
  • Impact: Even 0.1% means partitions occur daily in large systems
  • Conclusion: You must design for partitions, they’re not theoretical

Criticism 2: “CAP is Too Simplistic”

  • Reality: CAP doesn’t account for latency (addressed by PACELC)
  • Reality: Consistency isn’t binary (strong vs eventual spectrum)
  • Reality: Availability isn’t binary (partial availability exists)
  • Conclusion: CAP is a starting point, not the complete picture

Criticism 3: “You Can Have All Three”

  • Claim: Some argue you can optimize to have all three
  • Reality: During an actual partition, you must choose
  • Nuance: You can optimize for normal operation, but partitions force a choice
  • Conclusion: CAP applies during partitions, not during normal operation

Production Considerations: Real-World CAP Trade-offs

Section titled “Production Considerations: Real-World CAP Trade-offs”

Challenges:

  • Quorum Requirements: Need majority of nodes available
  • Example: 3-node cluster needs 2 nodes (66% availability requirement)
  • Impact: One node failure = system unavailable for writes
  • Mitigation: Use 5-node cluster (need 3, tolerate 2 failures = 60% failure tolerance)

Performance Implications:

  • Write Latency: CP writes are slower (must wait for quorum)
  • Benchmark: CP write latency ~50-200ms vs AP write latency ~5-20ms
  • Throughput: CP systems typically handle 1K-10K writes/sec vs AP systems 10K-100K writes/sec
  • Trade-off: Consistency costs 10x in latency/throughput

Real-World Example: MongoDB

  • Default: CP mode (strong consistency)
  • Write Concern: {w: "majority"} ensures CP behavior
  • Performance: ~5K writes/sec per shard
  • Alternative: {w: 1} provides AP behavior, ~50K writes/sec

Challenges:

  • Stale Reads: Users may see outdated data
  • Example: Social media like count might be 1-2 seconds behind
  • Impact: User confusion, potential race conditions
  • Mitigation: Read-after-write consistency, version vectors

Performance Implications:

  • Write Latency: AP writes are fast (fire-and-forget)
  • Benchmark: AP write latency ~5-20ms vs CP write latency ~50-200ms
  • Throughput: AP systems handle 10K-100K writes/sec vs CP systems 1K-10K writes/sec
  • Trade-off: Availability gains 10x in latency/throughput

Real-World Example: DynamoDB

  • Default: AP mode (eventual consistency)
  • Consistent Reads: Optional ConsistentRead=true provides CP reads
  • Performance: Eventually consistent reads ~1ms, consistent reads ~5ms
  • Cost: Consistent reads cost 2x more (higher resource usage)

Problem: Not all nodes are partitioned—some can communicate, others can’t.

Diagram

CP System Behavior:

  • Group 1 (3 nodes) has majority → can accept writes
  • Group 2 (2 nodes) doesn’t have majority → rejects writes
  • Result: Partial availability (some nodes work, others don’t)

AP System Behavior:

  • Both groups accept writes independently
  • Result: Full availability, but data diverges
  • Challenge: Must merge divergent data when partition heals

Problem: Network repeatedly connects and disconnects (flapping).

Impact:

  • CP Systems: Constantly switching between available/unavailable
  • AP Systems: Constantly merging divergent data
  • Performance: Both suffer from constant state changes

Mitigation:

  • Hysteresis: Don’t switch immediately, wait for stable state
  • Example: Require 3 consecutive failures before marking unavailable
  • Benefit: Reduces flapping, improves stability

Problem: Nodes have different system clocks, affecting CAP choices.

Example:

  • Node A thinks it’s 10:00:00
  • Node B thinks it’s 10:00:05 (5 seconds ahead)
  • Impact: Last-write-wins decisions may be wrong

CP System Impact:

  • Uses logical clocks (vector clocks) instead of wall clocks
  • Solution: Vector clocks preserve causality regardless of clock skew

AP System Impact:

  • May serve stale data if clocks are skewed
  • Solution: Use logical timestamps or synchronized clocks (NTP)

Practice 1: Gradual Consistency Degradation

Section titled “Practice 1: Gradual Consistency Degradation”

Pattern: Start with strong consistency, degrade gracefully.

Diagram

Implementation: Use consistency levels (strong → quorum → eventual)


Pattern: For AP systems, ensure users see their own writes immediately.

Solution:

  • Route user’s reads to primary for T seconds after their write
  • T: Typically 1-5 seconds (replication lag window)
  • Benefit: Users see their own changes, others see eventual consistency

Code Pattern:

class APDataStore:
def write(self, user_id, key, value):
# Write to primary
primary.write(key, value)
# Track user's recent writes
self._recent_writes[user_id] = time.now() + 5 # 5 second window
def read(self, user_id, key):
# If user wrote recently, read from primary
if self._recent_writes.get(user_id, 0) > time.now():
return primary.read(key)
# Otherwise, read from replica
return replica.read(key)

Pattern: Allow applications to choose consistency level per operation.

Levels:

  • Strong: Read from primary, wait for all replicas
  • Quorum: Read from majority of replicas
  • Eventual: Read from any replica

Example: DynamoDB Consistency Levels

  • ConsistentRead=true: Strong consistency (CP)
  • ConsistentRead=false: Eventual consistency (AP)
  • Cost: Strong reads cost 2x more (more resources)

System TypeWrite LatencyWrite ThroughputRead LatencyRead Throughput
CP (PostgreSQL)50-200ms1K-10K ops/sec5-20ms10K-50K ops/sec
AP (Cassandra)5-20ms10K-100K ops/sec1-10ms50K-200K ops/sec
Hybrid (MongoDB)10-100ms5K-50K ops/sec2-15ms20K-100K ops/sec

Key Insights:

  • CP systems: 10x slower writes, but consistent
  • AP systems: 10x faster writes, but eventual consistency
  • Hybrid systems: Middle ground, tunable per operation


CAP theorem explains the fundamental trade-off, but there’s more to the story. Let’s explore how latency factors into these decisions:

Next up: PACELC Theorem — Beyond CAP: understanding latency considerations in consistency vs availability trade-offs.