Skip to content
Low Level Design Mastery Logo
LowLevelDesign Mastery

NoSQL Databases

Beyond relational: flexible data models for modern applications

NoSQL (Not Only SQL) refers to non-relational databases that use flexible data models. They’re designed for scalability, performance, and handling unstructured/semi-structured data.

Diagram
Diagram

Document databases store data as documents (JSON, BSON, XML). Documents are self-contained and can have nested structures.

Diagram

Key Characteristics:

  • Flexible schema: Each document can have different fields
  • Nested data: Store related data together
  • No JOINs: Related data in same document
  • JSON-like: Easy to work with in applications

Examples: MongoDB, CouchDB, Amazon DocumentDB


User Document in MongoDB:

{
"_id": 123,
"name": "Alice",
"email": "[email protected]",
"address": {
"street": "123 Main St",
"city": "San Francisco",
"zip": "94102"
},
"orders": [
{
"order_id": 1,
"date": "2024-01-15",
"items": [
{"product": "Laptop", "price": 1000},
{"product": "Mouse", "price": 20}
],
"total": 1020
}
]
}

Benefits:

  • ✅ All user data in one document
  • ✅ No JOINs needed
  • ✅ Easy to read/write
  • ✅ Flexible (can add fields easily)

Key-value stores are the simplest NoSQL databases. They store data as key-value pairs.

Diagram

Key Characteristics:

  • Simple: Just key-value pairs
  • Fast: O(1) lookups by key
  • Limited queries: Can only query by key
  • Great for caching: Fast access patterns

Examples: Redis, DynamoDB, Memcached


Diagram

Common Use Cases:

  • Caching: Store frequently accessed data
  • Session storage: User sessions
  • Configuration: App settings
  • Feature flags: Toggle features

Column-family stores organize data by columns instead of rows. Data is stored in column families, optimized for reading specific columns.

Diagram

Key Characteristics:

  • Column-oriented: Data stored by columns
  • Wide tables: Can have many columns
  • Efficient reads: Read only needed columns
  • Time-series: Great for time-series data

Examples: Cassandra, HBase, Amazon Keyspaces


Time-Series Data in Cassandra:

Row KeyTimestampTemperatureHumidityPressure
sensor:12024-01-01 10:0025°C60%1013
sensor:12024-01-01 11:0026°C58%1014
sensor:12024-01-01 12:0027°C55%1015

Benefits:

  • ✅ Efficient to read all temperatures
  • ✅ Can add new columns easily
  • ✅ Optimized for time-series queries

Graph databases store data as nodes (entities) and edges (relationships). Optimized for relationship queries.

Diagram

Key Characteristics:

  • Nodes: Entities (users, products, etc.)
  • Edges: Relationships (friends, purchases, etc.)
  • Traversals: Follow relationships efficiently
  • Relationship queries: “Find friends of friends”

Examples: Neo4j, Amazon Neptune, ArangoDB


Social Network Graph:

Nodes:
- User(id: 1, name: "Alice")
- User(id: 2, name: "Bob")
- User(id: 3, name: "Charlie")
- Product(id: 10, name: "Laptop")
Edges:
- (Alice) -[FRIENDS]-> (Bob)
- (Bob) -[FRIENDS]-> (Charlie)
- (Alice) -[PURCHASED]-> (Laptop)
- (Bob) -[LIKES]-> (Laptop)

Query: “Find products liked by friends of Alice”

  • Start at Alice
  • Traverse FRIENDS edges → Bob
  • Traverse LIKES edges → Laptop
  • Result: Laptop

Diagram
AspectSQLNoSQL
SchemaFixed, rigidFlexible, dynamic
QueriesComplex JOINsSimple lookups
ScaleVerticalHorizontal
TransactionsACIDEventually consistent
Use CaseFinancial, ERPSocial media, IoT

How NoSQL databases affect your class design:

Document Database Model
from dataclasses import dataclass
from typing import List, Optional, Dict
from datetime import datetime
@dataclass
class Address:
street: str
city: str
zip_code: str
@dataclass
class OrderItem:
product: str
price: float
quantity: int
@dataclass
class Order:
order_id: int
date: datetime
items: List[OrderItem]
total: float
@dataclass
class User:
"""Document model - all data in one structure"""
_id: int
name: str
email: str
address: Address # Nested object
orders: List[Order] # Nested array
def to_document(self) -> Dict:
"""Convert to MongoDB document"""
return {
"_id": self._id,
"name": self.name,
"email": self.email,
"address": {
"street": self.address.street,
"city": self.address.city,
"zip": self.address.zip_code
},
"orders": [
{
"order_id": o.order_id,
"date": o.date.isoformat(),
"items": [
{"product": item.product, "price": item.price, "quantity": item.quantity}
for item in o.items
],
"total": o.total
}
for o in self.orders
]
}
Key-Value Store
class KeyValueStore:
def __init__(self, redis_client):
self.redis = redis_client
def get(self, key: str) -> Optional[str]:
"""Get value by key"""
return self.redis.get(key)
def set(self, key: str, value: str, ttl: Optional[int] = None):
"""Set key-value pair"""
if ttl:
self.redis.setex(key, ttl, value)
else:
self.redis.set(key, value)
def delete(self, key: str):
"""Delete key"""
self.redis.delete(key)
# Usage for caching
cache = KeyValueStore(redis_client)
cache.set("user:123", json.dumps({"name": "Alice"}), ttl=3600)
user_data = json.loads(cache.get("user:123"))

Deep Dive: Production Patterns and Advanced Considerations

Section titled “Deep Dive: Production Patterns and Advanced Considerations”

Document Databases: Schema Evolution in Production

Section titled “Document Databases: Schema Evolution in Production”

Reality: Document databases are schema-flexible, not schema-less.

Production Challenge: Schema changes still require migration planning.

Example: Adding Required Field

Before:

{
"_id": 123,
"name": "Alice",
"email": "[email protected]"
}

After (New Required Field):

{
"_id": 123,
"name": "Alice",
"email": "[email protected]",
"phone": "123-456-7890" // NEW REQUIRED FIELD
}

Migration Strategy:

class UserMigration:
def migrate_user(self, user_doc):
# Check if migration needed
if 'phone' not in user_doc:
# Backfill missing field
user_doc['phone'] = self.fetch_phone_from_legacy_system(user_doc['_id'])
self.collection.update_one(
{'_id': user_doc['_id']},
{'$set': {'phone': user_doc['phone']}}
)
return user_doc

Production Pattern:

  1. Add field as optional (backward compatible)
  2. Backfill existing documents (background job)
  3. Make field required in application logic
  4. Eventually enforce at database level

Problem: Documents have size limits.

Limits:

  • MongoDB: 16MB per document
  • CouchDB: No hard limit, but performance degrades >1MB
  • DynamoDB: 400KB per item

Production Impact:

  • Large documents: Slow to transfer, memory intensive
  • Sharding: Large documents harder to shard efficiently

Solution: Reference Pattern

Instead of:

{
"_id": 123,
"name": "Alice",
"orders": [
{ /* 1000 orders embedded */ }
]
}

Use References:

{
"_id": 123,
"name": "Alice",
"order_ids": [1, 2, 3, ...] // References
}

Benefit: Smaller documents, better sharding, faster queries


Challenge: Atomic increments across distributed systems.

Solution: Redis INCR

class DistributedCounter:
def __init__(self, redis_client):
self.redis = redis_client
def increment(self, key, amount=1):
# Atomic increment
return self.redis.incrby(key, amount)
def decrement(self, key, amount=1):
return self.redis.decrby(key, amount)
def get(self, key):
return int(self.redis.get(key) or 0)

Production Use Cases:

  • Page views: Track views across servers
  • Rate limiting: Count requests per user
  • Voting: Count votes in real-time

Challenge: Coordinate across distributed systems.

Solution: Redis SETNX with TTL

class DistributedLock:
def __init__(self, redis_client):
self.redis = redis_client
def acquire(self, lock_key, ttl_seconds=10):
# Try to acquire lock
acquired = self.redis.set(
lock_key,
"locked",
nx=True, # Only set if not exists
ex=ttl_seconds # Expire after TTL
)
return acquired is not None
def release(self, lock_key):
self.redis.delete(lock_key)
@contextmanager
def lock(self, lock_key, ttl_seconds=10):
if self.acquire(lock_key, ttl_seconds):
try:
yield
finally:
self.release(lock_key)
else:
raise LockAcquisitionError("Could not acquire lock")

Production Considerations:

  • TTL: Prevents deadlocks (lock expires)
  • Renewal: Extend TTL for long operations
  • Fencing tokens: Prevent stale locks

Challenge: Notify multiple services of events.

Solution: Redis Pub/Sub

class EventPublisher:
def __init__(self, redis_client):
self.redis = redis_client
def publish(self, channel, message):
self.redis.publish(channel, json.dumps(message))
class EventSubscriber:
def __init__(self, redis_client):
self.redis = redis_client
self.pubsub = redis_client.pubsub()
def subscribe(self, channel, handler):
self.pubsub.subscribe(channel)
for message in self.pubsub.listen():
if message['type'] == 'message':
data = json.loads(message['data'])
handler(data)

Production Use Cases:

  • Cache invalidation: Notify all servers to clear cache
  • Event distribution: Distribute events to multiple consumers
  • Real-time updates: Push updates to connected clients

Column-Family Stores: Production Considerations

Section titled “Column-Family Stores: Production Considerations”

Challenge: Wide rows (many columns) can become very large.

Example: Time-Series Data

Row Structure:

Row Key: sensor:1
Columns:
timestamp:2024-01-01-10:00 → temperature:25
timestamp:2024-01-01-10:01 → temperature:26
timestamp:2024-01-01-10:02 → temperature:27
... (millions of columns)

Problem: Row becomes too large, slow to read.

Solution: Row Partitioning

Partition by Time Window:

Row Key: sensor:1:2024-01-01
Columns: Only columns for that day
Row Key: sensor:1:2024-01-02
Columns: Only columns for next day

Benefit: Smaller rows, faster reads, better distribution


Challenge: Column-family stores accumulate many versions (tombstones, updates).

Solution: Compaction

Types:

  • Size-tiered compaction: Merge small files into larger ones
  • Leveled compaction: Organize into levels, merge within levels
  • Time-window compaction: Compact by time windows

Production Impact:

  • Write amplification: Compaction rewrites data (2-10x)
  • Disk I/O: High during compaction
  • Performance: Compaction can slow down reads/writes

Best Practice: Schedule compaction during low-traffic periods


Pattern 1: Relationship Traversal Optimization

Section titled “Pattern 1: Relationship Traversal Optimization”

Challenge: Deep traversals can be slow.

Example: “Friends of Friends” Query

Naive Approach:

MATCH (user:User {id: 123})-[:FRIENDS]->(friend)-[:FRIENDS]->(fof)
RETURN fof

Problem: May traverse millions of relationships.

Optimized Approach:

MATCH (user:User {id: 123})-[:FRIENDS*2..2]->(fof)
WHERE fof.id <> 123 // Exclude self
RETURN DISTINCT fof
LIMIT 100 // Limit results

Production Techniques:

  • Limit depth: Don’t traverse too deep
  • Limit results: Use LIMIT clause
  • Index relationships: Index on relationship properties
  • Caching: Cache common traversals

Challenge: Large graphs don’t fit on single machine.

Solution: Graph Partitioning

Strategies:

  • Vertex-cut: Split vertices across machines
  • Edge-cut: Split edges across machines
  • Hybrid: Combination of both

Production Example: Neo4j Fabric

  • Sharding: Distributes graph across multiple databases
  • Query routing: Routes queries to appropriate shards
  • Cross-shard queries: Merges results from multiple shards

Trade-off: Cross-shard queries are slower (network overhead)


NoSQL Performance Benchmarks: Real-World Numbers

Section titled “NoSQL Performance Benchmarks: Real-World Numbers”
Database TypeRead LatencyWrite LatencyThroughputUse Case
Document (MongoDB)1-5ms5-20ms10K-50K ops/secGeneral purpose
Key-Value (Redis)0.1-1ms0.1-1ms100K-1M ops/secCaching, sessions
Column-Family (Cassandra)1-10ms5-50ms50K-200K ops/secTime-series, wide tables
Graph (Neo4j)5-50ms10-100ms1K-10K ops/secRelationship queries

Key Insights:

  • Key-Value: Fastest (in-memory)
  • Document: Good balance (flexible + performant)
  • Column-Family: Best for writes (LSM trees)
  • Graph: Optimized for traversals (not raw speed)

Problem: Trying to do complex JOINs in document databases.

Bad:

// Trying to JOIN in MongoDB (doesn't work well)
db.users.aggregate([
{ $lookup: { from: "orders", ... } }, // Expensive!
{ $lookup: { from: "payments", ... } } // Very expensive!
])

Good:

// Denormalize data into documents
{
"_id": 123,
"name": "Alice",
"recent_orders": [ /* embedded */ ],
"payment_info": { /* embedded */ }
}

Lesson: Design for NoSQL’s strengths, not SQL patterns


Anti-Pattern 2: Ignoring Consistency Guarantees

Section titled “Anti-Pattern 2: Ignoring Consistency Guarantees”

Problem: Assuming eventual consistency means “eventually correct”.

Reality: Eventual consistency can lead to permanent inconsistencies if not handled.

Example:

  • User updates profile on Node A
  • User reads profile from Node B (stale)
  • User makes decision based on stale data
  • Result: Wrong decision, even after consistency

Solution: Use read-after-write consistency, version vectors


Anti-Pattern 3: Over-Normalizing in Document DBs

Section titled “Anti-Pattern 3: Over-Normalizing in Document DBs”

Problem: Normalizing like SQL (separate collections for everything).

Bad:

// Over-normalized (like SQL)
Users collection
Orders collection
OrderItems collection
Products collection
// Need multiple queries to get order!

Good:

// Denormalized (NoSQL style)
{
"_id": "order:123",
"user": { "id": 456, "name": "Alice" }, // Embedded
"items": [
{ "product": "Laptop", "price": 1000 } // Embedded
]
}
// Single query gets everything!

Lesson: Denormalize for read performance



Now that you understand different database types, let’s learn how to choose the right database for your use case:

Next up: Choosing the Right Database — Decision framework for database selection and mapping domain models to storage.