Remember our LEGO castle from the monolith lesson? A microservices architecture is like building your castle from many small, independent pieces that work together but can be changed, moved, or replaced individually.
Each piece (microservice) has one specific job:
The drawbridge service only handles opening and closing the gate
The watchtower service only handles lookout duties
The kitchen service only handles food preparation
If the drawbridge breaks, the kitchen still works!
The Philosophy
Microservices are independently deployable services organized around business capabilities. Each service runs in its own process, communicates via lightweight mechanisms (usually HTTP APIs), and can be deployed independently.
Companies like Netflix, Uber, Amazon, and Spotify run on microservices architectures with hundreds or thousands of services. But they didn’t start that way - they evolved to microservices when their monoliths couldn’t scale anymore.
Each microservice should do one thing well .
Bad Example:
OrdersAndPaymentsAndShippingService
Good Example:
OrderService → Manages orders only
PaymentService → Handles payments only
ShippingService → Manages shipping only
NotificationService → Sends notifications only
Deploy one service without touching others.
# Deploy only the payment service
kubectl apply -f payment-service-v2.yaml
# Order service keeps running with no downtime
# User service keeps running with no downtime
Each service has its own database. No shared databases!
Each team owns their service end-to-end:
Choose their own technology stack
Choose their own database
Make their own architectural decisions
Set their own deployment schedule
Example:
Order Service: Python + PostgreSQL
Payment Service: Java + MySQL
Notification Service: Node.js + MongoDB
Search Service: Go + Elasticsearch
Scale only what needs scaling!
Use the right tool for the job!
Real-World Example: Netflix
Java Spring Boot : Core business services (catalog, user management)
Node.js : API gateway (fast I/O, great for proxying)
Python : Machine learning recommendations
Go : High-performance streaming services
Each team chooses what works best for their problem domain.
One service failure doesn’t bring down the entire system.
With graceful degradation:
Orders still work (view, create, update)
Payments queued for later processing
Notifications still sent
System remains partially functional
Teams can work independently without stepping on each other’s toes.
Aspect Monolith Microservices Deployment Coordinate with everyone Deploy independently Technology Everyone uses same stack Choose your own stack Database Shared, coordinate schema changes Own your schema Testing Wait for full integration tests Test your service in isolation Ownership Blurred boundaries Clear ownership
Instead of understanding a 1M line monolith, understand a 10K line service.
Cognitive Load:
Monolith: 1,000,000 lines, 200 classes, 50 modules
Microservice: 10,000 lines, 20 classes, 5 modules
New engineers can become productive faster on individual services.
Warning: Complexity Explosion
Microservices trade code complexity for operational complexity . You’re not reducing complexity - you’re moving it to a different place!
# In a monolith - this ALWAYS works or throws exception
payment_result = payment_service. process ( order )
# In microservices - this can fail in many ways:
response = http. post ( ' http://payment-service/process ' ,
# - Payment service is down?
# - Response is corrupted?
# - Payment processed but response lost?
# Did the payment process or not? 🤷
# This is the "Two Generals Problem"
The network is reliable ❌ (it’s not)
Latency is zero ❌ (10-50ms per service hop)
Bandwidth is infinite ❌ (serialization overhead)
The network is secure ❌ (need authentication everywhere)
Topology doesn’t change ❌ (services come and go)
There is one administrator ❌ (multiple teams)
Transport cost is zero ❌ (serialization, monitoring)
The network is homogeneous ❌ (different tech stacks)
No more ACID transactions across services!
# One database transaction - ACID guaranteed
def create_order ( user_id , items , payment_info ) :
# All succeed or all fail atomically
order = order_repo. save ( Order ( user_id , items ))
payment = payment_repo. save ( Payment ( order.id , payment_info ))
inventory_repo. reserve ( items )
# Three service calls - no atomic transaction!
def create_order ( user_id , items , payment_info ) :
order = order_service. create ( user_id , items )
# Step 2: Process payment (different service/database)
payment = payment_service. process ( order.id , payment_info )
# Need to compensate - cancel the order
order_service. cancel ( order.id )
# Step 3: Reserve inventory (different service/database)
inventory_service. reserve ( order.id , items )
except InsufficientStock:
# Need to compensate both previous operations!
payment_service. refund ( payment.id )
order_service. cancel ( order.id )
# What if the service crashes here? 😱
# Order created, payment processed, but inventory not reserved!
Solution: Saga Pattern, eventual consistency (covered in async patterns section)
Challenges:
Need to spin up multiple services for integration tests
End-to-end tests require entire ecosystem
Mocking service dependencies is complex
Test data management across services
Task Monolith 10 Microservices 100 Microservices Deployment pipeline 1 10 100 Monitoring dashboards 1 10 100 Log aggregation 1 source 10 sources 100 sources Databases to manage 1 10 100 Security patches 1 app 10 apps 100 apps Incident response 1 service down Which of 10? Which of 100?
Required Infrastructure:
Service discovery (Consul, Eureka)
API Gateway (Kong, NGINX)
Message broker (Kafka, RabbitMQ)
Distributed tracing (Jaeger, Zipkin)
Centralized logging (ELK, Splunk)
Service mesh (Istio, Linkerd)
Container orchestration (Kubernetes)
Scenario: User reports “checkout is slow”
In a Monolith:
Check logs → Find slow database query → Fix
In Microservices:
Check which service is slow → API Gateway logs
→ Order Service logs → Payment Service logs
→ Inventory Service logs → User Service logs
→ Find slow service → Check its dependencies
→ Distributed trace shows: Payment Service calling external API slowly
Time: 2 hours (and a lot of frustration)
Performance Comparison:
Large, Mature Organization
50+ engineers
Multiple teams working independently
Clear organizational boundaries
Proven Product with Clear Boundaries
You understand your domain well
Stable business capabilities
Clear service boundaries identified
Different Scaling Requirements
Search: 1000 requests/sec → 100 instances
Admin: 10 requests/sec → 2 instances
Significant cost savings from independent scaling
Team Autonomy is Critical
Teams can’t wait for other teams
Need independent deployment schedules
Want to experiment with new technologies
You Have the Infrastructure and Expertise
DevOps team in place
Monitoring and observability ready
Experience with distributed systems
Starting a New Product
Don’t know what will succeed
Boundaries will change frequently
Premature optimization
Small Team (< 20 engineers)
Overhead outweighs benefits
Everyone can understand the monolith
Deployment coordination is easy
Strong Consistency Requirements
Financial transactions requiring ACID
Can’t tolerate eventual consistency
Complex cross-entity workflows
Limited DevOps Capabilities
No Kubernetes/Docker experience
No monitoring infrastructure
Small ops team
Group by business capabilities , not technical layers.
Bad (Technical Boundaries):
Good (Business Boundaries):
OrderManagement → Everything about orders
InventoryManagement → Everything about inventory
CustomerManagement → Everything about customers
PaymentProcessing → Everything about payments
Each service represents a bounded context with its own:
Domain model
Ubiquitous language
Business rules
Key Insight: The same entity can have different representations in different contexts!
Micro in Microservices
“Micro” doesn’t mean small in code size. It means:
Focused responsibility (does one thing)
Independently deployable
Bounded context
A microservice can be 100 lines or 10,000 lines. Size is not the point!
from typing import Optional
def __init__ ( self , payment_client : ' PaymentClient ' ) :
self ._payment_client = payment_client
async def create_order ( self , user_id : str , items : list ) -> Order:
order = Order ( user_id = user_id , items = items )
await self ._order_repo. save ( order )
# Synchronous call to payment service
payment_result = await self ._payment_client. process_payment (
timeout = 5.0 # Always set timeouts!
if payment_result.status == ' success ' :
except httpx.TimeoutException:
# Handle timeout - what should we do?
order. mark_as_pending_payment ()
# Maybe retry later via background job
""" Client for calling Payment Service """
def __init__ ( self , base_url : str ) :
self ._base_url = base_url
self ._client = httpx. AsyncClient ()
async def process_payment (
response = await self ._client. post (
f " { self ._base_url} /payments" ,
json = { " order_id " : order_id, " amount " : amount} ,
response. raise_for_status ()
return PaymentResult ( ** response. json ())
public class OrderService {
private final PaymentClient paymentClient ;
private final OrderRepository orderRepo ;
public Order createOrder ( String userId , List < Item > items ) {
Order order = new Order ( userId, items ) ;
// Synchronous call to payment service
PaymentResult result = paymentClient . processPayment (
Duration . ofSeconds ( 5 ) // Always set timeouts!
if ( result . getStatus () == PaymentStatus . SUCCESS ) {
} catch ( TimeoutException e ) {
// Handle timeout - what should we do?
order . markAsPendingPayment () ;
// Maybe retry later via background job
public interface PaymentClient {
PaymentResult processPayment (
) throws TimeoutException ;
public class HttpPaymentClient implements PaymentClient {
private final HttpClient httpClient ;
private final String baseUrl ;
public PaymentResult processPayment (
HttpRequest request = HttpRequest . newBuilder ()
. uri ( URI . create ( baseUrl + " /payments " ))
. POST ( BodyPublishers . ofString (
String . format ( " { \" order_id \" : \" %s \" , \" amount \" :%s} " ,
HttpResponse < String > response = httpClient . send (
HttpResponse . BodyHandlers . ofString ()
return parsePaymentResult ( response .body ()) ;
Pros:
Simple to understand
Immediate response
Easy to debug
Cons:
Tight coupling (caller waits for response)
Cascading failures (if payment service is down, order service fails)
Timeout management complexity
from dataclasses import dataclass
def __init__ ( self , message_broker : ' MessageBroker ' ) :
self ._broker = message_broker
async def create_order ( self , user_id : str , items : list ) -> Order:
order = Order ( user_id = user_id , items = items )
await self ._order_repo. save ( order )
# Publish event asynchronously
await self ._broker. publish (
# Don't wait for payment processing!
# Payment service will consume event and process asynchronously
""" Separate service that consumes events """
def __init__ ( self , message_broker : ' MessageBroker ' ) :
self ._broker = message_broker
# Subscribe to order events
self ._broker. subscribe ( ' order.created ' , self .handle_order_created )
async def handle_order_created ( self , event : OrderCreatedEvent ) :
""" Process payment asynchronously """
result = await self . _process_payment (
await self ._broker. publish (
topic = ' payment.processed ' ,
event = PaymentProcessedEvent (
# Handle error - maybe retry or dead-letter
await self ._broker. publish (
event = PaymentFailedEvent (
public record OrderCreatedEvent (
public class OrderService {
private final MessageBroker broker ;
private final OrderRepository orderRepo ;
public Order createOrder ( String userId , List < Item > items ) {
Order order = new Order ( userId, items ) ;
// Publish event asynchronously
// Don't wait for payment processing!
// Payment service will consume event and process asynchronously
public class PaymentService {
private final MessageBroker broker ;
public PaymentService ( MessageBroker broker ) {
// Subscribe to order events
broker . subscribe ( " order.created " , this :: handleOrderCreated ) ;
public void handleOrderCreated ( OrderCreatedEvent event ) {
PaymentResult result = processPayment (
new PaymentProcessedEvent (
// Handle error - maybe retry or dead-letter
Pros:
Loose coupling (services don’t wait for each other)
Fault tolerance (messages stored until consumed)
Natural retry mechanism
Better scalability
Cons:
Eventual consistency
Harder to debug
Message ordering challenges
Complexity in error handling
Scale:
200+ million subscribers
1000+ microservices
100,000+ AWS instances
Evolution:
Started as a monolith (DVD rental business)
Migrated to microservices (2008-2009)
Reason: Database scalability issues
Key Practices:
Chaos Engineering (intentionally breaking services)
API-first design
Automated deployment pipelines
Strong observability culture
Architecture:
2,000+ microservices
50,000+ production servers
Polyglot: Go, Java, Python, Node.js
Challenges They Faced:
Service discovery at scale
Solution: Built internal service mesh
Cascading failures
Solution: Circuit breakers everywhere
Debugging distributed traces
Solution: Built Jaeger (now open source)
The Mandate (2002):
“All teams will henceforth expose their data and functionality through service interfaces. Teams must communicate with each other through these interfaces. There will be no other form of interprocess communication allowed. Anyone who doesn’t do this will be fired.” - Jeff Bezos
Result:
Forced SOA (Service-Oriented Architecture)
Led to AWS (Amazon Web Services)
Enabled massive scale and innovation
from pydantic import BaseModel
class OrderV1 ( BaseModel ):
# V2 API - added items field
class OrderV2 ( BaseModel ):
tax: float # Breaking change!
@app.get ( " /api/v1/orders/ {order_id} " )
async def get_order_v1 ( order_id : str ) -> OrderV1:
order = await order_repo. get ( order_id )
@app.get ( " /api/v2/orders/ {order_id} " )
async def get_order_v2 ( order_id : str ) -> OrderV2:
order = await order_repo. get ( order_id )
// V2 API - added items field
BigDecimal tax // Breaking change!
public class OrderController {
@ GetMapping ( " /api/v1/orders/{orderId} " )
public OrderV1 getOrderV1 ( @ PathVariable String orderId ) {
Order order = orderRepo . findById ( orderId ) ;
@ GetMapping ( " /api/v2/orders/{orderId} " )
public OrderV2 getOrderV2 ( @ PathVariable String orderId ) {
Order order = orderRepo . findById ( orderId ) ;
Every service should expose health endpoints:
""" Is the service running? """
@app.get ( " /health/ready " )
""" Is the service ready to accept traffic? """
# Check database connection
if not await database. is_connected ():
raise HTTPException ( status_code = 503 , detail = " Database not ready " )
# Check dependent services
if not await payment_service. is_available ():
raise HTTPException ( status_code = 503 , detail = " Payment service unavailable " )
return { " status " : " ready " }
Structured Logging:
logger = structlog. get_logger ()
async def create_order ( order_id : str , user_id : str ) :
" order.create.completed " ,
Distributed Tracing:
from opentelemetry import trace
tracer = trace. get_tracer ( __name__ )
async def create_order ( order_id : str ) :
with tracer. start_as_current_span ( " order.create " ) as span:
span. set_attribute ( " order.id " , order_id )
# Call payment service - trace continues!
with tracer. start_as_current_span ( " payment.process " ):
await payment_client. process ( order_id )
Prevent cascading failures:
class CircuitState ( Enum ):
CLOSED = " closed " # Normal operation
OPEN = " open " # Failing, reject requests
HALF_OPEN = " half_open " # Testing if recovered
def __init__ ( self , failure_threshold : int = 5 , timeout : int = 60 ) :
self .failure_threshold = failure_threshold
self .last_failure_time = None
self .state = CircuitState. CLOSED
async def call ( self , func , *args , **kwargs ) :
if self .state == CircuitState. OPEN :
# Check if timeout expired
if time. time () - self .last_failure_time > self .timeout:
self .state = CircuitState. HALF_OPEN
raise CircuitOpenError ( " Circuit breaker is OPEN " )
result = await func ( * args , ** kwargs )
self .state = CircuitState. CLOSED
self .last_failure_time = time. time ()
if self .failure_count >= self .failure_threshold:
self .state = CircuitState. OPEN
payment_breaker = CircuitBreaker ()
async def call_payment_service ( order_id ) :
return await payment_breaker. call (
# Fallback: queue for later processing
await queue. enqueue ( order_id )
return { " status " : " queued " }
public enum CircuitState {
CLOSED , // Normal operation
OPEN , // Failing, reject requests
HALF_OPEN // Testing if recovered
public class CircuitBreaker {
private final int failureThreshold ;
private final long timeout ;
private int failureCount = 0 ;
private long lastFailureTime = 0 ;
private CircuitState state = CircuitState . CLOSED ;
public < T > T call ( Supplier < T > supplier ) throws CircuitOpenException {
if (state == CircuitState . OPEN ) {
if ( System . currentTimeMillis () - lastFailureTime > timeout) {
state = CircuitState . HALF_OPEN ;
throw new CircuitOpenException ( " Circuit breaker is OPEN " ) ;
T result = supplier . get () ;
private void onSuccess () {
state = CircuitState . CLOSED ;
private void onFailure () {
lastFailureTime = System . currentTimeMillis () ;
if (failureCount >= failureThreshold) {
state = CircuitState . OPEN ;
CircuitBreaker paymentBreaker = new CircuitBreaker ( 5 , 60000 ) ;
public PaymentResult callPaymentService ( String orderId ) {
return paymentBreaker . call ( () ->
paymentClient . process ( orderId )
} catch ( CircuitOpenException e ) {
// Fallback: queue for later processing
return new PaymentResult ( " queued " ) ;
Start with Monolith
Don’t start with microservices. Prove your product first, then evolve architecture based on real constraints.
Trade-offs Matter
Microservices trade code complexity for operational complexity. Be prepared for distributed system challenges.
Team Size Matters
Microservices work best with large teams (50+). Small teams benefit more from well-designed monoliths.
Boundaries are Hard
Finding the right service boundaries is hard. Use Domain-Driven Design principles to guide decomposition.
“Building Microservices” by Sam Newman
“Microservices Patterns” by Chris Richardson
“Production-Ready Microservices” by Susan J. Fowler
Netflix Tech Blog - Real-world microservices at scale
Martin Fowler’s Microservices Guide - Comprehensive resource