Skip to content
Low Level Design Mastery Logo
LowLevelDesign Mastery

Service Mesh & Sidecar Pattern

Infrastructure concerns solved at the platform level

Imagine you have a city with many buildings (microservices) that need to talk to each other. Instead of every building having its own phone system, security guards, and mail delivery, you build a city-wide infrastructure that handles all of this automatically.

A service mesh is like this city infrastructure - it handles all the communication, security, and monitoring between your microservices without changing your application code.

Without Service Mesh:

Every service needs to implement:

  • Circuit breakers
  • Retries and timeouts
  • Load balancing
  • Service discovery
  • Metrics collection
  • Distributed tracing
  • Mutual TLS encryption
  • Rate limiting

Result: 30-40% of your code is infrastructure concerns, not business logic!

Diagram

With Service Mesh:

Your service focuses on business logic. Infrastructure handled by the mesh!

Diagram

A sidecar is a companion container that runs alongside your main application container, providing auxiliary functionality.

Diagram

Key Components:

  1. Control Plane: Manages configuration, policies, and telemetry
  2. Data Plane: Network of sidecar proxies handling actual traffic
  3. Sidecar Proxy: Usually Envoy - intercepts all inbound/outbound traffic

Characteristics:

  • Uses Envoy as sidecar proxy
  • Comprehensive feature set
  • Kubernetes-native
  • Complex but powerful

Components:

  • Pilot: Traffic management
  • Citadel: Security and certificate management
  • Galley: Configuration management
  • Envoy: Sidecar proxy (data plane)

Characteristics:

  • Lightweight and simple
  • Written in Rust (fast and secure)
  • Easier to adopt than Istio
  • Kubernetes-only

Characteristics:

  • Works beyond Kubernetes
  • Multi-platform (VMs, containers, serverless)
  • Integrated with Consul service discovery
FeatureIstioLinkerdConsul Connect
ComplexityHighLowMedium
Performance Overhead10-15ms5-10ms10-15ms
Memory UsageHighLowMedium
PlatformKubernetesKubernetesMulti-platform
Learning CurveSteepGentleMedium
MaturityVery MatureMatureMature
CommunityLargestGrowingStrong

# Istio VirtualService - 80% to v1, 20% to v2
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: payment-service
spec:
hosts:
- payment-service
http:
- route:
- destination:
host: payment-service
subset: v1
weight: 80
- destination:
host: payment-service
subset: v2
weight: 20

What this gives you:

  • Canary deployments (route 5% traffic to new version)
  • A/B testing
  • Blue-green deployments
  • Gradual rollouts
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: order-service
spec:
hosts:
- order-service
http:
- route:
- destination:
host: order-service
timeout: 5s
retries:
attempts: 3
perTryTimeout: 2s
retryOn: 5xx,reset,connect-failure

No code changes needed! The sidecar proxy handles all retries automatically.

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: payment-service-circuit-breaker
spec:
host: payment-service
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
http:
http1MaxPendingRequests: 50
maxRequestsPerConnection: 2
outlierDetection:
consecutive5xxErrors: 5
interval: 30s
baseEjectionTime: 30s

Translation:

  • If a service instance returns 5 consecutive 5xx errors
  • Eject it from the load balancer pool for 30 seconds
  • All without modifying your application code!

Without Service Mesh:

# You need to implement mTLS in every service
import ssl
import httpx
ssl_context = ssl.create_default_context()
ssl_context.load_cert_chain(
certfile="client-cert.pem",
keyfile="client-key.pem"
)
ssl_context.load_verify_locations("ca-cert.pem")
async def call_payment_service():
async with httpx.AsyncClient(verify=ssl_context) as client:
response = await client.post(
"https://payment-service:8443/process",
json={"order_id": "123"}
)

With Service Mesh:

# Just make normal HTTP calls to localhost
# Sidecar handles mTLS automatically!
async def call_payment_service():
async with httpx.AsyncClient() as client:
response = await client.post(
"http://payment-service:8080/process", # HTTP, not HTTPS!
json={"order_id": "123"}
)
# Traffic from sidecar to sidecar is encrypted automatically
Diagram

Service mesh handles:

  • Certificate generation
  • Certificate distribution
  • Automatic rotation (every 24 hours)
  • No expired certificates!
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: payment-service-policy
spec:
selector:
matchLabels:
app: payment-service
rules:
- from:
- source:
principals: ["cluster.local/ns/default/sa/order-service"]
to:
- operation:
methods: ["POST"]
paths: ["/process"]

Translation: Only the Order Service can call the Payment Service’s /process endpoint.

Without any code changes, service mesh collects:

  • Request rate: Requests per second
  • Error rate: Percentage of failed requests
  • Latency: P50, P95, P99 latencies
  • Request volume: Total requests over time
Diagram

Without Service Mesh:

  • Manually instrument every service
  • Add trace IDs to headers
  • Send spans to collector

With Service Mesh:

  • Automatic trace propagation
  • Automatic span creation
  • Just forward the trace headers!
# Minimal code required - just forward headers
from fastapi import FastAPI, Request
app = FastAPI()
@app.post("/orders")
async def create_order(request: Request):
# Extract trace headers
trace_headers = {
k: v for k, v in request.headers.items()
if k.lower() in [
'x-request-id',
'x-b3-traceid',
'x-b3-spanid',
'x-b3-sampled',
]
}
# Forward to payment service
async with httpx.AsyncClient() as client:
response = await client.post(
"http://payment-service/process",
headers=trace_headers # Forward trace context
)
# Sidecar automatically creates spans and sends to Jaeger!
return {"status": "created"}

Result: Full distributed traces without heavy instrumentation!

Diagram

No need for hard-coded service URLs!

# Instead of:
PAYMENT_SERVICE_URL = "http://payment-service-prod-123.us-west-2.elb.amazonaws.com:8080"
# Just use service name:
response = await client.post("http://payment-service/process")
# Service mesh resolves the actual endpoint automatically

LLD Connection: Decorator Pattern at Infrastructure Level

Section titled “LLD Connection: Decorator Pattern at Infrastructure Level”

Service mesh is the Decorator Pattern applied to infrastructure!

# Base component
class PaymentService:
def process(self, order_id: str) -> PaymentResult:
return self._charge_card(order_id)
# Decorators add functionality
class LoggingDecorator(PaymentService):
def __init__(self, wrapped: PaymentService):
self._wrapped = wrapped
def process(self, order_id: str) -> PaymentResult:
print(f"Processing payment for {order_id}")
result = self._wrapped.process(order_id)
print(f"Payment result: {result.status}")
return result
class RetryDecorator(PaymentService):
def __init__(self, wrapped: PaymentService):
self._wrapped = wrapped
def process(self, order_id: str) -> PaymentResult:
for attempt in range(3):
try:
return self._wrapped.process(order_id)
except Exception:
if attempt == 2:
raise
time.sleep(1)
# Stack decorators
service = RetryDecorator(LoggingDecorator(PaymentService()))

Service mesh applies the same pattern at the network level!

Diagram
  1. Many Microservices (10+)

    • Managing infrastructure concerns manually becomes impossible
    • Need consistent policies across all services
  2. Zero-Trust Security Requirements

    • Need mutual TLS everywhere
    • Fine-grained authorization policies
    • Audit trail of all service-to-service communication
  3. Complex Routing Requirements

    • Canary deployments
    • A/B testing
    • Traffic mirroring
    • Gradual rollouts
  4. Observability is Critical

    • Need distributed tracing
    • Uniform metrics collection
    • Service dependency graphs
  5. Polyglot Architecture

    • Services in different languages
    • Can’t reimplement infrastructure in each language
  1. Small Number of Services (< 5)

    • Overhead not justified
    • Libraries like Resilience4j (Java) or Tenacity (Python) sufficient
  2. Simple Architecture

    • Direct service-to-service calls work fine
    • No complex routing needs
  3. Limited Kubernetes Experience

    • Service mesh adds operational complexity
    • Need solid Kubernetes foundation first
  4. Performance is Critical

    • Service mesh adds 5-15ms latency per hop
    • For ultra-low-latency systems, this matters
  5. Small Team

    • Learning curve is steep
    • Operational burden high
    • Focus on business features instead

Problem:

  • 100+ microservices
  • Multiple languages (Python, Go, Java)
  • Reimplementing infrastructure in each service

Solution:

  • Built Envoy proxy (2016)
  • Open-sourced it (now part of CNCF)
  • Foundation for Istio, Consul Connect

Results:

  • Consistent observability
  • Simplified operations
  • Faster feature development

Before Service Mesh:

  • Manual circuit breakers in each service
  • Inconsistent retry policies
  • Difficult to debug cascading failures

After Adopting Istio:

  • Uniform traffic policies
  • Better visibility into failures
  • Reduced incident response time by 40%

Key Learning:

“We spent 6 months migrating to service mesh. The operational simplicity we gained was worth every minute.” - Airbnb Engineering


Terminal window
# Download Istio
curl -L https://istio.io/downloadIstio | sh -
# Install Istio on Kubernetes
istioctl install --set profile=demo -y
# Enable sidecar injection for namespace
kubectl label namespace default istio-injection=enabled

Step 2: Deploy Your Service (No Code Changes!)

Section titled “Step 2: Deploy Your Service (No Code Changes!)”
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service
spec:
replicas: 3
template:
metadata:
labels:
app: payment-service
version: v1
spec:
containers:
- name: payment-service
image: payment-service:v1
ports:
- containerPort: 8080

When deployed to a namespace with Istio injection enabled:

  • Istio automatically injects Envoy sidecar
  • No changes to your container!
# Canary deployment: 90% to v1, 10% to v2
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: payment-service
spec:
hosts:
- payment-service
http:
- route:
- destination:
host: payment-service
subset: v1
weight: 90
- destination:
host: payment-service
subset: v2
weight: 10
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: default
spec:
mtls:
mode: STRICT # All traffic must be mTLS

Done! All service-to-service traffic is now encrypted.


ScenarioWithout MeshWith MeshOverhead
Simple request5ms10ms+5ms
With retries15ms20ms+5ms
With circuit breaker5ms10ms+5ms
mTLS handshake-15ms+15ms (once)

Typical: 5-15ms added latency per service hop

ResourcePer Sidecar
Memory50-100 MB
CPU0.1-0.5 cores

For 100 services with 3 replicas each:

  • 300 sidecars × 75 MB = 22.5 GB memory
  • 300 sidecars × 0.3 cores = 90 CPU cores

Infrastructure as Code

Service mesh moves infrastructure concerns from code to configuration. Focus on business logic, not retries and timeouts.

Decorator at Scale

Service mesh is the Decorator pattern applied to network infrastructure. Add functionality without modifying services.

Not a Silver Bullet

Service mesh adds complexity and overhead. Only adopt when you have enough services (10+) to justify the cost.

Observability for Free

Automatic metrics, tracing, and logging across all services without instrumentation. This alone can justify adoption.



  • “Istio: Up and Running” by Lee Calcote
  • “Service Mesh Patterns” by Alex Soto Bueno
  • Envoy Proxy Documentation - Deep technical details
  • Istio Documentation - Official guides and tutorials
  • “The Service Mesh Era” by William Morgan (Linkerd creator)