Skip to content
Low Level Design Mastery Logo
LowLevelDesign Mastery

Latency and Throughput

Measuring what matters in system performance

When measuring system performance, two metrics matter most:

Diagram

Latency is the time between sending a request and receiving a response.

Every request goes through multiple stages:

Diagram
TypeDescriptionTypical Values
Network LatencyTime for data to travel over network1-100ms
Processing LatencyTime for server to process request1-50ms
Database LatencyTime for database queries1-10ms
Queue LatencyTime spent waiting in queues0-1000ms+

Average latency can be misleading. Percentiles give a better picture:

Diagram Diagram

Throughput is the amount of work done per unit of time.

MetricDescriptionExample
RPSRequests per second10,000 RPS
TPSTransactions per second5,000 TPS
QPSQueries per second50,000 QPS
BandwidthData transferred per second1 Gbps

Throughput is calculated as:

Throughput = Total Requests / Time Period

Example: If your server handles 10,000 requests in 10 seconds, your throughput is 1,000 RPS.

Key considerations:

  • Measure over time - instantaneous measurements fluctuate
  • Track success rate - failed requests count against effective throughput
  • Monitor under load - throughput often drops when the system is stressed

They’re related but independent:

Diagram

A fundamental relationship:

Average Concurrent Requests = Throughput × Average Latency

Example:

  • Throughput: 1,000 RPS
  • Average Latency: 100ms = 0.1 seconds
  • Concurrent Requests: 1,000 × 0.1 = 100 requests in flight

Your code decisions directly affect latency and throughput:

sequential_bad.py
class OrderService:
"""❌ Sequential calls - high latency"""
def get_order_details(self, order_id: str) -> OrderDetails:
# Each call waits for the previous one
order = self.db.get_order(order_id) # 10ms
user = self.user_service.get(order.user_id) # 20ms
items = self.inventory.get_items(order.items) # 15ms
shipping = self.shipping.get_status(order_id) # 25ms
# Total: 10 + 20 + 15 + 25 = 70ms
return OrderDetails(order, user, items, shipping)
parallel_good.py
import asyncio
from concurrent.futures import ThreadPoolExecutor
class OrderService:
"""✅ Parallel calls - lower latency"""
async def get_order_details(self, order_id: str) -> OrderDetails:
# First, get the order (we need it for user_id)
order = await self.db.get_order(order_id) # 10ms
# Then fetch everything else in parallel
user, items, shipping = await asyncio.gather(
self.user_service.get(order.user_id), # 20ms
self.inventory.get_items(order.items), # 15ms } All run
self.shipping.get_status(order_id) # 25ms } in parallel
)
# Total: 10 + max(20, 15, 25) = 10 + 25 = 35ms
# Saved 35ms (50% reduction!)
return OrderDetails(order, user, items, shipping)
Diagram

Caching is the most effective way to reduce latency. The idea is simple: store frequently accessed data closer to where it’s needed.

Diagram

With this caching strategy:

LevelLatencyHit Rate
L1 (Local)0.01ms90%
L2 (Redis)2ms9%
Database30ms1%

Average latency = 0.9(0.01) + 0.09(2) + 0.01(30) = 0.49ms

That’s a 60x improvement over hitting the database every time!


MetricWhat It Tells YouAction Threshold
P50Typical user experienceBaseline for “normal”
P95Most users’ worst caseWatch for drift
P99Outlier experienceInvestigate if > 3x P50
Error RateSystem healthAlert if > 1%

In Production:

  • APM Tools: Datadog, New Relic, Dynatrace
  • Metrics: Prometheus + Grafana
  • Distributed Tracing: Jaeger, Zipkin

In Development:

  • Profilers: cProfile (Python), JProfiler (Java)
  • Benchmarking: pytest-benchmark, JMH


Now that you understand performance metrics, let’s learn how to identify and fix performance issues:

Next up: Understanding Bottlenecks - Learn to find and eliminate performance bottlenecks.