← Back to index

Common Patterns

A field guide to the recurring building blocks of system design — the handful of problems that show up in almost every large system, and the standard ways to solve each one. Each pattern is framed as a problem, the main approaches, and when to reach for which.

Most system-design problems are combinations of a small number of recurring sub-problems. You rarely invent a new way to push live updates or scale reads — you recognize the shape of the problem and apply a known pattern, then reason about its trade-offs for your specific constraints. This page collects the most common of those patterns. Knowing them turns a blank-page design question into a matter of selection: name the sub-problem, pick the fitting pattern, and justify the choice.

Contents

  1. Pushing Real-Time Updates
  2. Managing Long-Running Tasks
  3. Dealing with Contention
  4. Scaling Reads
  5. Scaling Writes
  6. Handling Large Blobs
  7. Multi-Step Processes
  8. Proximity-Based Services
  9. Rate Limiting & Throttling
  10. Idempotency & Safe Retries
  11. Summary & Cheat Sheet

1. Pushing Real-Time Updates

Problem: the server has new information (a chat message, a price tick, a notification) and the client needs to see it promptly, without the user refreshing. HTTP is request-response, so the challenge is getting data to a client that did not just ask for it.

Real-time update approaches
Three approaches on a spectrum from simplest to most capable: short polling, server-sent events, and WebSockets.
ApproachHow it worksUse when
Short pollingClient requests on a fixed timer; server answers with new data or nothing.Updates are infrequent and a few seconds of lag is fine. Simplest to build.
Long pollingClient request is held open until there is data (or a timeout), then immediately re-issued.You want near-instant delivery but must stay on plain HTTP / simple infra.
Server-sent events (SSE)One long-lived HTTP stream the server pushes events down. One-way only.Server-to-client streams: feeds, notifications, live dashboards.
WebSocketsA persistent, full-duplex TCP connection both sides can send on.Truly interactive, low-latency, bidirectional traffic: chat, games, collaboration.
At scale, the connection and the fan-out are separate concerns. A stateful gateway holds the many open connections, while a pub/sub layer (e.g. Redis, Kafka) decouples the publisher from the gateways so an event can reach every interested connection across the fleet.

2. Managing Long-Running Tasks

Problem: some work takes too long to finish inside a request — video transcoding, report generation, sending a million emails. Doing it synchronously ties up a server, risks timeouts, and gives the user a spinner for minutes.

Long-running task pattern
Accept the work, enqueue it, return immediately, and process it asynchronously on a worker pool while the client tracks status.

The pattern is accept-and-defer: the API validates the request, writes a job record, puts it on a queue, and returns 202 Accepted with a job id. A pool of workers pulls from the queue and processes jobs in the background, writing progress and results to a status store. The client learns the outcome by polling the job status or being notified (webhook / push) when it completes.

3. Dealing with Contention

Problem: multiple clients try to read-modify-write the same resource at the same time — the last seat on a flight, a counter, an account balance. Without coordination they overwrite each other and you get lost updates or oversells.

Contention strategies
Pessimistic locking serializes access up front; optimistic concurrency lets everyone proceed and detects conflicts at write time.
StrategyHow it worksUse when
Pessimistic lockingAcquire a lock (row lock, SELECT ... FOR UPDATE) before reading; others wait.Conflicts are frequent and retrying is expensive; short critical sections.
Optimistic concurrency (OCC)Read a version, do the work, write only if the version is unchanged (compare-and-set). Loser retries.Conflicts are rare; you want maximum concurrency and no held locks.
Atomic operationsPush the whole change into one atomic primitive (INCR, conditional update) the datastore serializes for you.The update is a simple, expressible mutation (counters, sets).
Distributed lockA lock held in an external store (Redis, ZooKeeper) coordinates across processes/machines.The resource spans services and no single database can arbitrate.
Distributed locks are deceptively hard: you need a lease/TTL so a crashed holder cannot block forever, and a fencing token so a slow holder cannot act after its lease expired. When you can, prefer a single atomic operation or OCC over a lock you have to manage.

4. Scaling Reads

Problem: read traffic vastly outweighs writes (often 100:1 or more) and a single database can't serve it. The goal is to add read capacity and cut latency without compromising the write path.

Scaling reads
Layered defense for reads: a CDN at the edge, a cache tier in front of the database, and read replicas behind it.
TechniqueHow it worksCost / caveat
CachingKeep hot results in memory (cache-aside, with TTLs) so most reads never hit the database.Invalidation and staleness; cold-cache and thundering-herd risk.
Read replicasReplicate the primary to read-only copies and route reads to them.Replication lag means replicas can serve slightly stale data.
CDN / edgeServe cacheable responses from points of presence close to users.Best for static or slowly-changing content; needs cache-control discipline.
Denormalization / materialized viewsPrecompute the shape the read needs so a query is a single lookup.More write-time work and storage; views must be kept in sync.
Decide your tolerance for staleness first — it dictates everything. If reads must be strictly current, replicas and caches don't help; if seconds of lag are fine, they let you scale almost arbitrarily.

5. Scaling Writes

Problem: a single node can't absorb the write throughput — too many inserts per second, or a working set too large for one machine. Writes are harder to scale than reads because you can't just add read-only copies.

Scaling writes
The core move is partitioning: route each write to a shard by a key, so total write capacity grows with the number of shards.
TechniqueHow it worksUse when
Partitioning / shardingSplit data across shards by a key (hash or range); each shard takes a fraction of writes.The dominant lever. Needs a shard key that spreads load evenly.
Batching & bufferingAccept writes into a queue/log and apply them in batches downstream.Smooths spikes and turns many small writes into fewer large ones.
LSM-tree storageEngines (Cassandra, RocksDB) that turn random writes into sequential appends.Write-heavy workloads where sequential I/O is far cheaper.
Sharded / async countersSplit a hot counter into many sub-counters, or aggregate asynchronously.A single hot row (likes, views) that would otherwise serialize all writes.
The whole game is the shard key. A poor key creates hot shards — one partition taking most of the traffic (the "celebrity" problem) — which puts you right back to a single-node bottleneck. Choose a key with high cardinality and even access.

6. Handling Large Blobs

Problem: users upload and download large binary files — images, video, documents. Streaming gigabytes through your application servers and storing them in your primary database wrecks both.

Handling large blobs
Keep the bytes out of your app: the client uploads directly to object storage with a presigned URL; the database holds only metadata.

The pattern separates bytes from metadata:

For very large uploads use multipart upload: the file is split into chunks uploaded in parallel and reassembled, which also makes failed uploads resumable instead of restarting from zero.

7. Multi-Step Processes

Problem: a business operation spans several services and can't be wrapped in one database transaction — placing an order needs to reserve inventory, charge payment, and create a shipment, each owned by a different service with its own database. If a later step fails, the earlier ones must be undone.

The obvious instinct is a distributed transaction — a two-phase commit (2PC) across all the databases so they commit or roll back together. In practice 2PC is avoided at scale: it holds locks across services for the duration of the transaction, the central coordinator is a single point of failure, and most modern datastores and message brokers don't support it. So instead of one big atomic transaction, we break the work into a chain of small local ones and accept eventual consistency. That chain is a saga.

What is a saga?

A saga is a sequence of local transactions, one per service. Each step commits independently in its own database, and each has a paired compensating transaction that semantically undoes it — not a literal rollback (the data is already committed), but a new action that reverses the effect: refund the charge, release the reservation, cancel the shipment. The saga runs the steps forward; if step N fails, it runs the compensations for steps N−1 … 1 in reverse, leaving the system in a consistent end state without ever holding a cross-service lock.

Saga happy path with compensations
Each forward step has a compensating undo. The happy path runs left to right; a failure triggers the compensations in reverse.

The term comes from a 1987 paper by Garcia-Molina and Salem describing how to handle "long-lived transactions" without holding locks for their entire duration — exactly the microservices problem decades early. There are two ways to coordinate the steps: orchestration and choreography.

Orchestration — a central coordinator

An orchestrator is a dedicated component that owns the workflow. It knows the full sequence, calls each service in turn (usually by sending a command and awaiting a reply), records progress in a durable saga log, and — when a step fails — issues the compensating commands in reverse. The services themselves stay dumb: they just expose "do this step" and "undo this step" operations.

Saga orchestration
The orchestrator drives every step, persists state to a saga log so it can resume after a crash, and triggers compensations on failure.

Implemented as a state machine, the orchestrator persists its position after every step so that, if it crashes, it resumes from exactly where it left off rather than restarting or double-charging:

function run_order_saga(order):
  state = saga_log.start(order.id)          # persisted before any step
  try:
    res = inventory.reserve(order)          # step 1 (idempotent, keyed by order.id)
    saga_log.record("reserved", res)        # checkpoint after each step

    pay = payment.charge(order)             # step 2
    saga_log.record("charged", pay)

    ship = shipping.create(order)           # step 3
    saga_log.record("shipped", ship)

    return saga_log.complete()
  except StepFailed as f:
    # run compensations in reverse for whatever already succeeded
    if saga_log.has("charged"):   payment.refund(order)
    if saga_log.has("reserved"):  inventory.release(order)
    saga_log.fail(f)

You rarely hand-roll this. A workflow engine provides the durable state, retries, timeouts, and resume-after-crash for you:

Choreography — services react to events

Choreography has no coordinator. Each service publishes an event when it finishes its local transaction; other services subscribe to the events relevant to them, do their step, and publish the next event. The end-to-end flow is an emergent chain of events rather than a script anyone owns.

Saga choreography
Services communicate only through an event bus: each consumes the previous event and emits the next. Compensation is itself just a failure event others react to.
# Inventory Service — reacts to the order event, emits the next event
on event OrderCreated(order):
  if reserve(order):                         # local transaction
    publish InventoryReserved(order)
  else:
    publish InventoryFailed(order)           # triggers no further steps

# Payment Service — reacts to the inventory event
on event InventoryReserved(order):
  if charge(order):
    publish PaymentCharged(order)
  else:
    publish PaymentFailed(order)             # compensation trigger ...

# Inventory compensates when it hears a downstream failure
on event PaymentFailed(order):
  release(order)                             # undo its own earlier step

The event bus is the backbone. Apache Kafka is the common choice — a durable, replayable log of topics services subscribe to. On AWS the equivalent is SNS + SQS (fan-out topic into per-consumer queues) or EventBridge; RabbitMQ and NATS fill the same role elsewhere. Whatever the broker, the consumers must handle redelivery, since these systems deliver at least once.

Choosing between them

DimensionOrchestrationChoreography
Where the logic livesCentralized in the orchestrator — one place to read the whole flow.Spread across services as event handlers — no single source of truth.
CouplingServices coupled to the orchestrator, not to each other.Loosely coupled; services only know events, not each other.
ObservabilityEasy — the saga log is the audit trail.Harder — you reconstruct the flow by tracing events across services.
Failure handlingCompensation logic is explicit and ordered in one place.Compensation is distributed; risk of event cycles and missed cases.
Best forComplex flows, many steps, where visibility and control matter.Simple, linear flows and high decoupling between teams.

What every saga implementation must get right

A saga buys you eventual consistency, not atomicity. There are real windows where payment is charged but the shipment isn't created yet — so the UI and downstream readers must tolerate "in-progress" states. If you genuinely need all-or-nothing isolation across services, you need a different design (or to redraw the service boundaries so the transaction fits in one).

8. Proximity-Based Services

Problem: "find things near me" — nearby drivers, restaurants, friends. Latitude/longitude are two independent dimensions, so a plain B-tree index can filter one but not both efficiently, and scanning every point is hopeless at scale.

Proximity search
A spatial index buckets the world into cells so a "nearby" query reads only the cells around you, not the whole dataset.

The fix is a spatial index that maps two-dimensional location to a one-dimensional, locality-preserving key, so nearby points share key prefixes and land in the same or adjacent buckets:

A query becomes: compute your cell, gather it and its neighbors, then filter those candidates by exact distance. You read a handful of buckets instead of the entire map. Many databases ship this built in (PostGIS, Elasticsearch geo_point, Redis geo commands).

9. Rate Limiting & Throttling

Problem: you must cap how often a client can call you — to protect against abuse and runaway loops, to enforce API quotas, and to keep one noisy tenant from starving everyone else. (Not in the original list, but it pairs with nearly every other pattern here.)

Rate limiting token bucket
The token bucket: requests spend tokens that refill at a steady rate; an empty bucket means reject.
AlgorithmHow it worksCharacter
Token bucketTokens refill at rate r up to a capacity; each request spends one.Smooth average rate, allows bursts up to capacity. The common default.
Leaky bucketRequests queue and drain at a fixed rate.Enforces a strictly steady output rate; smooths bursts out.
Fixed windowCount requests per calendar window (e.g. per minute).Simple, but allows 2× bursts straddling a window boundary.
Sliding windowA rolling count over the trailing interval.Accurate, avoids the boundary spike; a bit more state to track.
Reject with 429 Too Many Requests and a Retry-After header so well-behaved clients back off. In a distributed fleet the limiter state must be shared (e.g. counters in Redis) or each node enforces only its slice of the limit.

10. Idempotency & Safe Retries

Problem: networks fail mid-request, so clients and queues retry — but retrying "charge the card" or "place the order" must not do it twice. You need an operation that has the same effect whether it runs once or many times. (Also an addition — it's the safety net under long-running tasks, queues, and sagas.)

Idempotency with keys
The client attaches an idempotency key; the server records the result per key, so a retry returns the stored result instead of repeating the work.

The standard mechanism is the idempotency key: the client generates a unique key per logical operation and sends it on every attempt. On first receipt the server does the work and records the outcome under that key; on any retry with the same key it returns the stored result instead of acting again.

function handle(request, key):
  existing = store.get(key)
  if existing != null:
    return existing.result            # retry: replay the saved outcome
  result = do_work(request)           # first time only
  store.put(key, result)              # atomically, so concurrent retries are safe
  return result

11. Summary & Cheat Sheet

When a design question appears, name the sub-problem and reach for its pattern:

When you need to…Reach for
Push data to clients livePolling → SSE → WebSockets, with pub/sub fan-out behind a connection gateway.
Do slow work off the request pathQueue + worker pool + status store; return 202 and track the job.
Stop concurrent writers clobbering each otherAtomic op or optimistic concurrency first; pessimistic / distributed lock when conflicts are common.
Serve far more reads than one DB canCache → read replicas → CDN → denormalized views, sized to your staleness budget.
Absorb more writes than one node canPartition by a well-chosen shard key; batch, use LSM storage, split hot counters.
Move big files aroundObject storage + presigned direct transfer + metadata-only DB + CDN; multipart for huge uploads.
Coordinate steps across servicesSaga with compensating actions; orchestrate or choreograph; expect eventual consistency.
Answer "what's near me"Spatial index (geohash / quadtree / S2); query nearby cells, then filter by exact distance.
Cap request ratesToken bucket (or leaky / sliding window); reject with 429 + Retry-After.
Make retries safeIdempotency keys + a result store; design for naturally idempotent operations.
The meta-pattern: almost every one of these trades a little consistency or freshness for a lot of scale or availability. In an interview, the value isn't naming the pattern — it's stating that trade-off out loud and tying it to the requirements you were given.