← Back to index

NGINX — Internal Architecture

A developer's guide to how NGINX actually works under the hood: the process model, the event-driven core, how a request flows through the processing phases, and how it proxies, load-balances, caches, and reloads without dropping a single connection.

NGINX is a high-performance web server, reverse proxy, load balancer, and HTTP cache. It was written to solve one specific problem: serving tens of thousands of simultaneous connections on a single machine without the memory and context-switch overhead of the thread-per-connection model that dominated when it was created. Almost every design decision flows from that goal. Instead of one operating-system thread per client, NGINX uses a small fixed number of worker processes, each running a single-threaded event loop that multiplexes thousands of non-blocking connections. Understanding NGINX means understanding that event-driven core and the way work is organized around it.

Contents

  1. Design Goals & the C10k Problem
  2. Process Architecture
  3. Event-Driven Model
  4. Request Lifecycle
  5. Reverse Proxy & Upstreams
  6. Load Balancing
  7. Caching & Shared Memory
  8. Graceful Reload & Binary Upgrade
  9. Summary

1. Design Goals and the C10k Problem

NGINX was created to answer the C10k problem: how does a single server handle ten thousand concurrent connections? The traditional answer — one thread (or process) per connection — does not scale. Each thread carries a stack of one to several megabytes, so ten thousand threads burn gigabytes of memory before doing any work. Worse, the kernel must constantly context-switch between them, and most of those threads are idle, blocked waiting on a slow client or a slow network. The cost grows with the number of connections, not the amount of useful work.

Threads vs event loop
Thread-per-connection memory grows with the number of connections; NGINX's event loop holds a few KB of state per connection, so memory stays low and predictable.

NGINX inverts this. A single worker thread asks the kernel "which of my thousands of sockets is ready to read or write?" and then services exactly those, never blocking on the rest. The result is the property the whole project is built around: throughput scales with hardware while memory use stays flat and predictable as connections climb.

GoalHow NGINX achieves it
High concurrencyOne worker multiplexes thousands of connections via a non-blocking event loop, not one thread per connection.
Low, predictable memoryPer-connection state is a small struct (a few KB), not a thread stack. Memory tracks active work, not connection count.
Full use of multiple coresA handful of worker processes — typically one per CPU core — each run their own event loop in parallel.
Operational stabilityConfig reloads and even binary upgrades happen with zero dropped connections.
Efficient as a proxyBuffering, upstream keepalive, and on-disk caching shield slow clients from fast backends and vice versa.

2. Process Architecture

An NGINX instance is not one process but a small family. A single master process runs as root, reads and validates the configuration, and binds the listening sockets (port 80, 443, and so on). It does not serve any client traffic itself. Instead it forks a configured number of worker processes, drops their privileges to an unprivileged user, and hands each of them the already-bound listening sockets. The workers are where all request handling happens.

Master and worker processes
The master parses config and owns the listening sockets; it forks N workers (typically one per CPU core) that share those sockets and run independent event loops.

The division of labor is deliberate:

Why workers ≈ CPU cores

Because each worker is single-threaded and never blocks, a single worker can keep one CPU core fully busy. Running one worker per core (worker_processes auto;) means all cores do useful work with no two workers fighting over the same core. Adding more workers than cores just adds scheduling overhead without adding throughput.

Sharing the listening sockets

All workers inherit the same listening sockets from the master, so any worker can accept any new connection. By default the kernel may wake several waiting workers when a connection arrives — the "thundering herd" — and NGINX historically used an accept_mutex so only one worker accepts at a time. The modern alternative is SO_REUSEPORT: each worker opens its own listening socket on the same port and the kernel load-balances incoming connections across them, eliminating the herd and spreading accepts evenly.

master:
  parse(nginx.conf)
  for addr in listen_directives:
    sock = bind(addr); listen(sock)     # sockets owned by master
  for i in range(worker_processes):     # typically = CPU cores
    pid = fork()
    if pid == 0:
      drop_privileges()
      run_event_loop(inherited_sockets) # worker never returns
  supervise(children)                    # restart on crash, handle signals

3. The Event-Driven Model

Inside each worker is the heart of NGINX: a single-threaded event loop built on the operating system's scalable I/O readiness API — epoll on Linux, kqueue on BSD/macOS, and others elsewhere. Every socket is set to non-blocking mode. Rather than calling read() and waiting, the worker registers all its sockets with epoll and makes a single call — epoll_wait() — that blocks until any of those thousands of file descriptors is ready. The kernel returns just the ready ones, and the worker handles each, then loops.

The event loop
One worker thread monitors thousands of sockets; epoll returns only the ready descriptors, and handlers must never block — a single blocking call would freeze every connection on that worker.

This is the opposite of the thread-per-connection model. There, each connection has a dedicated thread that calls a blocking read(); the kernel parks that thread until data arrives. With ten thousand connections you need ten thousand parked threads. With the event loop, one thread asks "who is ready?" and services only those — so the number of threads is tied to the number of cores, not the number of connections.

The cost of this model is a strict discipline: nothing in a handler may block. A blocking disk read, a synchronous DNS lookup, or a slow database call would freeze the entire worker and every connection it is serving, not just the one that issued it. NGINX therefore uses non-blocking I/O everywhere it can, and for the unavoidable blocking operations (notably reading large files from disk) it offloads them to a small thread pool so the event loop stays responsive.

run_event_loop(sockets):
  epoll = epoll_create()
  for s in sockets: epoll.add(s, READABLE)
  while True:
    events = epoll.wait(timeout = next_timer())   # blocks once, on all fds
    for ev in events:
      conn = ev.connection
      handler = conn.current_handler              # read / write / proxy ...
      handler(conn)        # MUST be non-blocking; returns quickly
    run_expired_timers()   # keepalive, proxy timeouts, etc.
A connection is just a small state machine plus a buffer. When it has nothing to do (waiting on a slow client or backend) it costs almost nothing — it sits in epoll until the kernel says it is ready again. That is why idle connections are cheap and why memory tracks active work, not connection count.

4. Request Lifecycle

When a connection is accepted, the worker reads and parses the request line and headers, then runs the request through an ordered series of processing phases. NGINX's modular architecture hangs handler modules off these phases; the phase order is what makes configuration directives composable and predictable. A request always flows through the phases in the same sequence.

Request phases
After accept and header parsing, the request passes through ordered phases: server selection, rewrite, access/auth, content generation, and logging.

The phases, in order, do roughly the following:

PhaseWhat happens
Accept & readAccept the connection, read the request line and headers, parse them into a request structure.
Server selectionMatch the Host header against server_name to pick the virtual server (server { } block).
Find locationMatch the request URI against the location blocks to select the configuration that applies.
RewriteApply rewrite / return rules; URIs may be modified and re-matched.
Access / authEnforce allow/deny, authentication (auth_basic, auth_request), and rate limits.
ContentGenerate the response: serve a static file, run a handler, or proxy to an upstream backend.
LogWrite the access log line after the response is sent.

For static content the content phase reads a file from disk (or the page cache); for dynamic content it proxies the request to a backend, which is the subject of the next section. The response then travels back out through any configured output filters (gzip, headers, chunked encoding) before being written to the client.

handle_request(conn):
  req = parse_request_line_and_headers(conn)
  server   = match_server_name(req.host)          # server { }
  location = match_location(server, req.uri)       # location { }
  apply_rewrites(req, location)                    # may re-loop
  if not access_allowed(req, location):
    return respond(req, 403)
  resp = run_content_phase(req, location)          # file or proxy_pass
  resp = run_output_filters(resp)                  # gzip, headers ...
  send(conn, resp)
  write_access_log(req, resp)

5. Reverse Proxy and Upstreams

NGINX's most common production role is as a reverse proxy sitting in front of application servers. The proxy_pass directive forwards a request to an upstream — a named group of backend servers. NGINX opens (or reuses) a connection to a backend, forwards the request, reads the response, and relays it to the client.

Reverse proxy and upstreams
NGINX proxies to a pool of backends, keeps a keepalive connection pool to them, and buffers responses so a slow client never ties up an app server.

Two mechanisms make this efficient:

upstream backend {
  server 10.0.0.1:8080;
  server 10.0.0.2:8080;
  keepalive 32;                 # reuse up to 32 idle conns
}

server {
  location / {
    proxy_pass http://backend;            # forward to the pool
    proxy_buffering on;                    # shield backend from slow clients
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-For $remote_addr;
  }
}

6. Load Balancing

When an upstream block lists several servers, NGINX load-balances requests across them. The balancing method is chosen per upstream, and NGINX also tracks backend health so it can route around failures.

Load balancing methods
NGINX distributes requests across healthy backends by the configured method, and a passive health check marks a failing peer down after repeated errors.
MethodHow it picks a backendUse when
Round robin (default)Cycles through servers in order, optionally weighted.Backends are interchangeable and stateless.
least_connSends each request to the server with the fewest active connections.Request durations vary, so even rotation would imbalance load.
ip_hashHashes the client IP to always map a client to the same backend.You need session stickiness without shared session storage.
hash $keyHashes an arbitrary key (URI, header, cookie) to a backend, optionally consistent.Stickiness by something other than IP, e.g. cache affinity by URL.

Passive health checks come built in: with max_fails and fail_timeout, if a backend produces too many failed responses within the window, NGINX marks it down, stops sending it traffic, and retries it only after the timeout. (NGINX Plus adds active health checks that probe a dedicated health endpoint on a schedule rather than waiting for real requests to fail.)

function pick_backend(upstream, req):
  candidates = [s for s in upstream.servers if s.is_up()]
  if upstream.method == "least_conn":
    return min(candidates, key = lambda s: s.active_conns)
  if upstream.method == "ip_hash":
    return candidates[hash(req.client_ip) % len(candidates)]
  return round_robin_next(candidates)        # default, weighted

# passive health: on repeated errors, take the peer out of rotation
on_response_error(server):
  server.fails += 1
  if server.fails >= max_fails:
    server.mark_down(for = fail_timeout)

7. Caching and Shared Memory

NGINX can cache upstream responses on local disk so repeat requests are served without touching the backend at all. A proxy_cache_path declares a directory on disk plus a shared memory zone (keys_zone) that holds the cache index — the set of keys and their metadata — so every worker can check for a hit without scanning the filesystem.

Caching and shared memory
The cache key is hashed to a file path; a hit is served from disk, a miss is fetched from the backend and stored. Shared memory zones let all workers see one cache index and other cross-worker state.

On each request NGINX computes a cache key (by default $scheme$proxy_host$request_uri), hashes it to locate the cached file, and either serves the stored response (a HIT) or fetches it from the backend, stores it, and serves it (a MISS). Cache freshness is governed by proxy_cache_valid and the backend's Cache-Control headers.

The deeper idea here is the shared memory zone. Because workers are separate processes, anything they must agree on cannot live in one worker's private heap. NGINX places such state in mmap-backed shared memory that all workers map and update under a lock. This is used for far more than the cache index:

proxy_cache_path /var/cache/nginx
    keys_zone=mycache:10m          # 10 MB shared zone for the index
    max_size=10g inactive=60m;     # disk cap + idle eviction

location / {
  proxy_cache mycache;
  proxy_cache_key $scheme$host$request_uri;     # -> hashed to a file path
  proxy_cache_valid 200 10m;                     # cache 200s for 10 min
  proxy_pass http://backend;
}
# the cache index lives in shared memory so every worker sees the same hits

8. Graceful Reload and Binary Upgrade

NGINX is controlled at runtime through Unix signals sent to the master process, and its signature operational feature is changing configuration — or even the running binary — without dropping a single connection. This is possible precisely because the master, not the workers, owns the listening sockets.

Graceful reload and binary upgrade
On reload (HUP) the master forks new workers with the new config and tells old workers to drain and exit; binary upgrade (USR2) starts a new master sharing the same sockets.

Graceful reload (SIGHUP)

When the master receives SIGHUP, it re-reads and validates the configuration. If the new config is valid, it starts a fresh set of worker processes using it, then signals the old workers to gracefully shut down (SIGQUIT). Old workers immediately stop accepting new connections but keep serving their in-flight requests to completion; once their last request finishes, they exit. New connections go to the new workers, in-flight requests on old workers complete, and no connection is ever refused or cut off. If the new config is invalid, the master logs the error and keeps the old workers running unchanged.

Binary upgrade (SIGUSR2)

Upgrading the NGINX executable itself works the same way, one level up. On SIGUSR2 the running master forks and execs the new binary as a second master, passing it the inherited listening socket descriptors. Both masters and both sets of workers now run side by side, sharing the same listening sockets and serving traffic together. You verify the new version is healthy, then send the old master SIGQUIT to retire it gracefully — or, if something is wrong, send SIGTERM/SIGQUIT to the new one and roll back to the old, all without interrupting service.

SignalSent toEffect
HUPmasterReload config: validate, fork new workers, gracefully retire old ones.
USR2masterBinary upgrade: start a new master/binary sharing the listening sockets.
QUITmaster or workerGraceful shutdown: stop accepting, finish in-flight requests, then exit.
TERM / INTmasterFast shutdown: terminate workers immediately.
WINCHmasterGracefully shut down workers but keep the master (used during upgrades).
on SIGHUP (graceful reload):
  new_cfg = parse(nginx.conf)
  if not valid(new_cfg):
    log_error(); keep_old_workers(); return       # no disruption
  start_workers(new_cfg)                           # new conns -> new workers
  for w in old_workers:
    signal(w, QUIT)                                # stop accepting
    # w drains in-flight requests, then exits — zero drops

on SIGUSR2 (binary upgrade):
  new_master = fork(); exec(new_binary, inherited_sockets)
  # old + new run together; both share the listening sockets
  if healthy(new_master): signal(old_master, QUIT) # finish upgrade
  else:                   signal(new_master, QUIT) # roll back
The master keeps the listening sockets open the entire time, so there is never a moment when the port is unbound. That single fact — sockets owned by a stable supervisor, served by replaceable workers — is what makes both zero-downtime reloads and live binary upgrades possible.

9. Summary

NGINX is a small set of ideas applied consistently:

ConcernMechanism
How does it handle so many connections?A non-blocking event loop (epoll/kqueue) multiplexing thousands of sockets per worker.
Why is memory low and predictable?Per-connection state is a small struct, not a thread stack; memory tracks active work.
How does it use all the cores?One master plus N single-threaded workers, typically one per CPU core, sharing the listening sockets.
How is a request processed?Accept and parse, then ordered phases: server selection, rewrite, access/auth, content, log.
How does it proxy efficiently?Response buffering plus upstream keepalive decouple slow clients from fast backends.
How does it balance and stay healthy?Round robin / least_conn / ip_hash / hash, with passive health checks taking bad peers out of rotation.
How does state cross workers?Shared memory zones: cache index, rate limits, upstream health, TLS sessions.
How does it update with no downtime?Signals (HUP, USR2, QUIT) reload config or swap the binary while the master holds the sockets.
The recurring theme: a stable master owns the sockets, single-threaded non-blocking workers do the work, and anything shared lives in explicit shared memory. Nothing blocks, nothing is duplicated needlessly, and nothing has to stop to reconfigure.