Designing an Ad Auction (RTB)

A system design interview guide to building a real-time bidding system that, in the few dozen milliseconds it takes a page to load, asks many advertisers what they will pay for one ad slot, runs an auction, and serves the winning ad.

Real-time bidding is one of the most latency-sensitive systems you will ever be asked to design. Every time a user opens a page or app with an ad slot, an auction has to happen before that slot can be filled, and it has to finish fast enough that the user never notices. The system fans a single request out to many independent bidders, waits only as long as a strict budget allows, picks a winner, charges the right price, and serves the ad — typically inside a window of about a hundred milliseconds. The interesting engineering is not "show an ad," it is doing a distributed, money-moving auction under a hard deadline, at enormous request volume, while spending each advertiser's budget smoothly and never showing the same person the same ad too many times. This guide builds that design up from the moment a slot opens.

The Request Flow
Auction Mechanics
The Latency Budget
Fan-Out and Gather
Budget Pacing
Frequency Capping
Tracking for Billing
Scale and Parallelism
Summary

1. The Request Flow

The whole system is organized around a single short-lived event: an ad slot becoming available. A user loads a page, a slot on that page needs an ad, and the publisher's side sends a bid request to your ad exchange. From that instant a clock starts ticking, and everything that follows has to complete before it runs out.

Real-time bidding request flow — An ad slot opens; the exchange builds a bid request and fans it out to many demand-side bidders in parallel. It collects whatever bids arrive inside the latency budget, runs the auction to pick a winner and a clearing price, and serves the winning ad, logging the impression for billing.

Reading the flow left to right:

Ad slot opens. A page or app impression triggers the request. The slot carries context — its size, placement, the page topic, and whatever non-identifying signals about the user are available.
Bid request. The exchange assembles a request describing the opportunity and prepares to ask the demand side what it is worth.
Fan-out to bidders. The request goes to many demand-side bidders (DSPs) at once, in parallel. Each one decides independently whether it wants this impression and, if so, how much to bid.
Collect bids. The exchange gathers the responses, but only those that arrive before the deadline. Anything slower is simply not part of the auction.
Run auction. Among the bids that made it back, the exchange selects a winner and computes the price the winner pays.
Serve winning ad. The winning creative is returned and rendered in the slot, and the impression is recorded so the advertiser can be billed.

function handle_bid_request(slot):
  request = build_bid_request(slot)          # context, size, signals
  bids    = fan_out_to_bidders(request)      # parallel, deadline-bounded
  winner  = run_auction(bids)
  if winner is None:
    return NO_AD                             # slot stays empty or fallback
  log.impression(winner, slot, price=winner.clearing_price)
  return winner.creative

A useful reframing for an interview: RTB is a distributed auction on a deadline. Almost every design choice — the timeout, the parallel fan-out, the pacing, the tracking — exists to run a fair, money-moving auction inside a window so small the user can never perceive it.

2. Auction Mechanics

Once the bids are in, the exchange has to decide who wins and what they pay. Those are two separate questions, and the rule that connects them is the auction's pricing mechanism. The two classic choices are first-price and second-price.

First-price auction. The highest bidder wins and pays exactly what they bid. It is simple and transparent, but it pushes every bidder to constantly guess how low they can go without losing — a behavior called bid shading — because overbidding directly wastes money.
Second-price auction (Vickrey). The highest bidder still wins, but pays only just above the second-highest bid. The appealing property is that a bidder's best strategy is simply to bid their true value: bidding more cannot lower the price they pay, and bidding less only risks losing an impression they wanted.

For years second-price was the norm in ad exchanges precisely because it made bidding strategy easy. The industry has largely shifted toward first-price for transparency reasons, but both show up in interviews, and the key point is understanding why the pricing rule changes how rational bidders behave. Either way, the exchange usually enforces a reserve price (a floor the publisher will accept) and discards any bid below it.

function run_auction(bids, reserve):
  eligible = [b for b in bids if b.price >= reserve]
  if not eligible:
    return None
  eligible.sort(by=price, descending=True)
  winner = eligible[0]
  # first-price: pay your bid; second-price: pay just above runner-up
  if FIRST_PRICE:
    winner.clearing_price = winner.price
  else:
    runner_up = eligible[1].price if len(eligible) > 1 else reserve
    winner.clearing_price = runner_up + 0.01
  return winner

Rule	Winner pays	Effect on bidders
First-price	Their own bid	Must shade bids down; strategy is hard, but pricing is transparent.
Second-price	Just over the runner-up	Best to bid true value; strategy is simple, pricing is opaque.

3. The Latency Budget

The defining constraint of the whole system is the deadline. The auction is part of the user's page load, so the exchange gives the demand side a fixed, unforgiving window — commonly on the order of 100 milliseconds — to respond. A bidder that answers inside the window is in the auction. A bidder that does not is dropped, full stop.

The 100ms latency budget with a timeout cutoff — Each bidder races against a fixed timeout. Bids that arrive before the cutoff are counted in the auction; a bidder that is even slightly late is simply dropped rather than waited on. The exchange runs the auction on whatever came back in time.

This "drop the slow ones" behavior is not a bug or a degradation — it is the correct design. The exchange cannot hold up the user's page waiting for a straggler, and it cannot let one slow bidder set the pace for everyone. A bidder that is consistently late is, from the auction's point of view, a bidder that does not exist. The practical consequences are worth stating plainly:

The timeout is firm. When the clock hits the deadline, the exchange stops collecting and runs the auction with whatever it has, even if that means fewer bidders competed.
Slowness is self-correcting in the wrong direction. A slow DSP simply wins fewer auctions, so the incentive to be fast is built into the market — but the exchange must not depend on bidders being well-behaved.
The budget is sub-divided. The 100ms is shared across network round-trips, the bidder's own computation, and the exchange's auction step, so each bidder's real compute window is even tighter than the headline number.

In RTB, latency is not a quality metric you tune later — it is the hard boundary the entire architecture is built around. "Be fast or be excluded" is the rule, and the exchange enforces it by never waiting past the cutoff.

4. Fan-Out and Gather

Because the exchange must consult many bidders but cannot afford to ask them one after another, the requests go out in parallel and the responses are gathered with a single shared deadline. Calling bidders sequentially would stack their latencies and blow the budget after just two or three; calling them concurrently means the total wait is the time of the slowest bidder you are still willing to wait for — which is exactly the timeout.

The pattern is a scatter-gather: dispatch the bid request to every eligible bidder at once, then collect responses until the deadline arrives, then stop. The crucial detail is that the gather step is bounded by the clock, not by the number of bidders. You do not wait for all of them; you wait until time runs out and take whatever has arrived.

function fan_out_to_bidders(request):
  deadline = now() + 100ms
  futures  = [send_async(bidder, request) for bidder in eligible_bidders]
  bids = []
  for f in futures:
    remaining = deadline - now()
    if remaining <= 0:
      break                                  # deadline hit: stop collecting
    try:
      bid = f.result(timeout=remaining)      # never wait past the deadline
      bids.append(bid)
    except Timeout:
      continue                               # drop the slow bidder
  return bids                                # whatever made it back in time

This is also where isolation matters: a single hung bidder must never be able to stall the gather. Per-call timeouts, dropping rather than retrying within an auction, and treating a missing response as simply "did not bid" keep one misbehaving DSP from poisoning the whole auction. There is no time for retries inside a single auction — the deadline is the retry policy.

5. Budget Pacing

An advertiser sets a daily budget, and a naive system would let them win every auction they qualify for until that money runs out — which often means the entire budget is gone within the first hour of the day. Budget pacing exists to spread an advertiser's spend smoothly across the time period so their ads reach users throughout the day rather than in a single early burst.

Budget pacing spreading spend over the day — Without pacing an advertiser burns through the daily budget early and goes dark for the rest of the day. With pacing the system throttles how aggressively it bids so cumulative spend tracks a smooth line and the budget lasts the whole day.

The mechanism is a control loop. The system continuously compares actual spend against where spend should be by this point in the day, and adjusts how aggressively the advertiser participates — by lowering bids, or by entering only a fraction of eligible auctions. If spend is running ahead of plan, it throttles down; if it is behind, it leans in. The goals this serves are concrete:

Even reach. An audience is online all day, not just in the morning. Smoothing spend exposes the ad to a broader, more representative set of users.
Budget protection. Pacing guarantees the advertiser does not accidentally exhaust their budget against a single cheap, low-value window.
Stable prices. Many advertisers all bidding flat-out at once distorts auction prices; pacing dampens that and keeps the market sane.

function pacing_multiplier(advertiser, t):
  target = advertiser.daily_budget * fraction_of_day_elapsed(t)
  actual = spend_so_far(advertiser, t)
  if actual >= advertiser.daily_budget:
    return 0.0                               # budget exhausted: stop bidding
  if actual > target:
    return throttle_down(actual / target)    # ahead of plan: ease off
  return 1.0                                  # on or behind plan: bid normally

6. Frequency Capping

Even an advertiser with budget to spend should not show the same person the same ad twenty times a day. Frequency capping limits how often a given user sees a given ad (or campaign) within a window. Beyond a certain point repeated exposure stops persuading and starts annoying, so the cap protects both the user experience and the advertiser's money.

Implementing this means keeping a per-user, per-campaign count of recent impressions and consulting it before bidding. If the user has already hit the cap for that campaign, the bidder either skips the auction or the exchange filters the bid out. Because this check happens inside the latency budget, the counter has to be fast to read — typically a low-latency key-value store keyed by user and campaign, with the counts expiring after the capping window.

function under_frequency_cap(user_id, campaign):
  key   = (user_id, campaign.id)
  seen  = counter.get(key)                   # fast KV lookup, in-budget
  if seen >= campaign.frequency_cap:
    return False                             # already saw it enough; skip
  return True

# on a served impression, bump the windowed counter
function record_exposure(user_id, campaign):
  counter.incr((user_id, campaign.id), ttl=campaign.cap_window)

7. Tracking for Billing

An ad auction moves real money, so the system must record what actually happened with enough fidelity to bill advertisers correctly. The two events that matter most are impressions (the ad was shown) and clicks (the user engaged), and depending on the pricing model the advertiser is charged on one or the other.

Tracking is harder than it sounds because the events fire on the user's device, far from your servers, and arrive as a high-volume firehose. The design has to be durable and resistant to double-counting: a winning bid is logged at serve time, an impression event confirms the ad was actually rendered, and a later click event ties back to the same impression. These flow into a pipeline that aggregates spend per advertiser and reconciles it against budgets.

Impression tracking. When the creative renders, a tracking pixel or callback reports it. This confirms the slot was filled and is the billable event under cost-per-impression (CPM) pricing.
Click tracking. A click is attributed to the impression that produced it, and is the billable event under cost-per-click (CPC) pricing.
Reconciliation. Aggregated impression and click counts feed back into the pacing and budgeting systems so spend is accurate and an advertiser is never overcharged.

Billing demands the same discipline as any money system: durable logs, idempotent event handling keyed by a request or impression id, and reconciliation. An auction you cannot account for afterwards is an auction you cannot charge for.

8. Scale and Parallelism

The last thing to reckon with is volume. A large exchange handles enormous numbers of bid requests per second, and each one fans out to many bidders — so the real load is requests multiplied by bidders, all under the same hard deadline. That product is what forces nearly every scaling decision in the system.

Parallelize the fan-out. Every auction's bidder calls run concurrently, never sequentially, so the per-auction latency is bounded by the timeout rather than by the number of bidders.
Horizontal scale-out. The exchange is stateless per request and runs as a large pool of identical servers behind a load balancer, so throughput scales by adding machines.
Put hot data in memory. Pacing state, frequency counters, and budget balances are read on the critical path, so they live in fast in-memory stores rather than a disk-bound database that would blow the budget.
Move billing off the hot path. Impression and click events are written to a durable queue and aggregated asynchronously, keeping the latency-critical auction path lean.

The unifying idea is that anything on the auction's critical path must be fast and parallel, and anything that can tolerate a small delay — billing aggregation, reporting, reconciliation — is pushed off that path into asynchronous pipelines. The deadline is sacred; everything else bends around it.

9. Summary

An ad auction is a distributed, money-moving auction that has to finish inside a window the user can never perceive. Its design is a set of decisions that all serve that deadline:

Concern	Mechanism
What kicks off an auction?	An ad slot opening triggers a bid request describing the opportunity.
How do we get prices from the market?	Fan the request out to many demand-side bidders in parallel.
Who wins and what do they pay?	An auction: first-price (pay your bid) or second-price (pay just over the runner-up), above a reserve.
How do we stay within the deadline?	A hard latency budget (~100ms); bidders that respond late are dropped, never waited on.
How do we ask everyone at once?	Scatter-gather fan-out bounded by the clock, not by the bidder count.
How do we keep budgets from burning out early?	Budget pacing: a control loop that smooths spend across the day.
How do we avoid annoying users?	Frequency capping via fast per-user, per-campaign counters.
How do we charge advertisers?	Durable, idempotent impression and click tracking feeding a billing pipeline.
How do we handle the volume?	Parallel fan-out, stateless horizontal scale-out, in-memory hot data, async billing.

The recurring theme: the latency budget governs everything. Bidders are called in parallel and dropped when late, hot state lives in memory, billing is pushed off the critical path, and pacing and capping keep the market healthy — all so a fair auction can run and an ad can be priced and served before the page finishes loading.