Designing an Ad Auction (RTB)

A system design interview guide to building a real-time bidding system that, in the few dozen milliseconds it takes a page to load, asks many advertisers what they will pay for one ad slot, runs an auction, and serves the winning ad.

Real-time bidding is one of the most latency-sensitive systems you will ever be asked to design. Every time a user opens a page or app with an ad slot, an auction has to happen before that slot can be filled, and it has to finish fast enough that the user never notices. The system fans a single request out to many independent bidders, waits only as long as a strict budget allows, picks a winner, charges the right price, and serves the ad — typically inside a window of about a hundred milliseconds. The interesting engineering is not "show an ad," it is doing a distributed, money-moving auction under a hard deadline, at enormous request volume, while spending each advertiser's budget smoothly and never showing the same person the same ad too many times. This guide builds that design up from the moment a slot opens.

Contents

  1. The Request Flow
  2. Auction Mechanics
  3. The Latency Budget
  4. Fan-Out and Gather
  5. Budget Pacing
  6. Frequency Capping
  7. Tracking for Billing
  8. Scale and Parallelism
  9. Summary

1. The Request Flow

The whole system is organized around a single short-lived event: an ad slot becoming available. A user loads a page, a slot on that page needs an ad, and the publisher's side sends a bid request to your ad exchange. From that instant a clock starts ticking, and everything that follows has to complete before it runs out.

Real-time bidding request flow
An ad slot opens; the exchange builds a bid request and fans it out to many demand-side bidders in parallel. It collects whatever bids arrive inside the latency budget, runs the auction to pick a winner and a clearing price, and serves the winning ad, logging the impression for billing.

Reading the flow left to right:

function handle_bid_request(slot):
  request = build_bid_request(slot)          # context, size, signals
  bids    = fan_out_to_bidders(request)      # parallel, deadline-bounded
  winner  = run_auction(bids)
  if winner is None:
    return NO_AD                             # slot stays empty or fallback
  log.impression(winner, slot, price=winner.clearing_price)
  return winner.creative
A useful reframing for an interview: RTB is a distributed auction on a deadline. Almost every design choice — the timeout, the parallel fan-out, the pacing, the tracking — exists to run a fair, money-moving auction inside a window so small the user can never perceive it.

2. Auction Mechanics

Once the bids are in, the exchange has to decide who wins and what they pay. Those are two separate questions, and the rule that connects them is the auction's pricing mechanism. The two classic choices are first-price and second-price.

For years second-price was the norm in ad exchanges precisely because it made bidding strategy easy. The industry has largely shifted toward first-price for transparency reasons, but both show up in interviews, and the key point is understanding why the pricing rule changes how rational bidders behave. Either way, the exchange usually enforces a reserve price (a floor the publisher will accept) and discards any bid below it.

function run_auction(bids, reserve):
  eligible = [b for b in bids if b.price >= reserve]
  if not eligible:
    return None
  eligible.sort(by=price, descending=True)
  winner = eligible[0]
  # first-price: pay your bid; second-price: pay just above runner-up
  if FIRST_PRICE:
    winner.clearing_price = winner.price
  else:
    runner_up = eligible[1].price if len(eligible) > 1 else reserve
    winner.clearing_price = runner_up + 0.01
  return winner
RuleWinner paysEffect on bidders
First-priceTheir own bidMust shade bids down; strategy is hard, but pricing is transparent.
Second-priceJust over the runner-upBest to bid true value; strategy is simple, pricing is opaque.

3. The Latency Budget

The defining constraint of the whole system is the deadline. The auction is part of the user's page load, so the exchange gives the demand side a fixed, unforgiving window — commonly on the order of 100 milliseconds — to respond. A bidder that answers inside the window is in the auction. A bidder that does not is dropped, full stop.

The 100ms latency budget with a timeout cutoff
Each bidder races against a fixed timeout. Bids that arrive before the cutoff are counted in the auction; a bidder that is even slightly late is simply dropped rather than waited on. The exchange runs the auction on whatever came back in time.

This "drop the slow ones" behavior is not a bug or a degradation — it is the correct design. The exchange cannot hold up the user's page waiting for a straggler, and it cannot let one slow bidder set the pace for everyone. A bidder that is consistently late is, from the auction's point of view, a bidder that does not exist. The practical consequences are worth stating plainly:

In RTB, latency is not a quality metric you tune later — it is the hard boundary the entire architecture is built around. "Be fast or be excluded" is the rule, and the exchange enforces it by never waiting past the cutoff.

4. Fan-Out and Gather

Because the exchange must consult many bidders but cannot afford to ask them one after another, the requests go out in parallel and the responses are gathered with a single shared deadline. Calling bidders sequentially would stack their latencies and blow the budget after just two or three; calling them concurrently means the total wait is the time of the slowest bidder you are still willing to wait for — which is exactly the timeout.

The pattern is a scatter-gather: dispatch the bid request to every eligible bidder at once, then collect responses until the deadline arrives, then stop. The crucial detail is that the gather step is bounded by the clock, not by the number of bidders. You do not wait for all of them; you wait until time runs out and take whatever has arrived.

function fan_out_to_bidders(request):
  deadline = now() + 100ms
  futures  = [send_async(bidder, request) for bidder in eligible_bidders]
  bids = []
  for f in futures:
    remaining = deadline - now()
    if remaining <= 0:
      break                                  # deadline hit: stop collecting
    try:
      bid = f.result(timeout=remaining)      # never wait past the deadline
      bids.append(bid)
    except Timeout:
      continue                               # drop the slow bidder
  return bids                                # whatever made it back in time

This is also where isolation matters: a single hung bidder must never be able to stall the gather. Per-call timeouts, dropping rather than retrying within an auction, and treating a missing response as simply "did not bid" keep one misbehaving DSP from poisoning the whole auction. There is no time for retries inside a single auction — the deadline is the retry policy.

5. Budget Pacing

An advertiser sets a daily budget, and a naive system would let them win every auction they qualify for until that money runs out — which often means the entire budget is gone within the first hour of the day. Budget pacing exists to spread an advertiser's spend smoothly across the time period so their ads reach users throughout the day rather than in a single early burst.

Budget pacing spreading spend over the day
Without pacing an advertiser burns through the daily budget early and goes dark for the rest of the day. With pacing the system throttles how aggressively it bids so cumulative spend tracks a smooth line and the budget lasts the whole day.

The mechanism is a control loop. The system continuously compares actual spend against where spend should be by this point in the day, and adjusts how aggressively the advertiser participates — by lowering bids, or by entering only a fraction of eligible auctions. If spend is running ahead of plan, it throttles down; if it is behind, it leans in. The goals this serves are concrete:

function pacing_multiplier(advertiser, t):
  target = advertiser.daily_budget * fraction_of_day_elapsed(t)
  actual = spend_so_far(advertiser, t)
  if actual >= advertiser.daily_budget:
    return 0.0                               # budget exhausted: stop bidding
  if actual > target:
    return throttle_down(actual / target)    # ahead of plan: ease off
  return 1.0                                  # on or behind plan: bid normally

6. Frequency Capping

Even an advertiser with budget to spend should not show the same person the same ad twenty times a day. Frequency capping limits how often a given user sees a given ad (or campaign) within a window. Beyond a certain point repeated exposure stops persuading and starts annoying, so the cap protects both the user experience and the advertiser's money.

Implementing this means keeping a per-user, per-campaign count of recent impressions and consulting it before bidding. If the user has already hit the cap for that campaign, the bidder either skips the auction or the exchange filters the bid out. Because this check happens inside the latency budget, the counter has to be fast to read — typically a low-latency key-value store keyed by user and campaign, with the counts expiring after the capping window.

function under_frequency_cap(user_id, campaign):
  key   = (user_id, campaign.id)
  seen  = counter.get(key)                   # fast KV lookup, in-budget
  if seen >= campaign.frequency_cap:
    return False                             # already saw it enough; skip
  return True

# on a served impression, bump the windowed counter
function record_exposure(user_id, campaign):
  counter.incr((user_id, campaign.id), ttl=campaign.cap_window)

7. Tracking for Billing

An ad auction moves real money, so the system must record what actually happened with enough fidelity to bill advertisers correctly. The two events that matter most are impressions (the ad was shown) and clicks (the user engaged), and depending on the pricing model the advertiser is charged on one or the other.

Tracking is harder than it sounds because the events fire on the user's device, far from your servers, and arrive as a high-volume firehose. The design has to be durable and resistant to double-counting: a winning bid is logged at serve time, an impression event confirms the ad was actually rendered, and a later click event ties back to the same impression. These flow into a pipeline that aggregates spend per advertiser and reconciles it against budgets.

Billing demands the same discipline as any money system: durable logs, idempotent event handling keyed by a request or impression id, and reconciliation. An auction you cannot account for afterwards is an auction you cannot charge for.

8. Scale and Parallelism

The last thing to reckon with is volume. A large exchange handles enormous numbers of bid requests per second, and each one fans out to many bidders — so the real load is requests multiplied by bidders, all under the same hard deadline. That product is what forces nearly every scaling decision in the system.

The unifying idea is that anything on the auction's critical path must be fast and parallel, and anything that can tolerate a small delay — billing aggregation, reporting, reconciliation — is pushed off that path into asynchronous pipelines. The deadline is sacred; everything else bends around it.

9. Summary

An ad auction is a distributed, money-moving auction that has to finish inside a window the user can never perceive. Its design is a set of decisions that all serve that deadline:

ConcernMechanism
What kicks off an auction?An ad slot opening triggers a bid request describing the opportunity.
How do we get prices from the market?Fan the request out to many demand-side bidders in parallel.
Who wins and what do they pay?An auction: first-price (pay your bid) or second-price (pay just over the runner-up), above a reserve.
How do we stay within the deadline?A hard latency budget (~100ms); bidders that respond late are dropped, never waited on.
How do we ask everyone at once?Scatter-gather fan-out bounded by the clock, not by the bidder count.
How do we keep budgets from burning out early?Budget pacing: a control loop that smooths spend across the day.
How do we avoid annoying users?Frequency capping via fast per-user, per-campaign counters.
How do we charge advertisers?Durable, idempotent impression and click tracking feeding a billing pipeline.
How do we handle the volume?Parallel fan-out, stateless horizontal scale-out, in-memory hot data, async billing.
The recurring theme: the latency budget governs everything. Bidders are called in parallel and dropped when late, hot state lives in memory, billing is pushed off the critical path, and pacing and capping keep the market healthy — all so a fair auction can run and an ad can be priced and served before the page finishes loading.