Best Practices for RTB Applications

This guide explains best practices to consider when developing applications according to the RTB Protocol.

Manage connections

Keep connections alive

Establishing a new connection increases latencies and takes far more resources on both ends than reusing an existing one. By closing fewer connections, you can reduce the number of connections that must be opened again.

First, every new connection requires an extra network round-trip to establish. Because we establish connections on demand, the first request on a connection has a shorter effective deadline and is more likely to time out than subsequent requests. Any extra timeouts increase the error rate, which can lead to your bidder being throttled.

Second, many web servers spawn a dedicated worker thread for each connection established. This means that to close and recreate the connection, the server must shut down and discard a thread, allocate a new one, make it runnable, and build the connection state, before finally processing the request. That's a lot of unnecessary overhead.

Avoid closing connections

Begin by tuning connection behavior. Most server defaults are tailored for environments with large numbers of clients, each making a small number of requests. For RTB, by contrast, a small pool of machines sends requests on behalf of a large number of browsers, relatively speaking. Under these conditions, it makes sense to re-use connections as many times as possible. We recommend that you set:

Idle timeout to 2.5 minutes.
Maximum number of requests on a connection to the highest possible value.
Maximum number of connections to the highest value your RAM can accommodate, while taking care to verify that the number of connections does not approach that value too closely.

In Apache, for example, this would entail setting KeepAliveTimeout to 150, MaxKeepAliveRequests to zero, and MaxClients to a value that depends on server type.

Once your connection behavior is tuned, you should also ensure that your bidder code does not close connections needlessly. For example, if you have front-end code that returns a default "no bid" response in the event of backend errors or timeouts, make sure the code returns its response without closing the connection. That way you avoid the situation in which if your bidder gets overloaded, connections start to close, and the number of timeouts increases, causing your bidder to be throttled.

Keep connections balanced

If Authorized Buyers connects to your bidder's servers through a proxy server, the connections may become unbalanced over time because, knowing only the proxy server's IP address, Authorized Buyers cannot determine which bidder server is receiving each callout. Over time, as Authorized Buyers establishes and closes connections and the bidder's servers restart, the number of connections mapped to each can become highly variable.

When some connections are heavily utilized, other opened connections may remain mostly idle because they are not needed at the time. As Authorized Buyers's traffic changes, idle connections can become active and active connections can go idle. These may cause uneven loads on your bidder servers if the connections are clustered poorly. Google attempts to prevent this by closing all connections after 10,000 requests, to automatically rebalance hot connections over time. If you still find traffic becoming unbalanced in your environment, there are further steps you can take:

Select the backend per request rather than once per connection if you are using frontend proxies.
Specify a maximum number of requests per connection if you are proxying connections through a hardware load balancer or firewall and the mapping is fixed once the connections are established. Note that Google already specifies an upper limit of 10,000 requests per connection, so you should only need to provide a stricter value if you still find hot connections becoming clustered in your environment. In Apache, for example, set MaxKeepAliveRequests to 5,000
Configure the bidder's servers to monitor their request rates and close some of their own connections if they are consistently handling too many requests compared to their peers.

Handle overload gracefully

Ideally, quotas would be set high enough so your bidder can receive all the requests it can handle, but no more than that. In practice, keeping quotas at optimal levels is a difficult task, and overloads do happen, for a variety of reasons: a backend going down during peak times, a traffic mix changing so that more processing is required for each request, or a quota value just being set too high. Consequently, it pays to consider how your bidder will behave with too much traffic coming in.

To accommodate temporary traffic shifts (up to one week) between regions (especially between Asia & US West and US East & US West), we recommend a 15% cushion between 7-day peak and the QPS per Trading Location.

In terms of behavior under heavy loads, bidders fall into three broad categories:

The "respond to everything" bidder

While straightforward to implement, this bidder fares the worst when overloaded. It simply tries to respond to every bid request that comes in, no matter what, queueing up any that cannot be served immediately. The scenario that ensues is often something like this:

As the request rate climbs, so do the request latencies, until all requests start timing out
Latencies rise precipitously as callout rates approach peak
Throttling kicks in, sharply reducing the number of allowed callouts
Latencies start to recover, causing throttling to be reduced
The cycle to begins again.

The graph of latency for this bidder resembles a very steep saw-tooth pattern. Alternatively, queued-up requests cause the server to start paging memory or do something else that causes a long-term slowdown, and latencies do not recover at all until peak times are over, leading to depressed callout rates during the entire peak period. In either case, fewer callouts are made or responded to than if the quota had simply been set to a lower value.

The "error on overload" bidder

This bidder accepts callouts up to a certain rate, then starts returning errors for some callouts. This may be done through internal timeouts, disabling connection queuing (controlled by ListenBackLog on Apache), implementing a probabilistic drop mode when utilization or latencies get too high, or some other mechanism. If Google observes an error rate above 15%, we’ll start throttling. Unlike the "respond to everything" bidder, this bidder "cuts its losses," which allows it to recover immediately when request rates go down.

The graph of latency for this bidder resembles a shallow saw-tooth pattern during overloads, localized around the maximum acceptable rate.

The "no-bid on overload" bidder

This bidder accepts callouts up to a certain rate, then starts returning "no-bid" responses for any overload. Similar to the "error on overload" bidder, this can be implemented in a number of ways. What's different here is that no signal is returned to Google, so we never throttle back on callouts. The overload is absorbed by the front-end machines, which only allow the traffic that they can handle to continue through to the backends.

The graph of latency for this bidder shows a plateau that (artificially) stops paralleling the request rate at peak times, and a corresponding drop in the fraction of responses that contain a bid.

We recommend combining the "error on overload" with the "no-bid on overload" approach, in the following way:

Over-provision the front-ends and set them to error on overload, to help maximize the number of connections they can respond to in some form.
When erroring on overload, the front-end machines can use a canned "no-bid" response, and do not need to parse the request at all.
Implement health-checking of the backends, such that if none have sufficient capacity available, they return a "no-bid" response.

This allows some overload to be absorbed and gives the backends a chance to respond to exactly as many requests as they can handle. You can think of this as "no-bid on overload" with front-end machines falling back to "error on overload" when request counts are significantly higher than expected.

If you have a "respond to everything" bidder, consider transforming it into an "error on overload" bidder by tuning connection behavior so it in effect refuses to be overloaded. While this causes more errors to be returned, it reduces timeouts and prevents the server from getting into a state where it cannot respond to any requests.

Consider peering

Another way to reduce network latency or variability is to peer with Google. Peering helps optimize the path traffic takes to get to your bidder. The connection endpoints stay the same, but the intermediate links change. See the Peering guide for details. The reason to think of peering as a best practice can be summarized as follows:

On the internet, transit links are chosen primarily through "hot-potato routing," which finds the closest link outside of our network that can get a packet to its destination, and routes the packet through that link. When traffic traverses a section of backbone owned by a provider with whom we have many peering connections, the chosen link is likely to be close to where the packet starts. Beyond that point we have no control of the route the packet follows to the bidder, so it may be bounced to other autonomous systems (networks) along the way.
In contrast, when a direct peering agreement is in place, packets are always sent along a peering link. No matter where the packet originates, it traverses links that Google owns or leases until it reaches the shared peering point, which should be close to the bidder location. The reverse trip begins with a short hop to the Google network and remains on the Google network the rest of the way. Keeping most of the trip on Google-managed infrastructure ensures that the packet takes a low-latency route, and avoids much potential variability.

Submit static DNS

We recommend buyers always submit a single static DNS result to Google and rely on Google to handle the traffic delivery.

Here are two common practices from bidders' DNS servers when trying to load balance or manage availability:

The DNS server hands out one address, or a subset of addresses, in response to a query, and then cycle through this response in some fashion.
The DNS server always responds with the same set of addresses, but cycles the order of the addresses in the response.

The first technique is poor at load balancing since there's lots of caching at multiple levels of the stack, and attempts to bypass caching will likely not get the preferred results as well since Google charges DNS resolution time to the bidder.

The second technique doesn't achieve load balancing at all since Google randomly selects an IP address from the DNS response list so the order in the response doesn't matter.

If a bidder makes a DNS change, Google will honor the TTL(Time-to-live) that was set in its DNS records, but the refresh interval remains uncertain.