Caching

This document applies to the following methods:

About caching

To reduce client bandwidth usage and to protect Google from traffic spikes, clients of both the Lookup API and the Update API are required to create and maintain a local cache of threat data. For the Lookup API, the cache is used to reduce the number of threatMatches requests that clients send to Google. For the Update API, the cache is used to reduce the number of fullHashes requests that clients send to Google. The caching protocol for each API is outlined below.

Lookup API

Clients of the Lookup API should cache each returned ThreatMatch item for the duration defined by its cacheDuration field. Clients then need to consult the cache before making a subsequent threatMatches request to the server. If the cache duration for a previously returned ThreatMatch has not yet expired, the client should assume the item is still unsafe. Caching ThreatMatch items may reduce the number of API requests made by the client.

Example: threatMatches.find

Click the request and response links in the table header for complete examples.

URL Check
threatMatches Request
URL match
threatMatches Response
Caching Behavior
"threatEntries": [
 {"url": "http://www.urltocheck.org/"}
]
"matches": [{
 "threat": {"url": "http://www.urltocheck.org/"},
 "cacheDuration": "300.000s"
}]
Match.
The client must wait 5 minutes before sending a new threatMatches request that includes URL http://www.urltocheck.org/.

Update API

To reduce the overall number of fullHashes requests sent to Google using the Update API, clients are required to maintain a local cache. The API establishes two types of caching, positive and negative.

Positive caching

To prevent clients from repeatedly asking about the state of a particular unsafe full hash, each returned ThreatMatch contains a positive cache duration (defined by the cacheDuration field), which indicates how long the full hash is to be considered unsafe.

Negative caching

To prevent clients from repeatedly asking about the state of a particular safe full hash, each fullHashes response defines a negative cache duration for the requested prefix (defined by the negativeCacheDuration field). This duration indicates how long all full hashes with the requested prefix are to be considered safe for the requested lists, except for those returned by the server as unsafe. This caching is particularly important as it prevents traffic overload that could be caused by a hash prefix collision with a safe URL that receives a lot of traffic.

Consulting the cache

When the client wants to check the state of a URL, it first computes its full hash. If the full hash’s prefix is present in the local database, the client should then consult its cache before making a fullHashes request to the server.

First, clients should check for a positive cache hit. If there exists an unexpired positive cache entry for the full hash of interest, it should be considered unsafe. If the positive cache entry expired, the client must send a fullHashes request for the associated local prefix. Per the protocol, if the server returns the full hash, it is considered unsafe; otherwise, it’s considered safe.

If there are no positive cache entries for the full hash, the client should check for a negative cache hit. If there exists an unexpired negative cache entry for the associated local prefix, the full hash is considered safe. If the negative cache entry expired, or it doesn’t exist, the client must send a fullHashes request for the associated local prefix and interpret the response as normal.

Updating the cache

The client cache should be updated whenever a fullHashes response is received. A positive cache entry should be created or updated for the full hash per the cacheDuration field. The hash prefix’s negative cache duration should also be created or updated per the response’s negativeCacheDuration field.

If a subsequent fullHashes request does not return a full hash that is currently positively cached, the client is not required to remove the positive cache entry. This is not cause for concern in practice, since positive cache durations are typically short (a few minutes) to allow for quick correction of false positives.

Example scenario

In the following example, assume h(url) is the hash prefix of the URL and H(url) is the full-length hash of the URL. That is, h(url) = SHA256(url).substr(4), H(url) = SHA256(url).

Now, assume a client (with an empty cache) visits example.com/ and sees that h(example.com/) is in the local database. The client requests the full-length hashes for hash prefix h(example.com/) and receives back the full-length hash H(example.com/) together with a positive cache duration of 5 minutes and a negative cache duration of 1 hour.

The positive cache duration of 5 minutes tells the client how long the full-length hash H(example.com/) must be considered unsafe without sending another fullHashes request. After 5 minutes the client must issue another fullHashes request for that prefix h(example.com/) if the client visits example.com/ again. The client should reset the hash prefix’s negative cache duration per the new response.

The negative cache duration of 1 hour tells the client how long all the other full-length hashes besides H(example.com/) that share the same prefix of h(example.com/) must be considered safe. For the duration of 1 hour, every URL such that h(URL) = h(example.com/) must be considered safe, and therefore not result in a fullHashes request (assuming that H(URL) != H(example.com/)).

If the fullHashes response contains zero matches and a negative cache duration is set, then the client must not issue any fullHashes requests for any of the requested prefixes for the given negative cache duration.

If the fullHashes response contains one or more matches, a negative cache duration is still set for the entire response. In that case, the cache duration of a single full hash indicates how long that particular full-length hash must be assumed unsafe by the client. After the ThreatMatch cache duration elapses, the client must refresh the full-length hash by issuing a fullHashes request for that hash prefix if the requested URL matches the existing full-length hash in the cache. In that case the negative cache duration does not apply. The response’s negative cache duration only applies to full-length hashes that were not present in the fullHashes response. For full-length hashes that are not present in the response, the client must refrain from issuing any fullHashes requests until the negative cache duration is elapsed.

Example: fullHashes.find

Click the request and response links in the table header for complete examples.

Hash Prefixes
fullHashes Request
Full-Length Hash Matches
fullHashes Response
Caching Behavior
"threatEntries": [
  {"hash": "0xaaaaaaaa"}
]
"matches": [],
"negativeCacheDuration": "3600.000s"
No match.
Client must not send any fullHashes requests for hash prefix 0xaaaaaaaa for at least one hour. Any hash with prefix 0xaaaaaaaa is considered safe for one hour.
"threatEntries": [
  {"hash": "0xbbbbbbbb"}
]
"matches": [
 "threat": {"hash": "0xbbbbbbbb0000..."}
 "cacheDuration": "600.000s",
],
"negativeCacheDuration": "300.000s"
Possible matches.
The client should consider the URL with the full hash 0xbbbbbbbb0000… unsafe for 10 minutes. The client should consider all other URLs with hash prefix 0xbbbbbbbb safe for 5 minutes. After 5 minutes, the hash prefixes negative cache entry would expire. Since the positive cache entry for 0xbbbbbbbb0000… has not yet expired, the client should send fullHashes requests for all hashes except that one.
"threatEntries": [
  {"hash": "0xcccccccc"}
]
"matches": [
 "threat": {"hash": "0xccccccccdddd..."},
 "cacheDuration": "600.000s"
],
"negativeCacheDuration": "3600.000s"
Possible matches.
Client must not send any fullHashes request for hash prefix 0xcccccccc for at least 1h and assume that prefix to be safe — except if the full hash of the URL matches the cached full hash 0xccccccccdddd.... In that case the client should consider that URL to be unsafe for 10 minutes. After 10 minutes the full-length hash expires. Any subsequent lookups for that full hash should trigger a new fullHashes request.