Overview

Note: This documentation is currently still under development. Expect improvements in the near future.

Google Safe Browsing v5 is an evolution of Google Safe Browsing v4. The two key changes made in v5 are data freshness and IP privacy. In addition, the API surface has been improved to increase flexibility, efficiency, and reduce bloat. Furthermore, Google Safe Browsing v5 is designed to make migration from v4 easy.

Currently, Google offers both v4 and v5 and both are considered production ready. You may use either v4 or v5. We have not announced a date for sunsetting v4; if we do, we will give a minimum notice of one year. This page will describe v5 as well as a migration guide from v4 to v5; the complete v4 documentation remains available.

Data Freshness

In v5, we introduce a mode of operation known as real-time protection. This circumvents the data staleness problem above. In v4, clients are expected to download and maintain a local database, perform checks against the locally downloaded threat lists, and then when there is a partial prefix match, perform a request to download the full hash. In v5, although clients should continue to download and maintain a local database of threat lists, clients are now also expected to download a list of likely-benign sites (called the Global Cache), perform both a local check for this Global Cache as well as a local threat list check, and finally when there is either a partial prefix match for threat lists or a no-match in the Global Cache, perform a request to download the full hashes. (For details on the local processing required by the client, please see the provided procedure below.) This represents a shift from allow-by-default to check-by-default, which can improve protection in light of faster propagation of threats on the web. In other words, this is a protocol that is designed to provide near-real-time protection: we aim to have clients benefit from fresher Google Safe Browsing data.

IP Privacy

Google Safe Browsing (v4 or v5) does not process anything associated with a user’s identity in the course of serving requests. Cookies, if sent, are ignored. The originating IP addresses of the requests are known to Google, but Google only uses the IP addresses for essential networking needs (i.e. for sending responses) and for anti-DoS purposes.

Concurrently with v5, we introduce a companion API known as the Safe Browsing Oblivious HTTP Gateway API. This uses Oblivious HTTP to hide end users' IP addresses from Google. It works by having a non-colluding third-party to handle an encrypted version of the user request and then forward that to Google. So the third party only has access to the IP addresses, and Google only has access to the content of the request. The third party operates an Oblivious HTTP Relay (such as this service by Fastly), and Google operates the Oblivious HTTP Gateway. This is an optional companion API. When using it in conjunction with Google Safe Browsing, end users' IP addresses are no longer sent to Google.

The Modes of Operation

Google Safe Browsing v5 allows clients to choose from three modes of operation.

Real-Time Mode

When clients choose to use Google Safe Browsing v5 in real-time mode, clients will maintain in their local database: (i) a Global Cache of likely-benign sites, formatted as SHA256 hashes of host-suffix/path-prefix URL expressions, (ii) a set of threat lists, formatted as SHA256 hash prefixes of host-suffix/path-prefix URL expressions. The high-level idea is that whenever the client wishes to check a particular URL, a local check is performed using the Global Cache. If that check passes, a local threat lists check is performed. Otherwise, the client continues with the real-time hash check as detailed below.

Besides the local database, the client will maintain a local cache. Such a local cache need not be in persistent storage and may be cleared in case of memory pressure.

A detailed specification of the procedure is available below.

Local List Mode

When clients choose to use Google Safe Browsing v5 in this mode, the client behavior is similar to the v4 Update API except using the improved API surface of v5. Clients will maintain in their local database a set of threat lists formatted as SHA256 hash prefixes of host-suffix/path-prefix URL expressions. Whenever the client wishes to check a particular URL, a check is performed using the local threat list. If and only if there is a match, the client connects to the server to continue the check.

As with the above, the client will also maintain a local cache that need not be in persistent storage.

No-Storage Real-Time Mode

When clients choose to use Google Safe Browsing v5 in the no-storage real-time mode, the client need not maintain any persistent local database. However, the client is still expected to maintain a local cache. Such a local cache need not be in persistent storage and may be cleared in case of memory pressure.

Whenever the client wishes to check a particular URL, the client always connects to the server to perform a check. This mode is similar to what clients of the v4 Lookup API may implement.

Compared to the Real-Time Mode, this mode may use more network bandwidth but may be more suitable if it is inconvenient for the client to maintain persistent local state.

The Real-Time URL Check Procedure

This procedure is used when the client chooses the real-time mode of operation.

This procedure takes a single URL u and returns SAFE, UNSAFE or UNSURE. If it returns SAFE the URL is deemed safe by Google Safe Browsing. If it returns UNSAFE the URL is deemed potentially unsafe by Google Safe Browsing and appropriate action should be taken: such as showing a warning to the end user, moving a received message to the spam folder, or requiring extra confirmation by the user before proceeding. If it returns UNSURE, the following local-check procedure should be used afterwards.

  1. Let expressions be a list of suffix/prefix expressions generated by the URL u.
  2. Let expressionHashes be a list, where the elements are SHA256 hashes of each expression in expressions.
  3. For each hash of expressionHashes:
    1. If hash can be found in the global cache, return UNSURE.
  4. Let expressionHashPrefixes be a list, where the elements are the first 4 bytes of each hash in expressionHashes.
  5. For each expressionHashPrefix of expressionHashPrefixes:
    1. Look up expressionHashPrefix in the local cache.
    2. If the cached entry is found:
      1. Determine whether the current time is greater than its expiration time.
      2. If it is greater:
        1. Remove the found cached entry from the local cache.
        2. Continue with the loop.
      3. If it is not greater:
        1. Remove this particular expressionHashPrefix from expressionHashPrefixes.
        2. Check whether the corresponding full hash within expressionHashes is found in the cached entry.
        3. If found, return UNSAFE.
        4. If not found, continue with the loop.
    3. If the cached entry is not found, continue with the loop.
  6. Send expressionHashPrefixes to the Google Safe Browsing v5 server using RPC SearchHashes or the REST method hashes.search. If an error occurred (including network errors, HTTP errors, etc), return UNSURE. Otherwise, let response be the response received from the SB server, which is a list of full hashes together with some auxiliary information identifying the nature of the threat (social engineering, malware, etc), as well as the cache expiration time expiration.
  7. For each fullHash of response:
    1. Insert fullHash into the local cache, together with expiration.
  8. For each fullHash of response:
    1. Let isFound be the result of finding fullHash in expressionHashes.
    2. If isFound is False, continue with the loop.
    3. If isFound is True, return UNSAFE.
  9. Return SAFE.

While this protocol specifies when the client sends expressionHashPrefixes to the server, this protocol purposefully does not specify exactly how to send them. For example, it is acceptable for the client to send all the expressionHashPrefixes in a single request, and it is also acceptable for the client to send each individual prefix in expressionHashPrefixes to the server in separate requests (perhaps proceeding in parallel). It is also acceptable for the client to send unrelated or randomly generated hash prefixes together with the hash prefixes in expressionHashPrefixes, as long as the number of hash prefixes sent in a single request does not exceed 30.

The LocalThreat List URL Check Procedure

This procedure is used when the client opts for the local list mode of operation. It is also used when the client the RealTimeCheck procedure above returns the value of UNSURE.

This procedure takes a single URL u and returns SAFE or UNSAFE.

  1. Let expressions be a list of suffix/prefix expressions generated by the URL u.
  2. Let expressionHashes be a list, where the elements are SHA256 hashes of each expression in expressions.
  3. Let expressionHashPrefixes be a list, where the elements are the first 4 bytes of each hash in expressionHashes.
  4. For each expressionHashPrefix of expressionHashPrefixes:
    1. Look up expressionHashPrefix in the local cache.
    2. If the cached entry is found:
      1. Determine whether the current time is greater than its expiration time.
      2. If it is greater:
        1. Remove the found cached entry from the local cache.
        2. Continue with the loop.
      3. If it is not greater:
        1. Remove this particular expressionHashPrefix from expressionHashPrefixes.
        2. Check whether the corresponding full hash within expressionHashes is found in the cached entry.
        3. If found, return UNSAFE.
        4. If not found, continue with the loop.
    3. If the cached entry is not found, continue with the loop.
  5. For each expressionHashPrefix of expressionHashPrefixes:
    1. Look up expressionHashPrefix in the local threat list database.
    2. If the expressionHashPrefix cannot be found in the local threat list database, remove it from expressionHashPrefixes.
  6. Send expressionHashPrefixes to the Google Safe Browsing v5 server using RPC SearchHashes or the REST method hashes.search. If an error occurred (including network errors, HTTP errors, etc), return SAFE. Otherwise, let response be the response received from the SB server, which is a list of full hashes together with some auxiliary information identifying the nature of the threat (social engineering, malware, etc), as well as the cache expiration time expiration.
  7. For each fullHash of response:
    1. Insert fullHash into the local cache, together with expiration.
  8. For each fullHash of response:
    1. Let isFound be the result of finding fullHash in expressionHashes.
    2. If isFound is False, continue with the loop.
    3. If isFound is True, return UNSAFE.
  9. Return SAFE.

The Real-Time URL Check Procedure Without a Local Database

This procedure is used when the client chooses the no-storage real-time mode of operation.

This procedure takes a single URL u and returns SAFE or UNSAFE.

  1. Let expressions be a list of suffix/prefix expressions generated by the URL u.
  2. Let expressionHashes be a list, where the elements are SHA256 hashes of each expression in expressions.
  3. Let expressionHashPrefixes be a list, where the elements are the first 4 bytes of each hash in expressionHashes.
  4. For each expressionHashPrefix of expressionHashPrefixes:
    1. Look up expressionHashPrefix in the local cache.
    2. If the cached entry is found:
      1. Determine whether the current time is greater than its expiration time.
      2. If it is greater:
        1. Remove the found cached entry from the local cache.
        2. Continue with the loop.
      3. If it is not greater:
        1. Remove this particular expressionHashPrefix from expressionHashPrefixes.
        2. Check whether the corresponding full hash within expressionHashes is found in the cached entry.
        3. If found, return UNSAFE.
        4. If not found, continue with the loop.
    3. If the cached entry is not found, continue with the loop.
  5. Send expressionHashPrefixes to the Google Safe Browsing v5 server using RPC SearchHashes or the REST method hashes.search. If an error occurred (including network errors, HTTP errors, etc), return SAFE. Otherwise, let response be the response received from the SB server, which is a list of full hashes together with some auxiliary information identifying the nature of the threat (social engineering, malware, etc), as well as the cache expiration time expiration.
  6. For each fullHash of response:
    1. Insert fullHash into the local cache, together with expiration.
  7. For each fullHash of response:
    1. Let isFound be the result of finding fullHash in expressionHashes.
    2. If isFound is False, continue with the loop.
    3. If isFound is True, return UNSAFE.
  8. Return SAFE.

Just like the Real-Time URL Check Procedure, this procedure does not specify exactly how to send the hash prefixes to the server. For example, it is acceptable for the client to send all the expressionHashPrefixes in a single request, and it is also acceptable for the client to send each individual prefix in expressionHashPrefixes to the server in separate requests (perhaps proceeding in parallel). It is also acceptable for the client to send unrelated or randomly generated hash prefixes together with the hash prefixes in expressionHashPrefixes, as long as the number of hash prefixes sent in a single request does not exceed 30.

Example Requests

This section documents some examples of directly using the HTTP API to access Google Safe Browsing. It is generally recommended to use a generated language binding because it will automatically handle encoding and decoding in a convenient way. Please refer to the documentation for that binding.

Here is an example HTTP request using the hashes.search method:

GET https://safebrowsing.googleapis.com/v5/hashes:search?key=INSERT_YOUR_API_KEY_HERE&hashPrefixes=WwuJdQ

The response body is a protocol-buffer formatted payload that you may then decode.

Here is an example HTTP request using the hashLists.batchGet method:

GET https://safebrowsing.googleapis.com/v5alpha1/hashLists:batchGet?key=INSERT_YOUR_API_KEY_HERE&names=se&names=mw-4b

The response body is, once again, a protocol-buffer formatted payload that you may then decode.