Note: This documentation is currently still under development. Expect improvements in the near future.
Google Safe Browsing v5 is an evolution of Google Safe Browsing v4. The two key changes made in v5 are data freshness and IP privacy. In addition, the API surface has been improved to increase flexibility, efficiency, and reduce bloat. Furthermore, Google Safe Browsing v5 is designed to make migration from v4 easy.
Currently, Google offers both v4 and v5 and both are considered production ready. You may use either v4 or v5. We have not announced a date for sunsetting v4; if we do, we will give a minimum notice of one year. This page will describe v5 as well as a migration guide from v4 to v5; the complete v4 documentation remains available.
Data Freshness
In v5, we introduce a mode of operation known as real-time protection. This circumvents the data staleness problem above. In v4, clients are expected to download and maintain a local database, perform checks against the locally downloaded threat lists, and then when there is a partial prefix match, perform a request to download the full hash. In v5, although clients should continue to download and maintain a local database of threat lists, clients are now also expected to download a list of likely-benign sites (called the Global Cache), perform both a local check for this Global Cache as well as a local threat list check, and finally when there is either a partial prefix match for threat lists or a no-match in the Global Cache, perform a request to download the full hashes. (For details on the local processing required by the client, please see the provided procedure below.) This represents a shift from allow-by-default to check-by-default, which can improve protection in light of faster propagation of threats on the web. In other words, this is a protocol that is designed to provide near-real-time protection: we aim to have clients benefit from fresher Google Safe Browsing data.
IP Privacy
Google Safe Browsing (v4 or v5) does not process anything associated with a user’s identity in the course of serving requests. Cookies, if sent, are ignored. The originating IP addresses of the requests are known to Google, but Google only uses the IP addresses for essential networking needs (i.e. for sending responses) and for anti-DoS purposes.
Concurrently with v5, we introduce a companion API known as the Safe Browsing Oblivious HTTP Gateway API. This uses Oblivious HTTP to hide end users' IP addresses from Google. It works by having a non-colluding third-party to handle an encrypted version of the user request and then forward that to Google. So the third party only has access to the IP addresses, and Google only has access to the content of the request. The third party operates an Oblivious HTTP Relay (such as this service by Fastly), and Google operates the Oblivious HTTP Gateway. This is an optional companion API. When using it in conjunction with Google Safe Browsing, end users' IP addresses are no longer sent to Google.
The Modes of Operation
Google Safe Browsing v5 allows clients to choose from three modes of operation.
Real-Time Mode
When clients choose to use Google Safe Browsing v5 in real-time mode, clients will maintain in their local database: (i) a Global Cache of likely-benign sites, formatted as SHA256 hashes of host-suffix/path-prefix URL expressions, (ii) a set of threat lists, formatted as SHA256 hash prefixes of host-suffix/path-prefix URL expressions. The high-level idea is that whenever the client wishes to check a particular URL, a local check is performed using the Global Cache. If that check passes, a local threat lists check is performed. Otherwise, the client continues with the real-time hash check as detailed below.
Besides the local database, the client will maintain a local cache. Such a local cache need not be in persistent storage and may be cleared in case of memory pressure.
A detailed specification of the procedure is available below.
Local List Mode
When clients choose to use Google Safe Browsing v5 in this mode, the client behavior is similar to the v4 Update API except using the improved API surface of v5. Clients will maintain in their local database a set of threat lists formatted as SHA256 hash prefixes of host-suffix/path-prefix URL expressions. Whenever the client wishes to check a particular URL, a check is performed using the local threat list. If and only if there is a match, the client connects to the server to continue the check.
As with the above, the client will also maintain a local cache that need not be in persistent storage.
No-Storage Real-Time Mode
When clients choose to use Google Safe Browsing v5 in the no-storage real-time mode, the client need not maintain any persistent local database. However, the client is still expected to maintain a local cache. Such a local cache need not be in persistent storage and may be cleared in case of memory pressure.
Whenever the client wishes to check a particular URL, the client always connects to the server to perform a check. This mode is similar to what clients of the v4 Lookup API may implement.
Compared to the Real-Time Mode, this mode may use more network bandwidth but may be more suitable if it is inconvenient for the client to maintain persistent local state.
The Real-Time URL Check Procedure
This procedure is used when the client chooses the real-time mode of operation.
This procedure takes a single URL u
and returns SAFE
, UNSAFE
or UNSURE
. If it returns SAFE
the URL is deemed safe by Google Safe Browsing. If it returns UNSAFE
the URL is deemed potentially unsafe by Google Safe Browsing and appropriate action should be taken: such as showing a warning to the end user, moving a received message to the spam folder, or requiring extra confirmation by the user before proceeding. If it returns UNSURE
, the following local-check procedure should be used afterwards.
- Let
expressions
be a list of suffix/prefix expressions generated by the URLu
. - Let
expressionHashes
be a list, where the elements are SHA256 hashes of each expression inexpressions
. - For each
hash
ofexpressionHashes
:- If
hash
can be found in the global cache, returnUNSURE
.
- If
- Let
expressionHashPrefixes
be a list, where the elements are the first 4 bytes of each hash inexpressionHashes
. - For each
expressionHashPrefix
ofexpressionHashPrefixes
:- Look up
expressionHashPrefix
in the local cache. - If the cached entry is found:
- Determine whether the current time is greater than its expiration time.
- If it is greater:
- Remove the found cached entry from the local cache.
- Continue with the loop.
- If it is not greater:
- Remove this particular
expressionHashPrefix
fromexpressionHashPrefixes
. - Check whether the corresponding full hash within
expressionHashes
is found in the cached entry. - If found, return
UNSAFE
. - If not found, continue with the loop.
- Remove this particular
- If the cached entry is not found, continue with the loop.
- Look up
- Send
expressionHashPrefixes
to the Google Safe Browsing v5 server using RPC SearchHashes or the REST method hashes.search. If an error occurred (including network errors, HTTP errors, etc), returnUNSURE
. Otherwise, let response be theresponse
received from the SB server, which is a list of full hashes together with some auxiliary information identifying the nature of the threat (social engineering, malware, etc), as well as the cache expiration timeexpiration
. - For each
fullHash
ofresponse
:- Insert
fullHash
into the local cache, together withexpiration
.
- Insert
- For each
fullHash
ofresponse
:- Let
isFound
be the result of findingfullHash
inexpressionHashes
. - If
isFound
is False, continue with the loop. - If
isFound
is True, returnUNSAFE
.
- Let
- Return
SAFE
.
While this protocol specifies when the client sends expressionHashPrefixes
to the server, this protocol purposefully does not specify exactly how to send them. For example, it is acceptable for the client to send all the expressionHashPrefixes
in a single request, and it is also acceptable for the client to send each individual prefix in expressionHashPrefixes
to the server in separate requests (perhaps proceeding in parallel). It is also acceptable for the client to send unrelated or randomly generated hash prefixes together with the hash prefixes in expressionHashPrefixes
, as long as the number of hash prefixes sent in a single request does not exceed 30.
The LocalThreat List URL Check Procedure
This procedure is used when the client opts for the local list mode of operation. It is also used when the client the RealTimeCheck procedure above returns the value of UNSURE
.
This procedure takes a single URL u
and returns SAFE
or UNSAFE
.
- Let
expressions
be a list of suffix/prefix expressions generated by the URLu
. - Let
expressionHashes
be a list, where the elements are SHA256 hashes of each expression inexpressions
. - Let
expressionHashPrefixes
be a list, where the elements are the first 4 bytes of each hash inexpressionHashes
. - For each
expressionHashPrefix
ofexpressionHashPrefixes
:- Look up
expressionHashPrefix
in the local cache. - If the cached entry is found:
- Determine whether the current time is greater than its expiration time.
- If it is greater:
- Remove the found cached entry from the local cache.
- Continue with the loop.
- If it is not greater:
- Remove this particular
expressionHashPrefix
fromexpressionHashPrefixes
. - Check whether the corresponding full hash within
expressionHashes
is found in the cached entry. - If found, return
UNSAFE
. - If not found, continue with the loop.
- Remove this particular
- If the cached entry is not found, continue with the loop.
- Look up
- For each
expressionHashPrefix
ofexpressionHashPrefixes
:- Look up
expressionHashPrefix
in the local threat list database. - If the
expressionHashPrefix
cannot be found in the local threat list database, remove it fromexpressionHashPrefixes
.
- Look up
- Send
expressionHashPrefixes
to the Google Safe Browsing v5 server using RPC SearchHashes or the REST method hashes.search. If an error occurred (including network errors, HTTP errors, etc), returnSAFE
. Otherwise, let response be theresponse
received from the SB server, which is a list of full hashes together with some auxiliary information identifying the nature of the threat (social engineering, malware, etc), as well as the cache expiration timeexpiration
. - For each
fullHash
ofresponse
:- Insert
fullHash
into the local cache, together withexpiration
.
- Insert
- For each
fullHash
ofresponse
:- Let
isFound
be the result of findingfullHash
inexpressionHashes
. - If
isFound
is False, continue with the loop. - If
isFound
is True, returnUNSAFE
.
- Let
- Return
SAFE
.
The Real-Time URL Check Procedure Without a Local Database
This procedure is used when the client chooses the no-storage real-time mode of operation.
This procedure takes a single URL u
and returns SAFE
or UNSAFE
.
- Let
expressions
be a list of suffix/prefix expressions generated by the URLu
. - Let
expressionHashes
be a list, where the elements are SHA256 hashes of each expression inexpressions
. - Let
expressionHashPrefixes
be a list, where the elements are the first 4 bytes of each hash inexpressionHashes
. - For each
expressionHashPrefix
ofexpressionHashPrefixes
:- Look up
expressionHashPrefix
in the local cache. - If the cached entry is found:
- Determine whether the current time is greater than its expiration time.
- If it is greater:
- Remove the found cached entry from the local cache.
- Continue with the loop.
- If it is not greater:
- Remove this particular
expressionHashPrefix
fromexpressionHashPrefixes
. - Check whether the corresponding full hash within
expressionHashes
is found in the cached entry. - If found, return
UNSAFE
. - If not found, continue with the loop.
- Remove this particular
- If the cached entry is not found, continue with the loop.
- Look up
- Send
expressionHashPrefixes
to the Google Safe Browsing v5 server using RPC SearchHashes or the REST method hashes.search. If an error occurred (including network errors, HTTP errors, etc), returnSAFE
. Otherwise, let response be theresponse
received from the SB server, which is a list of full hashes together with some auxiliary information identifying the nature of the threat (social engineering, malware, etc), as well as the cache expiration timeexpiration
. - For each
fullHash
ofresponse
:- Insert
fullHash
into the local cache, together withexpiration
.
- Insert
- For each
fullHash
ofresponse
:- Let
isFound
be the result of findingfullHash
inexpressionHashes
. - If
isFound
is False, continue with the loop. - If
isFound
is True, returnUNSAFE
.
- Let
- Return
SAFE
.
Just like the Real-Time URL Check Procedure, this procedure does not specify exactly how to send the hash prefixes to the server. For example, it is acceptable for the client to send all the expressionHashPrefixes
in a single request, and it is also acceptable for the client to send each individual prefix in expressionHashPrefixes
to the server in separate requests (perhaps proceeding in parallel). It is also acceptable for the client to send unrelated or randomly generated hash prefixes together with the hash prefixes in expressionHashPrefixes
, as long as the number of hash prefixes sent in a single request does not exceed 30.
Example Requests
This section documents some examples of directly using the HTTP API to access Google Safe Browsing. It is generally recommended to use a generated language binding because it will automatically handle encoding and decoding in a convenient way. Please refer to the documentation for that binding.
Here is an example HTTP request using the hashes.search method:
GET https://safebrowsing.googleapis.com/v5/hashes:search?key=INSERT_YOUR_API_KEY_HERE&hashPrefixes=WwuJdQ
The response body is a protocol-buffer formatted payload that you may then decode.
Here is an example HTTP request using the hashLists.batchGet method:
GET https://safebrowsing.googleapis.com/v5alpha1/hashLists:batchGet?key=INSERT_YOUR_API_KEY_HERE&names=se&names=mw-4b
The response body is, once again, a protocol-buffer formatted payload that you may then decode.