Migration From V4

One significant improvement of Google Safe Browsing v5 over v4 (specifically, the v4 Update API) is data freshness and coverage. Since the protection highly depends on the client-maintained local database, the delay and size of the local database update is the main contributor of the missed protection. In v4, the typical client takes 20 to 50 minutes to obtain the most up-to-date version of threat lists. Unfortunately, phishing attacks spread fast: as of 2021, 60% of sites that deliver attacks live less than 10 minutes. Our analysis shows that around 25-30% of missing phishing protection is due to such data staleness. Further, some devices are not equipped to manage the entirety of the Google Safe Browsing threat lists, which continues to grow larger over time.

If you are currently using the v4 Update API, there is a seamless migration path from v4 to v5 without having to reset or erase the local database. This section documents how to do that.

Converting List Updates

Unlike V4, where lists are identified by the tuple of threat type, platform type, threat entry type, in v5 lists are simply identified by name. This provides flexibility when multiple v5 lists could share the same threat type. Platform types and threat entry types are removed in v5.

In v4, one would use the threatListUpdates.fetch method to download lists. In v5, one would switch to the hashLists.batchGet method.

The following changes should be made to the request:

  1. Remove the v4 ClientInfo object altogether. Instead of supplying a client's identification using a dedicated field, simply use the well-known User-Agent header. While there is no prescribed format for supplying the client identification in this header, we suggest simply including the original client ID and client version separated by a space character or a slash character.
  2. For each v4 ListUpdateRequest object: * Look up the corresponding v5 list name in the table above and supply that name in the v5 request.
    • Remove unneeded fields such as threat_entry_type or platform_type.
    • The state field in v4 is directly compatible with the v5 versions field. The same byte string that would be sent to the server using the state field in v4 can simply be sent in v5 using the versions field.
    • For the v4 constraints, v5 uses a simplified version called SizeConstraints. Additional fields such as region should be dropped.

The following changes should be made to the response:

  1. The v4 enum ResponseType is simply replaced by a boolean field named partial_update.
  2. The minimum_wait_duration field can now be zero or omitted. If it is, the client is requested to immediately make another request. This only happens when the client specifies in SizeConstraints a smaller constraint on max update size than the max database size.
  3. The Rice decoding algorithm for 32-bit integers will need to be adjusted. The difference is that the encoded data are encoded with a different endianness. In both v4 and v5, 32-bit hash prefixes are sorted lexicographically. But in v4, those prefixes are treated as little endian when sorted, whereas in v5 those prefixes are treated as big endian when sorted. This means that the client does not need to do any sorting, since lexicographic sorting is identical to numeric sorting with big endian. An example of this sort in the Chromium implementation of v4 can be found here. Such sorting can be removed.
  4. The Rice decoding algorithm will need to be implemented for other hash lengths as well.

Converting Hash Searches

In v4, one would use the fullHashes.find method to get full hashes. The equivalent method in v5 is the hashes.search method.

The following changes should be made to the request:

  1. Structure the code to only send hash prefixes that are exactly 4 bytes in length.
  2. Remove the v4 ClientInfo objects altogether. Instead of supplying a client's identification using a dedicated field, simply use the well-known User-Agent header. While there is no prescribed format for supplying the client identification in this header, we suggest simply including the original client ID and client version separated by a space character or a slash character.
  3. Remove the client_states field. It is no longer necessary.
  4. It is no longer needed to include threat_types and similar fields.

The following changes should be made to the response:

  1. The minimum_wait_duration field has been removed. The client can always issue a new request on an as-needed basis.
  2. The v4 ThreatMatch object has been simplified into the FullHash object.
  3. Caching has been simplified into a single cache duration. See the above procedures for interacting with the cache.