It's the 15th anniversary of Google Maps Platform - Check out the latest news and announcements

Google Maps Platform Incident Management

Incident communication channels

The Google Maps Platform Support team offers different incident communication channels.

The Maps API Incidents & Outage Issue Tracker list contains a list of all known incidents. You can easily view ongoing incidents, follow their progress, and add comments to help our teams investigate by subscribing to the issue.

The Google Maps Platform Notifications group is the first place where widespread outages are reported. All customers who have joined the group will receive an email notification once an outage has been detected and will receive all subsequent updates until the issue is resolved.

The moment an issue is detected and reported in the Issue Tracker, a banner is also displayed on the Google Maps Platform Support page (in the Google Cloud Platform [GCP] Console). The banner identifies the affected product and includes a link to the Issue Tracker.

Lifecycle of an incident

Google Maps Platform adheres to the Google Cloud Platform Incident Management framework.

When an outage or service degradation occurs, the product engineering team and the Google Maps Platform Support team work together to resolve the incident and communicate it to you.

lifecycle

Detection

Google uses internal and black box monitoring to detect incidents. For more information, see Chapter 6 of the Site Reliability Engineering book.

If you detect an incident that has not yet been reported in the Issue Tracker, go to the Google Maps Platform Support page (in the GCP Console) and create new a support case.

Initial Response

When an incident is detected, the Support team leads communication with you. Initial notification of an incident is often sparse, frequently only mentioning the product in question. This is because we prioritize fast notification over detail. Details will be provided in subsequent updates.

To provide the appropriate amount of information, different communication channels are used depending on the scope and severity of an issue.

response

Investigation

Product engineering teams are responsible for investigating the root cause of incidents. Incident management is often done by Site Reliability Engineers but might be done by software engineers or others, depending on the situation and product. For more information, see Chapter 12 of the Site Reliability Engineering Book.

Mitigation/Fix

An issue is considered fixed only when changes have been made that Google is confident will end the impact indefinitely. For example, the fix could be rolling back a change that triggered an incident.

While an incident is in progress, the Support and Product teams will attempt to mitigate the issue. Mitigation occurs when the impact or scope of an issue can be reduced, for example by temporarily providing additional resources to a service suffering overload.

If no mitigation has been found, when possible, the Support team will find and communicate workarounds. Workarounds are steps that you can take to solve the underlying need despite the incident. A workaround might be to use different settings for an API call to avoid a problematic code path.

Follow Up

While an incident is ongoing, the Support team provides regular updates. Updates typically provide:

  • More information about the incident, such as error messages, which features are affected, and how widespread it is.
  • Progress towards mitigation, including any workarounds.
  • Timelines for communication, tailored to the incident.
  • Changes in status, such as when an incident is fixed.

Postmortem

All incidents result in a postmortem (post incident) internal analysis to fully understand the incident and to identify reliability improvements that Google can make. These improvements are then tracked and implemented. For more information on postmortems at Google, see Chapter 15 of the Site Reliability Engineering Book.

Incident Report

When incidents have very wide and serious impact, Google provides incident reports that outline the symptoms, impact, root cause, remediation, and future prevention of incidents. As with postmortems, we pay particular attention to the steps that we take to learn from the issue and improve reliability. Google's goal in writing and releasing postmortems is to be transparent and demonstrate our commitment to building stable services for our customers.

FAQ

I want to get notified when there’s an ongoing outage. What should I do?

Join the Google Maps Platform Notifications group to get notified of ongoing issues and to follow the progress of the incident in real-time. This group will also help you stay up to date with product and platform announcements.

Where can I check if an outage has been reported?

The Google Maps Platform team offers several resources to help you stay informed when there’s an ongoing outage. Please choose the one that works best for you.

  • Incidents in the Issue Tracker: A reference list of all known incidents. You can easily view ongoing incidents, follow their progress by subscribing to them, and add comments to help our teams investigate. You can find the link to the public issue tracker in the Google Maps Platform support documentation.
  • Google Maps Platform Notifications Group: A Google group where all widespread outages are reported. All customers who have joined the group will receive an email notification once an outage has been detected and subsequent updates until the issue is resolved.
  • Google Maps Platform Support Page (in the GCP console): The moment an issue is detected and reported in the Issue Tracker, the Support page will display an active banner with a notice about the issue and a link to the Issue Tracker.

    outage

What if I am experiencing an issue, but it is not listed in the notification group or the Issue Tracker?

The issue may be isolated to your projects, or it may be impacting a limited number of customers. If no incident has been announced, go to the Google Maps Platform Support page (in the GCP Console) and create new a support case.

What is the difference between an "incident" and an "outage"?

Although these terms are often used interchangeably, our external communications uses "incident" to refer to any period of degraded service and "outage" to refer only to the most serious issues, where a product is nonfunctioning to a large extent.