Follow the instructions in each section below to integrate the Google Assistant into your project.
gRPC bindings
The Google Assistant Service is built on top of gRPC, a high performance, open-source RPC framework. This framework is well-suited for bidirectional audio streaming.
Python
If you're using Python, get started using this guide.
C++
Take a look at our C++ sample on GitHub.
Node.js
Take a look at our Node.js sample on GitHub.
Android Things
Interested in embedded devices? Check out the Assistant SDK sample for Android Things.
Other languages
- Clone the googleapis repository to get the protocol buffer interface definitions for the Google Assistant Service API.
- Follow the gRPC documentation to generate gRPC bindings for your language of choice
- Follow the steps in the sections below.
Authorize and authenticate your Google account to work with the Assistant
The next step is to authorize your device to talk with the Google Assistant using your Google account.
Obtain OAuth tokens with the Assistant SDK scope
The Assistant SDK uses OAuth 2.0 access tokens to authorize your device to connect with the Assistant.
When prototyping, you can use the authorization tool to easily generate OAuth2.0
credentials from the client_secret_<client-id>.json
file generated when
registering your device model.
Do the following to generate the credentials:
Use a Python virtual environment to isolate the authorization tool and its dependencies from the system Python packages.
sudo apt-get update
sudo apt-get install python3-dev python3-venv # Use python3.4-venv if the package cannot be found.
python3 -m venv env
env/bin/python -m pip install --upgrade pip setuptools wheel
source env/bin/activate
Install the authorization tool:
python -m pip install --upgrade google-auth-oauthlib[tool]
Run the tool. Remove the
--headless
flag if you are running this from a terminal on the device (not an SSH session):google-oauthlib-tool --client-secrets /path/to/client_secret_client-id.json --scope https://www.googleapis.com/auth/assistant-sdk-prototype --save --headless
When you are ready to integrate the authorization as part of the provisioning mechanism of your device, read our guides for Using OAuth 2.0 to Access Google APIs to understand how to obtain, persist and use OAuth access tokens to allow your device to talk with the Assistant API.
Use the following when working through these guides:
- OAuth scope: https://www.googleapis.com/auth/assistant-sdk-prototype
Supported OAuth flows:
- (Recommended) Installed apps
- Web server applications
Check out the best practices on privacy and security for recommendations on how to secure your device.
Authenticate your gRPC connection with OAuth tokens
Finally, put all the pieces together by reading how to use token-based authentication with Google to authenticate the gRPC connection to the Assistant API.
Register your device
Register your device model and instance either manually or with the registration tool (available in Python).
Implement a basic conversation dialog with the Assistant
- Implement a bidirectional streaming gRPC client for the Google Assistant Service API.
- Wait for the user to trigger a new request (e.g., wait for a GPIO interrupt from a button press).
Send an
AssistRequest
message with theconfig
field set (seeAssistConfig
). Make sure theconfig
field contains the following:- The
audio_in_config
field, which specifies how to process theaudio_in
data that will be provided in subsequent requests (seeAudioInConfig
). - The
audio_out_config
field, which specifies the desired format for the server to use when it returnsaudio_out
messages (seeAudioOutConfig
). - The
device_config
field, which identifies the registered device to the Assistant (seeDeviceConfig
). - The
dialog_state_in
field, which contains thelanguage_code
associated with the request (seeDialogStateIn
).
- The
Start recording.
Send multiple outgoing
AssistRequest
messages with audio data from the spoken query in theaudio_in
field.Handle incoming
AssistResponse
messages.Extract conversation metadata from the
AssistResponse
message. For example, fromdialog_state_out
, get theconversation_state
andvolume_percentage
(seeDialogStateOut
).Stop recording when receiving a
AssistResponse
with anevent_type
ofEND_OF_UTTERANCE
.Play back audio from the Assistant answer with audio data coming from the
audio_out
field.Take the
conversation_state
you extracted earlier and copy it into theDialogStateIn
message in theAssistConfig
for the nextAssistRequest
.
With this, you should be ready to make your first requests to the Google Assistant through your device.
Extend a conversation dialog with Device Actions
Extend the basic conversation dialog above to trigger the unique hardware capabilities of your particular device:
- In the incoming
AssistResponse
messages, extract thedevice_action
field (seeDeviceAction
). - Parse the JSON payload of the
device_request_json
field. Refer to the Device Traits page for the list of supported traits. Each trait schema page shows a sample EXECUTE request with the device command(s) and parameters that are returned in the JSON payload.
Get the transcript of the user request
If you have a display attached to the device, you might want to use it to
show the user request. To get this transcript, parse the speech_results
field
in the AssistResponse
messages. When the speech recognition completes, this list will contain one item
with a stability
set to 1.0.
Get the text and/or visual rendering of the Assistant's response
If you have a display attached to the device, you might want to use it to
show the Assistant's plain text response to the user's request. This text is located
in the DialogStateOut.supplemental_display_text
field.
The Assistant supports visual responses via HTML5 for certain queries (What
is the weather in Mountain View? or What time is it?). To enable this, set
the screen_out_config
field in AssistConfig
.
The ScreenOutConfig
message has field screen_mode
which should be set to PLAYING
.
The AssistResponse
messages will then have field screen_out
set. You can extract the HTML5 data (if present) from the
data
field.
Submitting queries via text input
If you have a text interface (for example, a keyboard) attached to the device,
set the text_query
field in the config
field (see AssistConfig
).
Do not set the audio_in_config
field.
Troubleshooting
See the Troubleshooting page if you run into issues.