Conversational Actions were deprecated on June 13, 2023. For more information, see Conversational Actions sunset.

Integrate the Assistant into Your Project (Other Languages)

Page Summary

This document provides instructions for integrating the Google Assistant into your project using gRPC bindings.
You can integrate the Google Assistant using Python, C++, Node.js, Android Things, or other languages.
Integration involves authorizing and authenticating your Google account, obtaining OAuth tokens, registering your device, and implementing a basic conversation dialog using bidirectional streaming gRPC.
You can extend the conversation dialog with Device Actions, get the transcript of user requests, and display text or visual responses from the Assistant.
Text input is supported by setting the text_query field in the AssistConfig.

Follow the instructions in each section below to integrate the Google Assistant into your project.

gRPC bindings

The Google Assistant Service is built on top of gRPC, a high performance, open-source RPC framework. This framework is well-suited for bidirectional audio streaming.

Python

If you're using Python, get started using this guide.

C++

Take a look at our C++ sample on GitHub.

Node.js

Take a look at our Node.js sample on GitHub.

Android Things

Interested in embedded devices? Check out the Assistant SDK sample for Android Things.

Other languages

Clone the googleapis repository to get the protocol buffer interface definitions for the Google Assistant Service API.
Follow the gRPC documentation to generate gRPC bindings for your language of choice
Follow the steps in the sections below.

Authorize and authenticate your Google account to work with the Assistant

The next step is to authorize your device to talk with the Google Assistant using your Google account.

Obtain OAuth tokens with the Assistant SDK scope

The Assistant SDK uses OAuth 2.0 access tokens to authorize your device to connect with the Assistant.

When prototyping, you can use the authorization tool to easily generate OAuth2.0 credentials from the client_secret_<client-id>.json file generated when registering your device model.

Do the following to generate the credentials:

Use a Python virtual environment to isolate the authorization tool and its dependencies from the system Python packages.

sudo apt-get update
sudo apt-get install python3-dev python3-venv # Use python3.4-venv if the package cannot be found.
python3 -m venv env
env/bin/python -m pip install --upgrade pip setuptools wheel
source env/bin/activate

Install the authorization tool:

python -m pip install --upgrade google-auth-oauthlib[tool]

Run the tool. Remove the --headless flag if you are running this from a terminal on the device (not an SSH session):

google-oauthlib-tool --client-secrets /path/to/client_secret_client-id.json --scope https://www.googleapis.com/auth/assistant-sdk-prototype --save --headless

When you are ready to integrate the authorization as part of the provisioning mechanism of your device, read our guides for Using OAuth 2.0 to Access Google APIs to understand how to obtain, persist and use OAuth access tokens to allow your device to talk with the Assistant API.

Use the following when working through these guides:

OAuth scope: https://www.googleapis.com/auth/assistant-sdk-prototype
Supported OAuth flows:
- (Recommended) Installed apps
- Web server applications
Note: The limited input device flow is not supported.

Check out the best practices on privacy and security for recommendations on how to secure your device.

Authenticate your gRPC connection with OAuth tokens

Finally, put all the pieces together by reading how to use token-based authentication with Google to authenticate the gRPC connection to the Assistant API.

Register your device

Implement a basic conversation dialog with the Assistant

Implement a bidirectional streaming gRPC client for the Google Assistant Service API.
Wait for the user to trigger a new request (e.g., wait for a GPIO interrupt from a button press).
Send an AssistRequest message with the config field set (see AssistConfig). Make sure the config field contains the following:
- The audio_in_config field, which specifies how to process the audio_in data that will be provided in subsequent requests (see AudioInConfig).
- The audio_out_config field, which specifies the desired format for the server to use when it returns audio_out messages (see AudioOutConfig).
- The device_config field, which identifies the registered device to the Assistant (see DeviceConfig).
- The dialog_state_in field, which contains the language_code associated with the request (see DialogStateIn).
Start recording.
Send multiple outgoing AssistRequest messages with audio data from the spoken query in the audio_in field.
Handle incoming AssistResponse messages.
Extract conversation metadata from the AssistResponse message. For example, from dialog_state_out, get the conversation_state and volume_percentage (see DialogStateOut).
Stop recording when receiving a AssistResponse with an event_type of END_OF_UTTERANCE.
Play back audio from the Assistant answer with audio data coming from the audio_out field.
Take the conversation_state you extracted earlier and copy it into the DialogStateIn message in the AssistConfig for the next AssistRequest.

With this, you should be ready to make your first requests to the Google Assistant through your device.

Extend a conversation dialog with Device Actions

Extend the basic conversation dialog above to trigger the unique hardware capabilities of your particular device:

In the incoming AssistResponse messages, extract the device_action field (see DeviceAction).
Parse the JSON payload of the device_request_json field. Refer to the Device Traits page for the list of supported traits. Each trait schema page shows a sample EXECUTE request with the device command(s) and parameters that are returned in the JSON payload.

Get the transcript of the user request

If you have a display attached to the device, you might want to use it to show the user request. To get this transcript, parse the speech_results field in the AssistResponse messages. When the speech recognition completes, this list will contain one item with a stability set to 1.0.

Get the text and/or visual rendering of the Assistant's response

If you have a display attached to the device, you might want to use it to show the Assistant's plain text response to the user's request. This text is located in the DialogStateOut.supplemental_display_text field.

The Assistant supports visual responses via HTML5 for certain queries (What is the weather in Mountain View? or What time is it?). To enable this, set the screen_out_config field in AssistConfig. The ScreenOutConfig message has field screen_mode which should be set to PLAYING.

The AssistResponse messages will then have field screen_out set. You can extract the HTML5 data (if present) from the data field.

Submitting queries via text input

If you have a text interface (for example, a keyboard) attached to the device, set the text_query field in the config field (see AssistConfig). Do not set the audio_in_config field.

Troubleshooting

See the Troubleshooting page if you run into issues.