Conversational Actions were deprecated on June 13, 2023. For more information, see Conversational Actions sunset.

Best Practices for Audio

Page Summary

Follow these guidelines for providing speech data to the Google Assistant API to improve efficiency, accuracy, and response times.
Use a good quality, well-positioned microphone, but avoid applying noise-reduction processing to the audio.
For best results, position the microphone close to the user, avoid clipping, disable automatic gain control and all noise reduction processing.
Ideally, calibrate audio levels to avoid clipping, ensure flat amplitude versus frequency characteristics, and maintain low total harmonic distortion.
Set the audio source sampling rate to 16000 Hz if possible, otherwise match the native sample rate.
When streaming live audio, split the stream into frames and send them in consecutive messages; a 100-millisecond frame size is recommended for a balance between latency and efficiency.

This page contains recommendations on how to provide speech data to the Google Assistant API. These guidelines are designed for greater efficiency and accuracy as well as reasonable response times from the service.

Audio pre-processing

It's best to provide audio that is as clean as possible by using a good quality and well-positioned microphone. However, applying noise-reduction signal processing to the audio before sending it to the service typically reduces recognition accuracy. The service is designed to handle noisy audio.

For best results:

Position the microphone as close to the user as possible, particularly when background noise is present.
Avoid audio clipping.
Do not use automatic gain control (AGC).
All noise reduction processing should be disabled.

Ideally:

The audio level should be calibrated so that the input signal does not clip, and peak speech audio levels reach approximately -20 to -10 dBFS.
The device should exhibit approximately "flat" amplitude versus frequency characteristics (+- 3 dB 100 Hz to 8000 Hz).
Total harmonic distortion should be less than 1% from 100 Hz to 8000 Hz at 90 dB SPL input level.

Sampling rate

If possible, set the sampling rate of the audio source to 16000 Hz. Otherwise, set the sample_rate_hertz to match the native sample rate of the audio source (instead of re-sampling).

Frame size

The Google Assistant recognizes live audio as it is captured from a microphone. The audio stream must be split into frames and sent in consecutive AssistRequest messages. Any frame size is acceptable. Larger frames are more efficient, but add latency. A 100-millisecond frame size is recommended as a good tradeoff between latency and efficiency.

Best Practices for Audio Stay organized with collections Save and categorize content based on your preferences.

Page Summary

Audio pre-processing

Sampling rate

Frame size

Best Practices for Audio