音频最佳实践
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
本页包含有关如何向 Google Assistant API 提供语音数据的建议。这些准则旨在提高操作效率和准确性,同时保证服务的合理响应时间。
音频预处理
最好使用品质良好的麦克风并将其放置在适当位置,以便提供尽可能纯净的音频。但是,在将音频发送到服务之前对其应用降噪信号处理通常会降低识别准确性。该服务旨在处理嘈杂音频。
为了达到最佳效果,请注意以下事项:
- 将麦克风放置在尽可能靠近用户的位置,尤其是当存在背景噪声时。
- 避免音频剪辑。
- 不要使用自动增益控制 (AGC)。
- 应停用所有降噪处理。
理想情况下:
- 应校准音频电平,以使输入信号不会裁剪,并且峰值语音音频电平达到大约 -20 到 -10 dBFS。
- 设备应表现出大致“平坦”的幅频特性(+-3 dB,100 Hz 到 8000 Hz)。
- 当输入等级为 90 dB SPL 时,从 100 Hz 到 8000 Hz 时,总谐波畸变率应小于 1%。
采样率
如果可能,请将音频源的采样率设置为 16000 Hz。否则,请将 sample_rate_hertz
设置为与音频源的原生采样率一致(而不是重新采样)。
帧大小
Google 助理会在从麦克风捕获音频时识别实时音频。音频流必须拆分为帧并以连续的 AssistRequest
消息的形式发送。任何帧大小都可以接受。较大的帧效率更高,但会增加延迟时间。为了在延迟时间和效率之间取得较好的平衡,建议使用 100 毫秒的帧大小。
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-07-26。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["没有我需要的信息","missingTheInformationINeed","thumb-down"],["太复杂/步骤太多","tooComplicatedTooManySteps","thumb-down"],["内容需要更新","outOfDate","thumb-down"],["翻译问题","translationIssue","thumb-down"],["示例/代码问题","samplesCodeIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-07-26。"],[[["\u003cp\u003eThis page provides recommendations for submitting speech data to the Google Assistant API for optimal performance.\u003c/p\u003e\n"],["\u003cp\u003eFor best results, use a high-quality microphone, position it close to the user, avoid audio clipping and noise reduction processing, and disable automatic gain control.\u003c/p\u003e\n"],["\u003cp\u003eIdeally, calibrate audio levels to prevent clipping, maintain a flat frequency response, and minimize harmonic distortion.\u003c/p\u003e\n"],["\u003cp\u003eSet the audio source sampling rate to 16000 Hz if possible, or match the native rate, and use a frame size of around 100 milliseconds for a balance between latency and efficiency.\u003c/p\u003e\n"]]],[],null,["# Best Practices for Audio\n\nThis page contains recommendations on how to provide speech data to the\nGoogle Assistant API. These guidelines are designed for greater efficiency\nand accuracy as well as reasonable response times from the service.\n\nAudio pre-processing\n--------------------\n\nIt's best to provide audio that is as clean as possible by using a good quality\nand well-positioned microphone. However, applying noise-reduction signal\nprocessing to the audio before sending it to the service typically reduces\nrecognition accuracy. The service is designed to handle noisy audio.\n\nFor best results:\n\n- Position the microphone as close to the user as possible, particularly when background noise is present.\n- Avoid audio clipping.\n- Do not use automatic gain control (AGC).\n- All noise reduction processing should be disabled.\n\nIdeally:\n\n- The audio level should be calibrated so that the input signal does not clip, and peak speech audio levels reach approximately -20 to -10 dBFS.\n- The device should exhibit approximately \"flat\" amplitude versus frequency characteristics (+- 3 dB 100 Hz to 8000 Hz).\n- Total harmonic distortion should be less than 1% from 100 Hz to 8000 Hz at 90 dB SPL input level.\n\nSampling rate\n-------------\n\nIf possible, set the sampling rate of the audio source to 16000 Hz. Otherwise,\nset the [`sample_rate_hertz`](/assistant/sdk/reference/rpc/google.assistant.embedded.v1alpha2#google.assistant.embedded.v1alpha2.AudioInConfig) to match the native sample rate of the audio source (instead\nof re-sampling).\n\nFrame size\n----------\n\nThe Google Assistant recognizes live audio as it is captured from a microphone.\nThe audio stream must be split into frames and sent in consecutive\n`AssistRequest` messages. Any frame size is acceptable. Larger frames are more\nefficient, but add latency. A 100-millisecond frame size is recommended as a\ngood tradeoff between latency and efficiency."]]