Stay organized with collections
Save and categorize content based on your preferences.
The Google Meet Media API lets your app join a Google Meet
conference and consume real-time
media streams.
Clients use WebRTC to communicate with Meet servers. The provided
reference clients (C++,
TypeScript) demonstrate recommended practices and
you are encouraged to build directly upon them.
However, you may also build fully custom WebRTC clients that adhere to
the Meet Media API's technical
requirements.
This page outlines key WebRTC concepts required for a successful
Meet Media API session.
Offer-answer signaling
WebRTC is a peer-to-peer (P2P) framework, where peers communicate by signaling
each other. To begin a session, the initiating peer sends an SDP
offer to a remote peer. This offer
includes the following important details:
Media descriptions for audio and video
Media descriptions indicate what's communicated during P2P sessions. Three
types of descriptions exist: audio, video, and data.
To indicate n audio streams, the offerer includes n audio media descriptions
in the offer. The same is true for video. However, there will only be one data
media description at most.
Directionality
Each audio or video description describes individual Secure Real-time Transport
Protocol (SRTP) streams, governed by RFC
3711. These are bi-directional,
allowing two peers to send and receive media across the same connection.
Because of this, each media description (in both the offer and answer) contains
one of three attributes describing how the stream should be used:
sendonly: Only sends media from the offering peer. The remote peer
won't send media on this stream.
recvonly: Only receives media from the remote peer. The offering peer
won't send media on this stream.
sendrecv: Both peers may send and receive on this stream.
Codecs
Each media description also specifies the codecs a peer supports. In the case of
the Meet Media API, client offers are rejected unless they support
(at least) the codecs specified in the technical
requirements.
DTLS handshake
SRTP streams are secured by an initial
Datagram Transport Layer Security ("DTLS", RFC
9147) handshake between the peers.
DTLS is traditionally a client-to-server protocol; during the signaling process,
one peer agrees to act as the server while the other acts as a peer.
Because each SRTP stream might have its own dedicated DTLS connection, each
media description specifies one of three attributes to indicate the peer's role
in the DTLS handshake:
a=setup:actpass: The offering peer defers to the choice of the
remote peer.
a=setup:active: This peer acts as the client.
a=setup:passive: This peer acts as the server.
Application media descriptions
Data channels (RFC 8831) are
an abstraction of the Stream Control Transmission Protocol ("SCTP", RFC
9260).
To open data channels during the initial signaling phase, the offer must contain
an application media description. Unlike audio and video descriptions,
application descriptions don't specify direction or codecs.
ICE candidates
A peer's Interactive Connectivity Establishment ("ICE", RFC
8445) candidates are a list of
routes that a remote peer may use to establish a connection.
The cartesian product of the two peers' lists, known as the candidate pairs,
represents the potential routes between two peers. These pairs are tested to
determine the optimal route.
Figure 1. Example offer with an audio media description.
The remote peer responds with an SDP
answer containing the same number
of media description lines. Each line indicates what media, if any, the remote
peer sends back to the offering client across the SRTP streams. The remote
peer might also reject specific streams from the offerer by setting that media
description entry to recvonly.
For the Meet Media API, clients always send the SDP offer to initiate
a connection. Meet is never the initiator.
This behavior is managed internally by the reference clients
(C++, TypeScript),
but developers of custom clients can use WebRTC's PeerConnectionInterface to
generate an offer.
To connect to Meet Meet, the offer must adhere to specific
requirements:
The client must always act as the client in the DTLS handshake, so every
media description in the offer must specify either a=setup:actpass or
a=setup:active.
Each media description line must support all required
codecs for that media
type:
Audio:Opus
Video:VP8, VP9, AV1
To receive audio, the offer must include exactly 3 receive-only audio media
descriptions. You can do this by setting transceivers on the peer connection
object.
C++
// ...rtc::scoped_refptr<webrtc::PeerConnectionInterface>peer_connection;for(inti=0;i < 3;++i){webrtc::RtpTransceiverInitaudio_init;audio_init.direction=webrtc::RtpTransceiverDirection::kRecvOnly;audio_init.stream_ids={absl::StrCat("audio_stream_",i)};webrtc::RTCErrorOr<rtc::scoped_refptr<webrtc::RtpTransceiverInterface>>
audio_result=peer_connection->AddTransceiver(cricket::MediaType::MEDIA_TYPE_AUDIO,audio_init);if(!audio_result.ok()){returnabsl::InternalError(absl::StrCat("Failed to add audio transceiver: ",audio_result.error().message()));}}
JavaScript
pc=newRTCPeerConnection();// Configure client to receive audio from Meet servers.pc.addTransceiver('audio',{'direction':'recvonly'});pc.addTransceiver('audio',{'direction':'recvonly'});pc.addTransceiver('audio',{'direction':'recvonly'});
To receive video, the offer must include 1–3 receive-only video media
descriptions. You can do this by setting transceivers on the peer connection
object.
C++
// ...rtc::scoped_refptr<webrtc::PeerConnectionInterface>peer_connection;for(uint32_ti=0;i < configurations.receiving_video_stream_count;++i){webrtc::RtpTransceiverInitvideo_init;video_init.direction=webrtc::RtpTransceiverDirection::kRecvOnly;video_init.stream_ids={absl::StrCat("video_stream_",i)};webrtc::RTCErrorOr<rtc::scoped_refptr<webrtc::RtpTransceiverInterface>>
video_result=peer_connection->AddTransceiver(cricket::MediaType::MEDIA_TYPE_VIDEO,video_init);if(!video_result.ok()){returnabsl::InternalError(absl::StrCat("Failed to add video transceiver: ",video_result.error().message()));}}
JavaScript
pc=newRTCPeerConnection();// Configure client to receive video from Meet servers.pc.addTransceiver('video',{'direction':'recvonly'});pc.addTransceiver('video',{'direction':'recvonly'});pc.addTransceiver('video',{'direction':'recvonly'});
The offer must always include data channels. At minimum, the
session-control and media-stats channels should always be open. All data
channels must be ordered.
C++
// ...// All data channels must be ordered.constexprwebrtc::DataChannelInitkDataChannelConfig={.ordered=true};rtc::scoped_refptr<webrtc::PeerConnectionInterface>peer_connection;// Signal session-control data channel.webrtc::RTCErrorOr<rtc::scoped_refptr<webrtc::DataChannelInterface>>
session_create_result=peer_connection->CreateDataChannelOrError("session-control",&kDataChannelConfig);if(!session_create_result.ok()){returnabsl::InternalError(absl::StrCat("Failed to create data channel ",data_channel_label,": ",session_create_result.error().message()));}// Signal media-stats data channel.webrtc::RTCErrorOr<rtc::scoped_refptr<webrtc::DataChannelInterface>>
stats_create_result=peer_connection->CreateDataChannelOrError("media-stats",&kDataChannelConfig);if(!stats_create_result.ok()){returnabsl::InternalError(absl::StrCat("Failed to create data channel ",data_channel_label,": ",stats_create_result.error().message()));}
JavaScript
// ...pc=newRTCPeerConnection();// All data channels must be ordered.constdataChannelConfig={ordered:true,};// Signal session-control data channel.sessionControlChannel=pc.createDataChannel('session-control',dataChannelConfig);sessionControlChannel.onopen=()=>console.log("data channel is now open");sessionControlChannel.onclose=()=>console.log("data channel is now closed");sessionControlChannel.onmessage=async(e)=>{console.log("data channel message",e.data);};// Signal media-stats data channel.mediaStatsChannel=pc.createDataChannel('media-stats',dataChannelConfig);mediaStatsChannel.onopen=()=>console.log("data channel is now open");mediaStatsChannel.onclose=()=>console.log("data channel is now closed");mediaStatsChannel.onmessage=async(e)=>{console.log("data channel message",e.data);};
Example SDP offer and answer
Here's a full example of a valid SDP offer and matching SDP answer. This offer
negotiates a Meet Media API session with audio and a single video
stream.
Observe there are three audio media descriptions, one video media
description, and the required application media description.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-02-24 UTC."],[[["The Google Meet Media API enables applications to join Google Meet conferences and receive real-time media streams, relying on WebRTC for peer-to-peer communication."],["Offer-answer signaling, facilitated by the Meet REST API, is crucial for establishing WebRTC sessions, with the initiating peer sending an SDP offer and receiving an SDP answer from the remote peer."],["Clients connecting to Google Meet must support specific codecs (Opus for audio, VP8, VP9, AV1 for video), act as the DTLS client, include at least three `recvonly` audio descriptions, and always include data channels."],["Media descriptions specify the type of media (audio, video, data), with directionality (sendonly, recvonly, sendrecv) determining stream usage and direction, governed by SRTP."],["SDP media descriptions include the type of media (audio, video, or application/data), which IP and port it uses, the ICE credential, the DTLS fingerprint and the header extensions it supports, like the time offset, the content type, the mid and the rtp-stream-id, among others."]]],[]]