WebRTC Connectivity
Describes how the various WebRTC-related protocols interact with one another in order to create a connection and transfer data and/or media among peers.
Resource: WebRTC connectivity - MDN
Signalling
WebRTC can't create connections without some sort of server in the middle. We call this the signal channel or signaling service
The information we need to exchange is the Offer and Answer which just contains the SDP
Offer: A special description is created by caller. This description includes all the information about the caller's proposed configuration for the call.
Answer: A description of their end of the call.
For example:
-
Peer A who will be the initiator of the connection, will create an Offer.
-
They will then send this offer to Peer B using the chosen signal channel.
-
Peer B will receive the Offer from the signal channel and create an Answer.
-
They will then send this back to Peer A along the signal channel.
Session descriptions
The configuration of an endpoint on a WebRTC connection is called a session description.
The description includes information about the kind of media being sent, its format, the transfer protocol being used, the endpoint's IP address and port, and other information needed to describe a media transfer endpoint. This information is exchanged and stored using SDP
When a user starts a WebRTC call to another user, the caller create an Offer. The recipient then responds with an Answer. In this way, both devices share with one another the information needed in order to exchange media data.
This exchange is handled using ICE, a protocol which lets two devices use an intermediary to exchange offers and answers even if the two devices are separated by NAT.
Each peer, then, keeps two descriptions on hand: the local description, describing itself, and the remote description, describing the other end of the call.
These are the basic steps which must occur to exchange the Offer and Answer, leaving out the ICE layer:
-
The caller captures local Media via
MediaDevices.getUserMedia
-
The caller creates
RTCPeerConnection
and callsRTCPeerConnection.addTrack()
(SinceaddStream
is deprecating) -
The caller calls
RTCPeerConnection.createOffer()
to create an Offer. -
The caller calls
RTCPeerConnection.setLocalDescription()
to set that Offer as the local description(that is, the description of the local end of the connection). -
After setLocalDescription(), the caller asks STUN servers to generate the ICE candidates
-
The caller uses the signaling server to transmit the Offer to the intended receiver of the call.
-
The recipient receives the Offer and calls
RTCPeerConnection.setRemoteDescription()
to record it as the remote description (the description of the other end of the connection). -
The recipient does any setup it needs to do for its end of the call: capture its local media, and attach each media tracks into the peer connection via
RTCPeerConnection.addTrack()
-
The recipient then creates an Answer by calling
RTCPeerConnection.createAnswer()
. -
The recipient calls
RTCPeerConnection.setLocalDescription()
, passing in the created answer, to set the answer as its local description. The recipient now knows the configuration of both ends of the connection. -
The recipient uses the signaling server to send the Answer to the caller.
-
The caller receives the Answer.
-
The caller calls
RTCPeerConnection.setRemoteDescription()
to set the answer as the remote description for its end of the call. It now knows the configuration of both peers. Media begins to flow as configured.
Pending and current descriptions
Because during renegotiation, an offer might be rejected because it proposes an incompatible format, it's necessary that each endpoint have the ability to propose a new format but not actually switch to it until it's accepted by the other peer. For that reason, WebRTC uses pending and current descriptions.
-
The
current description
(which is returned by theRTCPeerConnection.currentLocalDescription
andRTCPeerConnection.currentRemoteDescription
properties) represents the description currently in actual use by the connection. This is the most recent connection that both sides have fully agreed to use. -
The
pending description
(returned byRTCPeerConnection.pendingLocalDescription
andRTCPeerConnection.pendingRemoteDescription
) indicates a description which is currently under consideration following a call tosetLocalDescription()
orsetRemoteDescription()
, respectively. -
When reading the description (returned by
RTCPeerConnection.localDescription
andRTCPeerConnection.remoteDescription
), the returned value is the value ofpendingLocalDescription
/pendingRemoteDescription
if there's a pending description (that is, the pending description isn'tnull
); otherwise, the current description (currentLocalDescription
/currentRemoteDescription
) is returned.
ICE candidates
As well as exchanging information about the media, peers must exchange information about the network connection. This is known as an ICE candidate and details the available methods the peer is able to communicate (directly or through a TURN server).
Typically, each peer will propose its best candidates first, making their way down the line toward their worse candidates. Ideally, candidates are UDP (since it's faster, and media streams are able to recover from interruptions relatively easily), but the ICE standard does allow TCP candidates as well.
Generally, ICE candidates using TCP are only going to be used when UDP is not available or is restricted in ways that make it not suitable for media streaming. Not all browsers support ICE over TCP, however.
Trickle ICE WebRTC sample web: Trickle ICE WebRTC
UDP candidate types
Type | Definition |
---|---|
host | A host candidate is one for which its ip address is the actual, direct IP address of the remote peer. |
prflx | A peer reflexive candidate is one whose IP address comes from a symmetric NAT between the two peers, usually as an additional candidate during trickle ICE (that is, additional candidate exchanges that occur after primary signaling but before the connection verification phase is finished). |
srflx | A server reflexive candidate is generated by a STUN/TURN server; the connection's initiator requests a candidate from the STUN server, which forwards the request through the remote peer's NAT, which creates and returns a candidate whose IP address is local to the remote peer. The STUN server then replies to the initiator's request with a candidate whose IP address is unrelated to the remote peer. |
relay | A relay candidate is generated just like a server reflexive candidate ("srflx" ), but using TURN instead of STUN. |
TCP candidate types
Type | Definition |
---|---|
active | The transport will try to open an outbound connection but won't receive incoming connection requests. This is the most common type, and the only one that most user agents will gather. |
passive | The transport will receive incoming connection attempts but won't attempt a connection itself. |
so | The transport will try to simultaneously open a connection with its peer. |