How Voice And Video Call Works?

Voice over Internet Protocol (VoIP) is one of the most popular standards for voice and video calling over the web.

This post is all about how the voice and video call works on high level.

We all use voice and video on the various platform like WhatsApp, Skype, Messenger, Facebook and etc.

Basically, both voice and video call depend on how we stream media between the two clients which are connected to each other. So, there must be something that can do the work of media streaming from one client to another client.

For media streaming, we need to know about the WebRTC.

WebRTC is a free, open project that provides browsers and mobile applications with Real-Time Communications (RTC) capabilities via simple APIs. The WebRTC components have been optimized to best serve this purpose.

But there are many other things which we need to do as WebRTC is not enough for complete implementation.

Other things are:

  • Signaling.
  • STUN Server.
  • TURN Server.

Signaling

What is Signaling?

In order to setup a call between two clients, both the clients must conform to each other by sending key data, messages, metadata about the media. Over signaling, we do these things.

We can use the WebSocket for the purpose of signaling.

It is just used to know that these two clients want to connect to each other for the call.

Peer to Peer Connection

After signaling, we need to connect both the clients peer to peer. And for connecting, we must have the public IP address of both the clients.

So, in order to get the public IP address, we use STUN Server.

STUN Server

STUN Server is used to get the public IP address.

Why need public IP address?

A Public IP Address is an IP address that is globally unique across the Internet. Only one device may be in possession of a public IP address.

A Private IP Address is an IP address that is not globally unique and may exist simultaneously on many different devices. A private IP address is never directly connected to the Internet. Devices that possesses a private IP address will be in their own unique IP space (e.g. different companies or domains).

As the NAT(Network Address Translation) provides the local IP address of the device which can’t be used publicly to connect peer to peer. And for WebRTC, we need to have the public IP address. STUN Server provides that.

If everything is fine, we get the public IP addresses of both the clients and then, we connect both the clients through WebRTC to start the call. WebRTC handles all the media streaming.

The real world connectivity is not ideal.

In case, we are not able to get the public IP address of both the clients. Then we can’t connect peer to peer. In that case, we need TURN Server.

TURN Server

TURN Server is used to connect both the clients if peer to peer fails by acting as a mediator. Basically, it takes the data from one client and sends to another client. So, it’s job is to relay media.

This way, the two clients start talking to each other.

The other small data which are not related to media like a client cuts the call, any setting changes, messages and etc are sent over the signaling process.

The following question arises in mind.

Why WebRTC can’t do signaling?

Answer: To avoid redundancy and to maximize the compatibility with established technologies, the signaling methods and protocols are not specified by the WebRTC Standards.

WebRTC is optimized for media.

So, this way the voice and video call works.

Very soon, I will write on how it works on low level(like how WebRTC works internally). Stay tuned.

Happy Learning :)

Also, Let’s become friends on Twitter, Linkedin, Github, and Facebook.