How Voice And Video Call Works?


I am Amit Shekhar, writing this article to share my knowledge on video calling implementation.

Voice over Internet Protocol (VoIP) is one of the most popular standards for voice and video calling over the web.

This post is all about how voice and video call works on a high level.

Banner Image Credit: WhatsApp Official Website

We all use voice and video on various platforms like WhatsApp, Skype, Messenger, Facebook, and etc.

Basically, both voice and video call depend on how we stream media between the two clients which are connected to each other. So, there must be something that can do the work of media streaming from one client to another client.

For media streaming, we need to know about WebRTC.

WebRTC is a free, open project that provides browsers and mobile applications with Real-Time Communications (RTC) capabilities via simple APIs. The WebRTC components have been optimized to best serve this purpose.

But there are many other things which we need to do as WebRTC is not enough for complete implementation.

Other things are:

  • Signaling.
  • STUN Server.
  • TURN Server.
Signaling Turn Stun Server


What is Signaling?

In order to set up a call between two clients, both the clients must conform to each other by sending key data, messages, metadata about the media. Over signaling, we do these things.

We can use WebSocket for the purpose of signaling.

It is just used to know that these two clients want to connect to each other for the call.

Peer to Peer Connection

After signaling, we need to connect both the client peer to peer. And for connecting, we must have the public IP address of both the clients.

So, in order to get the public IP address, we use the STUN Server.

STUN Server

STUN Server is used to get the public IP address.

Why need a public IP address?

A Public IP Address is an IP address that is globally unique across the Internet. Only one device may be in possession of a public IP address.

A Private IP Address is an IP address that is not globally unique and may exist simultaneously on many different devices. A private IP address is never directly connected to the Internet. Devices that possess a private IP address will be in their own unique IP space (e.g. different companies or domains).

The NAT(Network Address Translation) provides the local IP address of the device which can’t be used publicly to connect peer to peer. And for WebRTC, we need to have the public IP address. STUN Server provides that.

If everything is fine, we get the public IP addresses of both the clients and then, we connect both the clients through WebRTC to start the call. WebRTC handles all the media streaming.

The real world connectivity is not ideal.

In case, we are not able to get the public IP address of both the clients. Then we can’t connect peer to peer. In that case, we need the TURN Server.

TURN Server

TURN Server is used to connect both the clients if peer to peer fails by acting as a mediator. Basically, it takes the data from one client and sends it to another client. So, it’s job is to relay the media.

This way, the two clients start talking to each other.

The other small data which are not related to media like a client cuts the call, any setting changes, messages and etc are sent over the signaling process.

The following question arises in mind.

Why WebRTC can’t do signaling?

Answer: To avoid redundancy and to maximize the compatibility with established technologies, the signaling methods and protocols are not specified by the WebRTC Standards.

WebRTC is optimized for media.

So, this way the voice and video call works.

That's it for now.

Happy Learning :)

Show your love by sharing this blog with your fellow developers.

Amit Shekhar

Also, Let’s become friends on Twitter, Linkedin, Github, Quora, and Facebook.