Whats The Process Before My Voice Is Sent To A Cell-tower in a Phone Call?


So like, when I speak into my phone while on a call, whats the process before it gets sent out to the other person? Like my voice has to be turned into a digital signal somehow? And how does what software understand any of that?

All modern phones are completely digital these days. Everything is transmitted as 1s and 0s. When you speak into a cellphone, a tiny microphone in the handset converts the up-and-down sounds of your voice into a corresponding up-and-down pattern of electrical signals (think record into mp3). A microchip inside the phone turns these signals into strings of numbers. The numbers are packed up into a radio wave (like your car radio signal, but much more secure) and beamed out from the phone’s antenna. The radio wave races through the air at the speed of light until it reaches the nearest cellphone tower, where it is digitized once again and sent to the person who called you.

An analog signal can be turned into digital information. We should all be pretty familiar with that at this point. This happens in CDs and DVDs, for example. So I assume that’s not exactly what you’re asking.

Interestingly, many cell phone standards specify that your voice be highly compressed before being sent over the radio link by means of a *vocoder*. This is a special processor that models the human vocal tract and extracts a series of symbols from speech being spoken by the user. These symbols are then sent over the radio link as digital information and the process is reversed at the other end. The advantage of this arrangement is that a human voice can be reduced to one or two hundred bits per second, an extremely low data rate compared to just digitizing the analog signal. This is the reason that music and other non-speech sounds so horrible over a cell phone.

Fun fact: Different human languages require different vocoders since the speech sounds different! Also, there is an annual competition for people who design these things to see who is doing the best job of compressing and rebuilding voice. It’s a whole technical field in its own right.

When the sound is recorded on the cellphone as part of a video, is it processed differently from the voice on a call?

Depends what technology you’re using for the small steps, but in the big picture, your voice is an analog signal that gets turned into digitized PCM stream. It is down sampled to 8000 Hz first (landline quality, though VoLTE is 22500Hz or even 44100Hz, closer to music quality) then fed into a codec to compress it to with a ratio of 30:1 to 15:1 (for comparison, streaming music with good quality is 10:1 or higher and sampled at 44100Hz). Basically, taking the audio stream, removing extraneous and unnecessary noise/signals, and producing a much smaller data stream to send to the tower.

Once it is submitted to the tower, the digitized stream traverses the carrier’s network until it reaches the Public Switched Telephone Network or another carrier’s gateway, routed on that destination network, and the process is reversed then encoded with that network’s technology to be transmitted to the other party’s receiver.

Edit: forgot to say the call is off course encrypted to the tower

A microphone vibrates, moves electrons in sync with the vibrations, which is measured by a computer chip canned an analog to digital converter. The digital data is fed to the phone system processor/modem (separate from the processor that runs apps!) and massaged into the right format to send over the air. The phone waits until it’s its turn to speak on the air and then sends the data.

Digital audio is just a measurement of where the microphone is, repeated thousands of times a second. Then we play it back on a speaker, making the same vibration pattern. Sound is air vibrations; air vibrates microphones and speakers vibrate air.