Voice is sound. Sound is a moving wave of pressure, with function to time. That means, one can approximately represent a sound digitally, by measuring their intensity for every time interval.
So now you have sound data, but represented in numbers (these numbers are represented in ones and zeroes too). There are a lot of numbers, 48000 in one second, assuming 48 kHz audio. But no worries, internet bandwidth is far beyond enough for this amount of data.
On the other end, simply convert back these numbers to voltages. High numbers meaning high voltage, and vice versa. These voltages get pumped through a speaker cone, which moves proportionally to the voltage, which moves air proportionally as well, which then generates the sound you can hear.
Latest Answers