If images are composed of the horizontal and vertical formation of pixels on a screen, what is “sound” as a data type?

1.25K views

I’m not a CS graduate or anything. I am a self-taught developer. This is what I’ve been wondering quite a while now.

I kinda know how images work. Yes, the format specifications might differ. A PNG file and a JPG file are different about how they *store* the image data. However, at the end of the day, images are the horizontal and vertical formation of pixels on a screen.

Yet, I do not know what a “sound” is. Images have a unit like “pixel”, what is the unit of sound?

Please note that I’m talking about “sound” as data, not as a physical event.

In: Technology

7 Answers

Anonymous 0 Comments

The differences between the two senses determine the way we encode information digitally: Hearing works by noticing pressure differences, while sight works by capturing incoming photons. So you can see ‘white’, but you can’t hear ‘1 Sample of Volume 255’.

Mental experiment time!

Imagine you have a screen with a 1×1 resolution. Yes, a single, gigantic pixel. Also, for simplicity, let’s assume that this is a greyscale 8-bit system: 0 is black, and 255 is full white.

We start with a pixel value of 0. Unwavering, frozen in time, a full black screen. This is a single frame.

Now, if I make it vary from 0 to 255 in 500ms and back to 0 in 500ms, that means that now I have a 1 full cycle/second frequency rate, or 1 Hertz (1 Hz). This is the encoded frequency.

Now imagine that, instead of a screen, I have a speaker connected to this system. Pixel value 128 is speaker at rest, 255 makes the diaphragm distend fully forward, and 0 fully backward. (You can see it in action here: https://www.youtube.com/watch?v=Qu5sqpFDYn8).

The musical note middle C is has a frequency of about 261Hz. So if I want to hear that note in our speaker system I need to make full cycles (from 128 to 255 to 0 and back to 128) 261 times per second. This can’t be done in a screen limited to, say, 60 Hz – but our system can change that value 44,100 times per second (44.1kHz), so it’s easy to accommodate a 261Hz encoded frequency. (Incidentally, 44.1kHz is the sampling rate for audio CDs: https://en.wikipedia.org/wiki/Sampling_(signal_processing))

Volume is controlled by limiting how much the ‘pixel’ can vary around the 128 value. Low ‘volume’ would be variances of medium gray (say, 118-138) while full blast would make the pixel travel the full range between 0 and 255.

Audio is encoded as a single ‘pixel’ per channel (1 for mono, 2 for stereo, 6 for 5.1, etc.); and songs are the audio equivalent of movies, since they describe how the ‘pixel’ change in time.

Phew, that was long. And certainly not for 5yos…

You are viewing 1 out of 7 answers, click here to view all answers.