If images are composed of the horizontal and vertical formation of pixels on a screen, what is “sound” as a data type?

856 views

I’m not a CS graduate or anything. I am a self-taught developer. This is what I’ve been wondering quite a while now.

I kinda know how images work. Yes, the format specifications might differ. A PNG file and a JPG file are different about how they *store* the image data. However, at the end of the day, images are the horizontal and vertical formation of pixels on a screen.

Yet, I do not know what a “sound” is. Images have a unit like “pixel”, what is the unit of sound?

Please note that I’m talking about “sound” as data, not as a physical event.

In: Technology

7 Answers

Anonymous 0 Comments

The differences between the two senses determine the way we encode information digitally: Hearing works by noticing pressure differences, while sight works by capturing incoming photons. So you can see ‘white’, but you can’t hear ‘1 Sample of Volume 255’.

Mental experiment time!

Imagine you have a screen with a 1×1 resolution. Yes, a single, gigantic pixel. Also, for simplicity, let’s assume that this is a greyscale 8-bit system: 0 is black, and 255 is full white.

We start with a pixel value of 0. Unwavering, frozen in time, a full black screen. This is a single frame.

Now, if I make it vary from 0 to 255 in 500ms and back to 0 in 500ms, that means that now I have a 1 full cycle/second frequency rate, or 1 Hertz (1 Hz). This is the encoded frequency.

Now imagine that, instead of a screen, I have a speaker connected to this system. Pixel value 128 is speaker at rest, 255 makes the diaphragm distend fully forward, and 0 fully backward. (You can see it in action here: https://www.youtube.com/watch?v=Qu5sqpFDYn8).

The musical note middle C is has a frequency of about 261Hz. So if I want to hear that note in our speaker system I need to make full cycles (from 128 to 255 to 0 and back to 128) 261 times per second. This can’t be done in a screen limited to, say, 60 Hz – but our system can change that value 44,100 times per second (44.1kHz), so it’s easy to accommodate a 261Hz encoded frequency. (Incidentally, 44.1kHz is the sampling rate for audio CDs: https://en.wikipedia.org/wiki/Sampling_(signal_processing))

Volume is controlled by limiting how much the ‘pixel’ can vary around the 128 value. Low ‘volume’ would be variances of medium gray (say, 118-138) while full blast would make the pixel travel the full range between 0 and 255.

Audio is encoded as a single ‘pixel’ per channel (1 for mono, 2 for stereo, 6 for 5.1, etc.); and songs are the audio equivalent of movies, since they describe how the ‘pixel’ change in time.

Phew, that was long. And certainly not for 5yos…

Anonymous 0 Comments

Sound waves (or pixels) are human perceived concept that computers do not understand. Everything is represented as numbers.

You can define a pixel as position(x,y),and amounts of red,green and blue (for example 0 to 255)

When it comes to sound, you can also store order, frequency and amplitude information digitally. And the unit will be… a simple dot on a wave graph.

Because real life is more detailed than the digital world, while reproducing or editing a sound file, computers can mathematically connect the saved “dots” to create sound waves. So, more dots per second = higher resolution.

Some audio formats store each captured dots lossless (such as WAV,FLAC,ALAC) and some formats use algorithms to delete or modify information to compress and save space (hopefully) without getting detected by the human ear. (such as AAC, MP3)

Anonymous 0 Comments

The basic until of data in a sound file is “loudness”. Individual sounds are made up of a wave, of which there are two components: Amplitude (loudness) and Frequency (pitch). While the mapping between amplitude is obvious and the higher the frequency the higher the pitch. Now, more complex sounds like music and speech are made up of layering multiple individual sounds over top of each other. Some sounds formats, such as midi, attempt to recreate this by have a library of individual sounds which can be played at the required pitch at the required time, but they are generally not used for recording. Most sound recording is done by sampling the loudness of the sounds extremely fast, fast enough to be able to recreate the sound if played back at the same speed. If you can record at twice the frequency of the high individual sound, it is mathematically possible to recreate it without any loss of data. So, most sounds files are just streams of loudness-es at extremely high frequency.

Anonymous 0 Comments

There are many different formats of sound such as wav and mp3 but if I’m right the data is stored in bytes

Anonymous 0 Comments

So when we’re talking about digital audio, we still have to talk about sound as physical event because it’s important inasmuch as we need to know the mechanism that translates a pressure wave in the air into digital data works.

To record audio, a microphone (I won’t go into the details of how a microphone works because it isn’t relevant here) converts the pressure waves that are sound into fluctuations in electrical current. This fluctuating level of current is measured by a device called an analog-to-digital converter which samples the current many times a second (the most common standard is 44.1khz, or 44,100 times per second) and creates a number representing that electrical signal (which itself represents the frequency and amplitude of the sound wave) which can be read by a program. These numbers are ultimately stored in a file (like all things on computers, they are ultimate stored as as binary, but the file itself can use all available digits), so you might have a single sample that says “234325601” or something like that.

To play back the sound out of speakers, the process works in reverse. A program reads those numbers off of wherever the file was stored and tells a called a digital-to-analog converter to generate a strong or weak current on a wire depending on the number the program tells it to generate. Speakers then convert that electrical current into sound.

To get to the core of your question, you can say the “unit” of audio is the number that represents one sample, so if the sample rate is 44.1khz, there are 44,100 numbers, and thus 44,100 “units” of digital data per second of audio, ignoring compression.

Anonymous 0 Comments

The data is representing a frequency (tone) and amplitude (volume) of a wave output at any specific moment in time. This can be resolved as a sequence of simple voltage levels at a known repeating rate.

The output system converts those voltages into a representation of the original input wave, and feeds it to a speaker, which oscillates according to the input wave, producing actual sound.

So the sample rate of audio is how many voltage levels per second are represented, and the encoding defines how many bits represent each momentary voltage. That then effects the quality of the reproduced audio.

Anonymous 0 Comments

Audio sample point is pretty much analogous to image pixel.

Audio is a wave. The data stored is the position of the wave at that instant.

With something like Audacity you can zoom in close enough to see these individual data points.
Like this: https://i.imgur.com/6K494Jk.png
Each point is one sample. Usually there are 44100 samples in one second of audio (44.1 kHz sample rate).