Why does making audio tracks shorter make them sound more high pitch



Because “pitch” is a measure of the number of vibrations a second for the sound and if you reduce the time that means you increase the pitch (frequency).

Sound comes from air wiggling, and the pitch of the sound is related to the speed; more rapid wiggling with less time between consecutive wiggles is a higher pitch. The simplest method of speeding up an audio recording is to just play it faster, which makes the wiggles happen more rapidly and reduces the time between wiggles, making the pitch rise. Because of how we perceive sound, doubling the speed will make the pitch rise by one octave.

As a side note, it is possible to speed up audio without changing the pitch. Try changing the speed of a Youtube video to see that in action. However, it does take a computer doing a bunch of math to make that work.

I assume by shorter, you mean speeding up the track.

You can think of sound as a wave. Imagine you drop a pebble into a still pond, and you see a wave ripple across the top. The wave always travels out from where you drop the pebble at the same speed, but what can change is the distance between each peak of the wave, each ripple in the water. We call this the wavelength.

Sound is similar: it propagates through air at the same speed (with small variations due to the current temperature and pressure of the air) no matter how loud, or how high-pitched. Higher-pitched sound has a shorter wavelength.

When you have an audio track, it’s essentially a representation of the waves that make the sound. There are different ways of representing these waves, depending on the format (analog tape, vinyl, CD, mp3, etc) but the representation gets converted to a wave at the end by your speaker or headphone. The representations you work with in audio editing software actually have waves more or less drawn out. If you zoom into your editor, you’ll see a bunch of picks and valleys, corresponding to the sound waves.

So if you speed up the track, what you’re essentially doing is compressing the whole representation so it fits into a shorter time. This means that the distance between each peak gets shorter. The wavelength gets shorter, the sound gets higher pitched.