As far as I understood it is pretty similar to what you described with an image. With images you have the RGB to display all your colors. In sound these are all various frequencies. And instead of aligning them in 2D what you’d do in a picture you align the frequencies in a series/time. That’s how I understood it but by all means no expert in this field
Latest Answers