I watched a show today where they talked about isolating certain sounds (such as a particular speaker) from the background of a noisy restaurant to allow better conversations in a busy environment. I have also seen software programs that can isolate particular instruments/vocals from a song. How does this work? I understand in a video image you can use techniques like edge detection and like to define the boundaries of an object and isolate it from the background, but this is because you have a matrix of pixels to compare. With audio, you only have a single waveform that is the combined contribution of all the sounds coming into the ear or that make up the recording. How do you split that back out into its individual components? How do you isolate particular sounds, especially speech or instruments that can span a wide range of frequencies and volumes, from the rest of the noise? I just don’t understand how you undo the summation of the individual waveforms and pull them out from everything else.
In: 1
Sound comes as a certain set of vibrating waves. The different the wave is shaped the different it sounds. If you measure the length of each wave you’ll be able to see certain sounds group at certain value of Hertz (the measure of the waves length). If you take out for example, all sound at x hertz, you may be eliminating all low end booming like a bass drum etc
Latest Answers