So I understand the basic principles behind noise cancellation. You essentially use a microphone to record incoming sound waves and create an inverse wave that destructively interferes with the initial wave, thus, cancelling it out. But I don’t understand, practically, how this is done.
Let’s assume the sound wave makes contact with the microphone in the AirPod, which analyses the wave and shoots out an inverse wave, but by that point – the initial sound wave would surely have already reached my ears. The AirPod basically needs to cancel the sound wave before it moves roughly a centimetre or it’s too late.
The speed of sound (in a standard environment like air) is 343 meters per second or 34,300 centimetres per second; this means the AirPod has 1/34,300 seconds or ~0.03 miliseconds to do these operations to cancel the wave. That just seems absurd to me for such a tiny chip in the bloody AirPod.
Someone fix my confusion please.
In: 678
30 microseconds is not a problem – even 1980s computers could perform several operations (additions/subtractions) in that time.
Modern cheap microcontrollers can perform about 20 operations in a single microsecond (the number taken from ATtiny, which is also quite small). So they can do 600 operations in the required time.
30 microseconds is a plenty of time.
One way to quickly process the signal is to do an FFT, converting a time domain signal into a frequency domain representation of the signal.
The frequency domain of a consistent signal is pretty much static, meaning you don’t need to act that fast on time domain to be able to predict and generate the counter signal, you can pretty much just generate it constantly and it always work.
Of course noises have changing frequency domain characteristics, so the analysis is constantly being done and the anti-noise signal is constantly being generated using the latest analysis results.
something to remember is that when Singers perform live, the signal goes from the microphone, to a stack of equipment, sometimes even things like pitch correction or auto-tune, and then to the speakers. and they still sound in-time with the rest of the band. sounds take time to happen and are not instantaneous. 100ms is a short sound.
This is why noise cancellation works better on low-frequency sounds than high-frequency. Typically they work best on frequencies lower than [1000 Hz.](https://www.nytimes.com/wirecutter/blog/what-noise-cancelling-headphones-do/) — that’s 1000 times per second. The electronics must detect the sound and create a cancelling audio waveform faster than that — say, 10,000 or 50,000 audio measurements per second.
I can’t find technical specs for the microprocessor in AirPods, but it’s quite common for embedded chips like these to have clock speeds of 10 or 50 *million* code operations per second. Which means they have time to do on the order of 1000 code operations per audio measurement.
Which is plenty.
Point is, sound is fast, but computers are faster.
0.03 ms is 30 us.
Modern small microcontrollers operate at around 50 MHz (some lower, some much higher), which results in clock cycles of 1/50000000 s or 20 ns.
So at least for the digital side, 30 us is quite a lot of time.
On top of that, you can use DSPs which are very fast and efficient in processing signals, as they are specifically made for that purpose.
It’s entirely possible to process a recorded sound in real time and determine the next amplitude, compensate for distances/phases and so on and control a speaker. You can probably even predict the waveform. And because a headphone is used, the distance from speaker to ear is always the same, making it much easier to optimize the system.
It would be quite possible to experiment with it using some off the shelf electronics. However I think the actual algorithms major brands use are most likely proprietary. You can create a working concept, but it won’t be as refined.
The microphones pick up the incoming sound and do a process called Fourier Analysis to identify the fundamental notes and *timing* (or *phase*) of those notes (simple sine waves at different frequencies that, when added together, make the final sound)
They then attempt to play the sum of the opposite waves at the right frequency *and* timing (phase) to cancel out that *actual* wave – not the repeated drone of that wave but that actual wave it heard. But this has to be synchronised correctly with the input wave (or it only makes things worse).
It can do this because it processes the signal so fast (and transmits what it wants to playback so quickly over copper wires) and sound travels so slowly (300 million m/s vs 300 m/s) so it can identify and start playback of the anti-sound at the right time before the wave passes the point before your ears where the speaker is… it’s listening an inch away from your eardrum, and calculating and playing the anti-sound in maybe half an inch of the the sound travelling thru air.
And it’s doing this continuously, listening to each wave in realtime as it’s actually arriving and actively cancelling each actual constantly varying wave.
Now unfortunately, due to various theoretical limits of the mathematical model used, it needs to listen to the sound for some time (I think it’s for at least 1.5 wavelengths of each fundamental note it can identify) – this how Fourier Analysis works, but you also need to be sampling the sound at twice the frequency of the each note you want to identify through the analysis.
If you have a 100 Hz wave the whole wave is 10 ms long. So, if you sample the signal at 200Hz, invert it and reproduce it with a lag of less than 1 ms you’re can do that. If you have a 1 kHz wave (which is “mid range” in audio) the whole wave is only 1 ms long. You’d have to be sampling the sound at 2Khz, and reproduce the signal in phase in less than 0.1 ms. As the frequency increases this reaction speed becomes difficult to impossible. And maintaining phase coherence is not that easy at higher frequency (amplitude difference is not such an issue) when you don’t know the precise distance between the mic and speaker.
So in the time it takes sound to travel that far, between the mic and the speaker, even if it could process the info in absolutely zero time, there’s a limit on the frequencies that can be effectively cancelled – low notes are easy but midrange and higher frequencies much less so.
Latest Answers