This isn’t exactly a standalone explanation, someone else did a good explanation, but I wanted to talk about interference in a bit more depth. When the speaker cone moves outward, it increases the pressure of the air in front of it. When it pulls back, it decreases the pressure. That alternating high/low/high/low pressure wave moves out in roughly a sphere from the speaker. After a few feet, though, the sound wave has collided with some of the things in the room and bounced off. This means you may be hearing two different wavefronts hitting your eardrums at the same time without even realizing it. The speed of sound is so fast that even if it’s not exactly the same time, you won’t be able to tell that there are two voices or anything, but interference can still happen.
So imagine you and the speaker are both centered left-right in the room. As the sound propagates from the speaker, the wavefront hits both left and right walls at the same time. They reflect, and these new wavefronts both hit you at the same time. That means both are high pressure at the same time and both are low pressure at the same time. The high adds to the high and the low adds to the low, meaning even louder sound. Not a ton louder because the bounce off the wall absorbed some of the energy, but some. Now imagine taking a small step left or right. You’re now closer to one wall that the other. That wavefront hits you slightly sooner in it’s travel while the other hits you slightly later. If you moved just the right amount, the high from one hits you the same time the low from the other. These cancel out and you have no pressure difference from ambient air. Stay still and wait a tiny fraction of a second because both are oscillating, you’ll get low pressure from the first and high pressure from the other. Exactly opposite of a moment ago, but still cancelling out. You hear no sound (or rather, less sound because there are still many other reflections going on that can’t all cancel out perfectly. With 2 speakers and a very well padded room, you might get to a point where it’s inaudible.)
If you were to keep moving, you would eventually meet back up where both are high at the same time, but one is a full cycle behind the other. The lowest sound humans can perceive is usually about 20Hz. Music with deep bass probably rarely goes below 50. So one cycle out of phase means a 20ms delay. Human reaction times are typically in the hundreds of milliseconds, so like I said, probably imperceptible.
Latest Answers