What is the information bottleneck method in deep learning?

764 views

I don’t have any background in machine learning or AI so I am struggling to understand the IB method.

In: Technology

Anonymous 0 Comments

Traditional statistics have sought to figure out a way to separate signal from noise. Noise could be measurement error, random chance, or something extraneous we can’t predict.

The problem with noise is that you can’t win when working big problems. Typically, ten variables is considered the maximum number of variables you can juggle before everything falls apart. Why is this? The signal you are looking for has a fixed size C. When you have k variables, you have k times as much noise. Eventually, k pieces of noise will drown out C.

Bottleneck approach says something like this. I have a picture of a dog. The key concept is dog. What if I could compress the picture of the dog (size 100,000 bytes) into three bytes — the word ‘dog.’ In this way, you “squeeze” out the noise and also “squeeze” out some signal… such as that adorable puppy face … distilling it until you are only left with the truth. It’s a picture of a dog!

This is a principal proposing how deep learning works.

More specifically, it’s an analysis where you compress information while trying to keep the “essence” of the information the same.

For example, a picture of a line could be compressed to the equation y=mx+b. Both are accurate descriptions of the same thing, but the latter takes up much less space.

To sum up, traditional statistics tries to hear the entire speech, despite the loud crowd. Bottleneck would aim to summarize the speech. As it turns out, the summary will be very accurate despite the loud crowd. By ignoring the words, you can still get the “gist” and always get it right.