If you want to train someone or something to act in a particular way, the best thing to do is reward them, right?
The classic example (called an “[operant conditioning chamber](https://en.wikipedia.org/wiki/Operant_conditioning_chamber)” or a Skinner box after the famous behaviorist) is a mouse in a box that has a lever hooked up to a treat dispenser; when the mouse pushes the lever, it’s rewarded with a treat. Sometimes the lever is presented with a pair of lights to signal when the mouse should push the lever, and sometimes it’s dressed up in other ways, but the result is the same: push lever, receive treat.
This is *positive reinforcement* — the desired behavior is trained by providing a reward linked to it.
But we’re missing out on something — if a reward is consistently provided when the desired action is taken, the subject knows it can always be obtained, and so they’ll perform the action exactly and only when they want the reward.
If we instead rig the lever to a little computer chip that only provides the reward, say, 40% of the time — that is, *intermittently* — we cement the behavior way more reliably than we do with *constant* reinforcement, driving the subject to hammer at the lever a lot more, because they don’t know when they’ll get it.
This is the essence of *intermittent reinforcement,* and it’s so powerful a conditioning mechanism that it’s what drives gambling in all its forms, along with all manner of other behaviors.
Latest Answers