Eli5 The sum rule for probability.


The sum rule: the probability of an event is the sum of all the joint probabilities with another event. P(X=x) = (y ∃ Ω) Σ P(X=x,Y=y)

Is this the same as the Law of Total Probability?

In: 0


First, you should be careful with your terms. “Event” is not interchangable with “random variable”, and your statement is one about random variables: events either happen or don’t, random variables take on numerical values. An event is a *set* of values for your random variables.

That said, there’s two ways to approach this.


One way is to think in terms of the *joint* probability distribution – that is, think about every possible *pair* of values X and Y could take on, and ask what the probability of that *pair* of values is. For example, if X and Y are both fair dice, there are 36 ordered pairs starting with (1,1), (1,2), (1,3),… and ending with …(6,4), (6,5), (6,6). Each of those 36 pairs has some probability – in the case of fair dice, they happen to all be equal – and those probabilities are a complete description of the entire system.

But you’re often interested in a *marginal* probability – that is, you’re interested in the distribution X takes on, without reference to Y. That is, we want to take all of those ordered pairs and group them by the X values, or a bit more formally, we want to say P(X = x) = P(any of the pairs for which X = x and Y = anything)

Since only one of the ordered pairs can happen, each ordered pair is mutually exclusive with any or all of the others – if any one of them happens, the others do not, and vice versa. And in general, the probability of any one of several mutually-exclusive events occurring is just the sum of each of their individual probabilities.

So P(X = x), our marginal, equals P(any of the pairs with X = x and Y = anything) by definition, and this in turn equals sum(P(X = x and Y = y)) for each individual y because we’re adding up all the mutually exclusive probabilities of each of the pairs.


Another way to think about it is, yes, that it’s just the law of total probability.

Define the event A to be the set of all ordered pairs where X = x. Then by the law of total probability:

P(X = x) = P(A) = sum(P(A intersect B_y)) for any set of pairwise disjoint events B_y. Then define your set of disjoint events B_y to just be the sets of ordered pairs where Y = y. Intuitively, you can think of the event A as identifying a particular “row” of a table of ordered pairs, while the events B_y identify “columns”. Then:

P(X = x) = P(A) = sum(P(A intersect B_y)) = sum(P(X = x, Y = y)).

and you’re done. (The continuous case requires a little more formality, but it works out the same way.)

Intuitively, what you’re saying is “the total value in a row is the sum of all the individual values across that row”. Which is very straightforward.