The law of averages

Posted December 11, 2009

where X₁, X₂, ... is an infinite sequence of independent and identically-distributed random variables with finite expected value E(X₁) = E(X₂) = ... = µ < ∞.

There's a lot going on here, which probably convinced laypeople to make up their own definitions. My goal is to set the record straight.

First up: the term "independent and identically-distributed random variable" is a long-winded way of explaining, for example, what number a die ends up showing after you roll it. If the die shows 3, that doesn't affect the next roll or the previous roll, and so it's "independent" from them. And die rolls are always "indentically-distributed" because you're comparing die rolls to die rolls, not die rolls to coin tosses. "Identically-distributed" is the math cliché equivalent of comparing apples to apples. Finally, each die roll gives a "random variable" because it's a number you can't predict.

Got that? Die rolls are independent and identically-distributed random variables. If you're still shaky on all those words, don't bother squinting: just go along with the die-rolling example and you'll do fine.

Onward: E(X₁) is the "expected value" of roll number one. Since dice fall on any of 1, 2, 3, 4, 5 or 6 with equal chances, the "expected value" is the average, 3.5. Normal people don't "expect" 3.5, but mathematicians do, and they must be right because they taught Vegas how to win mountains of money.

The other bits of the equation should fall into place more easily. "Expected value E(X₁) = E(X₂)". That means the "expected value" of roll number one is the same as the "expected value" of roll number two.

"Well, of course it is," the careful reader observes, "because 3.5 is equal to 3.5, right?"

And the careful reader is right. "E(X₁) = E(X₂) = ..." is redundant, but mathematicians put it in anyway because they want to remind themselves what "independent and identically-distributed" means. They find symbols easier to understand than words.

And "µ"? That's just a made-up shorthand for "E(X₁)", which is 3.5 for us.

Okay. We've figured out "independent and identically-distributed" (die-rolls). We've wrapped our head around "finite expected value µ" (3.5). Now let's work on "X₁". That's what you get when you rummage through your drawer, find a die, and roll it. We expect 3.5, because we think we're smart, but we're wrong: it comes up with a whole number, like 2. We can do the same with X₂, but that might give us 5.

And the "X_n" with a bar above it? That's the average of all the Xs we get: X₁ and X₂ and X₃ and every other X there is. Each X is one roll, and we're going to do as many of them as we possibly can.

Got all that? Then you're done with context. The mind-blowing magic is contained in the equation's arrow. The arrow says, "if you roll enough times, the average result will get pretty darned close to 3.5."

It's not the same as just adding up all the sides and dividing by six. To prove the Law of Large Numbers, you really have to sit there and roll a die all day. And the next day. And every day after that. It'll take infinite quantities of paper and pencils, causing total deforestation and the end of life as we know it.

But mathematicians found shortcuts. They managed to prove the law in twelve hundred years.

This is the law of averages, in plain English: "If you do something lots of times, you'll find the average result comes close to the theoretical average."

On average, you'll roll a 6 one-sixth of the time. So you can make the law of averages tell you: "If you roll a die a thousand times, you're bound to roll 6, maybe around 167 times."

But you can't make it tell you: "That next roll is bound to come up six." The law doesn't say anything about individual rolls.

That's why mathematicians tried to name it the "law of large numbers" instead of the "law of averages": because the "large numbers" bit is the key to the whole phenomenon. They never said you could predict a single event.

When the top three Google News hits for "law of averages" are a basketball coach calling it the "law of averages" when his players get ten swooshes in a row, a school-teacher calling it the "law of averages" when her curling team sweeps past rivals and a public safety communication director using the "law of averages" to explain why there are glitches in radio systems, you can tell there's a rift between their math and real math.

The mistake is that they didn't know the law can only predict large numbers of results, not single ones.

By their logic, the next Google News hit for "law of averages" is bound to apply it correctly. And then, when it does, the next time I touch a piano a stunning rendition of Rachmaninoff's third piano concerto is bound to reverberate from its soundboard.

As a journalist, I sometimes feel the urge to quote people who use the "law of averages"—I want to fit in, after all—but so far I've checked myself. When I cave, my mathematician friends will gnash their teeth in rage and reach for their (perfectly spherical) machetes before calming themselves just enough to write furious blog posts lamenting the usage which will evade persecution despite clearly and provably being against the law.