26 April 2010


Two more arguments for learning statistics

One of my repeated themes here over the years is how genuinely lousy the human brain is at intuitively understanding probability and statistics. Two articles this week had me thinking about it again.

The first was Clive Thompson's latest opinion piece in Wired, "Why We Should Learn the Language of Data," where he argues for significantly more education about stats and probability in school, and in general, because:

If you don't understand statistics, you don't know what's going on—and you can't tell when you're being lied to.

Climate change? The changing state of the economy? Vaccination? Political polls? Gambling? Disease? Making decisions about any of them requires some understanding of how likelihoods and big groups of numbers interact in the world. "Statistics," Thompson writes, "is the new grammar."

The second article explains a key example. At the NPR Planet Money blog (incidentally, the Planet Money podcast is endlessly fascinating, the only one clever enough to get me interested in listening to business stories several times a week), Jacob Goldstein describes why people place bad bets on horse races.

After exhaustive statistical analyses (alas, this stuff isn't easy), economists Erik Snowberg and Justin Wolfers have figured out that even regular bettors at the track simply misperceive how bad their bets are, especially when wagering on long shots—those outcomes that are particularly unlikely, but pay off big if you win, because:

...people overestimate the probability of very rare events. "We're dreadful at perceiving the difference between a tiny probability and a small probability."

In our heads, extremely unlikely things (being in a commercial jet crash, for instance) seem just as probable, or even more probable, than simply somewhat unlikely things (being in a car crash on the way to the airport). That has us make funny decisions. For instance, on occasion couples (parents of young children, perhaps) choose to fly on separate planes so that, in the rare event that a plane crashes, one of them survives. But they both take the same car to the airport—as well as during much of the rest of their lives—which is far, far more likely to kill them both. (Though still not all that likely.)

Unfortunately, so much of probability is counterintuitive that I'm not sure how well we can educate ourselves about it for regular day-to-day decision-making. Even bringing along our iPhones, I don't think we should be using them to make statistical calculations before every outing or every meal. Besides, we could be so distracted by the little screens that we step out into traffic without noticing.

Our minds are required be good at filtering out irrelevancies, so we're not overwhelmed by everything going on around us. But the modern world has changed what's relevant, both to our daily lives and to our long-term interests. The same big brains that helped us make it that way now oblige us to think more carefully about what we do, and why we do it.

Labels: , , , , , , ,


you're pretty smart
Interesting article. I know quite a few companies that split their management team up when they go on a retreat, just in case a plane were to crash.

I think the quality of data is always bound to the ability to measure that data accurately. Yes, on paper, many events seem completely improbable, but we know that those events seem to happen anyways. A good example is the subprime mortgage crisis, which on paper, probably shouldn't have happened due to the low probability of all those mortgages defaulting at the same time. Most statistics measure the likelihood of independent events occurring, when most events are dependent on each other in the real world.
I think the subprime mortgage collapse was more a matter of the statistics being divorced from reality. Stats and probability don't help if they're sucking in garbage data.

Predictions and pricing on the securities that were assembled from those mortgages were based on the way mortgages used to work, i.e. there were criteria that required you be able to pay your mortgage before you got one. So default rates were low then.

When mortgages started being sold in large numbers to people who should not have qualified for them, then the risk assessments should have changed, but they didn't.

If instead of being based on default statistics for previous bundles of mortgages, the analyses had looked at, say, the average income and job stability of the mortgage holders compared to the prices of the houses they were borrowing to buy—the actual mortgage-holders in the securities, not their predecessors buying houses in the '70s or '80s or '90s—then the problem might have been obvious. "Oh, in this investment, most of the people borrowing money probably won't be able to pay it back when their rates change or if housing prices flatten."

That's the "if you're being lied to" part of the Wired article. If journalists and politicians and investors and the general public had been able to look at the AAA bond ratings for the subprime mortgage securities and say, "Those are based on home buyers from the past, not from the present," and then asked some questions about the statistics of the real buyers in the present, maybe a potential collapse would have seemed more predictable.

There were a lot of mathematical whizzes working on creating and trading those collateralized debt obligations and credit default swaps, but a lot of their algorithms were working on vapour, not real information. Maybe on purpose.