## Wednesday, January 22, 2003

### A Request for Bad Statistics

For an MBA course I’m teaching this semester, I will be giving a lecture on the use and abuse of numbers and statistics, and also a lecture on the use and abuse of tables and figures. I plan to structure both lectures in “greatest misses” format – using examples of people employing numbers and figures in bogus or misleading ways. If you’ve read Darrell Huff’s book _How to Lie With Statistics_, you know the kind of thing I’m talking about; unfortunately, Huff’s examples are rather dated, and I’m looking for more recent ones. I have lots of old examples in my head, but I really prefer documented cases – “I saw an article one time that said…” isn’t good enough.

So, I’m hoping to solicit my readers’ assistance. If you know of any good examples of numbers/stats/tables/figures being used badly, please send me an email. I need full references or webpage locations if at all possible. I’m looking for examples in various categories, including:

• Confusion of mean and median (or mean and mode, or mode and median). For example, it’s often observed that the average (i.e., mean) time it takes a woman to conceive a child is several months, but most women (i.e., the mode) conceive much more quickly – within a couple of months. The average is higher because of a small number of women who take a really long time to conceive.
• Confusion of absolute numbers with changes in numbers, or confusion of changes in numbers with percent changes in numbers. For example, during the 2000 election season I recall hearing that Houston’s air quality had actually gotten worse than L.A.’s – but it turned out that while Houston’s had gotten worse over time (compared to Houston before) and L.A.’s had gotten better (compared to L.A. before), L.A.’s air quality was still worse than Houston’s.
• Providing figures that are meaningless without a figure for comparison. For example, saying that some very high percentage (say, X%) of all traffic fatalities involve people not wearing seatbelts, without telling us what percentage of *all* drivers wear seatbelts. (If that figure were also X%, we might conclude that seatbelts have no effect!)
• Creating a pie chart that leaves out the “other” category. For example, showing a pie chart with market shares for only the top four firms in an industry, even though together they constitute only a fraction of the market.
• Creating a line graph that does not show a trend over time, but instead connects unlike things. (“What’s this big jump in the line show? Oh, it shows the jump in GDP you get when you go from Mexico to Japan.”)

This is not a comprehensive list, of course. I’d appreciate any ideas or contributions you have.