Saturday, October 29, 2011

More about lying with statistics


"What have you got against statistics?"  a Facebook friend asked after reading the piece, 'Lies, damned lies and statistics.' "Nothing at all", I replied.  "I only distrust those who work with statistics."  Then I told her a couple of real-life stories.

One of them will bear repeating.  Thirteen years ago, I was working on a research project which involved a lot of statistical analysis of data.  I consulted two well-known statisticians in Chennai who agreed to do the work for me for a hefty fee.  They said that, in their analysis, they would be using three tests (namely, Kruskal-Wallis, Wilcoxon-Mann-Whitney, and Kolmogorov Smirnov) which, they claimed, were "spanking new".  A month later, they gave the results.  When I checked their calculations at random, I found even their arithmetic incorrect.  I had to get the entire data reanalyzed, and it was done with great competence by a young man in Vijayawada.  The young man told me that the three tests used in the analysis were far from new: they had been in use even two decades ago.  He showed me the 1977 edition of a book called Non-Parametric Statistical Inference by JD Gibbons, which spoke about the tests.

Lots and lots of such lies pass unnoticed, thanks to what the mathematician John Allen Paulos calls our "mathematical illiteracy".  In a lovely little book called Innumeracy he has written, Paulos points out that most people are uncomfortable with basic mathematical principles which makes them poor judges of the numbers they encounter.  We accept statistics, both good and bad, with reverence because we don't have a good head for figures.

I never had a good head for figures when I was a student.  But a friend of mine whom we called Statistics Srivatsan had a wonderful head.  At the slightest provocation, he would launch into a lengthy statistical explanation.  Once a language teacher asked him why he was often late.  "Often?" Srivatsan asked with a quizzical expression on his face, and continued, "So far in this academic year, you've taken 98 periods, sir.  And I've come to 82 of them on time, which is 8.3 per cent higher than the class average of …"

Once Srivatsan quizzed me with a statistic.  "Do you know, Ramanujam, that, in this world, every ten seconds, a woman gives birth to a child?  What do you think about it?"  "Think about it!" I burst out. "Who is that irresponsible woman? We must tell her to stop it at once."  Srivatsan fixed me with a gaze of cold hardness that would have frozen an Eskimo.

I wonder where Statistics Srivatsan is now.  Wherever he is, he must be a statistician; I can't picture him as anything else.  And he must be lying with statistics as impressively as he did when the teacher asked him whey he was often late.

Lies, damned lies and statistics


A story I read in the Lifestyle magazine fascinated me.  A woman was told that she would be granted three wishes but that her husband would get ten times more or better than whatever she wished for.  For the first two wishes, she wanted to become the richest and the most beautiful woman in the world, and both were granted.  Her third wish was malicious: "I'd like a mild heart attack."

Of course, the woman had a mild heart attack, but what would the husband have got for the third wish?  He would certainly have had a heart attack.  But would it have been ten times severer or ten times milder than the wife's? 

That depends on how we interpret the condition in relation to the wish.  And if we are required to generalize the "sample" (one woman) to the "population" (all women), what conclusion can we come to about women in general?  That also depends on our interpretation.  If our interpretation is that the husband's heart attack was ten times severer, then women are clever indeed.  If, on the contrary, our interpretation is that it was ten times milder, women are dumb, though they think they are smart.  Most women might argue that the first interpretation is the correct one, and most men, the second.  Truth, it is said, "will out".  But it looks as though there are no truths; there are only interpretations.

If you think I am splitting hairs, you are right.  And I am splitting hairs because I have just finished reading two books about statistics in which there is plenty of hair-splitting: How to Lie with Statistics, written more than forty five years ago, and Damned Lies and Statistics, published in 2001.

Joel Best, the author of the second book, talks about the following statistic he once came across in an article in a prestigious journal: "Every year since 1950, the number of American children gunned down has doubled."  What does it mean?  Well, let's assume that the number of American children gunned down in 1950 was one.  If the number doubled each year, there must have been two children gunned down in 1951, four in 1952, eight in 1953, and so on.  By 1980, the number would have been one billion (more than four times the total US population that year).  By 1995, when the article was published in the journal, the annual number of children gunned down in America would have been over 35 trillion.  Absurd!  Actually, the author of the journal article had borrowed the statistic from a 1994 yearbook in which the information had been given as follows: "The number of American children killed each year by guns has doubled since 1950."  In other words, the deaths in 1994 were twice as many as in 1950.  But the statistic had got garbled in the journal article. And considering the reputation of the journal, the statistic must have been uncritically accepted and quoted by several researchers.

Let me go back to the point I made with reference to that funny story with a clever punchline: you can prove anything with statistics if only you know the "art" of interpretation.  Benjamin Disraeli was not wide of the mark when he said: "There are three kinds of lies: lies, damned lies and statistics."