“There are three kinds of lies: lies, damned lies, and statistics.”
Attributed to Benjamin Disraeli
Truth or Truthiness: Distinguishing Fact from Fiction by Learning to Think Like a Data Scientist by Howard Wainer is a study of the information and how it is used in modern society. Wainer is an American statistician, past Principal Research Scientist at the Educational Testing Service, adjunct Professor of Statistics at the Wharton School of the University of Pennsylvania.
We are bombarded with statistics in our daily lives. One news station will show statistics that the economy is failing another that things are on the upswing. The death penalty is a crime deterrent, but Texas continues to execute more people than any other state (but ranks 11th if taken by executions per capita) and 16th in violent crime. Alaska ranks first in violent crime and has no death penalty. Vermont has the lowest level of violent crime and no death penalty. There is seems to be little in correlation in the death penalty and violent crime. Perhaps more information is needed. Wainer brings to the table a simpler example to the table that there is a strong correlation between the number of people eating ice cream and the number of people drowning. When ice cream eating spikes, so do the number of drownings. There must be a connection between the two. Actually, there isn’t. When the weather gets warmer more people take part in eating ice cream and swimming. The more people that swim the higher the number of drowning victims. One could take these figures and, wrongly, conclude that swimming and eating ice cream leads to higher temperatures perhaps a point for snowball throwing Senator Jim Inhofe
Missing information and how it is treated is as important as the information present. A company questionnaire asks employees how happy they are are with their jobs. The company reported that 80% of the respondents were happy or very happy. What is missing from the equation is that only 22% of the employees were motivated enough to complete the questionnaire. Many times missing information is much more complicated. In long-term studies, not everyone continues the study. If the study was following smokers, for example, what is to be done with the subjects that quit smoking? Those who die from non-smoking related disease and accidents? Those who just don’t want to participate anymore? Wainer gives examples and ways to deal with missing information without skewing the results.
Other problems are what about information that was not realized at the time. Cigarette smoking was a leading cause of preventable death in America and obesity was not that great of a concern. The problem was that smokers tended generally to be thinner than nonsmokers skewing the rate information on obesity. Thinner people died at a higher rate than the obese because of the number of smokers. Wainer takes on a variety of popular issues such as SAT tests, Teacher tenure, fracking, test cheating, standardized tests and a variety of other hot social issues. He starts slowly with simple examples separating truth from truthness and move to more complex problems. He even examine graphs and shows how results can be hidden by the type of graph being used.
Truth or Truthiness is a study of understanding information and data and interpreting it in a useful manner. It means for us to question what we see and hear to check the data and who supplies the data and determine how truthful it really is or if it is simply serving another group’s needs by appealing to your emotions and “gut feelings.” A very good read in our age of quick information, unofficial polls, and truthness.