Caleb’s Concepts: A critique of statistical reasoning

Caleb Garbuio, Columnist
October 30, 2020

Here’s a joke: what is two plus two? To a mathematician the answer is clearly four. However, to a statistician the answer is between 3.99 and 4.01, while the accountant makes it whatever you want it to be. This comical joke illustrates an uncomfortable truth about numerical and statistical evidence: it is not absolute.

In our everyday lives we are bombarded with data on how we should live. For example, the Brookings Institution ranks universities based on their median starting salary of graduates and within 10 years of graduation. This “empirical” reality aims to inform prospective students and their families, helping them choose their universities wisely. It is no secret that Ivy League graduates earn higher starting salaries than non-Ivy alumni. Based on raw data, Ivy Leagues guarantee success, compared to other universities. Therefore, Ivy Leagues are better schools.

This conclusion is built on the idea that universities, and not the students, are responsible for economic success. Yet, researchers that prove empirically that Ivy schools often overlook a question challenging their assumptions: what happens if a Harvard student opted to attend a less-prestigious institution? Economists Alan Krueger and Stacy Dale looked at the data to find answers.

They found the impact of an Ivy League education is often reduced when accounting for other factors, such as GPA and SAT scores. To put it simply, Ivy League universities attract the students who make exceptionally-high test scores, therefore giving the impression that the higher wages are a product of the university versus the student.

This error in statistics is known as omitted variable bias, which makes significant results greater than they actually are. Think about it, many variables that seek to answer a question often are related to each other. For example, think about wage growth: there is a high correlation between wage growth and years of experience. More-experienced workers typically earn more than less-experienced workers. In this model, there is a significant difference between workers that have experience and workers that do not. However, experience alone cannot explain all the differences in wages. So, we add variables to the equation, such as productivity. Productivity and experience are highly correlated as well. So, the experience’s impact is drastically reduced if productivity is included within the model. Therefore, productivity is a better predictor for future wage growth, according to the American Enterprise Institute. However, this conclusion has been disputed in recent years because some metrics suggest that wages have not kept pace with productivity, according to the Economic Policy Institute. The point here is not to determine the causality of productivity and higher wages, but to highlight the power of omitted variable bias.

Ultimately, the goal of statistical reasoning is to make predictions based on the available data. One such method is the Ordinary Least Squares, which aims to minimize the difference between different data points. This method reduces the difference between data and gives a simple equation where you input a certain amount number and receive a certain number. For example, to determine the effect poverty has on crime, we would regress crime levels on income. This would result in an equation seeking to explain the variation in the data. However, most simple models cannot accurately predict all the data’s variation. This is called the R-squared or the difference between a perfect relationship depicted as one and the residuals of the model. The higher the R-squared the better the prediction, while a perfect R-squared is equal to one. Yet, there are rarely any relationships that are actually perfect. Take for example, the link between cigarette smoking and cancer. These variables are highly correlated, but not perfectly correlated and cannot with 100% certainty be said to cause cancer. In other words, not everyone that smokes will have cancer. Think about it, maybe you’ve heard of someone’s grandparent that smokes a pack a day and is still kicking it at 90. These people are known as outliers and cause mayhem to otherwise-sound models. Does that mean we should ignore the model? Absolutely not. Rather, we should use statistics as a general rule versus an objective and unchanging truth.

The goal of statistics is to make inferences on people based on available data. It is impossible for statisticians to accurately depict an entire group of people. Rather, they rely on random sampling within that group to make generalizations about the entire population. Obviously, the more people sampled, the better the results and the closer a sample gets to infinite, the more accurate the results are. In his treatise Critique of Pure Reason, Immanuel Kant dissects this claim and proves its ridiculousness. Think about it: do you need to flip a coin an infinite amount of times to realize that its probability of landing tails is 50%. You don’t, and you don’t need an infinite sample to reasonably assume the probability of a population. However, you can never assume that your data is perfect. Rather, it is conditional on degrees of confidence or the likelihood of obtaining results, should we draw a sample from another population. The more confident we are about our sampling, the greater the likelihood of obtaining the same results from another randomly selected sample.

In other words, the results returned from any analysis are subject to constraints, or how likely it is that the results are accurate. Most statistically-inclined people, namely economists and statisticians, will tell you that we should only look at results that have confidence intervals of 90% or above. The greater the confidence interval, the greater the result’s accuracy. However, not every statistically-significant variable has any significant effect. Therefore, results must be both statistically significant and have a significant effect to have discernible impact.

So what does this mean? Statistics should never be treated as an “objective” source of information, like mathematics. Rather, they are tools to summarize complicated problems in understandable ways. They should never be treated like mathematics –– where there are clearly defined right and wrong answers –– rather, it is a tool that should be used to explore complicated problems that do not have right and wrong answers.

The beauty of statistical reasoning is that it is falsifiable, which according to philosopher Karl Popper, makes it scientifically sound. However, unlike mathematics, statistical reasoning is not absolute, which makes its practitioners prone to bias. One such bias is motivated reasoning where statistics are used to empirically verify deep seated beliefs that researchers have. Therefore, take great care to avoid jumping to conclusions automatically. There are many factors to consider. Lest we become empirical torturers as Ronald Coase puts it, “if you torture data, it will confess.”