Caleb’s Concepts: The difference between data and information


Caleb Garbuio, Columnist


If you tune into any news channel, experts will often educate listeners on what the data is telling them, be it COVID-19 cases, inequality, police violence or something else. Yet, what these experts often overlook is that many viewers may not know what data means and how it can be used to explain problems.

In its simplest term, data represents noise, where it exists in a meaningless void of randomness. Imagine a piece of paper with a random distribution of words. The words have value as numbers have value, yet they do not mean anything outside of their associated meaning. Take census data: census takers come to your home and ask demographic questions that eventually are recorded for clerical purposes. This data becomes information when the raw data is manipulated to give meaning. In the case of the U.S. Census, its information serves to contextualize demographic data for the United States.

Going back to our previous example, words are to data what information is to sentences. For example, the words “dog,” “the,” “ugly,” and “dumb” can have different interpretations, yet, when combined can have specific meanings like “the dumb dog is ugly.” Data and information operate under similar rules. The number of children can have many different interpretations by itself, yet, when you add additional data, the meaning becomes more specified. For example, if you wanted to tell how many children live in poverty, any sheet will give you an understanding of how many children live in poverty.  However, to gain an understanding of the magnitude of the problem you could divide the number of children living in poverty by total children and you have information about the problem.

Yet, the information process doesn’t stop there, you can discover relationships between data variables to find patterns of data behavior. You can compare income with poverty and age to determine if there is a relationship between age and income. Also, mathematical tests exist that determine the strength of the relationship and if the relationship is positive or negative, therefore informing people about different variable relationships.

When the experts speak about data, they are informing you about the questions they asked. For example, statisticians rely on the scientific method to determine the significance of a relationship. Statisticians always test their ideas against an existing hypothesis, called a null hypothesis, and if the data suggests that the null hypothesis does not hold, they reject the null in favor of an alternative one. There are problems to this method, and mathematical bias can impact the severity of the results. However, any conclusion drawn from data is inherently falsifiable, meaning that the conclusions can be proven false with new information.

That is not to say that the information is wrong, it just means that the data can be proven false if new data is introduced to the data set.