Quantitative Data Analysis – Causal Explanations

Most people are probably familiar with the old adage; correlation does not necessarily mean causation. For many researchers the proverbial end goal of their work is to find causal explanations; how does the introduction of an independent variable, x, effect a dependent variable, y. Unfortunately, as we will probably come to know with our own research, the real world isn’t quite as nice and easy to understand as a linear expression. Often somewhere between the and  there is a t , u, v, and w that we need to account for that also have an effect on the dependent variable.

That doesn’t necessarily mean that and y don’t have some perceivable connection, however. As Schutt describes in Investigating the Social World, there are five criteria that must be satisfied when considering whether a causal relationship exists.

  1. Association: Does a change in happen at the same time as a change in y? At its most simplest, this involves seeing whether cases where the independent variable is different also differ in terms of the dependent variable. Take a crosstabulation of Hours Studied and Grades on an Exam, for example: If 16 people who studied for 3 hours received a mean grade of 79 on an exam, and people who studied for 10 hours received a mean grade of 95 on an exam, you might be able to say that the amount of hours studied and grades on an exam are connected, or associated.
  2. Time Order: Did the change in happen after the change in x? If you wanted to assert that an independent variable caused a change in a dependent variable, you would have to illustrate that the change in the dependent variable only happened after the variation in the dependent variable.
  3. Nonspuriousness: Was the change in x and y due to a third variable? When deciding whether or not a causal relationship exists, we as researchers need to be certain that something else that we are not accounting for is happening at the same time. If we were to use the Hours Studied vs. Grades on an Exam example, what if students who studied more also saw tutors for extra help? If that were the case, we would not be able to say that studying more causes higher exam grades because studying alone may not have solely caused the increase.
  4. Randomization: Were participants in the research randomly sorted into their respective groups? This is essentially a means of controlling for spuriousness as well; by randomly assigning participants to research groups, you alleviate the risk of some extraneous variable disproportionately affecting one of the conditions for your independent variable.
  5. Statistical Control: Can we hold other variables constant and focus solely on the and y? Again, this is a way of controlling for unforseen influences by other variables. To use the hours studied vs. Exam Grades example, the researcher may acknowledge that tutoring can also influence exam grades and therefore gather data using only exams taken by students who do not recieve tutoring.

As illustrated, the criteria for determining a causal relationship are quite strenuous. It is nigh impossible to meet all of these criteria when doing non-Experimental research, especially if we are analyzing bivariate data. The possibility that a third, unforseen variable may not have been accounted for the the research design is incredibly present in non-Experimental research, as is establishing a time order between changes in the independent and dependent variables.

I will go back to one of my favorite studies to illustrate this point: Did Welfare Reform Cause the Caseload Decline? After AFDC was replaced with TANF in the late 1990’s, most politicians touted the success of welfare reform at changing the face of american poverty because the number of people receiving aid dropped so drastically. It seems to make sense; welfare reform passes and the caseload declines. Post hoc ergo propter hoc. However, the two researchers, Caroline Danielson and Jacob Klerman, believed that other variables had influenced this relationship.

The authors believed, and illustrated using multivariate regression modeling, that not all policy changes enacted during the 90’s had an equal effect on the caseload decline, and that the bolstering economy had a significant effect that was not accounted for in the “welfare caused the caseload decline” narrative. By focusing solely on the type of welfare program (independent variable) and the number of people receiving welfare (dependent variable), other important factors that contributed to the declining caseload were ignored (bolstering economy, dismissal from welfare rolls due to stringent work requirements, etc.).

This piece also illustrates just how messy it can be to avoid spuriousness. The authors illustrate that the majority of the caseload decline was not attributable to either the economy or new welfare policies, but some other variable(s) that they themselves couldn’t determine. Even though this research satisfies most of the other criteria (association being clear, time order established by having a clear starting point for welfare reform policies), even they could not determine what exactly caused the caseload decline. This research is still important though; even if it may not be able to establish a causal explanation, it can at least dispel the myth that welfare reform alone was the cause of the caseload decline.

 
Privacy Statement