Secondary Data Analysis

At its most basic, secondary data analysis is the process of reanalyzing an existing data source with the intention of answering a hypothesis different from those hypothesized by whoever did the original research. Most commonly this involves the use of social science surveys and government data, however this method can be done using qualitative data, although as Schutt points out qualitative data is not nearly as readily available as quantitative data.

Perhaps the most obvious advantage of doing secondary data analysis is that secondary data sets are easily accessible. Thanks to the online compendiums such as ICPSR, finding a data set with variables pertinent to your hypothesis is easier than ever. Even qualitative data sets are starting to find their way onto the internet via repositories such as Syracuse’s QDR. Some sites such as the ICPSR even allow the user to input possible variable names to expedite their search for relevant data sets. The accessibility of these data sets can allow researchers to test hypotheses much faster than if they were to attempt to collect the data themselves.

Similarly, one of the greatest advantages of secondary data analysis is that it is incredibly cheap to carry out; especially for students such as ourselves who may not have the time and money to conduct large-scale research projects, secondary data analysis allows us to conduct our research without the burden of paying for it. For researchers crunched for time and money, secondary data analysis gives them an opportunity to analyze data much faster, and typically on a much larger scale, than they could hope to achieve on their own.

Analyzing existing data sets isn’t without its disadvantages, however. Firstly, without conducting your own research, secondary data can never specifically address your research question. Existing data sets are usually collected with a specific hypothesis or goal in mind, which raises questions about whether or not the data is appropriate given the hypothesis you have.  The use of secondary data means that your hypotheses are often beholden to the data sets available rather than having data with survey items operationalized around your specific research question. Secondly, just because data is available does not mean that it is good data; typically government data can be considered fairly safe, however when retrieving a data set from an online repository, it is important to evaluate the quality of the data collection and analysis before using the data set to test another hypothesis.

A study that exemplifies the advantage of using secondary data is Did Welfare Reform Cause the Caseload Decline?in which the authors analyze state-level welfare caseload counts collected by the U.S. Department of Health and Human Services between 1992 and 2002 to determine the degree to which the falling welfare caseload could be attributed to the replacement of AFDC with TANF in the mid-nineties. The authors perform a multivariate regression analysis to determine which policies caused the caseload to decline and whether other variables had an effect on the caseload decline, ultimately concluding that TANF only accounted for about 1/5 of the decline in the caseload.

We can imagine that, as researchers, it would be quite arduous to carry out this kind of research if we did not know exactly where, and when, the amount of people on the welfare rolls was changing. Considering this research was conducted by just two people, having to compile a ten year, month-by-month count of welfare recipients for all 50 states would probably be an undertaking that could take many years and a good amount of money to complete. In this particular case there aren’t many concerns about whether or not the data is good data or whether or not it is applicable to the hypothesis being tested, considering the secondary being analyzed is simply a count of the number of people receiving TANF and was collected by a government agency.


Evaluation and Policy Research

At their most basic, the primary purpose of evaluative research is to understand how certain programs — whether that means a new drug, an educational curriculum, or a social policy — works the way that it does. This type of research can be guided by several questions; Is the program needed? What is the program’s impact? How efficient is the program? Evaluative research is a way for stakeholders — groups who have some kind of concern with a program — to answer these questions and determine how they should move forward with these programs in light of their findings.

Evaluative research is generally carried out for these stakeholders, whether they be business managers, government officials, or funders of a particular project. As Schutt points out, who program stakeholders are and what role they play in the program has extraordinary ethical consequences in evaluative research. In many cases, the funding awarded to researchers by these stakeholders could result in questionable research methodology or interpretations of findings for the sake of remaining funded. Consider for example that nearly 75% of U.S. clinical trials in medicine are funded by pharmaceutical companies; though this may seem benign, considering as researchers we should favor a world where scientific research is generously funded and endorsed, research funded by these companies is more likely to favor the drug under consideration than similar studies funded using government grants or charitable donations. Consider a company like Coca-Cola, who has a legacy of funding university studies that obfuscate the connection between soda consumption and obesity. When reading evaluative research, knowing who the research was conducted for can be nearly as important as the findings of the research itself.

What I found particularly important in this chapter is that Schutt illustrates that impact analysis is just one type of evaluative research. Often we think of evaluative research as something that retroactively ascribes necessity or usefulness to a program, but as Schutt points out, research can also be carried out before the implementation or design of a program to determine if it could be needed or if the program itself can even be evaluated. These are forms of evaluation research that I had not considered before, so I appreciated the designation.

Did Welfare Reform Cause the Caseload Decline? is a piece that I think exemplifies well done policy research. The authors, Caroline Danielson and Alex Klerman, use monthly state-level welfare case counts collected by the U.S. Department of Health and Human Services both before and after the replacement of AFDC with TANF policies to investigate how certain TANF policy changes affected the drastic reduction of the welfare caseload during the late 1990’s and early 2000’s. Using this data, the authors conduct a difference-in-difference model to detect how four major policy changes affected the caseload; the generosity of financial incentives, sanctioning from welfare rolls due to non-compliance with work requirements, time limits placed on how long families could receive aid, and other programs to divert families who needed temporary assistance from joining the welfare caseload. The authors also include the national unemployment rate for each given month to account for changes in the caseload that may be attributed to economic conditions.

Their findings are quiet grim; admittedly DID models are a bit above my pay grade, but using this data their findings suggest that these major policy changes only explain about 10 percentage points of the 56 percentage point decline in the welfare caseload that occurred between 1992 and 2005. Further, the booming economy of the late 1990’s accounted for about 5 percentage points of decline during this time period.  This suggests that factors outside of state-level welfare reform accounted for the majority of the caseload decline, a finding which is quite eery considering how the Clinton and Bush administrations were quick to tout TANF as a romping success.

This research firmly fits into what Schutt describes as impact analysis. The authors are not necessarily concerned with whether or not the effectiveness of TANF was worth the “cost”, just whether it was working as it was purported to at all. It is also more of a black box model, focusing not on how welfare reform should have theoretically operated, but attempting to dissect why the caseload declined the way that it did. It is hard to say what kind of stakeholders could have possibly funded this research; the authors were employed by The Public Policy Institute of California and RAND Corporation at the time of this piece’s publication; however, neither of these think tanks are very forthcoming with who sponsors their work.

Privacy Statement