Secondary data analysis blog

Secondary data analysis typically relies upon using statistical quantitative techniques or qualitative coding techniques on previously collected data. Data sources can be large omnibus surveys, government statistics, research conducted by previous researchers or even popular publications. This sort-of research design can carry both advantages and disadvantages.

Advantages of secondary data analysis primarily center around the ease of producing research projects with them. Data collection is often the most time-consuming and expensive stages of research projects, so the ability to skip this step increases our ability to produce new knowledge. Furthermore, many largescale data collection projects exist at a scale that most researchers could never acquire funding for, such as the GSS or even the US Census. The Internet age has enabled even more large-scale data collection techniques, often performed by tech firms, allowing researchers to start looking at big data as well. Appealingly, these projects rarely have serious ethical concerns attached to them, as there is no data collection or experimental stages, although considerations need to be made regarding confidentiality when accessing certain data sources. Typically, most repositories of secondary data restrict access to data that might compromise confidentiality, but researchers still need to be mindful of their sources.

That said, there are also disadvantages attached to the use of secondary data sources. Most obviously, researchers cannot direct data collection efforts. This restricts a researcher’s ability to investigate certain subjects when performing secondary data analysis, and carries the added danger of focusing the attention of researchers too closely on subjects investigated by presently existing secondary data sources. Furthermore, the distance between the researcher and the data collection in secondary data analysis can also arguably be a disadvantage – presumably, researchers involved in the data collection stages of their own projects are more familiar with the scope, advantages and disadvantages of their data. While there are many secondary data sources that are well-suited for content analysis, there are not many secondary data sources available that are appropriate for other qualitative techniques, meaning that secondary data analysis causes a disproportionate amount of quantitative publications. This can obscure the observation and understanding of causal mechanisms, and privilege statistical methods within the field. Finally, secondary data analysis can arguably lead to over-reliance on specific samples, such as the GSS, although statistics suggests that random selection and sampling techniques should mitigate this effect.

The advantages and disadvantages of secondary data analysis are exemplified well by Roscigno, Karafin and Tester’s analysis of discrimination in the housing market, “The Complexities and Processes of Racial Housing Discrimination” ( Perhaps most interesting about this is the novel data source, which is drawn from verified complaints received by the Ohio Civil Rights Commission. Previous attempts to document this phenomenon typically relied upon audit studies and large-scale statistical techniques, which attempted to document racial segregation and modern redlining practices. Roscigno, Karafin and Tester’s analysis gets right to the heart of the matter, by focusing on verified complaints made by real people, demonstrating the existence of discrimination in the housing market. However, that’s not to say their data source is perfect. Being that all of the people that are studied took the time to actually file an official complaint suggests that most of the people that they’re studying are outliers, and the typical nature of housing discrimination might not be depicted here. Similarly, while it might be interesting to compare the rate of civil rights complaints in Ohio versus other states, it is difficult to generalize this information to a broader rate of housing discrimination.

Privacy Statement