Quantitative data analysis

A frequent refrain in public discourse is the old axiom that “correlation does not equal causality.” Statistics can both muddle and clarify this picture. For example, it is very easy to determine association, linear or otherwise, with modern statistics software. However, determining causality is more difficult, and while statistical measures can provide a degree of comfort, it is ultimately impossible to do without careful analysis. Chi-square tests are often used in cross-tabular analysis to determine if such a correlation is due to chance. Similarly, a careful interpretation of the statistical measures which are generated in a regression analysis can provide some more hints, particularly when a correlation appears strong in the context of a highly predictive model. Perhaps one of the most challenging aspects of evaluating a relationship is the impact of other potential variables. In crosstabular analysis this can be accomplished by adding confounding variables into the analysis, such as intervening variables which lie in the casual chain between a dependent variable and an independent variable, and extraneous variables, which can be used to test if a relationship is, in fact, spurious. In a regression analysis, this can be achieved by including these variables in the regression model. However, regardless of statistical method being applied, causality demands a deep, critical understanding of time order and causal mechanics of the proposed relationship.

Sociological studies frequently rely on one of the above statistical techniques (crosstabs and regression analysis), and both of these techniques can be seen in Devah Pager’s “The Mark of a Criminal Record.” Pager is utilizing an employer audit methodology, where testers who are as identical as possible test two different independent variables (criminal record and race) in relation to the dependent variable of callbacks received. In essence, Pager is only examining three variables, so graphs are primarily utilized to communicate the results to the reader. This works well, because the proposed relationship is so simple, and the attention to internal validity within the study is so high that further information would only unnecessarily confuse the reader. However, when one journeys to the appendix, they find a more complicated picture provided via regression analysis. Pager probably utilizes regression analysis in order to communicate statistical significance and the specific co-efficients provided.

Considering the simplicity of the proposed relationship, it is not entirely clear if the regression analysis is needed – a more simple set of crosstabs could have probably been used. Notably, the regression table in the appendix lacks an r squared value, which helps communicate how comprehensive the regression model is. There is also no attention to additional variables – this is presumably due to the nature of the experimental design, in which the testers were made to be as identical as possible (with the exception of the variables being measured). It would be useless to control for another variable, as the value of that variable is presumably the same across every case. Ultimately though, simplicity is a major boon to this study – it is difficult to argue with Pager’s results without falling back on typical arguments opposed to audit methodologies.


Pager, D. (2003). The Mark of a Criminal Record. American Journal of Sociology, 108(5), 937–975.

Privacy Statement