Category Archives: socy508fall2016

Analysis Report Summary

While my previous posts specifically examined public opinion on federal funding of science, my data analysis project includes some broader variables on the public’s interest in science and the benefits from science. These variables provide a broader picture of whether public interest even exists and the effects of science.

The first new variable is intsci, which asks are you 1 = very interested, moderately interested, or 3 = not at all interested  in issues about new scientific discoveries?



The pie chart shows the differences in frequencies for the three possible responses. 85.9% responded either very interested (40%) or moderately interested (45.8%). The median response is moderately interested.

The second variable is bettrlfe, which asked: I’m going to read to you some statements like those you might find in a newspaper or magazine article. For each statement, please tell me if you strongly agree (= 1), agree, disagree, or strongly disagree (= 4). Science and technology are making our lives, healthier, easier, and more comfortable. I used bivariate analysis using sex (1 = male or 2 = female) and degree (0 = lt high school, high school, junior college, bachelor, or 4 = graduate).


From the table, men tend to strongly agree (6.6% difference from women) or strongly disagree (5.2% difference) that science makes our lives better. The research hypothesis would posit a relationship between gender and opinion that science makes our lives better, while the null hypothesis would be no relationship. Pearson’s chi-square has a value of 9.547 with 3 degrees of freedom. SPSS calculates the significance as .023 and gamma at .128 (chosen by identifying bettrlfe as an ordinal variable and sex as a dichotomous nominal variable). The null hypothesis can be rejected, but gamma indicates a very weak positive relationship.


Higher levels of education appear to trend towards slightly higher positive responses (towards strongly agree). The research hypothesis would posit a relationship between level of education and opinion that science makes our lives better, while the null hypothesis would be no relationship. Pearson’s chi-square has a value of 35.214 with 12 degrees of freedom (and 4 cells with values under 5). SPSS calculates the significance as .000 and gamma at -.173 (chosen by identifying bettrlfe and degree as an ordinal variables). The null hypothesis can be rejected, but gamma indicates a very weak negative relationship.

From previous analyses, I’m not surprised that gender has a very weak relationship to variables about science. However, I did expect a stronger relationship to emerge from level of education since it seems to be commonly cited as a measure of scientific literacy. Perhaps the relationship would seem a little stronger with collapsed categories (removing some high school, combining high school with junior college, and combining bachelor’s with graduate).

Chapters 10 & 11

I’m continuing the research questions and variables (with no edits to the properties this time) from my last data analysis post (November 14).


The bivariate table shows very similar percentages between male and female respondents: 54.9% of male respondents agree that science research is necessary and should be supported by the federal government and 58.8% of female respondents, only a 3.9% difference (and also the largest difference in the response categories). The Chi-square had a value of 2.482 with 3 degrees of freedom (and 0 cells with a count less than 5). If alpha is set at .05, the chi-square has a .50 probability, which means that the null hypothesis cannot be rejected. Furthermore, ADVFRONT is an ordinal variable and SEX is a dichotomous nominal variable, which makes gamma and Kendall’s tau-b appropriate measures of association. Gamma is calculated at -.004 and Kendall’s tau-b at -.002, which indicate pretty much no association between the two variables.


The bivariate table for political views indicates that extremely liberal respondents strongly agree (60.5%, 17.3% more than liberals) with the statement while moderates (21.2%) and conservatives (20.7 slightly conservative, 24.3% conservative) show lower percentages. The moderate (62.5%) and slightly conservative (62.8%) respondents indicated stronger support at agreeing. Conservative (22.5%) and extremely conservative (22%) respondents had higher percentages than other political views (5.7% more than the next category) at diagreeing. The highest category for strongly disagree came from the extremely conservative (14%). The Chi-square test appears problematic with 8 cells (25%) with a count less than 5. Gamma (.263) and Kendall’s tau-b (.179) indicate a weak relationship.

While the prior data analysis indicated that the null hypothesis cannot be rejected for gender, these additional tests provided further support that a relationship between gender and the survey question does not exist. For political views, the bivariate table looked interesting, but the tests reinforced the idea of a fairly weak relationship and better illustrated sample issues. As I mentioned before, education might yield a stronger relationship.


I found a tweet about the first known statistical graphic drawn in 1644. A very simple linear graph shows the distribution of twelve estimates for the distance between Toledo and Rome. Edward Tufte is known for his work on how to present information well.

Another tweet had more material to work with that related to our coursework about why surveys suck.  The author generically covers issues such as sampling (doubting good sampling techniques if a quarter of the non-runners in the sample died); ranking university programs without taking other factors into account (an academic program rated poorly at only two years old); and bad survey questions (would not provide good data).

Finally, I looked at one more historical example on a hypothesis test that “proved” the existence of God. I lost the link to this and sadly cannot find it again.

Instead, I ended up finding a linked abstract about the decline of null hypothesis testing in medical journals in favor of point estimates and confidence intervals. I understand how null hypothesis testing confuses the medical narrative because it seems like a very roundabout way to state that the association between variables has statistical significance versus confidence intervals that provide a simple numerical range.

Overall, I found that many tweets linking to #statistics involve throwing out a number without additional information about methodology or evidence. It seemed surprisingly difficult to find something a tiny bit substantive and related to our coursework. 


Data Analysis blog

This round of data analysis examines public opinion on supporting federal funding of scientific research: Should the federal government support scientific research? How does gender or political views (spectrum of liberal/conservative) affect opinions? Federal funding on research and development has been declining in recent years. A decline in public support might correlate with a decline in federal spending (although no provable connection probably exists between the two). The political views would be interesting in context of the recent presidential election, as the Republicans hold both the executive and legislative branches.

In the GSS2014 data set, the following two variables address the research questions:

Advfront: I’m going to read to you some statements like those you might find in a newspaper or magazine article. For each statement, please tell me if you strongly agree, agree, disagree, or strongly disagree. D. Even if it brings no immediate benefits, scientific research that advances the frontiers of knowledge is necessary and should be supported by the federal government. (Responses: Strongly agree, agree, disagree, strongly disagree, don’t know, no answer, not applicable)

Sex: Respondents sex (Responses: Male or female)

Polviews: Think of self as liberal or conservative (Responses: Extremely liberal, liberal, slightly liberal, moderate, slightly conservative, conservative, extremely conservative, don’t know, no answer, not applicable). Note: I altered the variable properties to exclude the 98 value, which had a count of 65. I’m not entirely sure what this value means since it doesn’t correspond with the stated possible responses.

I started with a simple frequencies table to look at the basic results, then ran the confidence intervals based on gender and political views.



For sex, the similarities between the means (1.92/1.93) and 95% confidence intervals (1.86-1.98/1.87-1.98) indicates no real difference in the response to the question about federal funding. The mean indicates the answer for both sexes as agree that scientific research should be supported by the federal government. The political views revealed more exciting results. The extremely liberal respondents had a mean of 1.56 with a 95% confidence interval of 1.32-1.79 while extremely conservative had a mean of 2.44 with a 95% confidence interval of 2.21-2.67. The extremely liberal respondents agreed and extremely conservative disagreed that scientific research should be supported by the federal government. Note: I can’t get the table image for political views to copy/paste because it’s very long and truncates.


Then, I decided to produce a test of mean differences for gender to determine if a significant difference exists. The null hypothesis would be no difference between male and female views (and alpha set at .05). In Levene’s Test, the F statistic significance is .257 and greater than .05 (alpha), so the null hypothesis cannot be rejected.


For political views, I defined the groups as 1 = extremely liberal and 2 = extremely conservative. For Levene’s Test, the F statistic significant is .920 and greater than .05, so the null hypothesis cannot be rejected. I’m a little surprised at this, although having a scale from 1-7 (extremely liberal to extremely conservative) probably doesn’t help in seeing a big difference. Or I might be missing something here and a Type II error applies…

In looking at the differences between means, I did not find a significant difference in gender and political for federal support of scientific research. I expected this with gender because the means and confidence intervals overlapped. I thought that political views seemed to have a difference because the means and confidence intervals showed the two extremes (liberal and conservative) in different response categories, but I couldn’t reject the null hypothesis. This may mean looking at additional dependent variables to determine whether differences exist in responses, such as age or educational attainment.



Confidence intervals

A 95% confidence interval is usually sufficient for sample estimates of most opinion polls or population parameters. If the level of confidence increases, then the confidence interval becomes less precise (widens). Choosing between 95% and 99% is a trade-off between reducing risk (and increasing confidence) versus the width of the interval.

For example, in the chapter exercises, Question #3 is about opinions on global warming. The data show that 589 out of the 1,511 surveyed felt global warming is a very serious problem. When calculating the proportion of respondents, the 95% confidence  interval is between 0.365 to 0.415, and the 99% between 0.356 to 0.424. In this particular case, the intervals have a very small difference. Is there a good rationale for choosing 95% versus 99%? I think the discussion context might matter here more than the numbers. A news article seems okay for reporting the 95% confidence interval. A high-level administrator trying to make an evidence based decision about federal environmental science investments might want to use the 99% (or even 99.9% for a large budget worth billions of dollars).

The other two factors involved in choosing a confidence interval might be sample size and sample standard deviation. A larger sample size increases the precision of the confidence interval (narrows), but also increases time and cost for the survey. A higher sample standard deviation decreases the precision of the confidence interval (widens). Both of these might affect choosing between the result of a 95% confidence interval vs 99% as well as the reporting venue.

For a 99.9% confidence interval, this means that the researcher really wants to be as close to 100% as possible (since 100% doesn’t actually seem possible in estimation). Health decisions might fit within this category, such as estimating the spread of deadly diseases in the country, as well as ones involving billions of dollars. If there’s a rule of thumb for using a certain percent, I haven’t come across it and probably won’t because it’s too dependent on context.

For an internet resource, I looked at Stattrek: I’ll admit what drew me in is the similarity to Star Trek. Unfortunately, the website doesn’t use science fiction references or problem examples. The estimation section includes more on margin of error than our textbook, such as finding the critical value and expressing it as a t score (or z score). The section on confidence intervals is relatively short, simple, and compatible with our textbook. The estimation problems also go further than our text with regression slopes and calculating differences (between proportions, means, and matched data pairs).

Week 7: sampling distributions

I started by looking at the Khan Academy on sampling distributions, however the first set of videos did not get covered by our textbook (sample proportions), so I skipped ahead to the sample means section. The central limit theorem video presents a simplified explanation similar to the textbook. The sampling distribution of the sample mean video continues by illustrating the central limit theorem and also adds definitions of skew (covered under measures of central tendency)/kurtosis (new concept). The Khan Academy videos don’t take theories for granted and instead attempt to prove them with examples using simple math written on a virtual chalkboard, such as how increasing the sample size creates a more normal sampling distribution. Although the textbook provides much more detail in the space of several pages, these videos spend about ten minutes going in-depth on one concept. Khan tends towards using variance (while explaining standard deviation as its square root), which may be an individual preference? In addition, the videos tend to summarize and reinforce the prior concepts.

The Khan Academy’s lessons have been organized differently than the textbook. For example, I found some videos about constructing probability distributions in the random variable section. He adds confidence intervals (the next chapter in our textbook) and showing the difference between two sample mean distributions (which is the sum of their variances) within the sampling distribution section as well as sampling proportions (where I had to look up Bernoulli). Sampling proportions seemed confusing to me, especially because the textbook does not cover it. I also found some discussion of the sampling methods in a sampling and surveys section.

Then, I took a look at an academic website in the hopes that it might match our textbook better with online lessons by Penn State. This resource also turned out to have videos to demonstrate examples using a stats program (Statkey rather than SPSS). It also had a section for sampling proportions and an introduction to the t distribution (similar to the standard normal distribution but with varied heights). In comparison, I think the Khan Academy videos make the subject more approachable for everyone without losing too much information although they make a better starting point than an ending.

Week 4-5 social stats

Using the GSS2014 data file, I came across two variables related to astrology:

  1. Do you ever read a horoscope or your personal astrology report?
    • Answers: yes or no. Other responses: don’t know, no answer, or not applicable.
  2. Would you say that astrology is very scientific, sort of scientific, or not at all scientific?
    • Scale: very scientific, sort of scientific, or not at all scientific. Other responses: don’t know, no answer, or not applicable.

Working backwards (having found my variables first), my research questions might ask: How popular is pseudoscience in United States? Do Americans tend to recognize the difference between science and pseudoscience? More specifically, how does a person’s belief in astrology relate to their belief in science? I remember seeing the results of a poll in a news article that indicated some non-zero number of people mistook astrology for astronomy. The ramification for this question includes public understanding of science as well as public support for science and scientific research. This topic also may have relevance to beliefs on larger and more controversial issues, such as climate change.

week5-1 week5-2

For the first variable, slightly more people responded that they did not read a horoscope or personal astrology report (55.5% to 44.5%). Presumably, of those that did admit to reading their horoscope, only 32.6% responded either very or sort of scientific (395 responses). It seems very unlikely that someone who would not read astrology, would answer sort of or very scientific for the second question. 67.4% responded that it is not at all scientific (818 responses). Both questions had a surprisingly high “not applicable” response (marked as “missing”), which is slightly more than half of the total survey respondents.

Since the results should be generalizable, this comfortingly indicates that slightly more than half of public completely ignores astrology. Of the astrology readers, one-third believes that astrology has a little or a lot of basis in science, which may make them excellent targets for infomercials selling miracle products. This is fortunate for the manufacturers of modern day snake oil and the astrology industry. However, two-thirds do not believe that astrology has a basis in science. Two-thirds is greater than the half from the first question, so some people do read their horoscope, but do not believe it has any basis in science (e.g. read it for entertainment).

The first variable results indicates that astrology is only popular with about half of the population, which provides one case study answer for the first research question by looking at the popularity of astrology as a psuedoscience. The second variable results provide a partial answer for the second research question that a majority of people (67%) do not believe in the pseudoscience of astrology and can differentiate it from actual science. These variables don’t give enough information to look at how a person’s belief in astrology relates to how they believe in science. It would be interesting to see what other things this subset of respondents believe is scientific and where they got their information from to form their beliefs.

Week 3 stats assignment

Research question 1: Where does the public predominantly choose to get their information, such as on health or medical topics? Print journalism currently seems to be at a crisis point, as newspapers have seen declining subscriptions and bankruptcy in recent years. Competitors include television/cable news and online sources. The Internet also allows alternate forms of news dissemination through non-journalistic sources such as social media and blogs.

I analyzed the results from the following question, asked about each area:

How much attention do you pay to information about health or medical topics from: The Internet? National or cable television news? Print newspapers? (Scale: A Lot, Some, A Little, None)

1 2 3

From the scaled responses, individuals paid a lot of attention to information from online sources (20%), nearly five times that of television (6.1%) or print (4%). Slightly more individuals paid at least a little attention to online sources (81.5% cumulative) than television (78.9% cumulative) or print (71.9% cumulative).

This data seems to support that the more respondents chose to pay attention to online sources for information on health/medical topics and the least to print newspapers. Future research might look at other kinds of topics or ask a more general question.

Research question 2: How much do individuals trust doctors versus online sources? As an abundance of information is available online, individuals may choose to research their symptoms before going to a doctor. Once at the doctor’s office, patients may second guess their doctors to ask about various ailments or treatments that they found online. This is significant for possible effects on the quality of health care for both the doctor and patient.

I analyzed the results of the following questions:

In general, how much do you trust information about health or medical topics from: A Doctor? Internet? (Scale: A Lot, Some, A Little, Not At All)

4 5

From the scaled responses, individuals placed a lot of trust in doctors (68%) than the Internet (18.3%). More respondents said they had some trust for online information (53.6%). 98.4% said they trusted information from doctors at least a little, while 91.0% trust information from online at least a little. More respondents chose not at all for online information (8.5%) than doctors (1.5%).

This data seems to support that while medical information is easily available online, individuals still place higher trust in doctors than what they can research on the Internet. Future research might look at more specific questions, such as trusting information from doctors online (rather than encyclopedia type sites or ads), or perhaps trusting information from television sources.

Hello world!

I’m excited to begin this digital sociology program since my research interests have intersected with “new media” in a previous life as a historian. I currently work full-time for the federal government and live near Washington DC.

Fun fact about me: I love coffee and coffee-flavored foods (such as ice cream), but usually drink decaf ever since my first pregnancy. Two kids later, I’m still drinking decaf and feel good most of the time about not being caffeine dependent…

Numeric data has been difficult to find to support my research interests on public opinion because polling provides very specific quantitative information with limitations. For example, a poll asking whether you support increased spending on space results in a fairly straightforward answer (% yes/no), but not why or even how much. On the other hand, several published editorials and letters to the editor with detailed arguments would not indicate whether the majority of the public supports it (or even is paying attention to the issue). This gets to the contrast between deductive and inductive research.

My concerns for taking a social stats course would be 1) learning new software, 2) whether this is math intensive in the context of not taking math as an academic subject for awhile, and 3) whether the first two factors make it very time intensive. I do hope to learn new skills and find out what kinds of different questions can be answered through statistical analysis.