We are going to continue working with data from the Gallup organization focusing on religious beliefs in the United States. Our article for today is from May of 2017 - you may have noticed that quite a lot of Gallup data is collected in May! The article is called In U.S., Belief in Creationist View of Humans at New Low. As the article suggests, the goal of this Gallup question was to estimate the percent of Americans who hold a specific viewpoint on how human beings were created. The Creationist view is the belief that some being or power "created humans in their present form at some time within the last 10,000 years or so" (Gallup 2017).
In the United States, there is still a raging argument about how to discuss evolution in schools. Should educators present evolution as a theory, or fact? Should Creationism be discussed? What about Intelligent Design, or a myriad of other possibilities from different cultures and religions? We are not going to try to address this specific topic today, but we are going to use the data in the article to explore changes in American beliefs on human creation across time.
As we did last lab, we are going to begin by taking a look at the article and using it to answer a few starting questions.
As we read the article, we notice that participants in the survey were not merely asked "Do you hold the Creationist viewpoint?" with a yes or no answer. Instead, participants were presented with three choices and were asked which of three aligned with their beliefs on human creation. The first option, which we will call Viewpoint 1, is that human beings did evolve, but a being or power guided the process. Of the given choices, this aligns most closely with the theory of Intelligent Design. The second option, which we will call Viewpoint 2, is that human beings evolved with no being or power guiding the process. This is generally termed evolution theory. The third option, which we will call Viewpoint 3, is the Creationist viewpoint.
In 2010, Gallup asked the same question, and presented the same answer choices, to a random sample of 1019 adults. In this sample, 38% of the participants chose Viewpoint 1, 16% chose Viewpoint 2, and 40% chose Viewpoint 3. 6% of the respondents chose not to respond. We are interested in determining if there is sufficient evidence to suggest that the breakdown of opinions on human creation in 2017 is different from the breakdown of opinion on human creation expressed in the 2010 survey. In other words, is the proportion breakdown from 2010 still a good representation of U.S. opinion in 2017?
In order to evaluate such a claim, we are going to use a hypothesis test, and specifically we are going to use a Chi-Square test.
Right now, our data is in the form of proportions. To run our test, we need to convert the data into counts. More specifically, we need to compute both the observed counts in each Viewpoint in 2017, and the expected counts.
Okay, so we need counts. We have seen examples in class of how to convert proportions to counts, but the process of individually converting each proportion to a count can be tedious. In R, we can do the conversion process much faster if we use vectors. Let's use the observed counts as an example. To get the observed counts in 2017, we know that we want to multiply each of the given proportions from 2017 by the number of participants in the 2017 survey.
n
. Note: Include everyone in this count, including those who did not respond to the question posed by the researchers.Instead of multiplying out each proportion (Viewpoint 1, Viewpoint 2, Viewpoint 3, and no response) value by n
one at a time, we can do the multiplication all at once by using a vector. The technique for doing this is shown in the code below. When you run it, be sure to replace n
with the value you wrote in Question 9.
observedcount <- n*c(.38,.19,.38,.05);observedcount
You will note that your result will give you the observed counts for all of the categories of interest!
Now we have the observed counts. To get our test statistic for a Chi-Square test, we also need the expected counts.
At this point, we have all the information we need to compute our test statistic. The test statistic uses the distance between the observed and the expected counts to help us decide how surprising it would be to see our observed counts if the null hypothesis were true. To compute the test statistic, we can use the code below.
chisq <- sum( (observedcount-expectedcount)^2/expectedcount); chisq
Once we have confirmed that the conditions for inference hold, we can plot the sampling distribution n to get an idea of how unusual our test statistic would be if the null were true.
plot(dchisq(seq(0,10, by = 0.001),df= 3)~seq(0,10, by = 0.001),type = "l", xlab = "Possible Values of the Test Statistic", ylab = "")
To obtain the p-value for our test, we use the following code, where we replace teststatistic
with the actual value of our test statistic.
pchisq( teststatistic, df = 3, lower.tail = FALSE)
Now that we have examined the difference between 2010 and 2017, let's try a different approach. For 2017, we are provided a breakdown of opinion by education level. In other words, for each respondent, we know their highest level of education as well as their opinion on human origin. Is there sufficient evidence to claim that there is a relationship between opinion on human origin and education level?
The data needed to perform the test is presented in the table below.
Viewpoint 1 | Viewpoint 2 | Viewpoint 3 | No Response | |
---|---|---|---|---|
High School or Less | 33% | 12% | 48% | 7% |
Some College | 38% | 16% | 42% | 4% |
College graduate | 45% | 27% | 24% | 4% |
Postgraduate | 45% | 31% | 21% | 3% |
The data format above is called a matrix. A matrix is easy to draw on paper, but how do we create a matrix in R? For every other data set in this class, the data was already in a matrix form, and we just loaded it in from somewhere. Today, we are going to create the matrix ourselves.
The first step in creating our data is to build each of the rows.
HighSchool <- c(.33,.12,.48,.03)
SomeCollege <- c(.38,.16,.42,.04)
Collegegraduate <- c(.45,.27,.24,.04)
Postgraduate <- c(.45,.31,.21,.03)
Once we have all the rows created, we can bind them together to create our data matrix.
Data<- rbind( "HighSchool" = HighSchool, "SomeCollege" = SomeCollege, "Collegegraduate"= Collegegraduate, "Postgraduate" = Postgraduate)
At this point, type the name Data
into a chunk and hit play. Check to make sure your matrix matches the table above!
The last thing to do is to convert from percentages to counts by multiplying by the sample size.
Data<-Data*1011
We now have a data set called Data
, and we are ready to run our test on this data set. In R, this is extremely simple - we only need one line of code!! The function we need to use is chisq.test()
. Inside of the parentheses you will place the name of the data set you want to test, and then hit play.