STA 111 Lab 9: Chi Square

As we read the article, we notice that participants in the survey were not merely asked "Do you hold the Creationist viewpoint?" with a yes or no answer. Instead, participants were presented with three choices and were asked which of three aligned with their beliefs on human creation. The first option, which we will call Viewpoint 1, is that human beings did evolve, but a being or power guided the process. Of the given choices, this aligns most closely with the theory of Intelligent Design. The second option, which we will call Viewpoint 2, is that human beings evolved with no being or power guiding the process. This is generally termed evolution theory. The third option, which we will call Viewpoint 3, is the Creationist viewpoint.

Changing Times

In 2010, Gallup asked the same question, and presented the same answer choices, to a random sample of 1019 adults. In this sample, 38% of the participants chose Viewpoint 1, 16% chose Viewpoint 2, and 40% chose Viewpoint 3. 6% of the respondents chose not to respond. We are interested in determining if there is sufficient evidence to suggest that the breakdown of opinions on human creation in 2017 is different from the breakdown of opinion on human creation expressed in the 2010 survey. In other words, is the proportion breakdown from 2010 still a good representation of U.S. opinion in 2017?

In order to evaluate such a claim, we are going to use a hypothesis test, and specifically we are going to use a Chi-Square test.

How do know we need a \(\chi^2\) test here, rather than a one or two sample proportion test?
In order to determine if the proportion breakdown from 2010 is a good representation of U.S. opinion in 2017, do we need to perform a Goodness of Fit test, or a Test of Independence? Explain your reasoning.

Right now, our data is in the form of proportions. To run our test, we need to convert the data into counts. More specifically, we need to compute both the observed counts in each Viewpoint in 2017, and the expected counts.

What does expected count mean in the context of this question? The counts of what under what assumption? You do not need to provide numerical values here, just reasoning.

Okay, so we need counts. We have seen examples in class of how to convert proportions to counts, but the process of individually converting each proportion to a count can be tedious. In R, we can do the conversion process much faster if we use vectors. Let's use the observed counts as an example. To get the observed counts in 2017, we know that we want to multiply each of the given proportions from 2017 by the number of participants in the 2017 survey.

How many individuals participated in the 2017 survey? Call this value n. Note: Include everyone in this count, including those who did not respond to the question posed by the researchers.

Instead of multiplying out each proportion (Viewpoint 1, Viewpoint 2, Viewpoint 3, and no response) value by n one at a time, we can do the multiplication all at once by using a vector. The technique for doing this is shown in the code below. When you run it, be sure to replace n with the value you wrote in Question 9.

observedcount <- n*c(.38,.19,.38,.05);observedcount

You will note that your result will give you the observed counts for all of the categories of interest!

What is the observed count of individuals who believe in each of the 4 viewpoints? Remember that we round count data, and that the total count needs to be the same as the sample size.

Now we have the observed counts. To get our test statistic for a Chi-Square test, we also need the expected counts.

Using the same code structure as we did for observed counts, compute the expected counts for 2017. Show your code and the counts in your response. Note: You will need to change more than the name. If you get stuck, try filling in these blanks: "To get the expected counts in 2017, we know that we want to multiply each of the given proportions from the year ____ by the number of participants in the 2017 survey."

At this point, we have all the information we need to compute our test statistic. The test statistic uses the distance between the observed and the expected counts to help us decide how surprising it would be to see our observed counts if the null hypothesis were true. To compute the test statistic, we can use the code below.

chisq <- sum( (observedcount-expectedcount)^2/expectedcount); chisq

What is the test statistic? Based on this result, do you think we are likely to reject the null, not likely to reject the null, or can you not tell at this stage? Explain.

Check the conditions necessary for using the chi squared distribution as our sampling distribution.

Once we have confirmed that the conditions for inference hold, we can plot the sampling distribution n to get an idea of how unusual our test statistic would be if the null were true.

plot(dchisq(seq(0,10, by = 0.001),df= 3)~seq(0,10, by = 0.001),type = "l", xlab = "Possible Values of the Test Statistic", ylab = "")

Describe the modality and skew of this \(\chi^2\) distribution. Based on the graph, how unusual does our test statistic seem under the null? Explain.

To obtain the p-value for our test, we use the following code, where we replace teststatistic with the actual value of our test statistic.

pchisq( teststatistic, df = 3, lower.tail = FALSE)

Based on the p-value above, what do you conclude? State your results in the context of the question of interest.

Does Education Level Make a Difference?

Now that we have examined the difference between 2010 and 2017, let's try a different approach. For 2017, we are provided a breakdown of opinion by education level. In other words, for each respondent, we know their highest level of education as well as their opinion on human origin. Is there sufficient evidence to claim that there is a relationship between opinion on human origin and education level?

To respond to this question, do we need to perform a Goodness of Fit test, or a Test of Independence? Explain your reasoning. State the hypotheses for your test.

The data needed to perform the test is presented in the table below.

	Viewpoint 1	Viewpoint 2	Viewpoint 3	No Response
High School or Less	33%	12%	48%	7%
Some College	38%	16%	42%	4%
College graduate	45%	27%	24%	4%
Postgraduate	45%	31%	21%	3%

The data format above is called a matrix. A matrix is easy to draw on paper, but how do we create a matrix in R? For every other data set in this class, the data was already in a matrix form, and we just loaded it in from somewhere. Today, we are going to create the matrix ourselves.

The first step in creating our data is to build each of the rows.

HighSchool <- c(.33,.12,.48,.03)

SomeCollege <- c(.38,.16,.42,.04)

Collegegraduate <- c(.45,.27,.24,.04)

Postgraduate <- c(.45,.31,.21,.03)

Once we have all the rows created, we can bind them together to create our data matrix.

Data<- rbind( "HighSchool" = HighSchool, "SomeCollege" = SomeCollege, "Collegegraduate"= Collegegraduate, "Postgraduate" = Postgraduate)

At this point, type the name Data into a chunk and hit play. Check to make sure your matrix matches the table above!

The last thing to do is to convert from percentages to counts by multiplying by the sample size.

Data<-Data*1011

We now have a data set called Data, and we are ready to run our test on this data set. In R, this is extremely simple - we only need one line of code!! The function we need to use is chisq.test(). Inside of the parentheses you will place the name of the data set you want to test, and then hit play.

Perform the appropriate chi-square test in R. Show your code as part of your answer, and state your conclusion in the context of the data.
Go to the very bottom of the Gallup article, and you will see links to other articles. You are also welcome to look at the categories on the top of the website and look there! Choose any Gallup article from 2017 or 2018 that you like. State the article that you choose. Perform a chi-square test using the data. State which test your choose (Goodness of Fit or Test of Independence). State your hypotheses, show your code, and state your conclusion in the context of the data.
How long did it take you to complete this lab?

This work created by Nicole Dalzell is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Last updated 2018 December 4.

The css file used to format this lab was retrieved from the GitHub of Mine Çetinkaya-Rundel, version 2016 Jan 13.

STA 111 Lab 9: Chi Square

The Data

Changing Times

Does Education Level Make a Difference?