Gallup is an organization that conducts extensive polls aimed at exploring a variety of facets about societal opinions, political views, and more. In December of 2017, the Gallup organization released an article called The 2017 Update on Americans and Religion. The article explored the relationship between religion and political party, the degree to which Americans considered themselves religious as well as the proportion of Americans identifying with particular religions. In this lab, we are going to use the survey results to explore changes in how Americans self-identify when it comes to their religious beliefs.
While we usually start our process with EDA, in this case, Gallup has done this work for us. Accordingly, we are going to begin by taking a look at the article and using it to answer the following questions.
Early in this course, we talked about the need to confirm the reliability of your data before using it to make any conclusions. One thing to look for to assess the validity of supplied data is information like margins of error, sampling methods, and sample sizes. If this information is provided, this lends more support to the claim that the data presented can be safely used. It also allows us to assess any potential biases that may result from the data collection methods.
The Gallup poll provides sample statistics, that is, calculations made from the sample. We are more interested in what information this sample can provide us about population parameters. Based on the survey results, we are able to determine what proportion of people in the sample reported being highly religious. Our goal is to estimate the proportion of adults in the United States who would report being highly religious. As we have done for population means, we are going to use confidence intervals and hypothesis tests to take information from the sample and make conclusions about the population.
Confidence intervals require both a point estimate and a margin of error.
We know that we can build a variety of different confidence intervals by changing the critical value, and hence changing the margin of error. It turns out that more than just the critical value impacts the margin of error. We have seen in class that sample size also impacts the margin error, but that's not all, either. Imagine you've set out to survey 1000 people on two questions: is your favorite color blue? and are you left-handed? Since both of these sample proportions were calculated from the same sample size, they should have the same margin of error, right? Wrong! While the margin of error does change with sample size, it is also affected by the proportion.
Think back to the formula for the standard error: \(SE = \sqrt{p(1-p)/n}\). This is then used in the formula for the margin of error for a 95% confidence interval: \(ME = 1.96\times SE = 1.96\times\sqrt{p(1-p)/n}\). Since the population proportion \(p\) is in this \(ME\) formula, it should make sense that the margin of error is in some way dependent on the population proportion. We can visualize this relationship by creating a plot of \(ME\) vs. \(p\).
The first step is to make a vector p
that is a sequence from \(0\) to \(1\) with each number separated by \(0.01\). This is our vector of proportions. We can then create a vector of the margin of error (me
) associated with each of these values of p
using the familiar approximate formula (\(ME = 2 \times SE\)). Lastly, we plot the two vectors against each other to reveal their relationship.
n <- 1000
p <- seq(0, 1, 0.01)
me <- 2*sqrt(p*(1 - p)/n)
plot(me ~ p)
To explore this a little, let's compute two proportions using our same article.
The Gallup poll has been conducted for multiple years, meaning that it is a longitudinal survey, or a survey with results that can be traced over time. To see trends over time, take a look at this article. We are going to use this to compare two different proportions.
We would like to build a confidence interval for the change in the proportion of adults who self-identify as protestant in 2000 versus 2017. We could of course do this by hand, but R makes our life easier. To perform either a one sample or two sample proportion test, or build confidence intervals, using R, we use the prop.test
function.
Before we can use the prop.test
function, we have to do one quick step. The function requires us to give counts rather than sample proportions. Luckily, it is easy to convert from one to the other. If you have a sample size and a sample proportion, we simply multiply the two to get the sample count.
count2017 <- 126965 * .38
count2000 <- 126965 * .52
Now we are ready to use the function to build our confidence interval.
prop.test(x = c(count2000,count2017), n = c(126965, 126965), alternative = "two.sided", conf.level = .95 )
Let's go through the arguments of this function.
x = c(count2000,count2017)
, which is the two different counts that we are interested in comparing. Because of the order we have chosen, this function will test \(p_{2000}-p_{2017}\).n = c(126965,126965)
tells R the sample size for the 2000 and 2017 surveys. In this case, the sample sizes are the same.alternative
, tells R what our alternative hypothesis is: less than or equal to ("less"
), greater than or equal to ("more"
) or is not equal to ("two.sided"
) the value specified in the null hypothesis. If we are only building a confidence interval, as we are here, we should set alternative = "two.sided"
.conf.level
is the confidence level used to define our confidence interval.Now that we understand the structure, let's see the results.