Sample statistics. The data came from polls. It does not include the whole population in the countries.
Data must be random samples. The observations within each sample must be independent.
Whether it is a reasonable assumption depends on whether we have confidence in the pollster to do the sampling correctly.
I suppose it is reasonable to assume that the WIN-Gallup network of leading pollsters took truly random samples.
Turn your attention to Table 6 (pages 15 and 16), which reports the sample size and response percentages for all 57 countries. While this is a useful format to summarize the data, we will base our analysis on the original data set of individual responses to the survey. Load this data set into R with the following command.
download.file("http://www.openintro.org/stat/data/atheism.RData", destfile = "atheism.RData")
load("atheism.RData")
Each row in Table 6 corresponds to a country and the percentage of respondents for each category.
Each row in atheism corresponds to one respondent’s answer in a particular country and year.
Proportion of atheist reponses is 5.25% in atheism. This agrees with the value in Table 6, which is 5%.
us12 <- subset(atheism, nationality == "United States" & year == "2012")
sum(us12$response == "non-atheist")
## [1] 952
sum(us12$response == "atheist")
## [1] 50
sum(us12$response == "atheist")/sum(us12$response == "non-atheist")
## [1] 0.05252101
As was hinted at in Exercise 1, Table 6 provides statistics, that is, calculations made from the sample of 51,927 people. What we’d like, though, is insight into the population parameters. You answer the question, “What proportion of people in your sample reported being atheists?” with a statistic; while the question “What proportion of people on earth would report being atheists” is answered with an estimate of the parameter.
The inferential tools for estimating population proportion are analogous to those used for means in the last chapter: the confidence interval and the hypothesis test.
No, not confident about the second condition. Polls are voluntary, so it’s harder to get a truly representative sample. Also the polling methods (in person vs phone vs email) yield different response rates and might bias the selection.
If the conditions for inference are reasonable, we can either calculate the standard error and construct the interval by hand, or allow the inference function to do it for us.
inference(us12$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.0499 ; n = 1002
## Check conditions: number of successes = 50 ; number of failures = 952
## Standard error = 0.0069
## 95 % Confidence interval = ( 0.0364 , 0.0634 )
Note that since the goal is to construct an interval estimate for a proportion, it’s necessary to specify what constitutes a “success”, which here is a response of “atheist”.
Although formal confidence intervals and hypothesis tests don’t show up in the report, suggestions of inference appear at the bottom of page 7: “In general, the error margin for surveys of this kind is ± 3-5% at 95% confidence”.
Margin of error = critical value (z) SE = 1.96 * 0.0069 = 0.0135 or 1.35%
saudi12 <- subset(atheism, nationality == "Saudi Arabia" & year == "2012")
inference(saudi12$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.05 ; n = 500
## Check conditions: number of successes = 25 ; number of failures = 475
## Standard error = 0.0097
## 95 % Confidence interval = ( 0.0309 , 0.0691 )
Saudia Arabia
Margin of Error = 1.96 * 0.0097 = 0.0190
p-hat = 0.05 n = 500 np = 25 met n(1-p) = 475 met 95% CI: 0.05 +/- 0.019 or 95% CI: (0.31, 0.069)
turkey12 <- subset(atheism, nationality == "Turkey" & year == "2012")
inference(turkey12$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.0203 ; n = 1032
## Check conditions: number of successes = 21 ; number of failures = 1011
## Standard error = 0.0044
## 95 % Confidence interval = ( 0.0117 , 0.029 )
Turkey
Margin of Error = 1.96 * 0.0044 = 0.0086 p-hat = 0.0203 n = 1032 np = 21 met n(1-p) = 1011 met 95% CI: 0.0203 +/- 0.0086 or 95% CI: (0.0117, 0.0289)
Imagine you’ve set out to survey 1000 people on two questions: are you female? and are you left-handed? Since both of these sample proportions were calculated from the same sample size, they should have the same margin of error, right? Wrong! While the margin of error does change with sample size, it is also affected by the proportion.
Think back to the formula for the standard error: SE=p(1−p)/n‾‾‾‾‾‾‾‾‾‾√. This is then used in the formula for the margin of error for a 95% confidence interval: ME=1.96×SE=1.96×p(1−p)/n‾‾‾‾‾‾‾‾‾‾√. Since the population proportion p is in this ME formula, it should make sense that the margin of error is in some way dependent on the population proportion. We can visualize this relationship by creating a plot of ME vs. p.
The first step is to make a vector p that is a sequence from 0 to 1 with each number separated by 0.01. We can then create a vector of the margin of error (me) associated with each of these values of p using the familiar approximate formula (ME=2×SE). Lastly, we plot the two vectors against each other to reveal their relationship.
n <- 1000
p <- seq(0, 1, 0.01)
me <- 2 * sqrt(p * (1 - p)/n)
plot(me ~ p, ylab = "Margin of Error", xlab = "Population Proportion")
This is a symmetrical curve where margin of error increases with proportion in the first half and decreases in the second half
The textbook emphasizes that you must always check conditions before making inference. For inference on proportions, the sample proportion can be assumed to be nearly normal if it is based upon a random sample of independent observations and if both np≥10 and n(1−p)≥10. This rule of thumb is easy enough to follow, but it makes one wonder: what’s so special about the number 10?
The short answer is: nothing. You could argue that we would be fine with 9 or that we really should be using 11. What is the “best” value for such a rule of thumb is, at least to some degree, arbitrary. However, when np and n(1−p) reaches 10 the sampling distribution is sufficiently normal to use confidence intervals and hypothesis tests that are based on that approximation.
We can investigate the interplay between n and p and the shape of the sampling distribution by using simulations. To start off, we simulate the process of drawing 5000 samples of size 1040 from a population with a true atheist proportion of 0.1. For each of the 5000 samples we compute p̂ and then plot a histogram to visualize their distribution.
p <- 0.1
n <- 1040
p_hats <- rep(0, 5000)
for(i in 1:5000){
samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
p_hats[i] <- sum(samp == "atheist")/n
}
hist(p_hats, main = "p = 0.1, n = 1040", xlim = c(0, 0.18))
summary(p_hats)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.07019 0.09327 0.09904 0.09969 0.10577 0.12981
sd(p_hats)
## [1] 0.009287382
This sampling distribution has a normal distribution with center at 0.10 and spread of 0.07 to 0.13, and standard deviation of 0.0093.
The 3 new sampling distributions are also normal, but with different centers, spreads, and frequencies.
A larger n reduces the spread of the distribution.
A larger p reduces the height of the distribution (lower frequencies).
p <- 0.1
n <- 400
p_hats2 <- rep(0, 5000)
for(i in 1:5000){
samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
p_hats2[i] <- sum(samp == "atheist")/n
}
p <- 0.02
n <- 1040
p_hats3 <- rep(0, 5000)
for(i in 1:5000){
samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
p_hats3[i] <- sum(samp == "atheist")/n
}
p <- 0.02
n <- 400
p_hats4 <- rep(0, 5000)
for(i in 1:5000){
samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
p_hats4[i] <- sum(samp == "atheist")/n
}
par(mfrow = c(2, 2))
hist(p_hats, main = "p = 0.1, n = 1040", xlim = c(0, 0.18))
hist(p_hats2, main = "p = 0.1, n = 400", xlim = c(0, 0.18))
hist(p_hats3, main = "p = 0.02, n = 1040", xlim = c(0, 0.18))
hist(p_hats4, main = "p = 0.02, n = 400", xlim = c(0, 0.18))
mean(p_hats)
## [1] 0.09969
mean(p_hats2)
## [1] 0.099759
mean(p_hats3)
## [1] 0.01995423
mean(p_hats4)
## [1] 0.0198785
Once you’re done, you can reset the layout of the plotting window by using the command par(mfrow = c(1, 1)) command or clicking on “Clear All” above the plotting window (if using RStudio). Note that the latter will get rid of all your previous plots.
Australia p-hat = 0.1 n = 1040 np = 104 met n(1-p)= 936 met
Ecuador p-hat = 0.02 n = 400 np = 8 not met n(1-p) = 392 met
For Australia, conditions for inference are met. But for Ecuador, sample size is not large enough. np >= 10 condition is not met. But the shape of both sampling distributions look normal with similar means. Since the distribution looks normal and the choice of “10” is somewhat arbitrary, it is reasonable to proceed with inference.