To generalize the findings, we have to assume that participants were chosen randomly. Parts of the methodology were not random, but for such a large survey, important efforts were made to achieve an appropriate sample. According to WIN-Gallup International,
## nationality response year
## 1 Afghanistan non-atheist 2012
## 2 Afghanistan non-atheist 2012
## 3 Afghanistan non-atheist 2012
## 'data.frame': 88032 obs. of 3 variables:
## $ nationality: Factor w/ 57 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ response : Factor w/ 2 levels "atheist","non-atheist": 2 2 2 2 2 2 2 2 2 2 ...
## $ year : int 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
Table 6 has columns for country, sample size, and percentages of respndents within a country reporting: “a religious person”, “not a religious person”, “A convinced atheist”, “don’t know/no response”. Our data frame, “atheism” contains three variables: “nationality”, “response” and “year”. For response, we only have 2 categories: “atheist” and “non-atheist”.
us.people<-subset(atheism, nationality=="United States"&year=="2012")
us.people.atheist<-subset(us.people, response=="atheist")
us.people.non_atheist<-subset(us.people, response=="non-atheist")
length(us.people.atheist[,1])/(length(us.people.non_atheist[,1])+length(us.people.atheist[,1]))
## [1] 0.0499002
Our conditions for inference are: independence, random, and normal. These conditions are met by our sample of respondent chosen within countries by professional pollsters with over 50,000 samples, but well below 10% of earth’s population.
inference(us.people$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.0499 ; n = 1002
## Check conditions: number of successes = 50 ; number of failures = 952
## Standard error = 0.0069
## 95 % Confidence interval = ( 0.0364 , 0.0634 )
The margin of error for the US reported by R is +/- 0.0135.
china.people<-subset(atheism, nationality=="China"&year=="2012")
brazil.people<-subset(atheism, nationality=="Brazil"&year=="2012")
inference(china.people$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.47 ; n = 500
## Check conditions: number of successes = 235 ; number of failures = 265
## Standard error = 0.0223
## 95 % Confidence interval = ( 0.4263 , 0.5137 )
The margin of error for China reported by R is +/- 0.0437. The conditions for inference are met.
inference(brazil.people$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.01 ; n = 2002
## Check conditions: number of successes = 20 ; number of failures = 1982
## Standard error = 0.0022
## 95 % Confidence interval = ( 0.0056 , 0.0143 )
The margin of error for Brazil reported by R is +/- 0.00435. The conditions for inference are met.
n <- 1000
p <- seq(0, 1, 0.01)
me <- 2 * sqrt(p * (1 - p)/n)
plot(me ~ p, ylab = "Margin of Error", xlab = "Population Proportion")
The closer a proportion is to 50%, the more higher the margin of error. Because it uses a square root, the relationship is curved. Because it is based on a binomial, it is symmetrical.
p <- 0.1
n <- 1040
p_hatsa <- rep(0, 5000)
for(i in 1:5000){
samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
p_hatsa[i] <- sum(samp == "atheist")/n
}
hist(p_hatsa, main = "p = 0.1, n = 1040", xlim = c(0, 0.18))
The sampling distribution of the proportions appears to be quite normal. It is centered around 0.09969. The skewness, at 0.0567685, is quite small. The kurtosis is -0.0911565. Our distribution is quite normal.
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.0499 ; n = 1002
## Check conditions: number of successes = 50 ; number of failures = 952
## Standard error = 0.0069
## 95 % Confidence interval = ( 0.0364 , 0.0634 )
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.01 ; n = 1002
## Check conditions: number of successes = 10 ; number of failures = 992
## Standard error = 0.0031
## 95 % Confidence interval = ( 0.0038 , 0.0161 )
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.09 ; n = 1145
## Check conditions: number of successes = 103 ; number of failures = 1042
## Standard error = 0.0085
## 95 % Confidence interval = ( 0.0734 , 0.1065 )
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.1003 ; n = 1146
## Check conditions: number of successes = 115 ; number of failures = 1031
## Standard error = 0.0089
## 95 % Confidence interval = ( 0.083 , 0.1177 )
a) In 2005, Spain’s c.i. was 8.3% to 11.77%. In the 2012 survey , it was 7.34% to 10.65%. They overlap a great deal. There is not evidence that the proportion of atheists in Spain changed over the 7 year period in between.
If there was no change in the atheism index, we would expect 1 in 20 countries to show a statistical significance, or 1.95 out of the 39 countries in table 4. The reality, though, is unlikely to be that clean. It is particularly possible that some countries changed while others didn’t. It would be surpising if all countries around the world were experiencing the same cultural pressures and responding to them identically.
If one wanted to create a single statistical test with a margin of error of 1% and with 95% confidence, one would want z*s.e. to equal 1%, or .01. s.e. = .01/1.96, which means s.e. should be .0051. In the case of a binomial, the variance of a sum is npq. For an average of a binomial, it’s npq over n squared, which is the pq over n used above. If the square root of pq/n is .0051, pq/n = .000026.