Since these findings are from a poll, that did not include the entire world population, they are sample statistics.
To generalize the results, the samples must be random and represntative of the world’s population. Because it was a poll done by an international organziation and used a random method from each country, the poll meets those requirements, at least as well as it could in a practical sense. It seems to favor industrialized countries, but those countries are where the world’s population centers are.
Table 6 corresponds to sample statistics summarized by country. The atheism dataset’s observations correspond to individual responses.
us12 <- atheism[atheism$nationality == "United States" & atheism$year == "2012",]
sum(us12$response == 'atheist')/nrow(us12)
## [1] 0.0499002
The table agrees with the raw data.
We already know the independence requirement is satisfied. There are 50 “atheist” responses, meaning both the success and failure categories have at least 10 responses. The requirements appear to be met.
inference(us12$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")
## Warning: package 'BHH2' was built under R version 3.4.2
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.0499 ; n = 1002
## Check conditions: number of successes = 50 ; number of failures = 952
## Standard error = 0.0069
## 95 % Confidence interval = ( 0.0364 , 0.0634 )
The margin of error is (.0634 - .0364)/2 = 0.0135
can12 <- atheism[atheism$nationality == "Canada" & atheism$year == "2012",]
sum(can12$response == 'atheist')/nrow(can12)
## [1] 0.08982036
cam12 <- atheism[atheism$nationality == "Cameroon" & atheism$year == "2012",]
sum(cam12$response == 'atheist')/nrow(cam12)
## [1] 0.0297619
inference(can12$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.0898 ; n = 1002
## Check conditions: number of successes = 90 ; number of failures = 912
## Standard error = 0.009
## 95 % Confidence interval = ( 0.0721 , 0.1075 )
inference(cam12$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.0298 ; n = 504
## Check conditions: number of successes = 15 ; number of failures = 489
## Standard error = 0.0076
## 95 % Confidence interval = ( 0.0149 , 0.0446 )
We have at least 15 successes and failures for each set, so we can use the results.
Canada margin: 0.03885 Cameroon margin: 0.01485
There is a parabolic relationship between p and me, with a maximum at .5
p <- 0.1
n <- 1040
p_hats <- rep(0, 5000)
for(i in 1:5000){
samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
p_hats[i] <- sum(samp == "atheist")/n
}
mean(p_hats)
## [1] 0.1000144
It has a median of .09904 and a mean of .09969. The shape is farily symetric, the mean higher than the median can indicate a very minor right skew, but it is not apparent from the historgram.
samp_func <- function (p, n) {
p_hats <- rep(0, 5000)
for(i in 1:5000){
samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
p_hats[i] <- sum(samp == "atheist")/n
}
p_hats
}
p_hats1 <- samp_func(.1, 400)
p_hats2 <- samp_func(.02, 1040)
p_hats3 <- samp_func(.02, 400)
par(mfrow = c(2, 2))
hist(p_hats1, main = "p = 0.1, n = 400", xlim = c(0, 0.18))
hist(p_hats2, main = "p = 0.02, n = 1040", xlim = c(0, .06))
hist(p_hats3, main = "p = 0.02, n = 400", xlim = c(0, .06))
par(mfrow = c(1, 1))
mean(p_hats1)
## [1] 0.100102
mean(p_hats2)
## [1] 0.02009788
mean(p_hats3)
## [1] 0.0200705
Inference around the n = 400 p = .02 distribution is questionable. There is noticable skew in that sampling distribution. The other ones look acceptable.
For both:
Ho there is no difference between the proportions Ha There is a difference between the proportions
a)
spa05 <- atheism[atheism$nationality == "Spain" & atheism$year == "2005",]
sum(spa05$response == 'atheist')
## [1] 115
spa12 <- atheism[atheism$nationality == "Spain" & atheism$year == "2012",]
sum(spa12$response == 'atheist')
## [1] 103
spa <- atheism[atheism$nationality == "Spain" & atheism$year %in% c("2012", "2005"),]
inference(spa05$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.1003 ; n = 1146
## Check conditions: number of successes = 115 ; number of failures = 1031
## Standard error = 0.0089
## 95 % Confidence interval = ( 0.083 , 0.1177 )
inference(spa12$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.09 ; n = 1145
## Check conditions: number of successes = 103 ; number of failures = 1042
## Standard error = 0.0085
## 95 % Confidence interval = ( 0.0734 , 0.1065 )
inference(x = spa$year, y = spa$response, est = "proportion", type = "ht", method = "theoretical", alternative = "twosided", null = 0,
success = "atheist")
## Warning: Explanatory variable was numerical, it has been converted to
## categorical. In order to avoid this warning, first convert your explanatory
## variable to a categorical variable using the as.factor() function.
## Response variable: categorical, Explanatory variable: categorical
## Two categorical variables
## Difference between two proportions -- success: atheist
## Summary statistics:
## x
## y 2005 2012 Sum
## atheist 115 103 218
## non-atheist 1031 1042 2073
## Sum 1146 1145 2291
## Observed difference between proportions (2005-2012) = 0.0104
##
## H0: p_2005 - p_2012 = 0
## HA: p_2005 - p_2012 != 0
## Pooled proportion = 0.0952
## Check conditions:
## 2005 : number of expected successes = 109 ; number of expected failures = 1037
## 2012 : number of expected successes = 109 ; number of expected failures = 1036
## Standard error = 0.012
## Test statistic: Z = 0.848
## p-value = 0.3966
The confidence intervals overlap, indicating that there is not a clear difference, and this is confirmed with the hypothesis test.
b)
us05 <- atheism[atheism$nationality == "United States" & atheism$year == "2005",]
sum(spa05$response == 'atheist')
## [1] 115
us12 <- atheism[atheism$nationality == "United States" & atheism$year == "2012",]
sum(spa12$response == 'atheist')
## [1] 103
us <- atheism[atheism$nationality == "United States" & atheism$year %in% c("2012", "2005"),]
inference(us05$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.01 ; n = 1002
## Check conditions: number of successes = 10 ; number of failures = 992
## Standard error = 0.0031
## 95 % Confidence interval = ( 0.0038 , 0.0161 )
inference(us12$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.0499 ; n = 1002
## Check conditions: number of successes = 50 ; number of failures = 952
## Standard error = 0.0069
## 95 % Confidence interval = ( 0.0364 , 0.0634 )
inference(x = us$year, y = us$response, est = "proportion", type = "ht", method = "theoretical", alternative = "twosided", null = 0, success = "atheist")
## Warning: Explanatory variable was numerical, it has been converted to
## categorical. In order to avoid this warning, first convert your explanatory
## variable to a categorical variable using the as.factor() function.
## Response variable: categorical, Explanatory variable: categorical
## Two categorical variables
## Difference between two proportions -- success: atheist
## Summary statistics:
## x
## y 2005 2012 Sum
## atheist 10 50 60
## non-atheist 992 952 1944
## Sum 1002 1002 2004
## Observed difference between proportions (2005-2012) = -0.0399
##
## H0: p_2005 - p_2012 = 0
## HA: p_2005 - p_2012 != 0
## Pooled proportion = 0.0299
## Check conditions:
## 2005 : number of expected successes = 30 ; number of expected failures = 972
## 2012 : number of expected successes = 30 ; number of expected failures = 972
## Standard error = 0.008
## Test statistic: Z = -5.243
## p-value = 0
For the US, the confidence intervals do not overlap, indicating that we can reject the null hypothesis.
We would expect to see about 2 type 1 errors out of the 40 countries at a .05 level of significance.
Assume a p of .5:
(1.96/.01)^2*.5^2
## [1] 9604