library(tidyverse)## Warning: package 'tidyverse' was built under R version 4.0.4
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.0.6 v dplyr 1.0.3
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Answer: These statistics are based on sample statistics because they were collected from a poll. In this case, knowing the exact population parameters would be impossible.
Answer: We would presume that each nation is independent and that the sample size of each country is less than 10% of the total population. I believe it is fair to expect both and it is impossible that one person’s response to the poll will influence the response of another. In comparison, the poll’s estimated sample size was 50,000 participants, which is less than 10% of the world’s population.
download.file("http://www.openintro.org/stat/data/atheism.RData", destfile = "atheism.RData")
load("atheism.RData")Answer: Each row represents a nation that was polled, and each row of atheism represents the percentage of people who identified as persuaded atheists in that country’s study.
us12 <- subset(atheism, nationality == "United States" & year == "2012")
us12ath <- subset(atheism, nationality == "United States" & year == "2012" & response == "atheist")
nrow(us12ath)/nrow(us12)## [1] 0.0499002
table(us12$response)##
## atheist non-atheist
## 50 952
prop.table(table(us12), 1)## , , year = 2012
##
## response
## nationality atheist non-atheist
## Afghanistan
## Argentina
## Armenia
## Australia
## Austria
## Azerbaijan
## Belgium
## Bosnia and Herzegovina
## Brazil
## Bulgaria
## Cameroon
## Canada
## China
## Colombia
## Czech Republic
## Ecuador
## Fiji
## Finland
## France
## Georgia
## Germany
## Ghana
## Hong Kong
## Iceland
## India
## Iraq
## Ireland
## Italy
## Japan
## Kenya
## Korea, Rep (South)
## Lebanon
## Lithuania
## Macedonia
## Malaysia
## Moldova
## Netherlands
## Nigeria
## Pakistan
## Palestinian territories (West Bank and Gaza)
## Peru
## Poland
## Romania
## Russian Federation
## Saudi Arabia
## Serbia
## South Africa
## South Sudan
## Spain
## Sweden
## Switzerland
## Tunisia
## Turkey
## Ukraine
## United States 0.0499002 0.9500998
## Uzbekistan
## Vietnam
Answer: The computed value from the atheism responses is 4.99%, so around 5% and it agrees with the Table 6 value which is also 5%.
Answer: The surveys are independent, as shown by the fact that the US random sample size is 1,002. (which is less than 10 percent of the US population). Also, as 1002(.05)= 50.1, the success-failure condition (np >= 10) is satisfied. The conditions for inference have been fulfilled.
inference(us12$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")## Warning: package 'BHH2' was built under R version 4.0.4
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.0499 ; n = 1002
## Check conditions: number of successes = 50 ; number of failures = 952
## Standard error = 0.0069
## 95 % Confidence interval = ( 0.0364 , 0.0634 )
SE = 0.0069
Z_score = 1.96
ME = SE * Z_score
ME## [1] 0.013524
Answer:The margin or error for the Us estimate is 0.013524.
fr12 <- subset(atheism, nationality == "France" & year == "2012")
de12 <- subset(atheism, nationality == "Germany" & year == "2012")table(fr12$response)##
## atheist non-atheist
## 485 1203
table(de12$response)##
## atheist non-atheist
## 75 427
inference(fr12$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.2873 ; n = 1688
## Check conditions: number of successes = 485 ; number of failures = 1203
## Standard error = 0.011
## 95 % Confidence interval = ( 0.2657 , 0.3089 )
inference(de12$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.1494 ; n = 502
## Check conditions: number of successes = 75 ; number of failures = 427
## Standard error = 0.0159
## 95 % Confidence interval = ( 0.1182 , 0.1806 )
# For France
SE = 0.011
Z_score = 1.96
ME = SE * Z_score
ME## [1] 0.02156
# For Germany
SE = 0.0159
Z_score = 1.96
ME = SE * Z_score
ME## [1] 0.031164
Answer: The countries chosen were France and Germany. The conditions for inference (independence and the success-failure condition) were met. The sample size for France was 485 (which is less than 10% of the population). For Germany, the sample size was 75 (also less than 10% of the population). The success-failure condition (np >= 10) for France was met since 485(.05)= 24.25 and the success-failure condition for Germany was less than 10 since 75(.05)= 3.75 which is less than 10. The margin or error for the France estimate is 0.02156 and the 95% confident interval is (0.2657, 0.3089). The margin or error for the Germany estimate is 0.031164 and the 95% confident interval is (0.1182 , 0.1806).
n <- 1000
p <- seq(0, 1, 0.01)
me <- 2 * sqrt(p * (1 - p)/n)
plot(me ~ p, ylab = "Margin of Error", xlab = "Population Proportion")Answer: Holding the sample size constant, the ME reaches its maximum value when p=.50 (50%). ME decreases symmetrically as p increases to 1 (100%). It is a parabolic distribution (non linear), which n increase as p increases until p=.5 and then margin of error drops.
p <- 0.1
n <- 1040
p_hats <- rep(0, 5000)
for(i in 1:5000){
samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
p_hats[i] <- sum(samp == "atheist")/n
}
hist(p_hats, main = "p = 0.1, n = 1040", xlim = c(0, 0.18))summary(p_hats)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.07019 0.09327 0.09904 0.09969 0.10577 0.12981
sd(p_hats)## [1] 0.009287382
IQR(p_hats)## [1] 0.0125
mean.p_hats <- mean(p_hats)
median.p_hats <- median(p_hats)boxplot(p_hats,y_lab="p_hats",x_lab="proportions")
abline(h = median.p_hats, col = "blue")
abline(h = mean.p_hats, col = "red")
abline(h= quantile(p_hats,0.25), col = "orange")
abline(h = quantile(p_hats,0.75), col = "green")Answer: The sampling distribution has a near normal distribution with the mean close to the population mean of 0.1. There are a few outliers on both upper and lower tails but these are small compared to the total sample size.
p <- 0.1
n <- 400
p_hats2 <- rep(0, 5000)
for(i in 1:5000){
samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
p_hats2[i] <- sum(samp == "atheist")/n
}p <- 0.2
n <- 1040
p_hats3 <- rep(0, 5000)
for(i in 1:5000){
samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
p_hats3[i] <- sum(samp == "atheist")/n
}p <- 0.2
n <- 400
p_hats4 <- rep(0, 5000)
for(i in 1:5000){
samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
p_hats4[i] <- sum(samp == "atheist")/n
}par(mfrow = c(2, 2))
hist(p_hats, main = "p = 0.1, n = 1040", xlim = c(0, 0.30))
hist(p_hats2, main = "p = 0.1, n = 400", xlim = c(0, 0.30))
hist(p_hats3, main = "p = 0.2, n = 1040", xlim = c(0, 0.30))
hist(p_hats4, main = "p = 0.2, n = 400", xlim = c(0, 0.30))Answer: The larger the sample size n, the smaller the spread. The smaller the spread, the smaller the margin of error. For p, it is the opposite. The large the p, the smaller the spread. Thus the margin of error increases with increasing p (until it reaches 0.5 then the opposite is true).
table( subset(atheism, nationality == "Australia" & year == "2012")$response)##
## atheist non-atheist
## 104 935
table( subset(atheism, nationality == "Ecuador" & year == "2012")$response)##
## atheist non-atheist
## 8 396
Answer: Both tend to have reasonably regular distributions, and both samples are less than 10% of each country’s population; however, although Australia meets the success-failure condition [1040(.1) = 104 >= 10], Ecuador does not [400(.02) = 8 which is not >= 10]. In that regard, I will not follow the report’s lead.
Answer: H0: In Spain, there is no difference between the proportion of atheists in 2005 and 2012. HA: In Spain, there is a difference between the proportion of atheists in 2005 and 2012. Inference conditions: - Independent: the sample sizes were 1145 and 1146 (less than 10% of the population). - Success-failure condition: the success-failure condition (np >= 10) was met since 1146(.05)= 57.3 and 1145(.05)= 57.25
Spain05 = subset(atheism, nationality == "Spain" & year == "2005")
Spain12 = subset(atheism, nationality == "Spain" & year == "2012")nrow(Spain05)## [1] 1146
nrow(Spain12)## [1] 1145
spain = atheism %>%
subset(year == 2005 || year == 2012) %>%
subset(nationality == "Spain")
inference(spain$response, x = spain$year, est = "proportion", type = "ht", null = 0, alternative = "twosided", method = "theoretical", success = "atheist")## Warning: Explanatory variable was numerical, it has been converted to
## categorical. In order to avoid this warning, first convert your explanatory
## variable to a categorical variable using the as.factor() function.
## Response variable: categorical, Explanatory variable: categorical
## Two categorical variables
## Difference between two proportions -- success: atheist
## Summary statistics:
## x
## y 2005 2012 Sum
## atheist 115 103 218
## non-atheist 1031 1042 2073
## Sum 1146 1145 2291
## Observed difference between proportions (2005-2012) = 0.0104
##
## H0: p_2005 - p_2012 = 0
## HA: p_2005 - p_2012 != 0
## Pooled proportion = 0.0952
## Check conditions:
## 2005 : number of expected successes = 109 ; number of expected failures = 1037
## 2012 : number of expected successes = 109 ; number of expected failures = 1036
## Standard error = 0.012
## Test statistic: Z = 0.848
## p-value = 0.3966
p_spain05 = 0.1003
n_spain05 = 1146
p_spain12 = 0.09
n_spain12 = 1145
PE_spain = p_spain12 - p_spain05
SE_spn = sqrt((p_spain05*(1-p_spain05)/n_spain05)+(p_spain12*(1-p_spain12)/n_spain12))
SE_spn## [1] 0.01225854
PE_spain + (1.96*SE_spn)## [1] 0.01372674
PE_spain - (1.96*SE_spn)## [1] -0.03432674
Answer: Since the p-value is 0.3966 and less than the alpha which is 0.05, we reject the null hypothesis that, in Spain, there is moderate evidence that there is no difference in proportions of atheists in Spain between the year 2005 and the year 2012.
Answer: H0: proportion of US’s atheists in 2005 = proportion of US’s atheists in 2012 HA: proportion of US’s atheists in 2005 != proportion of US’s atheists in 2012 Inference conditions: - Independent: the sample sizes were 1002 (less than 10% of the population). - Success-failure condition: the success-failure condition (np >= 10) was met since 1002(.05)= 50.1
usa05 <- subset(atheism, nationality == "United States" & year == "2005")
usa12 <- subset(atheism, nationality == "United States" & year == "2012")nrow(usa05)## [1] 1002
nrow(usa12)## [1] 1002
US = atheism %>%
subset(year == 2005 || year == 2012) %>%
subset(nationality == "United States")
inference(US$response, x = US$year, est = "proportion", type = "ht", null = 0, alternative = "twosided", method = "theoretical", success = "atheist")## Warning: Explanatory variable was numerical, it has been converted to
## categorical. In order to avoid this warning, first convert your explanatory
## variable to a categorical variable using the as.factor() function.
## Response variable: categorical, Explanatory variable: categorical
## Two categorical variables
## Difference between two proportions -- success: atheist
## Summary statistics:
## x
## y 2005 2012 Sum
## atheist 10 50 60
## non-atheist 992 952 1944
## Sum 1002 1002 2004
## Observed difference between proportions (2005-2012) = -0.0399
##
## H0: p_2005 - p_2012 = 0
## HA: p_2005 - p_2012 != 0
## Pooled proportion = 0.0299
## Check conditions:
## 2005 : number of expected successes = 30 ; number of expected failures = 972
## 2012 : number of expected successes = 30 ; number of expected failures = 972
## Standard error = 0.008
## Test statistic: Z = -5.243
## p-value = 0
p_usa05 = 0.01
n_usa05 = 1002
p_usa12 = 0.05
n_usa12 = 1002
PE_usa = p_usa12 - p_usa05
SE_usa = sqrt(((p_usa05*(1-p_usa05))/n_usa05)+((p_usa12*(1-p_usa12))/n_usa12))
SE_usa## [1] 0.007568714
PE_usa + (1.96*SE_usa)## [1] 0.05483468
PE_usa - (1.96*SE_usa)## [1] 0.02516532
Answer: Since the p-value is 0 and less than the alpha which is 0.05, we reject the null hypothesis that, in USA, there is very strong evidence that there is no difference in proportions of atheists in USa between the year 2005 and the year 2012.
Answer: It’s a Type I error to detect a shift when there isn’t any (rejecting H0 when H0 is true). This is likely to happen, but not more than 5% of the time. With 39 countries in Table 4, we’d like to see a difference in around 2 countries [39*0.05 = 1.95].
ME = 0.01
z = 1.96
n <- (z^2 * .25)/ ME^2Answer: The sample size should be at least 9604.