The way it is phrased it seems to be a population parameters. However, in reallity it is a sample statistics.
The sample should be proportinally distributed (random) among world population with a big sample size for such a diverse population. I doubt it is trully independent sample of reasonable size. So, results might paint incorrect picture. Or in another words the article might end up describing a sample and not world population (overfitting).
download.file("http://www.openintro.org/stat/data/atheism.RData", destfile = "atheism.RData")
load("atheism.RData")
head(atheism)
## nationality response year
## 1 Afghanistan non-atheist 2012
## 2 Afghanistan non-atheist 2012
## 3 Afghanistan non-atheist 2012
## 4 Afghanistan non-atheist 2012
## 5 Afghanistan non-atheist 2012
## 6 Afghanistan non-atheist 2012
summary(atheism)
## nationality response year
## Pakistan : 5409 atheist : 5498 Min. :2005
## France : 3359 non-atheist:82534 1st Qu.:2005
## Korea, Rep (South): 3047 Median :2012
## Ghana : 2995 Mean :2009
## Macedonia : 2418 3rd Qu.:2012
## Peru : 2414 Max. :2012
## (Other) :68390
Each row in table 6 corresponds to a country sample and sample statistics.
Each row in the atheism file corresponds an individual response (with country residence and year of response provided).
us12 <- subset(atheism, nationality == "United States" & year == "2012")
prop.table(table(us12$response))
##
## atheist non-atheist
## 0.0499002 0.9500998
File ateism and the Table 6 agree, both show that 5% of USA sample are atheists.
Sample should be random, independent (less than 10% of population), and approximately normal (both success and failure more than 10). I am confident about independence and normality, but I am not sure that sample is random (it could be).
inference(us12$response, est = "proportion", type = "ci", method = "theoretical", success = "atheist")
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.0499 ; n = 1002
## Check conditions: number of successes = 50 ; number of failures = 952
## Standard error = 0.0069
## 95 % Confidence interval = ( 0.0364 , 0.0634 )
Margin of error is 0.0499-0.0364 or 0.0135
0.0499-0.0364
## [1] 0.0135
Confidence interval for France is (0.2657,0.3089).
Confidence interval for Argentina is (0.0547,0.0866)
fr12 <- subset(atheism, nationality == "France" & year == "2012")
summary(fr12)
## nationality response year
## France :1688 atheist : 485 Min. :2012
## Afghanistan: 0 non-atheist:1203 1st Qu.:2012
## Argentina : 0 Median :2012
## Armenia : 0 Mean :2012
## Australia : 0 3rd Qu.:2012
## Austria : 0 Max. :2012
## (Other) : 0
arg12 <- subset(atheism, nationality == "Argentina" & year == "2012")
summary(arg12)
## nationality response year
## Argentina :991 atheist : 70 Min. :2012
## Afghanistan: 0 non-atheist:921 1st Qu.:2012
## Armenia : 0 Median :2012
## Australia : 0 Mean :2012
## Austria : 0 3rd Qu.:2012
## Azerbaijan : 0 Max. :2012
## (Other) : 0
inference(fr12$response, est = "proportion", type = "ci", method = "theoretical", success = "atheist")
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.2873 ; n = 1688
## Check conditions: number of successes = 485 ; number of failures = 1203
## Standard error = 0.011
## 95 % Confidence interval = ( 0.2657 , 0.3089 )
inference(arg12$response, est = "proportion", type = "ci", method = "theoretical", success = "atheist")
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.0706 ; n = 991
## Check conditions: number of successes = 70 ; number of failures = 921
## Standard error = 0.0081
## 95 % Confidence interval = ( 0.0547 , 0.0866 )
n <- 1000
p <- seq(0, 1, 0.01)
me <- 2*sqrt(p*(1-p)/n)
plot(me~p, ylab="Margin of error", xlab = "Population proportion")
Margin of error keeps increasing as population proprtion incrreases from 0 to 0.5 and then keeps decreasing as population proportion increases from 0.5 to 1.0.
p <- 0.1
n <- 1040
p_hats <- rep(0,5000)
for(i in 1:5000){
samp<-sample(c("atheist", "non-atheist"), n, replace=TRUE, prob = c(p, 1-p))
p_hats[i]<-sum(samp=="atheist")/n
}
hist(p_hats, main="p = 0.1, n = 1040", xlim=c(0, 0.18))
summary(p_hats)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.07019 0.09327 0.09904 0.09969 0.10577 0.12981
The distribution appears near normal with mean of 0.1. It spreads from 0.067 to 0.137.
All 3 distributions appear near normal with first distribution having mean of 0.1 and 2nd and 3rd having mean of 0.02. N appears to effect distribution. Lower N’s have bigger spreads.
p <- 0.1
n <- 400
p_hats1 <- rep(0,5000)
for(i in 1:5000){
samp<-sample(c("atheist", "non-atheist"), n, replace=TRUE, prob = c(p, 1-p))
p_hats1[i]<-sum(samp=="atheist")/n
}
p <- 0.02
n <- 1040
p_hats2 <- rep(0,5000)
for(i in 1:5000){
samp<-sample(c("atheist", "non-atheist"), n, replace=TRUE, prob = c(p, 1-p))
p_hats2[i]<-sum(samp=="atheist")/n
}
p <- 0.02
n <- 400
p_hats3 <- rep(0,5000)
for(i in 1:5000){
samp<-sample(c("atheist", "non-atheist"), n, replace=TRUE, prob = c(p, 1-p))
p_hats3[i]<-sum(samp=="atheist")/n
}
par(mfrow = c(2, 2))
hist(p_hats, main="p = 0.1, n = 1040", xlim=c(0, 0.18))
hist(p_hats1, main="p = 0.1, n = 400", xlim=c(0, 0.18))
hist(p_hats2, main="p = 0.02, n = 1040", xlim=c(0, 0.18))
hist(p_hats3, main="p = 0.02, n = 400", xlim=c(0, 0.18))
summary(p_hats1)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.05250 0.09000 0.10000 0.09976 0.11000 0.15500
summary(p_hats2)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.005769 0.017308 0.020192 0.019954 0.023077 0.039423
summary(p_hats3)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.01500 0.02000 0.01988 0.02500 0.04750
Reporting Margin of error would provide more accurate picture of true population religiosity status, but at the same time it will make report also more confusing for general public who often do not know how to interpret confidence intervals.
H1: Spain proportions changed in 2012 in comparison to 2005 results.
Conclusion: confidence intervals overlap, so we do not have enough evidence to reject H0. We accept H0, as there is not enough eveidence to conclude that there was substantial change in proportion of atheist since 2005.
H1: United Sates proportions changed in 2012 in comparison to 2005 results.
Conclusion: confidence intervals do not overlap, so we have enough evidence to reject H0. We reject H0, as there is enough eveidence to conclude that there was substantial change in proportion of atheist since 2005.
sp5 <- subset(atheism, nationality == "Spain" & year == "2005")
sp12 <- subset(atheism, nationality == "Spain" & year == "2012")
inference(sp5$response, est = "proportion", type = "ci", method = "theoretical", success = "atheist")
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.1003 ; n = 1146
## Check conditions: number of successes = 115 ; number of failures = 1031
## Standard error = 0.0089
## 95 % Confidence interval = ( 0.083 , 0.1177 )
inference(sp12$response, est = "proportion", type = "ci", method = "theoretical", success = "atheist")
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.09 ; n = 1145
## Check conditions: number of successes = 103 ; number of failures = 1042
## Standard error = 0.0085
## 95 % Confidence interval = ( 0.0734 , 0.1065 )
us5 <- subset(atheism, nationality == "United States" & year == "2005")
us12 <- subset(atheism, nationality == "United States" & year == "2012")
inference(us5$response, est = "proportion", type = "ci", method = "theoretical", success = "atheist")
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.01 ; n = 1002
## Check conditions: number of successes = 10 ; number of failures = 992
## Standard error = 0.0031
## 95 % Confidence interval = ( 0.0038 , 0.0161 )
inference(us12$response, est = "proportion", type = "ci", method = "theoretical", success = "atheist")
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.0499 ; n = 1002
## Check conditions: number of successes = 50 ; number of failures = 952
## Standard error = 0.0069
## 95 % Confidence interval = ( 0.0364 , 0.0634 )
n <- 10000
p <- seq(0, 1, 0.01)
me <- 2*sqrt(p*(1-p)/n)
plot(me~p, ylab="Margin of error", xlab = "Population proportion")