These are sample statistics.
We must assume that the data came from truly worldwide sources, and that all data was randomly collected from all sources. It’s not likely that happened- this is not a reasonable assumption.
download.file("http://www.openintro.org/stat/data/atheism.RData", destfile = "atheism.RData")
load("atheism.RData")
Each row in table 6 corresponds to a country where a sample of the population was surveyed about their self-perceived level of religiosity. Each row of the atheism table corresponds to a single response, divided only into atheist vs non-atheist.
us12 <- subset(atheism, nationality == "United States" & year == "2012")
proportion <- 49/1000
proportion
## [1] 0.049
Percent <- proportion* 100
Percent
## [1] 4.9
The calculated proportion of atheists is 0.049, or 4.9%. This matches very closely to the 5% in Table 6.
inference(us12$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.0499 ; n = 1002
## Check conditions: number of successes = 50 ; number of failures = 952
## Standard error = 0.0069
## 95 % Confidence interval = ( 0.0364 , 0.0634 )
Based on the R output, the margin of error is 0.0069
Arg12 <- subset(atheism, nationality == "Argentina" & year == "2012")
Aus12 <- subset(atheism, nationality == "Australia" & year == "2012")
inference(Arg12$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.0706 ; n = 991
## Check conditions: number of successes = 70 ; number of failures = 921
## Standard error = 0.0081
## 95 % Confidence interval = ( 0.0547 , 0.0866 )
inference(Aus12$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.1001 ; n = 1039
## Check conditions: number of successes = 104 ; number of failures = 935
## Standard error = 0.0093
## 95 % Confidence interval = ( 0.0818 , 0.1183 )
Argentina: Margin = 0.0081 Australia: Margin = 0.0093
n <- 1000
p <- seq(0, 1, 0.01)
me <- 2 * sqrt(p * (1 - p)/n)
plot(me ~ p, ylab = "Margin of Error", xlab = "Population Proportion")
### Exercise 8
The margin of error peaks at the center of the population proportion, right around 0.5
p <- 0.1
n <- 1040
p_hats <- rep(0, 5000)
for(i in 1:5000){
samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
p_hats[i] <- sum(samp == "atheist")/n
}
hist(p_hats, main = "p = 0.1, n = 1040", xlim = c(0, 0.18))
mean(p_hats)
## [1] 0.09969
For p= 0.1, n= 1040, the spread is relatively normal between 0.065 and 0.14. The peak/center is just shy of 0.10, and the mean is 0.0999
par(mfrow = c(2, 2))
p <- 0.1
n <- 1040
p_hats <- rep(0, 5000)
for(i in 1:5000){
samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
p_hats[i] <- sum(samp == "atheist")/n
}
hist(p_hats, main = "p = 0.1, n = 1040", xlim = c(0, 0.18))
p <- 0.1
n <- 400
p_hats <- rep(0, 5000)
for(i in 1:5000){
samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
p_hats[i] <- sum(samp == "atheist")/n
}
hist(p_hats, main = "p = 0.1, n = 400", xlim = c(0, 0.18))
p <- 0.02
n <- 1040
p_hats <- rep(0, 5000)
for(i in 1:5000){
samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
p_hats[i] <- sum(samp == "atheist")/n
}
hist(p_hats, main = "p = 0.02, n = 1040", xlim = c(0, 0.18))
p <- 0.02
n <- 400
p_hats <- rep(0, 5000)
for(i in 1:5000){
samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
p_hats[i] <- sum(samp == "atheist")/n
}
hist(p_hats, main = "p = 0.02, n = 400", xlim = c(0, 0.18))
P= 0.1, n= 400 is still relatively. normal, with a similar peak to the first. The spread now covers 0.03 to 0.16. P=0.02, n=1040 is shifted completely to the left, and lies between 0.00 and 0.03 with a mean around 0.015. P=0.02, n=400 is also shifted left, with a similar mean and spread as p=0.02, n=1040. N does not appear to have as much impact on the sampling distribution as p does. When P decreased from .1 to .02, the distribution shifted completely left.
Since both distributions are relatively normal, it could be sensible to proceed with inference and margins of error. However, it would have to be stated what the p-hat and the n is to make sure that all calculations are to the appropriate scale and there’s not a false apples-apples comparison.