13% of atheists, 23% are non-religion, and 59% believe in a religion, these are percentages that stem from sample statics, that came from a sample set of 50,000 men and women, across 57 countries.
We must assume the report/sample is randomly sampled and independent. This is reasonable due to the heavy potential presence of bias, and other factors that could easily waver the findngs to fit one’s beliefs and expectations.
Each row of table 6 corresponds to a county and its correlating percentage in each category. While each row in atheism is only corresponding to those who are atheists and non-atheists.
download.file("http://www.openintro.org/stat/data/atheism.RData", destfile = "atheism.RData")
load("atheism.RData")
Yes, it dose agree, even though it’s not exactly 0.05% it’s very close and can be round.
us12 <- subset(atheism, nationality == "United States" & year == "2012")
nrow(us12)
## [1] 1002
us12a <- subset(us12, response == "atheist")
nrow(us12a)
## [1] 50
us12n <- subset(us12, response == "non-atheist")
nrow ("us12n")
## NULL
usathp <-50/1002
usathp
## [1] 0.0499002
inference(us12$response, est = "proportion", type = "ci", method = "theoretical", success = "atheist")
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.0499 ; n = 1002
## Check conditions: number of successes = 50 ; number of failures = 952
## Standard error = 0.0069
## 95 % Confidence interval = ( 0.0364 , 0.0634 )
The Margin of error is 0.01352
I choose Romania and Brazil because the conditions are met for both, Romania had a margin error of 0.000588 while Brazil had one of 0.004312
Romania12 = subset(atheism, atheism$nationality == "Romania" & atheism$year == "2012")
inference(Romania12$response, est = "proportion", type = "ci", method = "theoretical", success = "atheist")
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.0096 ; n = 1039
## Check conditions: number of successes = 10 ; number of failures = 1029
## Standard error = 0.003
## 95 % Confidence interval = ( 0.0037 , 0.0156 )
margER <- 1.96 * 0.0003
margER
## [1] 0.000588
Brazil12 = subset(atheism, atheism$nationality == "Brazil" & atheism$year == "2012")
inference(Brazil12$response, est = "proportion", type = "ci", method = "theoretical", success = "atheist")
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.01 ; n = 2002
## Check conditions: number of successes = 20 ; number of failures = 1982
## Standard error = 0.0022
## 95 % Confidence interval = ( 0.0056 , 0.0143 )
margEB <- 1.96 * 0.0022
When the proportion moves away from 0.5 the margin of error decreases, in other words, the closer the P is to 0.5 the higher the ME.
n <- 1000
p <- seq(0, 1, 0.01)
me <- 2 * sqrt(p * (1 - p)/n)
plot(me ~ p, ylab = "Margin of Error", xlab = "Population Proportion")
### 9.) Describe the sampling distribution of sample proportions at
n=1040 and p=0.1. Be sure to note the center, spread, and shape.Hint:
Remember that R has functions such as mean to calculate summary
statistics
The mean and median of the distribution are identical, with a fairly similar shape and center, that consists of minimal spread.
p <- 0.1
n <- 1040
p_hats <- rep(0, 5000)
for(i in 1:5000){
samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
p_hats[i] <- sum(samp == "atheist")/n
}
hist(p_hats, main = "p = 0.1, n = 1040", col = "lavender", xlim = c(0, 0.18))
summary(p_hats)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.07019 0.09327 0.09904 0.09969 0.10577 0.12981
hist(p_hats,
main ="p = 0.1, n= 1040",
xlim = c(0,0.18),
col = "lavender")
sd(p_hats)
## [1] 0.009287382
The spread decreases as the sample size increases, so the larger the sample usually the closest estimate of the population proportion. While the larger the sample size the more normal the shape of the distribution. While the larger the P seems to be the higher the margin of Error.
n1 = 400
p1 = 0.1
n2 = 1040
p2 = 0.02
n3 = 400
p3 = 0.02
p_hats1 <- rep(0,5000)
for(i in 1:5000){
samp1 <- sample(c("atheist","non_atheist"),n1,replace = TRUE,prob = c(p1, (1-p1)))
p_hats1[i] <- sum(samp1 == "atheist")/n1
}
p_hats2 <- rep(0,5000)
for(i in 1:5000){
samp2 <- sample(c("atheist","non_atheist"),n2,replace = TRUE,prob = c(p2, (1-p2)))
p_hats2[i] <- sum(samp2 == "atheist")/n2
}
p_hats3 <- rep(0,5000)
for(i in 1:5000){
samp3 <- sample(c("atheist","non_atheist"),n3,replace = TRUE,prob = c(p3, (1-p3)))
p_hats3[i] <- sum(samp3 == "atheist")/n3
}
par(mfrow=c(2,2))
hist(p_hats1, main = "p = 0.1, n = 400", xlim = c(0, 0.18), col = "lavender")
hist(p_hats2, main = "p = 0.02, n = 1040", xlim = c(0, 0.05), col = "lavender")
hist(p_hats3, main = "p = 0.02, n = 400", xlim = c(0, 0.05), col = "lavender")
### 11.) If you refer to Table 6, you’ll find that Australia has a
sample proportion of 0.1 on a sample size of 1040, and that Ecuador has
a sample proportion of 0.02 on 400 subjects. Let’s suppose for this
exercise that these point estimates are actually the truth. Then given
the shape of their respective sampling distributions, do you think it is
sensible to proceed with inference and report margin of errors, as the
reports does?
Australia would be the only one I would feel confident about due to having sufficient data such as the np and n value being big enough. While Ecuador seems to fall short of meeting the conditions Australia does as its np is only 8.