This percentages appear to be sample statistics. However, they talk about them like they are population parameters.
We must assume the the sample is a simple random sample. The report states that they sampled using a “national probability sample” method, which is technically not a simple random sample. However, for the sake of analysis, this sample is random enough to conduct inference.
load("more/atheism.RData")atheism correspond to?head(atheism)## nationality response year
## 1 Afghanistan non-atheist 2012
## 2 Afghanistan non-atheist 2012
## 3 Afghanistan non-atheist 2012
## 4 Afghanistan non-atheist 2012
## 5 Afghanistan non-atheist 2012
## 6 Afghanistan non-atheist 2012
Each row in table 6 corresponds to a single country and its relative proportions of religious beliefs among the sample taken from that country.
Each row of atheism corresponds to the religious beliefs of one person in one country during a specific year.
us12 that contains only the rows in atheism associated with respondents to the 2012 survey from the United States. Next, calculate the proportion of atheist responses. Does it agree with the percentage in Table 6? If not, why?us12 <- subset(atheism, nationality == "United States" & year == "2012")
library(dplyr)
us12 %>%
group_by(response) %>%
summarise(proportion = length(response)/1002)## # A tibble: 2 x 2
## response proportion
## <fct> <dbl>
## 1 atheist 0.0499
## 2 non-atheist 0.950
The proportion agrees with the percentage of atheist responses in the data. They rounded the atheist percentage up to 5% because 4.99% is close enough to 5%.
Observations must be independent - There may be some degree of sampling bias since the sample was not a simple random sample, however, the data is random enough to have independent observations. Also the sample is less than 10% of the population.
Success failure criteria - There are at least 10 expected people in the success group and 10 people in the failure group.
inference(us12$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")## Warning: package 'BHH2' was built under R version 3.5.3
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.0499 ; n = 1002
## Check conditions: number of successes = 50 ; number of failures = 952
## Standard error = 0.0069
## 95 % Confidence interval = ( 0.0364 , 0.0634 )
(0.0634 - 0.0364)/2## [1] 0.0135
The margin of error is 0.0135.
inference function, calculate confidence intervals for the proportion of atheists in 2012 in two other countries of your choice, and report the associated margins of error. Be sure to note whether the conditions for inference are met. It may be helpful to create new data sets for each of the two countries first, and then use these data sets in the inference function to construct the confidence intervals.Observations must be independent - There may be some degree of sampling bias since the sample was not a simple random sample, however, the data is random enough to have independent observations. Also the sample is less than 10% of the population.
Success failure criteria - There are at least 10 people in the expected success group and 10 people in the expected failure group for both samples.
serbia12 <- subset(atheism, nationality == "Serbia" & year == "2012")
serbia12 %>%
group_by(response) %>%
summarise(proportion = length(response)/1036)## # A tibble: 2 x 2
## response proportion
## <fct> <dbl>
## 1 atheist 0.0299
## 2 non-atheist 0.970
0.03 * 1036## [1] 31.08
inference(serbia12$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.0299 ; n = 1036
## Check conditions: number of successes = 31 ; number of failures = 1005
## Standard error = 0.0053
## 95 % Confidence interval = ( 0.0195 , 0.0403 )
(0.0403 - 0.0195)/2## [1] 0.0104
Serbia margin of error is 0.0104.
poland12 <- subset(atheism, nationality == "Poland" & year == "2012")
poland12 %>%
group_by(response) %>%
summarise(proportion = length(response)/1036)## # A tibble: 2 x 2
## response proportion
## <fct> <dbl>
## 1 atheist 0.0251
## 2 non-atheist 0.482
0.025 * 525## [1] 13.125
inference(poland12$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.0495 ; n = 525
## Check conditions: number of successes = 26 ; number of failures = 499
## Standard error = 0.0095
## 95 % Confidence interval = ( 0.031 , 0.0681 )
(0.0681 - 0.031)/2## [1] 0.01855
Poland margin of error is 0.01855.
p and me.n <- 1000
p <- seq(0, 1, 0.01)
me <- 2 * sqrt(p * (1 - p)/n)
plot(me ~ p, ylab = "Margin of Error", xlab = "Population Proportion")As p approaches 0.5, the margin of error is maximized. As p approaches 0 or 1, the margin of error is minimized.
mean to calculate summary statistics.p <- 0.1
n <- 1040
p_hats <- rep(0, 5000)
for(i in 1:5000){
samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
p_hats[i] <- sum(samp == "atheist")/n
}
hist(p_hats, main = "p = 0.1, n = 1040", xlim = c(0, 0.18))mean(p_hats)## [1] 0.09969
The sampling distribution appears to be centered at 0.1 and is normally distributed. The distribution is also relatively narrow, meaning that the spread is quite small.
par(mfrow = c(2, 2)) command before creating the histograms. You may need to expand the plot window to accommodate the larger two-by-two plot. Describe the three new sampling distributions. Based on these limited plots, how does \(n\) appear to affect the distribution of \(\hat{p}\)? How does \(p\) affect the sampling distribution?Ex10Sim = function(p_input,n_input,title){
p <- p_input
n <- n_input
p_hats <- rep(0, 5000)
for(i in 1:5000){
samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
p_hats[i] <- sum(samp == "atheist")/n
}
hist(p_hats, main = title, xlim = c(0, 0.18))
}
par(mfrow=c(2,2))
Ex10Sim(0.1,1040,"n1040 p0.1")
Ex10Sim(0.1,400,"n400 p0.1")
Ex10Sim(0.02,1040,"n1040 p0.02")
Ex10Sim(0.02,400,"n400 p0.02")par(mfrow=c(1,1))Increasing n decreases the spread.
Moving p closer to zero changes the center and shape of the distribution to be more skewed.
It is sensible to report the margin of error for Australia because its sample size is larger and has a sample proportion that is farther away from zero compared to Ecuador. It is not sensible to report the margin of error for Ecuador because its sample size is smaller and its sample proportion is very close to zero, which makes the margin of error less reliable.
inference function. As always, write out the hypotheses for any tests you conduct and outline the status of the conditions for inference.Observations must be independent - There may be some degree of sampling bias since the sample was not a simple random sample, however, the data is random enough to have independent observations. Also the sample is less than 10% of the population.
Success failure criteria - There are at least 10 expected people in the success group and 10 people in the failure group.
H0: There is no change in the atheism index of Spain between 2005 and 2012.
HA: There is a change in the atheism index of Spain between 2005 and 2012.
Spain12 <- subset(atheism, nationality == "Spain" & year == "2012")
inference(Spain12$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.09 ; n = 1145
## Check conditions: number of successes = 103 ; number of failures = 1042
## Standard error = 0.0085
## 95 % Confidence interval = ( 0.0734 , 0.1065 )
Spain5 <- subset(atheism, nationality == "Spain" & year == "2005")
inference(Spain5$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.1003 ; n = 1146
## Check conditions: number of successes = 115 ; number of failures = 1031
## Standard error = 0.0089
## 95 % Confidence interval = ( 0.083 , 0.1177 )
The confidence intervals are overlapping, so there is not convincing evidence that Spain has seen a change in its atheism index between 2005 and 2012.
Observations must be independent - There may be some degree of sampling bias since the sample was not a simple random sample, however, the data is random enough to have independent observations. Also the sample is less than 10% of the population.
Success failure criteria - There are at least 10 expected people in the success group and 10 people in the failure group.
H0: There is no change in the atheism index of the United States between 2005 and 2012.
HA: There is a change in the atheism index of the United States between 2005 and 2012.
inference(us12$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.0499 ; n = 1002
## Check conditions: number of successes = 50 ; number of failures = 952
## Standard error = 0.0069
## 95 % Confidence interval = ( 0.0364 , 0.0634 )
us5 <- subset(atheism, nationality == "United States" & year == "2005")
inference(us5$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.01 ; n = 1002
## Check conditions: number of successes = 10 ; number of failures = 992
## Standard error = 0.0031
## 95 % Confidence interval = ( 0.0038 , 0.0161 )
Since the confidence intervals are not overlapping, there is enough evidence to conclude that there has been a change in the United States’ atheism index between 2005 and 2012.
We would expect to detect a change in 5% of the countries by chance.
Since the margin of error is maximized at p = 0.5, calculating the sample size based off of p = 0.5 ensures that the margin of error will not be greater than 1% not matter what p actually is.
margin = 0.25/((0.01/1.96)^2)
margin## [1] 9604
The sample should be 9604 to ensure a margin of error of 1% at 95% confidence.