load("more/atheism.RData")
head(atheism)## nationality response year
## 1 Afghanistan non-atheist 2012
## 2 Afghanistan non-atheist 2012
## 3 Afghanistan non-atheist 2012
## 4 Afghanistan non-atheist 2012
## 5 Afghanistan non-atheist 2012
## 6 Afghanistan non-atheist 2012
atheism correspond to?library(plyr)
us12 <- subset(atheism, nationality == "United States" & year == "2012")
count(us12)## nationality response year freq
## 1 United States atheist 2012 50
## 2 United States non-atheist 2012 952
our_pct <- count(us12)[1,4]/1002
paste(round(our_pct*100,2),"%")## [1] "4.99 %"
The sample observations are independent - YES
The success failure conditions are met-YES
If the conditions for inference are reasonable, we can either calculate the standard error and construct the interval by hand, or allow the inference function to do it for us.
inference(us12$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")## Warning: package 'BHH2' was built under R version 3.4.4
## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.0499 ; n = 1002
## Check conditions: number of successes = 50 ; number of failures = 952
## Standard error = 0.0069
## 95 % Confidence interval = ( 0.0364 , 0.0634 )
Z <- 1.96
Stand_err <- 0.0069
Mar_Err <- Z * Stand_err
Mar_Err ## [1] 0.013524
inference function, calculate confidence intervals for the proportion of atheists in 2012 in two other countries of your choice, and report the associated margins of error. Be sure to note whether the conditions for inference are met. It may be helpful to create new data sets for each of the two countries first, and then use these data sets in the inference function to construct the confidence intervals.pak12 <- subset(atheism, nationality == "Pakistan" & year == "2012")
count(pak12)## nationality response year freq
## 1 Pakistan atheist 2012 54
## 2 Pakistan non-atheist 2012 2650
inference(pak12$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.02 ; n = 2704
## Check conditions: number of successes = 54 ; number of failures = 2650
## Standard error = 0.0027
## 95 % Confidence interval = ( 0.0147 , 0.0252 )
Spain12 <- subset(atheism, nationality == "Spain" & year == "2012")
count(Spain12)## nationality response year freq
## 1 Spain atheist 2012 103
## 2 Spain non-atheist 2012 1042
inference(Spain12$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.09 ; n = 1145
## Check conditions: number of successes = 103 ; number of failures = 1042
## Standard error = 0.0085
## 95 % Confidence interval = ( 0.0734 , 0.1065 )
pak_se <- .0027
z <-1.96
spain_se <- .0085
pak_me <-spain_se*z
spain_me <- pak_se*zpaste("Margin of error in pakistan is ",pak_me," margin of error in Spain is ",spain_me)## [1] "Margin of error in pakistan is 0.01666 margin of error in Spain is 0.005292"
p and me.The margin of error reaches the maximum when the population proportion is .5 and is parabolic before and after.
p <- 0.1
n <- 1040
p_hats <- rep(0, 5000)
for(i in 1:5000){
samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
p_hats[i] <- sum(samp == "atheist")/n
}
hist(p_hats, main = "p = 0.1, n = 1040", xlim = c(0, 0.18))mean to calculate summary statistics.library(psych)
summary(p_hats)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.07019 0.09327 0.09904 0.09969 0.10577 0.12981
describe(p_hats)## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 5000 0.1 0.01 0.1 0.1 0.01 0.07 0.13 0.06 0.06 -0.09
## se
## X1 0
par(mfrow = c(2, 2)) command before creating the histograms. You may need to expand the plot window to accommodate the larger two-by-two plot. Describe the three new sampling distributions. Based on these limited plots, how does \(n\) appear to affect the distribution of \(\hat{p}\)? How does \(p\) affect the sampling distribution?p <- 0.1
n <- 400
phats1 <- rep(0, 5000)
for(i in 1:5000){
samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
phats1[i] <- sum(samp == "atheist")/n
}
p <- 0.02
n <- 1040
phats2 <- rep(0, 5000)
for(i in 1:5000){
samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
phats2[i] <- sum(samp == "atheist")/n
}
p <- 0.02
n <- 400
phats3 <- rep(0, 5000)
for(i in 1:5000){
samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
phats3[i] <- sum(samp == "atheist")/n
}
par(mfrow = c(2, 2))
hist(p_hats, main = "p = 0.1, n = 1040", xlim = c(0, 0.18))
hist(phats1, main = "p = 0.1, n = 400", xlim = c(0, 0.18))
hist(phats2, main = "p = 0.02, n = 1040", xlim = c(0, 0.18))
hist(phats3, main = "p = 0.02, n = 400", xlim = c(0, 0.18))Once you’re done, you can reset the layout of the plotting window by using the command par(mfrow = c(1, 1)) command or clicking on “Clear All” above the plotting window (if using RStudio). Note that the latter will get rid of all your previous plots.
par(mfrow = c(1, 1))n_aus <- 1040
p_aus <- .1
aus_atheist <- 1040*.1
aus_not_atheist <- 1040*.9
ecuador_n <- 400
ecuador_atheist <- 400*.02
ecuador_not_atheist <- 400 * .98
paste("any samples under 10 arent large enough for inference. Austraillia's samples are Atheist=",aus_atheist,"not atheist =",aus_not_atheist,"ecuador's numbers are atheist=",ecuador_atheist," not atheist",ecuador_not_atheist)## [1] "any samples under 10 arent large enough for inference. Austraillia's samples are Atheist= 104 not atheist = 936 ecuador's numbers are atheist= 8 not atheist 392"
The question of atheism was asked by WIN-Gallup International in a similar survey that was conducted in 2005. (We assume here that sample sizes have remained the same.) Table 4 on page 13 of the report summarizes survey results from 2005 and 2012 for 39 countries.
Answer the following two questions using the inference function. As always, write out the hypotheses for any tests you conduct and outline the status of the conditions for inference.
a. Is there convincing evidence that Spain has seen a change in its atheism index between 2005 and 2012?
\(H_O\) There is no change in atheism index from 2005-2012
\(H_A\) There is a change in the atheism index from 2005-2012
The confidence intervals in Spain 05 and 12 overlap, which indicates that we fail to reject the null and can’t state that there is a change in the atheism index between those years
library(kableExtra)
library(knitr)
Spain05 <- subset(atheism, nationality == "Spain" & year == "2005")
kable(count(Spain05$response == 'atheist'))| x | freq |
|---|---|
| FALSE | 1031 |
| TRUE | 115 |
Spain_inf_05 <- inference(Spain05$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.1003 ; n = 1146
## Check conditions: number of successes = 115 ; number of failures = 1031
## Standard error = 0.0089
## 95 % Confidence interval = ( 0.083 , 0.1177 )
Spain12 <- subset(atheism, nationality == "Spain" & year == "2012")
count(Spain12$response == 'atheist')## x freq
## 1 FALSE 1042
## 2 TRUE 103
Spain_inf_12 <- inference(Spain12$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.09 ; n = 1145
## Check conditions: number of successes = 103 ; number of failures = 1042
## Standard error = 0.0085
## 95 % Confidence interval = ( 0.0734 , 0.1065 )
b. Is there convincing evidence that the United States has seen a change in its atheism index between 2005 and 2012?
$H_O$ There is no change in atheism index from 2005-2012
$H_A$ There is a change in the atheism index from 2005-2012
United_States05 <- subset(atheism, nationality == "United States" & year == "2005")
kable(count(United_States05$response == 'atheist'))| x | freq |
|---|---|
| FALSE | 992 |
| TRUE | 10 |
United_States_inf_05 <- inference(United_States05$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.01 ; n = 1002
## Check conditions: number of successes = 10 ; number of failures = 992
## Standard error = 0.0031
## 95 % Confidence interval = ( 0.0038 , 0.0161 )
United_States12 <- subset(atheism, nationality == "United States" & year == "2012")
kable(count(United_States12$response == 'atheist'))| x | freq |
|---|---|
| FALSE | 952 |
| TRUE | 50 |
United_States_inf_12 <- inference(United_States12$response, est = "proportion", type = "ci", method = "theoretical",
success = "atheist")## Single proportion -- success: atheist
## Summary statistics:
## p_hat = 0.0499 ; n = 1002
## Check conditions: number of successes = 50 ; number of failures = 952
## Standard error = 0.0069
## 95 % Confidence interval = ( 0.0364 , 0.0634 )
If in fact there has been no change in the atheism index in the countries listed in Table 4, in how many of those countries would you expect to detect a change (at a significance level of 0.05) simply by chance?
Hint: Look in the textbook index under Type 1 error.
39*.05## [1] 1.95
SE = \(\sqrt {\frac {P_0(1-P_0)}{N}}\)= \(\sqrt {\frac {.5(1-.5)}{N}}\)
ME= Z* SE
\(ME^2= Z^2 {\frac {.5(1-.5)}{N}}\)
\(N= (Z^2 {.5(1-.5)})/ ME^2\)
n= 9604
z=1.96
ME=.01
n= z^2*(.5*.5)/ME^2*Hint:* Refer to your plot of the relationship between $p$ and margin of
error. Do not use the data set to answer this question.
This is a product of OpenIntro that is released under a Creative Commons Attribution-ShareAlike 3.0 Unported. This lab was written for OpenIntro by Andrew Bray and Mine Çetinkaya-Rundel.