1. In the first paragraph, several key findings are reported. Do these percentages appear to be sample statistics (derived from the data sample) or population parameters?
  1. The title of the report is “Global Index of Religiosity and Atheism”. To generalize the report’s findings to the global human population, what must we assume about the sampling method? Does that seem like a reasonable assumption?
load("more/atheism.RData")
head(atheism)
##   nationality    response year
## 1 Afghanistan non-atheist 2012
## 2 Afghanistan non-atheist 2012
## 3 Afghanistan non-atheist 2012
## 4 Afghanistan non-atheist 2012
## 5 Afghanistan non-atheist 2012
## 6 Afghanistan non-atheist 2012
  1. What does each row of Table 6 correspond to? What does each row of atheism correspond to?
  1. Does it agree with the percentage in Table 6? If not, why?
library(plyr)
us12 <- subset(atheism, nationality == "United States" & year == "2012")
count(us12)
##     nationality    response year freq
## 1 United States     atheist 2012   50
## 2 United States non-atheist 2012  952
our_pct <- count(us12)[1,4]/1002
paste(round(our_pct*100,2),"%")
## [1] "4.99 %"

Inference on proportions

  1. Write out the conditions for inference to construct a 95% confidence interval for the proportion of atheists in the United States in 2012. Are you confident all conditions are met?

If the conditions for inference are reasonable, we can either calculate the standard error and construct the interval by hand, or allow the inference function to do it for us.

inference(us12$response, est = "proportion", type = "ci", method = "theoretical", 
          success = "atheist")
## Warning: package 'BHH2' was built under R version 3.4.4
## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.0499 ;  n = 1002 
## Check conditions: number of successes = 50 ; number of failures = 952 
## Standard error = 0.0069 
## 95 % Confidence interval = ( 0.0364 , 0.0634 )
  1. Based on the R output, what is the margin of error for the estimate of the proportion of the proportion of atheists in US in 2012?
Z <- 1.96
Stand_err <- 0.0069
Mar_Err <- Z * Stand_err
Mar_Err 
## [1] 0.013524
  1. Using the inference function, calculate confidence intervals for the proportion of atheists in 2012 in two other countries of your choice, and report the associated margins of error. Be sure to note whether the conditions for inference are met. It may be helpful to create new data sets for each of the two countries first, and then use these data sets in the inference function to construct the confidence intervals.
pak12 <- subset(atheism, nationality == "Pakistan" & year == "2012")
count(pak12)
##   nationality    response year freq
## 1    Pakistan     atheist 2012   54
## 2    Pakistan non-atheist 2012 2650
inference(pak12$response, est = "proportion", type = "ci", method = "theoretical", 
          success = "atheist")
## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.02 ;  n = 2704 
## Check conditions: number of successes = 54 ; number of failures = 2650 
## Standard error = 0.0027 
## 95 % Confidence interval = ( 0.0147 , 0.0252 )
Spain12 <- subset(atheism, nationality == "Spain" & year == "2012")
count(Spain12)
##   nationality    response year freq
## 1       Spain     atheist 2012  103
## 2       Spain non-atheist 2012 1042
inference(Spain12$response, est = "proportion", type = "ci", method = "theoretical", 
          success = "atheist")
## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.09 ;  n = 1145 
## Check conditions: number of successes = 103 ; number of failures = 1042 
## Standard error = 0.0085 
## 95 % Confidence interval = ( 0.0734 , 0.1065 )
pak_se <- .0027
z <-1.96
spain_se <- .0085

pak_me <-spain_se*z
spain_me <- pak_se*z
paste("Margin of error in pakistan is ",pak_me," margin of error in Spain is ",spain_me)
## [1] "Margin of error in pakistan is  0.01666  margin of error in Spain is  0.005292"
  1. Describe the relationship between p and me.

The margin of error reaches the maximum when the population proportion is .5 and is parabolic before and after.

p <- 0.1
n <- 1040
p_hats <- rep(0, 5000)

for(i in 1:5000){
  samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
  p_hats[i] <- sum(samp == "atheist")/n
}

hist(p_hats, main = "p = 0.1, n = 1040", xlim = c(0, 0.18))

  1. Describe the sampling distribution of sample proportions at \(n = 1040\) and \(p = 0.1\). Be sure to note the center, spread, and shape.
    Hint: Remember that R has functions such as mean to calculate summary statistics.
library(psych)

summary(p_hats)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.07019 0.09327 0.09904 0.09969 0.10577 0.12981
describe(p_hats)
##    vars    n mean   sd median trimmed  mad  min  max range skew kurtosis
## X1    1 5000  0.1 0.01    0.1     0.1 0.01 0.07 0.13  0.06 0.06    -0.09
##    se
## X1  0
  1. Repeat the above simulation three more times but with modified sample sizes and proportions: for \(n = 400\) and \(p = 0.1\), \(n = 1040\) and \(p = 0.02\), and \(n = 400\) and \(p = 0.02\). Plot all four histograms together by running the par(mfrow = c(2, 2)) command before creating the histograms. You may need to expand the plot window to accommodate the larger two-by-two plot. Describe the three new sampling distributions. Based on these limited plots, how does \(n\) appear to affect the distribution of \(\hat{p}\)? How does \(p\) affect the sampling distribution?
p <- 0.1
n <- 400
phats1 <- rep(0, 5000)

for(i in 1:5000){
  samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
  phats1[i] <- sum(samp == "atheist")/n
}

p <- 0.02
n <- 1040
phats2 <- rep(0, 5000)

for(i in 1:5000){
  samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
  phats2[i] <- sum(samp == "atheist")/n
}

p <- 0.02
n <- 400
phats3 <- rep(0, 5000)

for(i in 1:5000){
  samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
  phats3[i] <- sum(samp == "atheist")/n
}

par(mfrow = c(2, 2))

hist(p_hats, main = "p = 0.1, n = 1040", xlim = c(0, 0.18))
hist(phats1, main = "p = 0.1, n = 400", xlim = c(0, 0.18))
hist(phats2, main = "p = 0.02, n = 1040", xlim = c(0, 0.18))
hist(phats3, main = "p = 0.02, n = 400", xlim = c(0, 0.18))

Once you’re done, you can reset the layout of the plotting window by using the command par(mfrow = c(1, 1)) command or clicking on “Clear All” above the plotting window (if using RStudio). Note that the latter will get rid of all your previous plots.

par(mfrow = c(1, 1))
  1. If you refer to Table 6, you’ll find that Australia has a sample proportion of 0.1 on a sample size of 1040, and that Ecuador has a sample proportion of 0.02 on 400 subjects. Let’s suppose for this exercise that these point estimates are actually the truth. Then given the shape of their respective sampling distributions, do you think it is sensible to proceed with inference and report margin of errors, as the reports does?
n_aus <- 1040
p_aus <- .1
aus_atheist <- 1040*.1
aus_not_atheist <- 1040*.9
ecuador_n <- 400
ecuador_atheist <- 400*.02
ecuador_not_atheist <- 400 * .98

paste("any samples under 10 arent large enough for inference. Austraillia's samples are Atheist=",aus_atheist,"not atheist =",aus_not_atheist,"ecuador's numbers are atheist=",ecuador_atheist," not atheist",ecuador_not_atheist)
## [1] "any samples under 10 arent large enough for inference. Austraillia's samples are Atheist= 104 not atheist = 936 ecuador's numbers are atheist= 8  not atheist 392"

On your own

The question of atheism was asked by WIN-Gallup International in a similar survey that was conducted in 2005. (We assume here that sample sizes have remained the same.) Table 4 on page 13 of the report summarizes survey results from 2005 and 2012 for 39 countries.

library(kableExtra)
library(knitr)
Spain05 <- subset(atheism, nationality == "Spain" & year == "2005")
kable(count(Spain05$response == 'atheist'))
x freq
FALSE 1031
TRUE 115
Spain_inf_05 <- inference(Spain05$response, est = "proportion", type = "ci", method = "theoretical", 
          success = "atheist")
## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.1003 ;  n = 1146 
## Check conditions: number of successes = 115 ; number of failures = 1031 
## Standard error = 0.0089 
## 95 % Confidence interval = ( 0.083 , 0.1177 )
Spain12 <- subset(atheism, nationality == "Spain" & year == "2012")
count(Spain12$response == 'atheist')
##       x freq
## 1 FALSE 1042
## 2  TRUE  103
Spain_inf_12 <- inference(Spain12$response, est = "proportion", type = "ci", method = "theoretical", 
          success = "atheist")
## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.09 ;  n = 1145 
## Check conditions: number of successes = 103 ; number of failures = 1042 
## Standard error = 0.0085 
## 95 % Confidence interval = ( 0.0734 , 0.1065 )

b. Is there convincing evidence that the United States has seen a change in its atheism index between 2005 and 2012?

$H_O$ There is no change in atheism index from 2005-2012
$H_A$ There is a change in the atheism index from 2005-2012
United_States05 <- subset(atheism, nationality == "United States" & year == "2005")
kable(count(United_States05$response == 'atheist'))
x freq
FALSE 992
TRUE 10
United_States_inf_05 <- inference(United_States05$response, est = "proportion", type = "ci", method = "theoretical", 
          success = "atheist")
## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.01 ;  n = 1002 
## Check conditions: number of successes = 10 ; number of failures = 992 
## Standard error = 0.0031 
## 95 % Confidence interval = ( 0.0038 , 0.0161 )
United_States12 <- subset(atheism, nationality == "United States" & year == "2012")
kable(count(United_States12$response == 'atheist'))
x freq
FALSE 952
TRUE 50
United_States_inf_12 <- inference(United_States12$response, est = "proportion", type = "ci", method = "theoretical", 
          success = "atheist")
## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.0499 ;  n = 1002 
## Check conditions: number of successes = 50 ; number of failures = 952 
## Standard error = 0.0069 
## 95 % Confidence interval = ( 0.0364 , 0.0634 )

If in fact there has been no change in the atheism index in the countries listed in Table 4, in how many of those countries would you expect to detect a change (at a significance level of 0.05) simply by chance?
Hint: Look in the textbook index under Type 1 error.

39*.05
## [1] 1.95

SE = \(\sqrt {\frac {P_0(1-P_0)}{N}}\)= \(\sqrt {\frac {.5(1-.5)}{N}}\)

ME= Z* SE

\(ME^2= Z^2 {\frac {.5(1-.5)}{N}}\)

\(N= (Z^2 {.5(1-.5)})/ ME^2\)

n= 9604

z=1.96
ME=.01
n= z^2*(.5*.5)/ME^2
*Hint:* Refer to your plot of the relationship between $p$ and margin of 
error. Do not use the data set to answer this question.

This is a product of OpenIntro that is released under a Creative Commons Attribution-ShareAlike 3.0 Unported. This lab was written for OpenIntro by Andrew Bray and Mine Çetinkaya-Rundel.