Lab 8

Name: James Tian

Section: 3

Date: 10/22/2013

Exercises

Load data & inference function:

options(digits = 5)
source("http://stat.duke.edu/~kkl13/courses/sta102F13/labs/inference.R")
download.file("http://stat.duke.edu/~kkl13/courses/sta102F13/labs/atheism.RData", 
    destfile = "atheism.RData")
load("atheism.RData")

Exercise 1:

51927 people were interviewed WIN-Gallup International interviewed people from 57 countries. In each country a national probability sample of around 1000 men and women was interviewed either face to face (35 countries; n=33,890), via telephone (11 countries; n=7,661) or online (11 countries; n=10,376).

Exercise 2:

The sample was stratified and ensured that people from many countries were represented. The results from the random sample within each country was weighted to its population relative to the world, giving a seemingly accurate representation of the world. However, we can make the argument that some countries are not represented, and that this is not a truly random sample because the people in the non-represented countries have no way of being included in the study.

Exercise 3:

Sample statistics because they came from the study's sample.

Exercise 4:

Each row of table 6 gives the values of sample size and response percentage for each country. Each row in the atheism data set is a separate data point for each person surveyed.

Exercise 5:

us12 = subset(atheism, atheism$nationality == "United States" & atheism$year == 
    "2012")
sum(us12$response == "atheist")/length(us12$response)
## [1] 0.0499

About 5% of the United States is atheist. This agrees with table 6.

Exercise 6:

We assume that the sample was random and responses were independent. We assume that the number of successes and failures (atheists and non-atheists) are greater than 15. The sample size requirement is more than met. However, in our stratified sample, the responses per country are not independent and, by some interpretations, not independent.

Exercise 7:

inference(data = us12$response, est = "proportion", type = "ci", method = "theoretical", 
    success = "atheist")
## Single proportion -- success: atheist 
## Summary statistics:

plot of chunk unnamed-chunk-3

## p_hat = 0.0499 ;  n =  1002 
## Check conditions: number of successes = 50 ; number of failures = 952 
## Standard error = 0.0069 
## 95 % Confidence interval = ( 0.0364 , 0.0634 )

The margin of error is 0.0135

Exercise 8:

china12 = subset(atheism, atheism$nationality == "China" & atheism$year == "2012")
sum(china12$response == "atheist")/length(china12$response)
## [1] 0.47
inference(data = china12$response, est = "proportion", type = "ci", method = "theoretical", 
    success = "atheist")
## Single proportion -- success: atheist 
## Summary statistics:

plot of chunk unnamed-chunk-4

## p_hat = 0.47 ;  n =  500 
## Check conditions: number of successes = 235 ; number of failures = 265 
## Standard error = 0.0223 
## 95 % Confidence interval = ( 0.4263 , 0.5137 )

The margin of error is 0.0437

Exercise 9:

n = 100
p = seq(0, 1, 0.01)
me = 2 * sqrt(p * (1 - p)/n)
plot(me ~ p)

plot of chunk unnamed-chunk-5

p is the proportion and me is the margin of error with a z value of 2. As p is closer to 0.5, the margin of error is larger.

Exercise 10:

p = 0.1
n = 1040
p_hats = rep(0, 5000)
for (i in 1:5000) {
    samp = sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 
        1 - p))
    p_hats[i] = sum(samp == "atheist")/n
}
hist(p_hats, main = "p = 0.1, n = 1040")

plot of chunk unnamed-chunk-6

The sampling distribution is approximately normal.

Exercise 11:

p = 0.1
n = 1040
n * p
## [1] 104
n * (1 - p)
## [1] 936
p = 0.02
n = 400
n * p
## [1] 8
n * (1 - p)
## [1] 392

np and n(1-p) are both well over 10 when n=1040 and p = 0.1. However, when n=400 and p=.02, n(1-p)>10 but np is only 8. As a result, the sampling distribution appears asymmetric and slightly right skewed.

Exercise 12:

p = 0.02
n = 400
p_hats = rep(0, 5000)
for (i in 1:5000) {
    samp = sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 
        1 - p))
    p_hats[i] = sum(samp == "atheist")/n
}
hist(p_hats, main = "p = 0.02, n = 400")

plot of chunk unnamed-chunk-8

Exercise 13:

No, the report should not use theoretical inference and report on margin of error to predict the true population proportion because the conditions are not met. Since the sampling distribution is not approximately normal, we cannot use a confidence interval accurately to predict the true population proportions. They could potentially estimate the true population proportion by setting it equal to the sample mean; however, they cannot make inferences or report probabilities or margins of error.

Exercise 14:

spain12 = subset(atheism, atheism$nationality == "Spain" & atheism$year == "2005")
sum(spain12$response == "atheist")/length(spain12$response)
## [1] 0.10035
inference(data = spain12$response, est = "proportion", type = "ci", method = "theoretical", 
    success = "atheist")
## Single proportion -- success: atheist 
## Summary statistics:

plot of chunk unnamed-chunk-9

## p_hat = 0.1003 ;  n =  1146 
## Check conditions: number of successes = 115 ; number of failures = 1031 
## Standard error = 0.0089 
## 95 % Confidence interval = ( 0.083 , 0.1177 )
inference(data = spain12$response, est = "proportion", type = "ht", method = "theoretical", 
    success = "atheist", null = 0.08, alternative = "twosided")
## Single proportion -- success: atheist 
## Summary statistics:
## p_hat = 0.1003 ;  n =  1146 
## H0: p = 0.08 
## HA: p != 0.08 
## Check conditions: number of expected successes = 92 ; number of expected failures = 1054 
## Standard error = 0.008 
## Test statistic: Z =  2.539 
## p-value =  0.0112

plot of chunk unnamed-chunk-9

Our 95% confidence interval does not capture the null hypothesis population mean and our significance test vs a p-value of 0.0112. Both lead us to reject the null hypothesis that the true population mean of atheists in Spain is 0.08.

Exercise 15:

1/(0.03^2/1.96^2/0.25)
## [1] 1067.1

We would need a sample size of 1068, or 1064 if we are using the adjusted confidence interval estimate.