Lab 6

Excercises

Excercise 1

Since these findings are from a poll, that did not include the entire world population, they are sample statistics.

Excercise 2

To generalize the results, the samples must be random and represntative of the world’s population. Because it was a poll done by an international organziation and used a random method from each country, the poll meets those requirements, at least as well as it could in a practical sense. It seems to favor industrialized countries, but those countries are where the world’s population centers are.

Excercise 3

Table 6 corresponds to sample statistics summarized by country. The atheism dataset’s observations correspond to individual responses.

Excercise 4

us12 <- atheism[atheism$nationality == "United States" & atheism$year == "2012",]
sum(us12$response == 'atheist')/nrow(us12)

## [1] 0.0499002

The table agrees with the raw data.

Excercise 5

We already know the independence requirement is satisfied. There are 50 “atheist” responses, meaning both the success and failure categories have at least 10 responses. The requirements appear to be met.

Excercise 6

inference(us12$response, est = "proportion", type = "ci", method = "theoretical", 
          success = "atheist")

## Warning: package 'BHH2' was built under R version 3.4.2

## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.0499 ;  n = 1002 
## Check conditions: number of successes = 50 ; number of failures = 952 
## Standard error = 0.0069 
## 95 % Confidence interval = ( 0.0364 , 0.0634 )

The margin of error is (.0634 - .0364)/2 = 0.0135

Excercise 7

can12 <- atheism[atheism$nationality == "Canada" & atheism$year == "2012",]
sum(can12$response == 'atheist')/nrow(can12)

## [1] 0.08982036

cam12 <- atheism[atheism$nationality == "Cameroon" & atheism$year == "2012",]
sum(cam12$response == 'atheist')/nrow(cam12)

## [1] 0.0297619

inference(can12$response, est = "proportion", type = "ci", method = "theoretical", 
          success = "atheist")

## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.0898 ;  n = 1002 
## Check conditions: number of successes = 90 ; number of failures = 912 
## Standard error = 0.009 
## 95 % Confidence interval = ( 0.0721 , 0.1075 )

inference(cam12$response, est = "proportion", type = "ci", method = "theoretical", 
          success = "atheist")

## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.0298 ;  n = 504 
## Check conditions: number of successes = 15 ; number of failures = 489 
## Standard error = 0.0076 
## 95 % Confidence interval = ( 0.0149 , 0.0446 )

We have at least 15 successes and failures for each set, so we can use the results.

Canada margin: 0.03885 Cameroon margin: 0.01485

Excercise 8

There is a parabolic relationship between p and me, with a maximum at .5

Excercise 9

p <- 0.1
n <- 1040
p_hats <- rep(0, 5000)

for(i in 1:5000){
  samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
  p_hats[i] <- sum(samp == "atheist")/n
}

mean(p_hats)

## [1] 0.1000144

It has a median of .09904 and a mean of .09969. The shape is farily symetric, the mean higher than the median can indicate a very minor right skew, but it is not apparent from the historgram.

Excercise 10

samp_func <- function (p, n) { 
  p_hats <- rep(0, 5000)
  for(i in 1:5000){
    samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
    p_hats[i] <- sum(samp == "atheist")/n
  }
  p_hats
}
p_hats1 <- samp_func(.1, 400)
p_hats2 <- samp_func(.02, 1040)
p_hats3 <- samp_func(.02, 400)


par(mfrow = c(2, 2))

hist(p_hats1, main = "p = 0.1, n = 400", xlim = c(0, 0.18))
hist(p_hats2, main = "p = 0.02, n = 1040", xlim = c(0, .06))
hist(p_hats3, main = "p = 0.02, n = 400", xlim = c(0, .06))
     
par(mfrow = c(1, 1))

mean(p_hats1)

## [1] 0.100102

mean(p_hats2)

## [1] 0.02009788

mean(p_hats3)

## [1] 0.0200705

Excercise 11

Inference around the n = 400 p = .02 distribution is questionable. There is noticable skew in that sampling distribution. The other ones look acceptable.

On you own

Question 1

For both:

Ho there is no difference between the proportions Ha There is a difference between the proportions

spa05 <- atheism[atheism$nationality == "Spain" & atheism$year == "2005",]
sum(spa05$response == 'atheist')

## [1] 115

spa12 <- atheism[atheism$nationality == "Spain" & atheism$year == "2012",]
sum(spa12$response == 'atheist')

## [1] 103

spa <- atheism[atheism$nationality == "Spain" & atheism$year %in% c("2012", "2005"),]

inference(spa05$response, est = "proportion", type = "ci", method = "theoretical", 
          success = "atheist")

## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.1003 ;  n = 1146 
## Check conditions: number of successes = 115 ; number of failures = 1031 
## Standard error = 0.0089 
## 95 % Confidence interval = ( 0.083 , 0.1177 )

inference(spa12$response, est = "proportion", type = "ci", method = "theoretical", 
          success = "atheist")

## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.09 ;  n = 1145 
## Check conditions: number of successes = 103 ; number of failures = 1042 
## Standard error = 0.0085 
## 95 % Confidence interval = ( 0.0734 , 0.1065 )

inference(x = spa$year, y = spa$response, est = "proportion", type = "ht", method = "theoretical", alternative = "twosided", null = 0,
          success = "atheist")

## Warning: Explanatory variable was numerical, it has been converted to
## categorical. In order to avoid this warning, first convert your explanatory
## variable to a categorical variable using the as.factor() function.

## Response variable: categorical, Explanatory variable: categorical
## Two categorical variables
## Difference between two proportions -- success: atheist
## Summary statistics:
##              x
## y             2005 2012  Sum
##   atheist      115  103  218
##   non-atheist 1031 1042 2073
##   Sum         1146 1145 2291

## Observed difference between proportions (2005-2012) = 0.0104
## 
## H0: p_2005 - p_2012 = 0 
## HA: p_2005 - p_2012 != 0 
## Pooled proportion = 0.0952 
## Check conditions:
##    2005 : number of expected successes = 109 ; number of expected failures = 1037 
##    2012 : number of expected successes = 109 ; number of expected failures = 1036 
## Standard error = 0.012 
## Test statistic: Z =  0.848 
## p-value =  0.3966

The confidence intervals overlap, indicating that there is not a clear difference, and this is confirmed with the hypothesis test.

us05 <- atheism[atheism$nationality == "United States" & atheism$year == "2005",]
sum(spa05$response == 'atheist')

## [1] 115

us12 <- atheism[atheism$nationality == "United States" & atheism$year == "2012",]
sum(spa12$response == 'atheist')

## [1] 103

us <- atheism[atheism$nationality == "United States" & atheism$year %in% c("2012", "2005"),]

inference(us05$response, est = "proportion", type = "ci", method = "theoretical", 
          success = "atheist")

## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.01 ;  n = 1002 
## Check conditions: number of successes = 10 ; number of failures = 992 
## Standard error = 0.0031 
## 95 % Confidence interval = ( 0.0038 , 0.0161 )

inference(us12$response, est = "proportion", type = "ci", method = "theoretical", 
          success = "atheist")

## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.0499 ;  n = 1002 
## Check conditions: number of successes = 50 ; number of failures = 952 
## Standard error = 0.0069 
## 95 % Confidence interval = ( 0.0364 , 0.0634 )

inference(x = us$year, y = us$response, est = "proportion", type = "ht", method = "theoretical", alternative = "twosided", null = 0, success = "atheist")

## Warning: Explanatory variable was numerical, it has been converted to
## categorical. In order to avoid this warning, first convert your explanatory
## variable to a categorical variable using the as.factor() function.

## Response variable: categorical, Explanatory variable: categorical
## Two categorical variables
## Difference between two proportions -- success: atheist
## Summary statistics:
##              x
## y             2005 2012  Sum
##   atheist       10   50   60
##   non-atheist  992  952 1944
##   Sum         1002 1002 2004

## Observed difference between proportions (2005-2012) = -0.0399
## 
## H0: p_2005 - p_2012 = 0 
## HA: p_2005 - p_2012 != 0 
## Pooled proportion = 0.0299 
## Check conditions:
##    2005 : number of expected successes = 30 ; number of expected failures = 972 
##    2012 : number of expected successes = 30 ; number of expected failures = 972 
## Standard error = 0.008 
## Test statistic: Z =  -5.243 
## p-value =  0

For the US, the confidence intervals do not overlap, indicating that we can reject the null hypothesis.

Question 2

We would expect to see about 2 type 1 errors out of the 40 countries at a .05 level of significance.

Question 3

Assume a p of .5:

(1.96/.01)^2*.5^2

## [1] 9604