Lab week 9 - Inference for categorical data

Exercise 1

These percentages appear to be sample statistics.

Exercise 2

We may assume the observations are independent and random. It is reasonable.

download.file("http://www.openintro.org/stat/data/atheism.RData", destfile = "atheism.RData")
load("atheism.RData")

Exercise 3

Each row is a country. Each row of atheism is about the individual and whether or not they chose athiest.

Exercise 4

us12 <- subset(atheism, nationality == "United States" & year == "2012")
table(us12$response)

## 
##     atheist non-atheist 
##          50         952

50/1002

## [1] 0.0499002

The portion of athiest responses is .0499 or 4.99%. It does agree with the percentage in table 6.

Exervise 5

The two conditions are the sucess-failure condition and independent observations.

1002*(4.99/100)

## [1] 49.9998

1002*(1-(4.99/100))

## [1] 952.0002

About 50 and 952. Both conditions are met.

inference(us12$response, est = "proportion", type = "ci", method = "theoretical", 
          success = "atheist")

## Warning: package 'BHH2' was built under R version 3.6.3

## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.0499 ;  n = 1002 
## Check conditions: number of successes = 50 ; number of failures = 952 
## Standard error = 0.0069 
## 95 % Confidence interval = ( 0.0364 , 0.0634 )

Exercise 6

1.96*.0069

## [1] 0.013524

The margin of error is +/- 1.35%.

Exercise 7

canada12 <- subset(atheism, nationality == "Canada" & year == "2012")
inference(canada12$response, est = "proportion", type = "ci", method = "theoretical", 
          success = "atheist")

## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.0898 ;  n = 1002 
## Check conditions: number of successes = 90 ; number of failures = 912 
## Standard error = 0.009 
## 95 % Confidence interval = ( 0.0721 , 0.1075 )

germany12 <- subset(atheism, nationality == "Germany" & year == "2012")
inference(germany12$response, est = "proportion", type = "ci", method = "theoretical", 
          success = "atheist")

## Single proportion -- success: atheist 
## Summary statistics:

## p_hat = 0.1494 ;  n = 502 
## Check conditions: number of successes = 75 ; number of failures = 427 
## Standard error = 0.0159 
## 95 % Confidence interval = ( 0.1182 , 0.1806 )

.009*1.96

## [1] 0.01764

.0159*1.96

## [1] 0.031164

Canada margin of error = 1.76% Germany margin of error = 3.12% In both cases the sucess failure condition is met as 90 and 912 and 75 and 427 are all above 10.

n <- 1000
p <- seq(0, 1, 0.01)
me <- 2 * sqrt(p * (1 - p)/n)
plot(me ~ p, ylab = "Margin of Error", xlab = "Population Proportion")

Exercise 8

The relationship between p and me is the margin of error is increasing as we go from 0 to .5 then the margin of error is decreasing as we go fomr .5 to 1.

p <- 0.1
n <- 1040
p_hats <- rep(0, 5000)

for(i in 1:5000){
  samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
  p_hats[i] <- sum(samp == "atheist")/n
}

hist(p_hats, main = "p = 0.1, n = 1040", xlim = c(0, 0.18))

summary(p_hats)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.07019 0.09327 0.09904 0.09969 0.10577 0.12981

Exercise 9

The distribution looks normal. The center is .100 and spread from ~ .07 to ~.14

Exercise 10

par(mfrow=c(2,2))
p <- 0.1
n <- 1040
p_hats <- rep(0, 5000)

for(i in 1:5000){
  samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
  p_hats[i] <- sum(samp == "atheist")/n
}

hist(p_hats, main = "p = 0.1, n = 1040", xlim = c(0, 0.18))

p <- 0.1
n <- 400
p_hats <- rep(0, 5000)

for(i in 1:5000){
  samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
  p_hats[i] <- sum(samp == "atheist")/n
}

hist(p_hats, main = "p = 0.1, n = 400", xlim = c(0, 0.18))

p <- 0.02
n <- 1040
p_hats <- rep(0, 5000)

for(i in 1:5000){
  samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
  p_hats[i] <- sum(samp == "atheist")/n
}

hist(p_hats, main = "p = 0.02, n = 1040", xlim = c(0, 0.18))

p <- 0.02
n <- 400
p_hats <- rep(0, 5000)

for(i in 1:5000){
  samp <- sample(c("atheist", "non_atheist"), n, replace = TRUE, prob = c(p, 1-p))
  p_hats[i] <- sum(samp == "atheist")/n
}

hist(p_hats, main = "p = 0.02, n = 400", xlim = c(0, 0.18))

par(mfrow = c(1, 1))

Most of the distributions are fairly normal. It seems that p shifts the skewedness of the graph while n widens or narrows the spread.

Exercise 11

For Austrailia we may be able to use normal approximation but for Ecuador .2 *400 is less that 10 so it does not mean a condition.

Lab Week 9 - Inference for categorical data

Carolina Fuentes

4/5/2020