Confidence Intervals and Hypothesis Testing for a Proportion


Question 1:

  1. What are the cases in the dataset?
       The cases in this dataset is a random sample of 1% of US Residents.

  1. What is the sample proportion of US residents that have health insurance?
       sample proportion = 0.861

k <- length(which(ACS$HealthInsurance == "1"))
n <- length(na.omit(ACS$HealthInsurance))
p.hat <- k/n
p.hat
sample proportion = [1] 0.861

Question 2:

  1. What type of estimate is the one you found in question 1: a point estimate or an interval estimate?
  1. Which do you think is a better estimate to report, a point estimate or an interval estimate? Explain your reasoning!

Question 3:

Suppose we want to construct a confidence interval. Are the conditions met to assume the sampling distribution of sample proportions is approximately normal (i.e., the CLT is valid)? Explain.

  1. Random Sample: Yes, Its a random sample pulled from 3.5 million households.

  2. n < 10%: Yes, the sample is 1% of the the population.

  3. np >= and n(1-p) >= 10: Yes, we are trying to see who has health insurance vs those who don’t have health insurance.


Using the normal distribution:

Question 4:

What is the value of the estimated standard error? Use the formula from the Week 5 slides and estimate the standard error using the normal distribution.

se <- sqrt((p.hat*(1-p.hat))/n)
se
Standard Error = [1] 0.01093979

Question 5:

  1. Find a confidence interval for the true proportion of US residents who have health insurance based on a confidence level that you choose.

prop.test(k,n, .95)
## 
##  1-sample proportions test with continuity correction
## 
## data:  k out of n, null probability 0.95
## X-squared = 164.89, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.95
## 95 percent confidence interval:
##  0.8376433 0.8815296
## sample estimates:
##     p 
## 0.861

  1. Explain why you chose the confidence interval that you did. Use qnorm() to find the z needed.

z = qnorm(.975, 0, 1)
z
## [1] 1.959964
qnorm(.025,0, 1)
## [1] -1.959964
x = z * se
upper = p.hat + x
lower = p.hat - x
lower
## [1] 0.8395584
upper
## [1] 0.8824416

  1. Interpret this confidence interval.

Using bootstrap simulations:


Question 6:

What is the value of the estimated standard error? Use bootstrap simulations like in HW 4 to find the standard error.

SE = 0.01095587

boot.samp <-sample(ACS$HealthInsurance, size = n, replace = TRUE)
boot.phats <-c()
for(i in 1:10000){
  boot.samp <-sample(ACS$HealthInsurance, n, replace = TRUE)
  boot.k <-length(which(boot.samp==1))
  boot.phat <- boot.k/n
  boot.phats <-c(boot.phats, boot.phat)
}

hist(boot.phats)

mean(boot.phats)
## [1] 0.8611106
SE <-sd(boot.phats)
SE
## [1] 0.01101536

Question 7:

Find a confidence interval for the true proportion of US residents who have health insurance based on a confidence level that you choose and the standard error you calculated in question 6. Your confidence interval should be very similar to question 5.

Confidence Interval = (0.839, 0.882)

CI.lb <- (sort(boot.phats)[250])
CI.ub <- (sort(boot.phats)[9750])
CI.lb
## [1] 0.839
CI.ub
## [1] 0.882