R Homework Week 5

Confidence Intervals and Hypothesis Testing for a Proportion

Question 1:

What are the cases in the dataset?

The cases in this dataset is a random sample of 1% of US Residents.

What is the sample proportion of US residents that have health insurance?

sample proportion = 0.861

k <- length(which(ACS$HealthInsurance == "1"))
n <- length(na.omit(ACS$HealthInsurance))
p.hat <- k/n
p.hat

sample proportion = [1] 0.861

Question 2:

What type of estimate is the one you found in question 1: a point estimate or an interval estimate?

It is a point estimate because it isn’t a range of values like a interval estimate but an estimate from a single value.

Which do you think is a better estimate to report, a point estimate or an interval estimate? Explain your reasoning!

I believe the interval estimate is better. With the point estimate you only get a sliver of the whole picture. An interval estimate gives you more information of where the truth is located and gives us a glimpse of the bigger picture.

Question 3:

Suppose we want to construct a confidence interval. Are the conditions met to assume the sampling distribution of sample proportions is approximately normal (i.e., the CLT is valid)? Explain.

Random Sample: Yes, Its a random sample pulled from 3.5 million households.
n < 10%: Yes, the sample is 1% of the the population.
np >= and n(1-p) >= 10: Yes, we are trying to see who has health insurance vs those who don’t have health insurance.

Using the normal distribution:

Question 4:

What is the value of the estimated standard error? Use the formula from the Week 5 slides and estimate the standard error using the normal distribution.

Standard Error = 0.01093979

se <- sqrt((p.hat*(1-p.hat))/n)
se

Standard Error = [1] 0.01093979

Question 5:

Find a confidence interval for the true proportion of US residents who have health insurance based on a confidence level that you choose.

95%, (0.8376, 0.8815)

prop.test(k,n, .95)

## 
##  1-sample proportions test with continuity correction
## 
## data:  k out of n, null probability 0.95
## X-squared = 164.89, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.95
## 95 percent confidence interval:
##  0.8376433 0.8815296
## sample estimates:
##     p 
## 0.861

Explain why you chose the confidence interval that you did. Use qnorm() to find the z needed.

I chose this confidence interval because I wanted a wide net. The Z for this interval is 1.959964.

z = qnorm(.975, 0, 1)
z

## [1] 1.959964

qnorm(.025,0, 1)

## [1] -1.959964

x = z * se
upper = p.hat + x
lower = p.hat - x
lower

## [1] 0.8395584

upper

## [1] 0.8824416

Interpret this confidence interval.

We are 95% confident that the true proportion of US residents that have health insurance is between .8376 and .8815.

Using bootstrap simulations:

Question 6:

What is the value of the estimated standard error? Use bootstrap simulations like in HW 4 to find the standard error.

SE = 0.01095587

boot.samp <-sample(ACS$HealthInsurance, size = n, replace = TRUE)
boot.phats <-c()
for(i in 1:10000){
  boot.samp <-sample(ACS$HealthInsurance, n, replace = TRUE)
  boot.k <-length(which(boot.samp==1))
  boot.phat <- boot.k/n
  boot.phats <-c(boot.phats, boot.phat)
}

hist(boot.phats)

mean(boot.phats)

## [1] 0.8611106

SE <-sd(boot.phats)
SE

## [1] 0.01101536

Question 7:

Find a confidence interval for the true proportion of US residents who have health insurance based on a confidence level that you choose and the standard error you calculated in question 6. Your confidence interval should be very similar to question 5.

Confidence Interval = (0.839, 0.882)

CI.lb <- (sort(boot.phats)[250])
CI.ub <- (sort(boot.phats)[9750])
CI.lb

## [1] 0.839

CI.ub

## [1] 0.882

R Homework Week 5

Tully O’Leary

3/6/2021

Confidence Intervals and Hypothesis Testing for a Proportion

Using the normal distribution:

Using bootstrap simulations: