Question 1:
a. What are the cases in the dataset?

b. What is the sample proportion of US residents that have health insurance?

##Calculate the point estimate

##The length() function counts the number of values in the object, and the which() function finds the

k<-length(which(ACS$HealthInsurance== 1))

##The na.omit() function removes the rows in the variable that are missing, so you would only be left with a
##variable with non-missing rows.
n<-length(na.omit(ACS$HealthInsurance))

##calculate the sample proportion
k
## [1] 861
n
## [1] 1000
p.hat<-k/n
p.hat
## [1] 0.861

Question 2:

a. What type of estimate is the one you found in question 1: a point estimate or an interval estimate?

b.Which do you think is a better estimate to report, a point estimate or an interval estimate? Explain your reasoning!

Question 3:

Suppose we want to construct a confidence interval. Are the conditions met to assume the sampling distribution of sample proportions is approximately normal (i.e., the CLT - Central Limit Theorem is valid)? Explain. -

Using the normal distribution:

Question 4:

What is the value of the estimated standard error? Use the formula from the Week 5 slides and estimate the standard error using the normal distribution.

## estimate the standard error.
SE=sqrt(p.hat*(1-p.hat)/n)

Question 5:

a. Find a confidence interval for the true proportion of US residents who have health insurance based on a confidence level that you choose.

b. Explain why you chose the confidence interval that you did. Use qnorm() to find the z needed.

c. Interpret this confidence interval.

##find SE - standart error
SE=sqrt(p.hat*(1-p.hat)/n)
##find z para 99%
Z<-qnorm(.995)
##find the confidence interval
CI1 <- p.hat-Z*SE
CI2 <- p.hat+Z*SE

Using bootstrap simulations:

Question 6:

What is the value of the estimated standard error? Use bootstrap simulations like in HW 4 to find the standard error.

boot.phats <- c() #Initializing the vector
for(i in 1:10000){ #i is a sample and we are taking 10000 samples
  boot.samp <- sample(ACS$HealthInsurance, n, replace = TRUE) #Take a random sample
  #Now we need to calculate our bootstrap statistic
  ##(this is analogous to the sample statistic we compute from a sample)
  boot.k <- length(which(boot.samp == 1)) #how many events or "successes" do we have in our sample
  boot.phat <- boot.k/n #a bootstrap statistic
  boot.phats <- c(boot.phats, boot.phat) #I am added the newly computed bootstrap statistic to the vector of bootstrap statistics
}

##Recall we use this quantity as an estimate of population proportion
SEBoot <- sd(boot.phats) # estimate of the SE for the sampling distribution of the proportion.
##We estimate the SE by computing the standard deviation of our bootstrap distribution.
SEBoot
## [1] 0.01088075

Question 7:

Find a confidence interval for the true proportion of US residents who have health insurance based on a confidence level that you choose and the standard error you calculated in question 6.

CIboot1 <- boot.phat-2*SE
CIboot1
## [1] 0.8321204
CIboot2 <- boot.phat+2*SE
CIboot2
## [1] 0.8758796

Question 8:

Suppose we’d like to test if the true proportion of US residents who have health insurance is 80% vs. the true proportion of US residents who have health insurance is NOT 80%. What would be the hypotheses for this test? Please write your hypotheses in non-technical language AND using notation. Specify which hypothesis is which (null or alternative).

Question 9:

Conduct a hypothesis test for the hypotheses specified in Question 8 using the confidence interval calculated in Question 5. State your conclusions in layman’s terms and in the context of this question. Hint: look at the Week 5 part 2 slides.

## n=1000, phat=0.8

##Calculate SE
SE9=sqrt(0.8*(1-0.8)/n)
##z was calculated in question 5 for 99%
CI1Hy <- .8-Z*SE9
CI2Hy <- .8+Z*SE9
CI1Hy
## [1] 0.7674181
CI2Hy
## [1] 0.8325819