Question 1:
a. What are the cases in the dataset?
b. What is the sample proportion of US residents that have health insurance?
##Calculate the point estimate
##The length() function counts the number of values in the object, and the which() function finds the
k<-length(which(ACS$HealthInsurance== 1))
##The na.omit() function removes the rows in the variable that are missing, so you would only be left with a
##variable with non-missing rows.
n<-length(na.omit(ACS$HealthInsurance))
##calculate the sample proportion
k
## [1] 861
n
## [1] 1000
p.hat<-k/n
p.hat
## [1] 0.861
Question 2:
a. What type of estimate is the one you found in question 1: a point estimate or an interval estimate?
b.Which do you think is a better estimate to report, a point estimate or an interval estimate? Explain your reasoning!
Question 3:
Suppose we want to construct a confidence interval. Are the conditions met to assume the sampling distribution of sample proportions is approximately normal (i.e., the CLT - Central Limit Theorem is valid)? Explain. -
Independence It is a random sample and it is 1% of te US residents (population)
Yes. Sample size/success-failure condition We have more than 10 expected successes and 10 expected failures in the observed sample.
Using the normal distribution:
Question 4:
What is the value of the estimated standard error? Use the formula from the Week 5 slides and estimate the standard error using the normal distribution.
## estimate the standard error.
SE=sqrt(p.hat*(1-p.hat)/n)
Question 5:
a. Find a confidence interval for the true proportion of US residents who have health insurance based on a confidence level that you choose.
b. Explain why you chose the confidence interval that you did. Use qnorm() to find the z needed.
c. Interpret this confidence interval.
##find SE - standart error
SE=sqrt(p.hat*(1-p.hat)/n)
##find z para 99%
Z<-qnorm(.995)
##find the confidence interval
CI1 <- p.hat-Z*SE
CI2 <- p.hat+Z*SE
Using bootstrap simulations:
Question 6:
What is the value of the estimated standard error? Use bootstrap simulations like in HW 4 to find the standard error.
10000 samples
SE=0.01085488
boot.phats <- c() #Initializing the vector
for(i in 1:10000){ #i is a sample and we are taking 10000 samples
boot.samp <- sample(ACS$HealthInsurance, n, replace = TRUE) #Take a random sample
#Now we need to calculate our bootstrap statistic
##(this is analogous to the sample statistic we compute from a sample)
boot.k <- length(which(boot.samp == 1)) #how many events or "successes" do we have in our sample
boot.phat <- boot.k/n #a bootstrap statistic
boot.phats <- c(boot.phats, boot.phat) #I am added the newly computed bootstrap statistic to the vector of bootstrap statistics
}
##Recall we use this quantity as an estimate of population proportion
SEBoot <- sd(boot.phats) # estimate of the SE for the sampling distribution of the proportion.
##We estimate the SE by computing the standard deviation of our bootstrap distribution.
SEBoot
## [1] 0.01088075
Question 7:
Find a confidence interval for the true proportion of US residents who have health insurance based on a confidence level that you choose and the standard error you calculated in question 6.
CIboot1 <- boot.phat-2*SE
CIboot1
## [1] 0.8321204
CIboot2 <- boot.phat+2*SE
CIboot2
## [1] 0.8758796
Question 8:
Suppose we’d like to test if the true proportion of US residents who have health insurance is 80% vs. the true proportion of US residents who have health insurance is NOT 80%. What would be the hypotheses for this test? Please write your hypotheses in non-technical language AND using notation. Specify which hypothesis is which (null or alternative).
Null Hypothesis: p=0.8 - US residents who have health insurance is 80%
Alternative Hypotheses p!=0.8 - US residents who have health insurance is different 80%
Question 9:
Conduct a hypothesis test for the hypotheses specified in Question 8 using the confidence interval calculated in Question 5. State your conclusions in layman’s terms and in the context of this question. Hint: look at the Week 5 part 2 slides.
Null Hypothesis: p=0.8
Alternative Hypotheses p!=0.8
Confidence interval 99%
Our data provides evidence that the true proportion of US residents who have health insurance is not equal to 80%. Our data suggest between 77% and 83% of US residents who have health insurance.
## n=1000, phat=0.8
##Calculate SE
SE9=sqrt(0.8*(1-0.8)/n)
##z was calculated in question 5 for 99%
CI1Hy <- .8-Z*SE9
CI2Hy <- .8+Z*SE9
CI1Hy
## [1] 0.7674181
CI2Hy
## [1] 0.8325819