ACS<-read.csv(file.choose(),header = TRUE)

Question 1:
a. What are the cases in the dataset? The cases: Us Residents who have health insurance USCitizen: 1=citizen and 0=noncitizen HealthInsurance: 1=have health insurance and 0=no health insurance are the variables.

  1. What is the sample proportion of US residents that have health insurance? THe sample proportion is : [1] 0.861
k<-length(which(ACS$USCitizen== 1))
k
## [1] 939
n<-length(na.omit(ACS$USCitizen))
n
## [1] 1000
p.hat<-k/n
p.hat
## [1] 0.939
A<-length(which(ACS$HealthInsurance== 1))
A
## [1] 861
b<-length(na.omit(ACS$HealthInsurance))
b
## [1] 1000
p.hat<-A/b
p.hat
## [1] 0.861
p.hat<-k/A
p.hat
## [1] 1.090592

Question 2: a. What type of estimate is the one you found in question 1: a point estimate or an interval estimate?
It is a point estimate.

b.Which do you think is a better estimate to report, a point estimate or an interval estimate? Explain your reasoning! Point estimate for this scenario due to it being a sample proportion. If it were a population parameter using interval would work best.

Question 3: Suppose we want to construct a confidence interval. Are the conditions met to assume the sampling distribution of sample proportions is approximately normal (i.e., the CLT is valid)? Explain.

Yes, Randomization is valid. THe sample is random. The sample values are independent of each other. The sample size is large enough.

Using the normal distribution:

Question 4: What is the value of the estimated standard error? Use the formula from the Week 5 slides and estimate the standard error using the normal distribution.

ACS.sd <- sd(ACS$HealthInsurance)
ACS.sd
## [1] 0.3461196

Question 5: a. Find a confidence interval for the true proportion of US residents who have health insurance based on a confidence level that you choose.

95 percent confidence interval: 0.8376433 0.8815296

  1. Explain why you chose the confidence interval that you did. Use qnorm() to find the z needed.
    [1] 1.546433 I chose the confidence interval of 95% to get a better perspective of where the mean is.

  2. Interpret this confidence interval. Us citizens who have health insurance fall between 0.8376433 and 0.8815296 confidence interval.

se <-ACS$HealthInsurance
boot.samp <-sample(ACS$HealthInsurance, size = n, replace = TRUE)
CI <- p.hat+ c(-1,1)*2*se
sd(ACS$HealthInsurance)
## [1] 0.3461196
qnorm(0.95,mean=0.861,sd= 0.3461196)
## [1] 1.430316
qnorm(0.939)
## [1] 1.546433
prop.test(A, b, conf.level=0.95)
## 
##  1-sample proportions test with continuity correction
## 
## data:  A out of b, null probability 0.5
## X-squared = 519.84, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.8376433 0.8815296
## sample estimates:
##     p 
## 0.861

Using bootstrap simulations: Question 6: What is the value of the estimated standard error? Use bootstrap simulations like in HW 4 to find the standard error. > SE [1] 0.01096388

boot.samp <-sample(ACS$HealthInsurance, size = b, replace = TRUE)
boot.samp
##    [1] 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 0 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1
##   [38] 1 1 0 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
##   [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1
##  [112] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1
##  [149] 0 1 1 0 1 1 1 1 0 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1
##  [186] 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0
##  [223] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1
##  [260] 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1
##  [297] 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [334] 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 0 1 1
##  [371] 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 1 1
##  [408] 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0
##  [445] 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [482] 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
##  [519] 1 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0
##  [556] 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1
##  [593] 1 1 0 1 0 1 1 0 0 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [630] 0 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1
##  [667] 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1
##  [704] 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 0 1 1 0 1 1 0 1 1 1 1 0 0
##  [741] 0 1 0 0 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1
##  [778] 1 0 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 0 0 1 1 1 0 1 1 1 0 1
##  [815] 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1
##  [852] 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1
##  [889] 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1
##  [926] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1
##  [963] 1 1 0 1 1 1 0 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1
## [1000] 1
boot.phats <-c()
for(i in 1:10000){boot.samp <-sample(ACS$HealthInsurance, b, replace = TRUE)
boot.A <-length(which(boot.samp==1))
boot.phat <- boot.A/b 
boot.phats <-c(boot.phats, boot.phat)}
SE <-sd(boot.phats)
SE
## [1] 0.01092115

Question 7: Find a confidence interval for the true proportion of US residents who have health insurance based on a confidence level that you choose and the standard error you calculated in question 6. Your confidence interval should be very similar to question 5. [1] 1.068665 1.112520

CI <- p.hat+ c(-1,1)*2*SE
CI
## [1] 1.068750 1.112435

Question 8: Suppose we’d like to test if the true proportion of US residents who have health insurance is 80% vs. the true proportion of US residents who have health insurance is NOT 80%. What would be the hypotheses for this test? Please write your hypotheses in non-technical language AND using notation. Specify which hypothesis is which (null or alternative). Alternative hypothesis. The p = is greater then 50%. It is more likely US residents who have health insurance is 80%.

Question 9: Conduct a hypothesis test for the hypotheses specified in Question 8 using the confidence interval calculated in Question 5. State your conclusions in layman’s terms and in the context of this question. Hint: look at the Week 5 part 2 slides.

"t.test(x,y)"
## [1] "t.test(x,y)"