download.file("http://www.openintro.org/stat/data/ames.RData", destfile = "ames.RData")
load("ames.RData")

population <- ames$Gr.Liv.Area
samp <- sample(population, 60)

Question 1

Using the plot_ci(lower_vector, upper_vector, mean(population)) function (which was downloaded with the data set), plot all intervals. What proportion of your confidence intervals include the true population mean? Is this proportion exactly equal to the confidence level? If not, explain why.

samp_mean <- rep(NA, 50)
samp_sd <- rep(NA, 50)
n <- 60


for(i in 1:50){
  samp <- sample(population, n) # obtain a sample of size n = 60 from the population
  samp_mean[i] <- mean(samp)    # save sample mean in ith element of samp_mean
  samp_sd[i] <- sd(samp)        # save sample sd in ith element of samp_sd
}

lower_vector <- samp_mean - 1.96 * samp_sd / sqrt(n) 
upper_vector <- samp_mean + 1.96 * samp_sd / sqrt(n)

plot_ci(lower_vector, upper_vector, mean(population))

The first time I ran this command, 47 out of the 50 intervals captured the true population mean (47/50=0.94) so we have 94% instead of the expected 95%. The reason why this isn’t exactly equal to the confidence level is firstly that it is simply impossible for a 47.5 out of 50 intervals to capture the population mean in a single shot; intervals either do or don’t capture the population mean, no half capture half don’t. I anticipate that if you ran this multiple times, you’d have others that capture 48 out of 50 or 96%. But over time it should even out to 95% confidence.

Question 2

Pick a confidence level of your choosing, provided it is not 95%. What is the appropriate critical value?

#No R code needed.

I will make a 99% confidence interval, so my critical value is 2.576.

Question 3

Calculate 50 confidence intervals at the confidence level you chose in the previous question. You do not need to obtain new samples, simply calculate new intervals based on the sample means and standard deviations you have already collected. Using the plot_ci function, plot all intervals and calculate the proportion of intervals that include the true population mean. How does this percentage compare to the confidence level selected for the intervals?

lower_vector <- samp_mean - 2.576 * samp_sd / sqrt(n) 
upper_vector <- samp_mean + 2.576 * samp_sd / sqrt(n)

plot_ci(lower_vector, upper_vector, mean(population))

The first time I ran this command, 49 out of 50, or 98% of the intervals, captured the population mean. Again this is lower than the exact confidence interval I plotted out, although higher than the 94% proportion I received when I plotted a 95% interval, but I believe over time this will prove to be an accurate confidence interval if you run the command with infinite repetition.