## Warning: package 'ggplot2' was built under R version 3.2.5
download.file("http://www.openintro.org/stat/data/ames.RData", destfile = "ames.RData")
load("ames.RData")
n <- 60
population <- ames$Gr.Liv.Area
samp <- sample(population, 60)
summary(samp)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 708 1088 1378 1441 1664 2640
hist(population)
hist(samp, breaks = 10)
popMean <- mean(population)
popMean
## [1] 1499.69
sample_mean <- mean(samp)
se <- sd(samp) / sqrt(60)
lower <- sample_mean - 1.96 * se
upper <- sample_mean + 1.96 * se
c(lower, upper)
## [1] 1316.370 1566.063
mean(population)
## [1] 1499.69
samp_mean <- rep(NA, 50)
samp_sd <- rep(NA, 50)
n <- 60
Now we’re ready for the loop where we calculate the means and standard deviations of 50 random samples.
for(i in 1:50){
samp <- sample(population, n) # obtain a sample of size n = 60 from the population
samp_mean[i] <- mean(samp) # save sample mean in ith element of samp_mean
samp_sd[i] <- sd(samp) # save sample sd in ith element of samp_sd
}
lower_vector <- samp_mean - 1.96 * samp_sd / sqrt(n)
upper_vector <- samp_mean + 1.96 * samp_sd / sqrt(n)
c(lower_vector[1], upper_vector[1])
## [1] 1408.242 1694.391
1.Using the following function (which was downloaded with the data set), plot all intervals. What proportion of your confidence intervals include the true population mean? Is this proportion exactly equal to the confidence level? If not, explain why.
plot_ci(lower_vector, upper_vector, mean(population))
Pick a confidence level of your choosing, provided it is not 95%. What is the appropriate critical value?
Let’s pick 95%. For this, the critical value will be 2.58.
Calculate 50 confidence intervals at the confidence level you chose in the previous question. You do not need to obtain new samples, simply calculate new intervals based on the sample means and standard deviations you have already collected. Using the plot_ci
function, plot all intervals and calculate the proportion of intervals that include the true population mean. How does this percentage compare to the confidence level selected for the intervals?
lower_vector95 <- samp_mean - 2.58 * samp_sd / sqrt(n)
upper_vector95 <- samp_mean + 2.58 * samp_sd / sqrt(n)
plot_ci(lower_vector95, upper_vector95, mean(population))
capMean <- lower_vector95 < popMean && popMean < upper_vector95
sum(capMean) / length(capMean)
## [1] 1
In this case all of the intervals captured the true mean, which is just 1% off from the confidence interval.