Lab 6: Foundations for Statistical Innference - Confidence Intervals
set.seed(64588)
download.file("http://www.openintro.org/stat/data/ames.RData", destfile = "ames.RData")
load("ames.RData")
Exercise 1: Describe the distribution of your sample. What would you say is the “typical” size within your sample? Also state precisely what you interpreted “typical” to mean.
population <- ames$Gr.Liv.Area
samp <- sample(population, 60)
summary(samp)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 438 1178 1465 1561 1841 3500
hist(samp)

Answer: The distribution of this random sample of size 60 is right skewed. I would say that the “typical” house size in my sample is 1561 square feet. I interpreted “typical” to mean the average value in the sample, or the mean.
Exercise 2: Would you expect another student’s distribution to be identical to yours? Would you expect it to be similar? Why or why not?
Answer: I would expect another student’s distribution to be similar to the distribution of my sample. This is because the class is using the same population data, however each student will create their own random sample from that population.
Exercise 3: For the confidence interval to be valid, the sample mean must be normally distributed and have standard error s/n−−√. What conditions must be met for this to be true?
sample_mean <- mean(samp)
se <- sd(samp) / sqrt(60)
lower <- sample_mean - 1.96 * se
upper <- sample_mean + 1.96 * se
c(lower, upper)
## [1] 1424.436 1698.330
Answer: The sample must be random and have been taken from a large population. The sample observations must be independent of each other. Also, the population must be approximately normal if n is small.
Exercise 4: What does “95% confidence” mean? If you’re not sure, see Section 4.2.2.
Answer: There is a 95% probability that the true population parameter will be within +/- 2 SE from the mean.
Exercise 5: Does your confidence interval capture the true average size of houses in Ames? If you are working on this lab in a classroom, does your neighbor’s interval capture this value?
mean(population)
## [1] 1499.69
Answer: My confidence interval does capture the true average size of houses in Ames. The true mean is 1499.69 which is included in my confidence interval (from 1424.436 to 1698.330 sq ft). Most likely, my neighbor’s interval also will capture the true mean due to the high confidence interval of 95%.
Exercise 6: Each student in your class should have gotten a slightly different confidence interval. What proportion of those intervals would you expect to capture the true population mean? Why? If you are working in this lab in a classroom, collect data on the intervals created by other students in the class and calculate the proportion of intervals that capture the true population mean.
Answer: I would expect 95% of the class to capture the true population mean, while 5% will not. This is because of the meaning of a 95% confidence interval: there is a 95% chance that a student’s confidence interval will contain the true population mean.
samp_mean <- rep(NA, 50)
samp_sd <- rep(NA, 50)
n <- 60
for(i in 1:50){
samp <- sample(population, n) # obtain a sample of size n = 60 from the population
samp_mean[i] <- mean(samp) # save sample mean in ith element of samp_mean
samp_sd[i] <- sd(samp) # save sample sd in ith element of samp_sd
}
lower_vector <- samp_mean - 1.96 * samp_sd / sqrt(n)
upper_vector <- samp_mean + 1.96 * samp_sd / sqrt(n)
c(lower_vector[1], upper_vector[1])
## [1] 1466.730 1744.003
On Your Own 1: Using the following function (which was downloaded with the data set), plot all intervals. What proportion of your confidence intervals include the true population mean? Is this proportion exactly equal to the confidence level? If not, explain why.
plot_ci(lower_vector, upper_vector, mean(population))

Answer: Out of 50 intervals, 5 did not include the true population mean in their confidence interval. This proportion is not exactly equal to the confidence level of 95%. (45/50 x 100 = 90%) We can expect the proportion to be similar/close to the confidence interval, however due to the randomness of the samples collected, it is possible to have a different proportion.
On Your Own 2: Pick a confidence level of your choosing, provided it is not 95%. What is the appropriate critical value?
qnorm(0.95,0,1)
## [1] 1.644854
Answer: For a confidence level of 90%, the appropriate critical value is 1.645 because the 90% area on the graph is centered on the mean 0. We must then include that there is 5% on either end of the graph which affects the cut off value
On Your Own 3: Calculate 50 confidence intervals at the confidence level you chose in the previous question. You do not need to obtain new samples, simply calculate new intervals based on the sample means and standard deviations you have already collected. Using the plot_ci function, plot all intervals and calculate the proportion of intervals that include the true population mean. How does this percentage compare to the confidence level selected for the intervals?
samp_mean <- rep(NA, 50)
samp_sd <- rep(NA, 50)
n <- 60
for(i in 1:50){
samp <- sample(population, n)
samp_mean[i] <- mean(samp)
samp_sd[i] <- sd(samp)
}
lower_vector <- samp_mean - 1.645 * samp_sd / sqrt(n)
upper_vector <- samp_mean + 1.645 * samp_sd / sqrt(n)
plot_ci(lower_vector, upper_vector, mean(population))

Answer: Out of 50 confidence intervals, 6 did not include the true population mean. (44/50 x 100 = 0.88%) The percentage of intervals that contain the true population mean, 88%, is lower than the selected confidence level of 90%.