download.file("http://www.openintro.org/stat/data/ames.RData", destfile = "ames.RData")
load("ames.RData")
Describe the distribution of your sample. What would you say is the “typical” size within your sample? Also state precisely what you interpreted “typical” to mean.
population <- ames$Gr.Liv.Area
samp <- sample(population, 60)
hist(samp)
summary(samp)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 691 1038 1434 1428 1722 2872
The distribution of the data does not seem normal. There appears to be outliers and the sample is skewed to the right. I would say that the typical 1350. I interpreted “typical” to mean the average.
hist(samp, breaks = 10)
Would you expect another student’s distribution to be identical to yours? Would you expect it to be similar? Why or why not?
I would not expect another student’s distribution to be identical to mine. I would however expect it to be simple because this sample was randomly selected from a population. Due to random sampling, I would expect some variety however I believe the distribution would be similar.
sample_mean<-mean(samp)
For the confidence interval to be valid, the sample mean must be normally distributed and have standard error s/√n. What conditions must be met for this to be true?
se<- sd(samp) / sqrt(60)
lower<- sample_mean - 1.96 * se
upper<- sample_mean + 1.96 * se
c(lower, upper)
## [1] 1315.792 1540.741
The conditions for this to be true is that there must be random sampling and independent observations. In addition, n must be large enough.
What does “95% confidence” mean? If you’re not sure, see Section 4.2.2.
A 95% confindence means that we are 95% confident that the true population mean of (context of problem)lies between (blank #) and (blank #).
mean(population)
## [1] 1499.69
Does your confidence interval capture the true average size of houses in Ames? If you are working on this lab in a classroom, does your neighbor’s interval capture this value?
My 95% confidence interval (1236.076, 1511.158) for the sample does capture the true average size of houses (1499.69 sq ft) in the population of Ames.
samp_mean <- rep(NA, 50)
samp_sd <- rep(NA, 50)
n <- 60
Each student in your class should have gotten a slightly different confidence interval. What proportion of those intervals would you expect to capture the true population mean? Why? If you are working in this lab in a classroom, collect data on the intervals created by other students in the class and calculate the proportion of intervals that capture the true population mean.
I would expect 95% of the confidence intervals from our class to capture the true population mean.
for(i in 1:50){
samp <- sample(population, n)
samp_mean[i] <- mean(samp)
samp_sd[i] <- sd(samp)
}
lower_vector <- samp_mean - 1.96 * samp_sd / sqrt(n)
upper_vector <- samp_mean + 1.96 * samp_sd / sqrt(n)
c(lower_vector[1], upper_vector[1])
## [1] 1332.130 1550.904
Using the following function (which was downloaded with the data set), plot all intervals. What proportion of your confidence intervals include the true population mean? Is this proportion exactly equal to the confidence level? If not, explain why.
plot_ci(lower_vector, upper_vector, mean(population))
5 out of 50 of the samples do not include the true population mean. This means that 45 out of 50 of the samples, or 90% of the samples, include the true popualtion mean. Our cI was 95% so this proportion is slighlty lower than expected. The confidence interval only gives us a probability that the true population mean will be within certain values. Due to this, there is a chance that this might not occur. In this case, we could only be 95% certain that the true population mean would fall within our parameters.
Pick a confidence level of your choosing, provided it is not 95%. What is the appropriate critical value?
qt(.95,df=49)
## [1] 1.676551
I choose a 90% confidence interval. The critical value for this would be 1.676
Calculate 50 confidence intervals at the confidence level you chose in the previous question. You do not need to obtain new samples, simply calculate new intervals based on the sample means and standard deviations you have already collected. Using the plot_ci function, plot all intervals and calculate the proportion of intervals that include the true population mean. How does this percentage compare to the confidence level selected for the intervals?
lower_vector <- samp_mean - 1.676 * samp_sd / sqrt(n)
upper_vector <- samp_mean + 1.676 * samp_sd / sqrt(n)
plot_ci(lower_vector, upper_vector, mean(population))
6 out of 50 of the samples do not include the true population mean. This means that 44 out of 50 samples, or 88% of the samples include the true population mean. While this is close, it is still slightly lower than the confidence interval I selected of 90%.