download.file("http://www.openintro.org/stat/data/ames.RData", destfile = "ames.RData")
load("ames.RData")
population <- ames$Gr.Liv.Area
samp <- sample(population, 60)
hist(samp)
The distribution of my sample is somewhat right skewed, with the peak between 1000-1500 sq feet. Based on where the peak is, I would say the typical size of a house is between 1000-1500 square feet. Typical in this case is area where the majority of house sizes fall.
I would not expect another student’s distribution to be identical, since the odds of pulling the exact same 60 values for the sample out of 2930 total values is really small. It may be similar, in that the shape of the distribution may be similar if the overall data is somewhat normal, but there’s virtually no way that it could be identical.
sample_mean <- mean(samp)
se <- sd(samp) / sqrt(60)
lower <- sample_mean - 1.96 * se
upper <- sample_mean + 1.96 * se
c(lower, upper)
## [1] 1459.875 1719.859
must know the SD of the sample; n must be greater than/equal to 30
95% confidence means that we are 95% confident that the true average sizes of houses in Ames lies between 1328.883 sq feet and 1554.717 sq feet.
mean(population)
## [1] 1499.69
Yes, the confidence interval ranged from 1328.883 to 1554.717, and the true mean of 1499.69 falls within that range.
Based on our 95% confidence interval, I would say that 95% of the intervals should capture the true population mean. My interval was fairly large, and if the other intervals are similar ranges (based on the data) then the intervals SHOULD include the true mean.
samp_mean <- rep(NA, 50)
samp_sd <- rep(NA, 50)
n <- 60
for(i in 1:50){
samp <- sample(population, n)
samp_mean[i] <- mean(samp)
samp_sd[i] <- sd(samp)
}
lower_vector <- samp_mean - 1.96 * samp_sd / sqrt(n)
upper_vector <- samp_mean + 1.96 * samp_sd / sqrt(n)
c(lower_vector[1], upper_vector[1])
## [1] 1316.141 1537.392