download.file("http://www.openintro.org/stat/data/ames.RData", destfile = "ames.RData")
load("ames.RData")
population <- ames$Gr.Liv.Area
samp <- sample(population, 60)
Exercise 1: Describe the distribution of your sample. What would you say is the “typical” size within your sample? Also state precisely what you interpreted “typical” to mean.
summary(samp) The typical size within my sample is 1608. I interpreted typical to mean averarage of the sample is the mean.
Exercise 2: Would yge, which the aveou expect another student’s distribution to be identical to yours? Would you expect it to be similar? Why or why not?
It will be similar but not identical. We have the whole population of the data but we are using only a sample size of 60 so there will be sampling error involved.So we should have similar numbers but they will not be identical. Everytime you run the data the number will be slightly different.
sample_mean <- mean(samp)
se <- sd(samp) / sqrt(60)
lower <- sample_mean - 1.96 * se
upper <- sample_mean + 1.96 * se
c(lower, upper)
## [1] 1360.223 1596.843
Exercise 3: For the confidence interval to be valid, the sample mean must be normally distributed and have standard error s/n−−√. What conditions must be met for this to be true?
The sample consists of at least 30 independent observations and the data are not strongly skewed.
Exercise 4: What does “95% confidence” mean? If you’re not sure, see Section 4.2.2.
A 95% confidence interval means that we are 95% confident that the data will fall between these “Intervals” So if we can a set of data 95% of the time the data would land between these 2 numebers.
mean(population)
## [1] 1499.69
Exercise 5: Does your confidence interval capture the true average size of houses in Ames? If you are working on this lab in a classroom, does your neighbor’s interval capture this value?
Yes the 95% confidence interval I got captures the true average of house in Ames. The mean of the population is 1499.69 and the 95% confidence interval is between 1360.853 and 1617.680.
Exercise 6: Each student in your class should have gotten a slightly different confidence interval. What proportion of those intervals would you expect to capture the true population mean? Why? If you are working in this lab in a classroom, collect data on the intervals created by other students in the class and calculate the proportion of intervals that capture the true population mean.
Everyone should have gotten a confidence interval with the mean inside of it. Since it is the true population mean instead of just a sample mean it will show the true mean. The true population mean will always fall inside the 95% confidence inteval.
samp_mean <- rep(NA, 50)
samp_sd <- rep(NA, 50)
n <- 60
for(i in 1:50){
samp <- sample(population, n) # obtain a sample of size n = 60 from the population
samp_mean[i] <- mean(samp) # save sample mean in ith element of samp_mean
samp_sd[i] <- sd(samp) # save sample sd in ith element of samp_sd
}
lower_vector <- samp_mean - 1.96 * samp_sd / sqrt(n)
upper_vector <- samp_mean + 1.96 * samp_sd / sqrt(n)
c(lower_vector[1], upper_vector[1])
## [1] 1300.925 1510.075