load("ames.RData")
population <- ames$Gr.Liv.Area
samp <- sample(population, 60)
Ex 1, Describe the distribution of your sample. What would you say is the “typical” size within your sample? Also state precisely what you interpreted “typical” to mean.
summary(samp)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 438 1168 1332 1453 1675 3493
hist(samp)

hist(population)

Answer: The distribution of the sample is skewed right, bimodal, and the ‘typical’ size is about 1489. I interpreted ‘typical’ to mean the mode.
Ex 2, Would you expect another student’s distribution to be identical to yours? Would you expect it to be similar? Why or why not?
Answer: Since this is a random sample with n>30, I would expect a similar but different distribution.
sample_mean <- mean(samp)
se <- sd(samp) / sqrt(60)
lower <- sample_mean - 1.96 * se
upper <- sample_mean + 1.96 * se
c(lower, upper)
## [1] 1323.843 1582.023
Ex 3, For the confidence interval to be valid, the sample mean must be normally distributed and have standard error s/n?????????. What conditions must be met for this to be true?
Conditions are as follows: sample observations must be independent, sample size should be equal to or more than 30 and population distribution should not be strongly skewed.
Ex 4, What does “95% confidence” mean? If you’re not sure, see Section 4.2.2.
It means that we are 95% confident that the population mean is between the lower and upper boundary we determined from the sample mean (point estimate). In other words if we continue generating samples and calculating boundaries, 95% of those boundaries will contain the true population mean.
mean(population)
## [1] 1499.69
Ex 5, Does your confidence interval capture the true average size of houses in Ames? If you are working on this lab in a classroom, does your neighbor’s interval capture this value?
Confidence interval does capture the true poulation mean.
Ex 6, Each student in your class should have gotten a slightly different confidence interval. What proportion of those intervals would you expect to capture the true population mean? Why? If you are working in this lab in a classroom, collect data on the intervals created by other students in the class and calculate the proportion of intervals that capture the true population mean.
As described in answer 4, I would expect 95% of intervals to capture the true population mean. In normal distribution 95% of observations are within the 1.96 times of standard deviation (standard error).
samp_mean <- rep(NA, 50)
samp_sd <- rep(NA, 50)
n <- 60
for(i in 1:50){
samp <- sample(population, n) # obtain a sample of size n = 60 from the population
samp_mean[i] <- mean(samp) # save sample mean in ith element of samp_mean
samp_sd[i] <- sd(samp) # save sample sd in ith element of samp_sd
}
lower_vector <- samp_mean - 1.96 * samp_sd / sqrt(n)
upper_vector <- samp_mean + 1.96 * samp_sd / sqrt(n)
c(lower_vector[1], upper_vector[1])
## [1] 1339.345 1634.655
On Your Own
1, Using the following function (which was downloaded with the data set), plot all intervals. What proportion of your confidence intervals include the true population mean? Is this proportion exactly equal to the confidence level? If not, explain why.
plot_ci(lower_vector, upper_vector, mean(population))

pop_mean <- mean(population)
outside_intervals <- length(lower_vector[lower_vector > pop_mean]) + length(upper_vector[upper_vector < pop_mean])
Out of fifty confidence intervals, there are 3 intervals that do not include the true population mean. That is 6% of the intervals which is similar to the confidence level (exact value is not possible with 50 intervals since it will require 2.5 intervals to be outside of the population mean).
2,Pick a confidence level of your choosing, provided it is not 95%. What is the appropriate critical value?
pick a confidential level of 90%. The appropriate critical value is 1.6448536.
3, Calculate 50 confidence intervals at the confidence level you chose in the previous question. You do not need to obtain new samples, simply calculate new intervals based on the sample means and standard deviations you have already collected. Using the plot_ci function, plot all intervals and calculate the proportion of intervals that include the true population mean. How does this percentage compare to the confidence level selected for the intervals?
lower_vector90 <- samp_mean - qnorm(0.95) * samp_sd / sqrt(n)
upper_vector90 <- samp_mean + qnorm(0.95) * samp_sd / sqrt(n)
plot_ci(lower_vector90, upper_vector90, mean(population))
