## mean_area
## 1 10069.13
It appears from the plot that the distribution is unimodal and roughly normal, with a slight right skew.
The typical size of homes in this sample is 10,069.13 square feet. I interpreted “typical” to mean the average size of the homes in the sample, or more specifically, the mean.
If other classmates have set the same seed, then yes, we should expect the same results. If we are all pulling random samples, however, there should be some differences.
## [1] 1.959964
Now we can calculate the confidence interval
samp %>%
summarise(lower = mean(Lot.Area) - z_star_95 * (sd(Lot.Area) / sqrt(n)),
mean = mean(Lot.Area),
upper = mean(Lot.Area) + z_star_95 * (sd(Lot.Area) / sqrt(n)))## lower mean upper
## 1 8390.28 10069.13 11747.99
We are 95% confident that the true average size of houses in Ames lies between the values 8390.28 and 11747.99.
For this to be true, sampled observations need to be independent. Independence is more likely if random samplng is used and, if sampling without replacement, the sample size should be less than 10% of the population. The popluation distribution should either be normal or n>30 and the population distribution is not extremely skewed.
This refers to the long term success rate of this method, so it means that 95% of the confidence intervals produced will successfully capture the population parameter of interest, in this case, the mean Lot Area of homes in Ames.
Looking at the population
## mean_area
## 1 10147.92
The true average size of houses in Ames is 10147.92. My confidence interval has a lower value of 8390.28 and an upper value of 11747.99, so it does capture the true average size.
Because the whole class did not set the same seed before drawing their sample, everyone constructed their confidence intervals off of different randomly selected samples. I would expect 95% of the constructed confidence intervals to capture the true population mean.
ci <- do(50) * ames %>%
sample_n(n) %>%
summarise(lower = mean(Lot.Area) - z_star_95 * (sd(Lot.Area) / sqrt(n)),
upper = mean(Lot.Area) + z_star_95 * (sd(Lot.Area) / sqrt(n)))## lower upper .row .index
## 1 8800.031 10805.569 1 1
## 2 9018.124 11819.843 1 2
## 3 8355.396 9887.704 1 3
## 4 8708.582 10808.218 1 4
## 5 8964.277 10923.823 1 5
ci_data <- data.frame(ci_id = c(1:50, 1:50),
ci_bounds = c(ci$lower, ci$upper),
capture_mu = c(ci$capture_mu, ci$capture_mu))ggplot(ci_data, aes(x = ci_bounds, y = ci_id,
group = ci_id, color = capture_mu)) +
geom_point(size = 2) + # add points at the ends, size = 2
geom_line() + # connect with lines
geom_vline(xintercept = params$mu, color = "darkgray") # draw vertical lineFour confidence intervals out of 50 did not capture the true population mean, which means 0.92 of my confidence intervals did include the true population mean. This is not exactly equal to the confidence level because the confidence level is an estimate and not a perfect predictor. The plot shows that the majority of confidence intervals did capture the population mean.
I am choosing a 99% confidence level and using a two-tailed test to find the critical value.
Our critical value is 2.58.
ci99 <- do(50) * ames %>%
sample_n(n) %>%
summarise(lower = mean(Lot.Area) - z_star_99 * (sd(Lot.Area) / sqrt(n)),
upper = mean(Lot.Area) + z_star_99 * (sd(Lot.Area) / sqrt(n)))## lower upper .row .index
## 1 8238.424 10693.34 1 1
## 2 8830.868 10850.16 1 2
## 3 7868.305 11983.06 1 3
## 4 6954.670 16566.93 1 4
## 5 7844.596 10547.04 1 5
ci99_data <- data.frame(ci_id = c(1:50, 1:50),
ci_bounds = c(ci99$lower, ci99$upper),
capture_mu = c(ci99$capture_mu, ci99$capture_mu))ggplot(ci99_data, aes(x = ci_bounds, y = ci_id,
group = ci_id, color = capture_mu)) +
geom_point(size = 2) + # add points at the ends, size = 2
geom_line() + # connect with lines
geom_vline(xintercept = params$mu, color = "darkgray") # draw vertical lineAll of the confidence intervals at the 99% confidence level captured the true population mean. 50 out of 50 is a 100% success rate, or a proportion of 1. This is very close to the 99% confidence level selected for the intervals.