Data 606 - Lab 4b

Questions to answer:

1. Describe the distribution of your sample. What would you say is the “typical” size within your sample? Also state precisely what you interpreted “typical” to mean.

Very right-skewed, almost bi-modal. Mean of sampe is 1477, this is around what a typical value seems to be (or slightly less, given skew… median is 1372).

2. Would you expect another student’s distribution to be identical to yours? Would you expect it to be similar? Why or why not?

Not exactly identical, but having a similar mean, and likely also right-skewed given sample size.

3. For the confidence interval to be valid, the sample mean must be normally distributed and have standard error \(s / \sqrt{n}\). What conditions must be met for this to be true?

observations must be independent.
must be sampling > 30 for each sample
underlying population must be roughly normally distributed.

4. What does “95% confidence” mean? If you’re not sure, see Section 4.2.2.

We expect that 95% of these confidence intervals will include the true mean of the population.

5. Does your confidence interval capture the true average size of houses in Ames?

Yes

6. Each student in your class should have gotten a slightly different confidence interval. What proportion of those intervals would you expect to capture the true population mean? Why?

95% … because we expect the sampling distribution to be approximately normal, and our confidence intervals use the sample data to determine the 95% confidence interval based on this assumption, using the sample data mean and std dev. as stand-ins for the population parameters.

On Your Own:

samp_mean <- rep(NA, 50)
samp_sd <- rep(NA, 50)
n <- 60

for(i in 1:50){
  samp <- sample(population, n) # obtain a sample of size n = 60 from the population
  samp_mean[i] <- mean(samp)    # save sample mean in ith element of samp_mean
  samp_sd[i] <- sd(samp)        # save sample sd in ith element of samp_sd
}

lower_vector <- samp_mean - 1.96 * samp_sd / sqrt(n) 
upper_vector <- samp_mean + 1.96 * samp_sd / sqrt(n)

Using the following function (which was downloaded with the data set), plot all intervals. What proportion of your confidence intervals include the true population mean? Is this proportion exactly equal to the confidence level? If not, explain why.
```
plot_ci(lower_vector, upper_vector, mean(population))
```

2/50

## [1] 0.04

96% of the CI’s include the mean… which is pretty close to the confidence interval. The confidence intervals are just an estimate, not a parameter of the population, so we do expect some variation based on the randomness inherent in the samples.

Pick a confidence level of your choosing, provided it is not 95%. What is the appropriate critical value? 90% –> 1.65

lower_vector <- samp_mean - 1.65 * samp_sd / sqrt(n) 
upper_vector <- samp_mean + 1.65 * samp_sd / sqrt(n)

plot_ci(lower_vector, upper_vector, mean(population))

7/50

## [1] 0.14

Calculate 50 confidence intervals at the confidence level you chose in the previous question. You do not need to obtain new samples, simply calculate new intervals based on the sample means and standard deviations you have already collected. Using the plot_ci function, plot all intervals and calculate the proportion of intervals that include the true population mean. How does this percentage compare to the confidence level selected for the intervals?

This confidence interval is lower (90%), so we are less confident our sample CI’s will include the true population mean, and accordingly, we see several more CI’s that do not include mu, as expected.