download.file("http://www.openintro.org/stat/data/ames.RData", destfile = "ames.RData")
load("ames.RData")
population <- ames$Gr.Liv.Area
samp <- sample(population, 60)

Exercise 1 -

Describe the distribution of your sample. What would you say is the “typical” size within your sample? Also state precisely what you interpreted "typical to mean.

hist(samp, breaks = 15)

The distribution for this sample is abnormal with a little right skewedness. The “typical” size within my sample is around 1800. For this example, I interpreted typical to mean “most frequent.”

Exercise 2 -

Would you expect another student’s distribution to be identical to yours? Would you expect it to be similar? Why or why not?

I wouldn’t expect another student’s distribution to be identical to mine because we would have generated different numbers creating different summary statistics, however, I would expect the distributions to be similar because all of the sample summary statistics should be similar since they were all taken from the same population.

sample_mean <- mean(samp)

se <- sd(samp) / sqrt(60)
lower <- sample_mean - 1.96 * se
upper <- sample_mean + 1.96 * se
c(lower,upper)
## [1] 1285.240 1501.093

Exercise 3 -

For the confidence interval to be valid, the sample mean must be normally distributed and have standard error s/sqrt(n). What conditions must be met for this to be true?

- Random Sampling

- Independent Observations

- If n is small, the population must be basically normally distributed. If n is 15 < n < 30, the population distribution should not have any strong skewing or outliers. If n is large, the population may have some skewing or outliers

Exercise 4 -

What does "95% confidence mean?

95% confidence means that you are 95% confident the true population mean or proportion is between two values.

mean(population)
## [1] 1499.69

Exercise 5 -

Does your confidence interval capture the true average size of houses in Ames?

Yes. My CI was between 1369.38 and 1590.11.

Exercise 6 -

Each student in your class should have gotten a slightly different confidence interval. What proportion of those intervals would you expect to capture the true population mean?

samp_mean <- rep(NA, 50)
samp_sd <- rep(NA,50)
n <- 60

for(i in 1:50) {
  samp <- sample(population, n)
  samp_mean[i] <- mean(samp)
  samp_sd[i] <- sd(samp)
}

lower_vector <- samp_mean - 1.96 * samp_sd / sqrt(n)
upper_vector <- samp_mean + 1.96 * samp_sd / sqrt(n)

c(lower_vector[1],upper_vector[1])
## [1] 1396.999 1731.067
c(lower_vector[2],upper_vector[2])
## [1] 1250.381 1573.719

Since we were working with 95% CIs, I would expect that 95% of the intervals captured the true mean.

On Your Own

1. Using the following function, plot all intervals. What proportion of your confidence intervals include the true population mean? Is this proportion exactly equal to the confidence level? If not, explain why.

plot_ci(lower_vector,upper_vector,mean(population))

p <- 49/50
p
## [1] 0.98

49 out of 50 samples included the true populatoin mean, giving me a proportion of 0.98. This proportion is higher than our confidence level which is ok because our CI isn’t meant to be exact (we’re not using a confidence level of 100%), rather tell us our CI is 95% likely to contain our true value.

2. Pick a confidence level of your choosing, provided it is not 95%. What is the appropriate critical value?

Confidence level = 85%

z <- qnorm(.925)
z
## [1] 1.439531

Critical Value = 1.44

3. Calculate 50 confidence intervals at the confidence level you chose in the previous question. You do not need to obtain new samples, simply calculate new intervals based on the sample means and standard deviations you have already collected. Using the plot_ci function, plot all intervals and calculate the proportion of intervals that include the true population mean. How does this percentage compare to the confidence level selected for the intervals?

lower_vector2 <- samp_mean - 1.44 * samp_sd / sqrt(n)
upper_vector2 <- samp_mean + 1.44 * samp_sd / sqrt(n)

plot_ci(lower_vector2,upper_vector2,mean(population))

p2 <- 44/50
p2
## [1] 0.88

At an 85% CI, 88% of the samples contained the true population mean. This percentage is again higher than my CI, but not by a lot which is still ok because our CI is not an exact measure.