The Data

download.file("http://www.openintro.org/stat/data/ames.RData", destfile = "ames.RData")
load("ames.RData")

Downloading and loading the data set.

population <- ames$Gr.Liv.Area
samp <- sample(population, 60)

We only want to focus on the size of the houses ($Gr.Liv.Area), with a n=60 from that population.

Exercise 1

Describe the distribution of your sample. What would you say is the “typical” size within your sample? Also state precisely what you interpreted “typical” to mean.

The tyipical size would be the population mean.

Exercise 2

Would you expect another student’s distribution to be identical to yours? Would you expect it to be similar? Why or why not?

I would because there’s a really strong possibility that the sample mean would be near the population mean.

Confidence Intervals

sample_mean <- mean(samp)

The mean was calculated to describe the central value of distribution.

se <- sd(samp) / sqrt(60)
lower <- sample_mean - 1.96 * se
upper <- sample_mean + 1.96 * se
c(lower, upper)

## [1] 1358.668 1601.465

Calculating a 95% confidence interval for the sample mean by adding and subtracting 1.96 standard errors to the point estimate.

Exercise 3

For the confidence interval to be valid, the sample mean must be normally distributed and have standard error s/n‾√. What conditions must be met for this to be true?

“n” would be small - not signitficantly large.

Confidence Levels

Exercise 4

What does “95% confidence” mean?

It means that we are 95% confident that the mean of the true population lies within the confidence interval we’ve constructed using the sample properties.

mean(population)

## [1] 1499.69

The true population mean may be calculated since we have data on the entire population. This is a rare occurance.

Exercise 5

Does your confidence interval capture the true average size of houses in Ames? If you are working on this lab in a classroom, does your neighbor’s interval capture this value?

Yes, it does. My neighbor’s likely will as well because we are working with a 95% confidence interval.

Exercise 6

Each student in your class should have gotten a slightly different confidence interval. What proportion of those intervals would you expect to capture the true population mean? Why? If you are working in this lab in a classroom, collect data on the intervals created by other students in the class and calculate the proportion of intervals that capture the true population mean.

I would have expected 95% of the students in my class to obtain a CI which contained the true population mean because there’s only a 5% chance it won’t.

samp_mean <- rep(NA, 50)
samp_sd <- rep(NA, 50)
n <- 60

Empty vectors were created in order to save the means and standard deviations that would be calculated from each sample. The desired sample size is stored as “n”. The repetition will occur 50 times.

The following is an outline of LOOPS. They recreate many samples in order to learn about how sample means and confidence intervals vary from one sample to another: 1) Obtain a random sample. 2) Calculate and store the sample’s mean and standard deviation. 3) Repeat steps (1) and (2) 50 times. 4) Use these stored statistics to calculate many confidence intervals.

for(i in 1:50){
  samp <- sample(population, n) # obtain a sample of size n = 60 from the population
  samp_mean[i] <- mean(samp)    # save sample mean in ith element of samp_mean
  samp_sd[i] <- sd(samp)        # save sample sd in ith element of samp_sd
}

The Loop: where we calculate the means and standard deviations of 50 random samples

lower_vector <- samp_mean - 1.96 * samp_sd / sqrt(n) 
upper_vector <- samp_mean + 1.96 * samp_sd / sqrt(n)

The confidence intervals.

c(lower_vector[1], upper_vector[1])

## [1] 1373.153 1617.147

Viewing the first [1] interval of the 50 confidence interval

On My Own

Using the following function (which was downloaded with the data set), plot all intervals. What proportion of your confidence intervals include the true population mean? Is this proportion exactly equal to the confidence level? If not, explain why.

plot_ci(lower_vector, upper_vector, mean(population))

I would say that the plots were an accurate visualization of a 95% confidence interval. Of all the plots shown, there were only four intervals which did not include the population mean.

Pick a confidence level of your choosing, provided it is not 95%. What is the appropriate critical value?

Trouble

Calculate 50 confidence intervals at the confidence level you chose in the previous question. You do not need to obtain new samples, simply calculate new intervals based on the sample means and standard deviations you have already collected. Using the plot_ci function, plot all intervals and calculate the proportion of intervals that include the true population mean. How does this percentage compare to the confidence level selected for the intervals?

trouble

Foundations for Statistical Inference - Confidence intervals

Shanille Allo

10/21/2019