download.file("http://www.openintro.org/stat/data/ames.RData", destfile = "ames.RData")
load("ames.RData")

The Data

population <- ames$Gr.Liv.Area
samp <- sample(population, 60)


##This is looking at the distribution of the graph so that it can be analyzed
hist(samp, breaks= 6)

Exercise 1

Describe the distribution of your sample. What would you say is the “typical” size within your sample? Also state precisely what you interpreted “typical” to mean.

The distribution of the sample is skewed right and the typical size of the mean is between 1000 to 1500. Typical would mean that most of the samples from the population lie between 1000 and 1500.

Exercise 2

Would you expect another student’s distribution to be identical to yours? Would you expect it to be similar? Why or why not?

Another student’s distribution will not be identical to mine because the sample of 60 from the population will most likely be different from each other. They will be similar because it is still taking 60 from the sample of 2930 sizes.

Confidence Intervals

sample_mean <- mean(samp)
##This is the 95% interval and if it was a different CI it would be a different number instead of 1.96

se <- sd(samp) / sqrt(60)
lower <- sample_mean - 1.96 * se
upper <- sample_mean + 1.96 * se
c(lower, upper)
## [1] 1286.397 1502.370

Exercise 3

For the confidence interval to be valid, the sample mean must be normally distributed and have standard error s/√n. What conditions must be met for this to be true?

The samples taken have to be random, the observations have to be independent and the population has to be relatively normal.

Confidence Levels

Exercise 4

What does “95% confidence” mean? If you’re not sure, see Section 4.2.2.

95% confidence means that we are 95% confident that the true population or mean for the specific context of the problem is between this number and that number.

mean(population)
## [1] 1499.69

Exercise 5

Does your confidence interval capture the true average size of houses in Ames? If you are working on this lab in a classroom, does your neighbor’s interval capture this value?

The confidence interval does capture the true average size of houses in Ames because 1499.69 is in between 1414.865 and 1639.401.

Exercise 6

Each student in your class should have gotten a slightly different confidence interval. What proportion of those intervals would you expect to capture the true population mean? Why? If you are working in this lab in a classroom, collect data on the intervals created by other students in the class and calculate the proportion of intervals that capture the true population mean.

Since we are 95% confident I would think that 95% of the intervals will capture the population mean because we were only 95% confident on it. If it were a different confidence level it would be a different number entirely.

##this is for the sample size looking for the mean and sd
samp_mean <- rep(NA, 50)
samp_sd <- rep(NA, 50)
n <- 60
##this loops it into 50 different iterations of samples
for(i in 1:50){
  samp <- sample(population, n) # obtain a sample of size n = 60 from the population
  samp_mean[i] <- mean(samp)    # save sample mean in ith element of samp_mean
  samp_sd[i] <- sd(samp)        # save sample sd in ith element of samp_sd
}
##for confidence interval
lower_vector <- samp_mean - 1.96 * samp_sd / sqrt(n) 
upper_vector <- samp_mean + 1.96 * samp_sd / sqrt(n)
#outputs the interval
c(lower_vector[1], upper_vector[1])
## [1] 1259.033 1463.300

On you own

1. Using the following function (which was downloaded with the data set), plot all intervals. What proportion of your confidence intervals include the true population mean? Is this proportion exactly equal to the confidence level? If not, explain why.

plot_ci(lower_vector, upper_vector, mean(population))

#### It is 48/50 which is 96%. The proportion is not completely equal to the confidence level because the confidence interval is only a approximation.

2.Pick a confidence level of your choosing, provided it is not 95%. What is the appropriate critical value?

I am going to pick a confidence level of 90% and the appropriate critical value is +/- 1.645

3.Calculate 50 confidence intervals at the confidence level you chose in the previous question. You do not need to obtain new samples, simply calculate new intervals based on the sample means and standard deviations you have already collected. Using the plot_ci function, plot all intervals and calculate the proportion of intervals that include the true population mean. How does this percentage compare to the confidence level selected for the intervals?

The percentage of 88% or 44/50 is close to the 90% confidence interval. It will not be equal because it is only an approximation.

lower_vector90 <- samp_mean - 1.645 * samp_sd / sqrt(n) 
upper_vector90 <- samp_mean + 1.645 * samp_sd / sqrt(n)

plot_ci(lower_vector90, upper_vector90, mean(population))