Lab Week 7 - Confidence intervals

download.file("http://www.openintro.org/stat/data/ames.RData", destfile = "ames.RData")
load("ames.RData")
population <- ames$Gr.Liv.Area
samp <- sample(population, 60)
hist(samp)

summary(samp)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     630    1161    1568    1551    1767    3395

Exercise 1

I would say the typical size is from 1000 to 1500 square feet, the mean and median both being around 1500. Typical to mean means what there is the most of. If most houses in a certain neighborhood have the same floor plan then I would consider that house the “typical” house.

Exercise 2

I would not expect another students distibution to be identical but I would expect it to be relatively similar, with the mean and/ or median being between 1000-1500.

sample_mean <- mean(samp)
se <- sd(samp) / sqrt(60)
lower <- sample_mean - 1.96 * se
upper <- sample_mean + 1.96 * se
c(lower, upper)
## [1] 1411.307 1689.793

Exercise 3

Samples should be random, a sample size larger or equal to 30 and distribution not strongly skewed

Exercise 4

95% confidence level means we belive that 95% of the time that the truly “typical” size of the house was captured in those intervals

mean(population)
## [1] 1499.69

Exercise 5

Yes my confidence interval captures the true average size of the house in Ames. The interval being 1442.583-1679.084, and the mean being 1499.69.

Exercise 6

I would expect that 95% of the confidence intervals of my classmates would capture the true population mean.

samp_mean <- rep(NA, 50)
samp_sd <- rep(NA, 50)
n <- 60
for(i in 1:50){
  samp <- sample(population, n) # obtain a sample of size n = 60 from the population
  samp_mean[i] <- mean(samp)    # save sample mean in ith element of samp_mean
  samp_sd[i] <- sd(samp)        # save sample sd in ith element of samp_sd
}
lower_vector <- samp_mean - 1.96 * samp_sd / sqrt(n) 
upper_vector <- samp_mean + 1.96 * samp_sd / sqrt(n)
c(lower_vector[1], upper_vector[1])
## [1] 1324.579 1542.021

On your own

Number 1

plot_ci(lower_vector, upper_vector, mean(population))

55/60 * 100
## [1] 91.66667

Since 5 intervals did not capture the true mean, only 91.6% included the true population mean. This portion is not exactly equal to the confidence level because the confidence interval is only a range that may or may not capture the true mean. Most of the intervals infact captured the true mean.

Number 2

I decided to choose confidence level 85% and the critical value for this CI is 1.439. Not too sure about this critical value but it is what I got after some google search on how to find a critical value.

100-85
## [1] 15
.15/2
## [1] 0.075
1-.075
## [1] 0.925
qnorm(.925)
## [1] 1.439531

Number 3

lower_vector <- samp_mean - 1.44 * samp_sd / sqrt(n) 
upper_vector <- samp_mean + 1.44 * samp_sd / sqrt(n)
c(lower_vector[1], upper_vector[1])
## [1] 1353.423 1513.177
plot_ci(lower_vector, upper_vector, mean(population))

60-9
## [1] 51
51/60
## [1] 0.85

The confidence interval I chose was 85% since only 9 did not capture the true mean. This percentage perfectly reflected the confidence level selected for the intervals.