DATA606_W4_LAB4b_MatheeshaThambeliyagodage

## Warning: package 'ggplot2' was built under R version 3.2.5

The data

download.file("http://www.openintro.org/stat/data/ames.RData", destfile = "ames.RData")
load("ames.RData")

n <- 60
population <- ames$Gr.Liv.Area
samp <- sample(population, 60)
summary(samp)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     708    1088    1378    1441    1664    2640

hist(population)

hist(samp, breaks = 10)

popMean <- mean(population)
popMean

## [1] 1499.69

sample_mean <- mean(samp)
se <- sd(samp) / sqrt(60)
lower <- sample_mean - 1.96 * se
upper <- sample_mean + 1.96 * se
c(lower, upper)

## [1] 1316.370 1566.063

mean(population)

## [1] 1499.69

samp_mean <- rep(NA, 50)
samp_sd <- rep(NA, 50)
n <- 60

Now we’re ready for the loop where we calculate the means and standard deviations of 50 random samples.

for(i in 1:50){
  samp <- sample(population, n) # obtain a sample of size n = 60 from the population
  samp_mean[i] <- mean(samp)    # save sample mean in ith element of samp_mean
  samp_sd[i] <- sd(samp)        # save sample sd in ith element of samp_sd
}

lower_vector <- samp_mean - 1.96 * samp_sd / sqrt(n) 
upper_vector <- samp_mean + 1.96 * samp_sd / sqrt(n)

c(lower_vector[1], upper_vector[1])

## [1] 1408.242 1694.391

On your own

1.Using the following function (which was downloaded with the data set), plot all intervals. What proportion of your confidence intervals include the true population mean? Is this proportion exactly equal to the confidence level? If not, explain why.

plot_ci(lower_vector, upper_vector, mean(population))

Pick a confidence level of your choosing, provided it is not 95%. What is the appropriate critical value?

Let’s pick 95%. For this, the critical value will be 2.58.
Calculate 50 confidence intervals at the confidence level you chose in the previous question. You do not need to obtain new samples, simply calculate new intervals based on the sample means and standard deviations you have already collected. Using the plot_ci function, plot all intervals and calculate the proportion of intervals that include the true population mean. How does this percentage compare to the confidence level selected for the intervals?

lower_vector95 <- samp_mean - 2.58 * samp_sd / sqrt(n) 
upper_vector95 <- samp_mean + 2.58 * samp_sd / sqrt(n)

plot_ci(lower_vector95, upper_vector95, mean(population))

capMean <- lower_vector95 < popMean && popMean < upper_vector95
sum(capMean) / length(capMean)

## [1] 1

In this case all of the intervals captured the true mean, which is just 1% off from the confidence interval.

DATA606_W4_LAB4b_MatheeshaThambeliyagodage

Matheesha Thambeliyagodage

May 1, 2017

The data

On your own