load("more/ames.RData")
population <- ames$Gr.Liv.Area
samp <- sample(population, 60)
summary(samp)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 816 1194 1542 1540 1713 2728
hist(samp)
sample_mean <- mean(samp)
se <- sd(samp) / sqrt(60)
lower <- sample_mean - 1.96 * se
upper <- sample_mean + 1.96 * se
c(lower, upper)
## [1] 1425.747 1653.953
mean(population)
## [1] 1499.69
samp_mean <- rep(NA, 50)
samp_sd <- rep(NA, 50)
n <- 60
for(i in 1:50){
samp <- sample(population, n) # obtain a sample of size n = 60 from the population
samp_mean[i] <- mean(samp) # save sample mean in ith element of samp_mean
samp_sd[i] <- sd(samp) # save sample sd in ith element of samp_sd
}
lower_vector <- samp_mean - 1.96 * samp_sd / sqrt(n)
upper_vector <- samp_mean + 1.96 * samp_sd / sqrt(n)
plot_ci(lower_vector, upper_vector, mean(population))
47/50 = .94 94% of the confidence intervals include the sample mean. This proportion is almost equal to the confidence level. When there are 50 samples, in the sampling distribution, you can either have 94% or 96% of the confidence intervals include the mean. The larger the number of samples in the sampling distribution, the closer you can achieve having 95% of the samples having the actual mean inside their confidence intervals.
I am choosing an 80% confidence interval. This means that 80% of the samples should have their mean inside the confidence interval.
Z score = 1.28
lower_vector <- samp_mean - 1.28 * samp_sd / sqrt(n)
upper_vector <- samp_mean + 1.28 * samp_sd / sqrt(n)
plot_ci(lower_vector, upper_vector, mean(population))
35/50 = 70% of samples have confidence intervals that include the population mean. This is not equal to 80% of the intervals. Taking more than 50 samples in the sampling distribution would help me achieve a more accurate confidence interval.