load("more/ames.RData")
set.seed(13)
population <- ames$Gr.Liv.Area
samp <- sample(population, 60)
hist(samp,breaks=10)
The distribution of the sample is skewed right, bimodal, and the ‘typical’ size is about 1250. I interpreted ‘typical’ to mean the mode.
I would not expect another distribution to be identical due to the variation between each sample
The observations must be independant, ideally over 30 observations, and the population distribution not skewed
“95% Confidence” means that the population mean will be within the confidence interval of the point estimate 95% of the time
sample_mean <- mean(samp)
se <- sd(samp) / sqrt(60)
lower <- sample_mean - 1.96 * se
upper <- sample_mean + 1.96 * se
c(lower, upper)
## [1] 1288.893 1526.940
## [1] 1499.69
In this case the confidence interval does capture the true average.
We would expect 95% percent of the confidence intervals to capture the true population mean because the interval is based on the value bondaries related to that probability’s z-score.
set.seed(868)
samp_mean <- rep(NA, 50)
samp_sd <- rep(NA, 50)
n <- 60
for(i in 1:50){
samp <- sample(population, n) # obtain a sample of size n = 60 from the population
samp_mean[i] <- mean(samp) # save sample mean in ith element of samp_mean
samp_sd[i] <- sd(samp) # save sample sd in ith element of samp_sd
}
lower_vector <- samp_mean - 1.96 * samp_sd / sqrt(n)
upper_vector <- samp_mean + 1.96 * samp_sd / sqrt(n)
Using the following function (which was downloaded with the data set), plot all intervals. What proportion of your confidence intervals include the true population mean? Is this proportion exactly equal to the confidence level? If not, explain why.
The proportion of confidence intervals that include the true mean are not excactly equal to the confidence level. This is due to chance and as the number of samples increases the percentage does approach the exact confidence level.
For a confidence level of 99%, the critical value is 2.58
plot_ci
function, plot all intervals and calculate the proportion of intervals that include the true population mean. How does this percentage compare to the confidence level selected for the intervals?With a confidence interval of 99%, in this example, all of the intervals include the true population mean. This is in line with what we would expect with the 99% confidence level, and as the number of samples increases we would see the proportion to get close to 99%
lower_vector <- samp_mean - 2.58 * samp_sd / sqrt(n)
upper_vector <- samp_mean + 2.58 * samp_sd / sqrt(n)
plot_ci(lower_vector, upper_vector, mean(population))