download.file("http://www.openintro.org/stat/data/ames.RData", destfile = "ames.RData")
load("ames.RData")
population <- ames$Gr.Liv.Area
samp <- sample(population, 60)
summary(samp)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 792 1115 1482 1472 1803 2599
hist(samp)
My sample is unimodal and rightward skewed. A majority of the data, or the typical size, is between 1000 and 1500. I interpret typical to mean which size house is most common (the mode).
I would not expect another student’s distribution to be identical to mine. This is beacuse we are using a small sample size of only 60 out of a large population. While there is a chance another student’s distribution may be similar in terms of being rightward skewed, it is very unlikely it would look indentical.
sample_mean <- mean(samp)
se <- sd(samp) / sqrt(60)
lower <- sample_mean - 1.96 * se
upper <- sample_mean + 1.96 * se
c(lower, upper)
## [1] 1366.089 1578.611
The data must be randomly sampled, all observations must be independent, and the data should be normally distributed.
A 95% confidence is a sense of precision of the mean. It suggests that we are 95% confident that the true population mean is between our two calculated values.
mean(population)
## [1] 1499.69
Yes, my confidence levels are (1344.969, 1614.765). This includes the true population mean of 1299.69
I would expect 95% of the class to have confidence intervals that included the true population mean. This is because each of us have calculated a 95% confidence interval on our own random identically sized samples.
samp_mean <- rep(NA, 50)
samp_sd <- rep(NA, 50)
n <- 60
for(i in 1:50){
samp <- sample(population, n) # obtain a sample of size n = 60 from the population
samp_mean[i] <- mean(samp) # save sample mean in ith element of samp_mean
samp_sd[i] <- sd(samp) # save sample sd in ith element of samp_sd
}
lower_vector <- samp_mean - 1.96 * samp_sd / sqrt(n)
upper_vector <- samp_mean + 1.96 * samp_sd / sqrt(n)
c(lower_vector[1], upper_vector[1])
## [1] 1277.675 1529.125
plot_ci(lower_vector, upper_vector, mean(population))
47/50 of my confidence intervals include the true population mean. This is lower than the confidence level. A 95% confidence means that at least 95% of the sample intervals will include the true mean, in my situation the number was lower at 94%.
For a 99% confidence level the critical value for this confidence interval is +/- 2.58 * Standard Error
lower_vector <- samp_mean - 2.58 * samp_sd / sqrt(n)
upper_vector <- samp_mean + 2.58 * samp_sd / sqrt(n)
plot_ci(lower_vector, upper_vector, mean(population))
For me, this has a higher value as there is 100% inclusion of the population mean in each confidence interval.