A college counselor is interested in estimating how many credits a student typically enrolls in each semester. The counselor decides to randomly sample 100 students by using the registrar’s database of students. The histogram below shows the distribution of the number of credits taken by these students. Sample statistics for this distribution are also provided.
min <- 8
q1 <- 13
median <- 14
mean <- 13.65
sd <- 1.91
q3 <- 15
max <- 18
set.seed(12345)
simulation <- rnorm(100, mean = mean, sd = sd)
hist(simulation, main = "Number of credits", breaks = 10)
The point estimate would be the mean and that is equal to 13.65, whereas the median is 14.
The sd is 1.91, whereas the IQR is equal to q3 - q1 = (15 - 13) = 2.
# simulation
set.seed(12345)
plot(density(x = rnorm(10000, mean, sd)),
main = "simulation of number of college credits per semester",
xlab = "college credits")
abline(v = mean, col = "black", lty = 2) # mean
abline(v = 16, col = "blue", lty = 2) # 16-credit
abline(v = 18, col = "red", lty = 2) # 18-credit
# 16-credit
print(paste("percent of area on the right of 16 credits (beyond blue line) is approximately ",
round(100 * pnorm(16, mean, sd, lower.tail = F), 1),
"%",
sep = ""))
## [1] "percent of area on the right of 16 credits (beyond blue line) is approximately 10.9%"
# 18-credit
print(paste("percent of area on the right of 18 credits (beyond red line) is approximately ",
round(100 * pnorm(18, mean, sd, lower.tail = F), 1),
"%",
sep = ""))
## [1] "percent of area on the right of 18 credits (beyond red line) is approximately 1.1%"
margin_of_error <- 1.96 * (sd / sqrt(100))
confidence_intervals <- round( c(mean - margin_of_error,
mean + margin_of_error), 2 )
confidence_intervals
## [1] 13.28 14.02
She should not be too surprised. There’s data variation within each sampling; however, the true mean should be covered within range suggested by the confidence intervals. In above case, we set up the alpha level to be 5% (meaning the chance of making a type 1 error), we are 95% confident that the mean would be included between 13.28 and 14.02. Therefore, her number is still within range.
# 95% confidence intervals
margin_of_error <- 1.96 * (sd / sqrt(100))
confidence_intervals <- round( c(mean - margin_of_error,
mean + margin_of_error), 2 )
confidence_intervals
## [1] 13.28 14.02
# 99% confidence intervals
margin_of_error99 <- 2.58 * (sd / sqrt(100))
confidence_intervals99 <- round( c(mean - margin_of_error99,
mean + margin_of_error99), 2 )
confidence_intervals99
## [1] 13.16 14.14
We can quantify the variability of the mean by setting up a confidence interval. Above shows two examples of using 95% and 99% confidence intervals respectively.