Here I will try to use R to generate a plot that demonstrates how confidence intervals work. First, I will initialize n as the sample size and N as the number of trials.
alpha <- 0.05
n <- 10
N <- 20
To keep track of everything, I will form a data frame.
thisDF <- data.frame(seq(1:N))
#colnames(thisDF) <- "Interval"
thisDF$sizes <- rep(n,N)
Next, I will create a function in R that takes in the sample size n, produces n random numbers from a uniform distribution, and then outputs the sample mean and sample standard deviation.
sampleStats <- function(n){
this_sample <- runif(n)
xbar <- mean(this_sample)
s <- sd(this_sample)
c(xbar,s)
#list(xbar=xbar, s=s)
}
Now we can use sapply to run that function N times.
thisDF <- cbind(thisDF, t(sapply(thisDF$sizes, sampleStats)))
colnames(thisDF) <- c("intervalID", "size", "xbar", "s")
Now that we have N trials of n size, along with their sample statistics, we can finally form the confidence intervals.
thisDF$left <- with(thisDF, xbar - qt(1-alpha, df = n-1)*s/sqrt(n))
thisDF$right <- with(thisDF, xbar + qt(1-alpha, df = n-1)*s/sqrt(n))
head(thisDF)
## intervalID size xbar s left right
## 1 1 10 0.5371323 0.2103760 0.4151812 0.6590833
## 2 2 10 0.5663907 0.2952669 0.3952301 0.7375514
## 3 3 10 0.3451050 0.1865836 0.2369460 0.4532641
## 4 4 10 0.4922720 0.3308136 0.3005056 0.6840384
## 5 5 10 0.5549684 0.3213257 0.3687019 0.7412348
## 6 6 10 0.4269955 0.2620609 0.2750837 0.5789073
Finally, we can plot all of the confidence intervals.
plot(seq(0:N), seq(0,1,1/N), col = "white",
main = "95% Confidence Intervals", xlab = "interval", ylab = "values from U(0,1) distribution")
points(thisDF$intervalID, thisDF$right, col = "blue", pch = 24)
points(thisDF$intervalID, thisDF$left, col = "blue", pch = 25)
segments(thisDF$intervalID, thisDF$left, thisDF$intervalID, thisDF$right, col = "black")
abline(h = 0.5, col = "red", lty = 3)
legend("topright", c("confidence interval", "true mean: mu = 0.5"), lty = c(1,3),
col = c("black", "red"))
Hopefully we can see that about 95% of the confidence intervals in the above plot contain the true mean (we can rerun the code to generate a new picture).