Confidence Intervals

Here I will try to use R to generate a plot that demonstrates how confidence intervals work. First, I will initialize n as the sample size and N as the number of trials.

alpha <- 0.05
n <- 10
N <- 20

To keep track of everything, I will form a data frame.

thisDF <- data.frame(seq(1:N))
#colnames(thisDF) <- "Interval"
thisDF$sizes <- rep(n,N)

Next, I will create a function in R that takes in the sample size n, produces n random numbers from a uniform distribution, and then outputs the sample mean and sample standard deviation.

sampleStats <- function(n){
  this_sample <- runif(n)
  xbar <- mean(this_sample)
  s <- sd(this_sample)
  c(xbar,s)
  #list(xbar=xbar, s=s)
}

Now we can use sapply to run that function N times.

thisDF <- cbind(thisDF, t(sapply(thisDF$sizes, sampleStats)))
colnames(thisDF) <- c("intervalID", "size", "xbar", "s")

Now that we have N trials of n size, along with their sample statistics, we can finally form the confidence intervals.

thisDF$left  <- with(thisDF, xbar - qt(1-alpha, df = n-1)*s/sqrt(n))
thisDF$right <- with(thisDF, xbar + qt(1-alpha, df = n-1)*s/sqrt(n))
head(thisDF)

##   intervalID size      xbar         s      left     right
## 1          1   10 0.5371323 0.2103760 0.4151812 0.6590833
## 2          2   10 0.5663907 0.2952669 0.3952301 0.7375514
## 3          3   10 0.3451050 0.1865836 0.2369460 0.4532641
## 4          4   10 0.4922720 0.3308136 0.3005056 0.6840384
## 5          5   10 0.5549684 0.3213257 0.3687019 0.7412348
## 6          6   10 0.4269955 0.2620609 0.2750837 0.5789073

Finally, we can plot all of the confidence intervals.

plot(seq(0:N), seq(0,1,1/N), col = "white", 
     main = "95% Confidence Intervals", xlab = "interval", ylab = "values from U(0,1) distribution")
points(thisDF$intervalID, thisDF$right, col = "blue", pch = 24)
points(thisDF$intervalID, thisDF$left, col = "blue", pch = 25)
segments(thisDF$intervalID, thisDF$left, thisDF$intervalID, thisDF$right, col = "black")
abline(h = 0.5, col = "red", lty = 3)
legend("topright", c("confidence interval", "true mean: mu = 0.5"), lty = c(1,3),
       col = c("black", "red"))

Hopefully we can see that about 95% of the confidence intervals in the above plot contain the true mean (we can rerun the code to generate a new picture).

Confidence Intervals

Derek Sollberger

Wednesday, July 29, 2015