Simulated Confidence Intervals

Introduction

This is an RMarkdown document displaying R code for generating and visualizing multiple simulated confidence intervals. This was created with the intention of demonstrating confidence intervals that differ by confidence level and sample size. Confidence intervals are color-coated to visualize whether or not the interval contains the true parameter value.

Initial Fair Coin Simulation and Data Frame Preparation

Sample means are simulated from an appropriate normal distribution. It is assumed that sample means originate from a normal distribution with a mean of $50,000.

The first block of code accomplishes the following:

Load appropriate R packages.
Simulate a set of 50 sample means.

library(dplyr)
library(ggplot2)
library(gridExtra)

meanset <- rnorm(50,50000,1000)
meanset <- as.data.frame(meanset)
colnames(meanset) <- "Mean"

Creating Interval Boundaries Varying by Confidence Level

Upper and lower boundaries are calculated to reflect 90%, 95%, and 99% confidence intervals. Additionally, a variable to represent the specific sample is created. This is used to create data frames for each of the three confidence level samples.

meanset90 <- meanset %>% mutate(upper = Mean + 1677) %>% mutate(lower = Mean - 1677)
meanset95 <- meanset %>% mutate(upper = Mean + 2010) %>% mutate(lower = Mean - 2010)
meanset99 <- meanset %>% mutate(upper = Mean + 2680) %>% mutate(lower = Mean - 2680)

Sample <- seq(1,50,1)
Sample <- as.data.frame(Sample)

ci90 <- cbind(Sample, meanset90)
ci95 <- cbind(Sample, meanset95)
ci99 <- cbind(Sample, meanset99)

Determining Whether a Confidence Interval Captures the True Parameter Value

For each confidence intervals, an indicator variable is created to track whether a confidence interval contains the true parameter value (mean income of $50,000) or not.

ci95 <- ci95 %>% mutate(Capture = ifelse(lower < 50000, ifelse(upper > 50000, 1, 0), 0))
ci90 <- ci90 %>% mutate(Capture = ifelse(lower < 50000, ifelse(upper > 50000, 1, 0), 0))
ci99 <- ci99 %>% mutate(Capture = ifelse(lower < 50000, ifelse(upper > 50000, 1, 0), 0))
ci90$Capture <- factor(ci90$Capture, levels = c(0,1))
ci95$Capture <- factor(ci95$Capture, levels = c(0,1))
ci99$Capture <- factor(ci99$Capture, levels = c(0,1))

head(ci90)

##   Sample     Mean    upper    lower Capture
## 1      1 51615.75 53292.75 49938.75       1
## 2      2 51196.33 52873.33 49519.33       1
## 3      3 50354.89 52031.89 48677.89       1
## 4      4 49284.53 50961.53 47607.53       1
## 5      5 50678.65 52355.65 49001.65       1
## 6      6 49779.49 51456.49 48102.49       1

Generating Confidence Interval Plots in a Grid (with Varying Confidence Levels)

Error bar plots are created for each of the three confidence interval scenarios. The point estimate is indicated by a point, and the interval estimate is shown as error bands. A dashed vertical line is displayed to represent the true parameter value. Red confidence intervals indicate a confidence interval that fails to capture the true parameter value. The result is a set of plots that clearly demonstrate the variation in long-term performance when varying confidence level.

colorset = c('0'='red','1'='black')

ci_plot_90 <- ci90 %>% ggplot(aes(x = Sample, y = Mean)) + geom_point(aes(color = Capture)) + geom_errorbar(aes(ymin = lower, ymax = upper, color = Capture)) + scale_color_manual(values = colorset) + coord_flip() + geom_hline(yintercept = 50000, linetype = "dashed", color = "blue") + labs(title = "90% Confidence Intervals") + theme(plot.title = element_text(hjust = 0.5)) + ylim(40000,60000)

ci_plot_95 <- ci95 %>% ggplot(aes(x = Sample, y = Mean)) + geom_point(aes(color = Capture)) + geom_errorbar(aes(ymin = lower, ymax = upper, color = Capture)) + scale_color_manual(values = colorset) + coord_flip() + geom_hline(yintercept = 50000, linetype = "dashed", color = "blue") + labs(title = "95% Confidence Intervals") + theme(plot.title = element_text(hjust = 0.5)) + ylim(40000,60000)

ci_plot_99 <- ci99 %>% ggplot(aes(x = Sample, y = Mean)) + geom_point(aes(color = Capture)) + geom_errorbar(aes(ymin = lower, ymax = upper, color = Capture)) + scale_color_manual(values = colorset) + coord_flip() + geom_hline(yintercept = 50000, linetype = "dashed", color = "blue") + labs(title = "99% Confidence Intervals") + theme(plot.title = element_text(hjust = 0.5)) + ylim(40000,60000)

grid.arrange(ci_plot_90, ci_plot_95, ci_plot_99, ncol = 3)

Simulating Confidence Intervals for Different Sample Sizes

The same simulation and data-related steps are repeated, but this time varying sample size to create 3 different confidence interval scenarios.

meanset_9 <- rnorm(50,50000,1333)
meanset_9 <- as.data.frame(meanset_9)
colnames(meanset_9) <- "Mean"
meanset_25 <- rnorm(50,50000,1000)
meanset_25 <- as.data.frame(meanset_25)
colnames(meanset_25) <- "Mean"
meanset_100 <- rnorm(50,50000,500)
meanset_100 <- as.data.frame(meanset_100)
colnames(meanset_100) <- "Mean"

meanset_9 <- meanset_9 %>% mutate(upper = Mean + 3075) %>% mutate(lower = Mean - 3075)
meanset_25 <- meanset_25 %>% mutate(upper = Mean + 2064) %>% mutate(lower = Mean - 2064)
meanset_100 <- meanset_100 %>% mutate(upper = Mean + 992) %>% mutate(lower = Mean - 992)

Sample <- seq(1,50,1)
Sample <- as.data.frame(Sample)

n9 <- cbind(Sample, meanset_9)
n25 <- cbind(Sample, meanset_25)
n100 <- cbind(Sample, meanset_100)

n9 <- n9 %>% mutate(Capture = ifelse(lower < 50000, ifelse(upper > 50000, 1, 0), 0))
n25 <- n25 %>% mutate(Capture = ifelse(lower < 50000, ifelse(upper > 50000, 1, 0), 0))
n100 <- n100 %>% mutate(Capture = ifelse(lower < 50000, ifelse(upper > 50000, 1, 0), 0))
n9$Capture <- factor(n9$Capture, levels=c(0,1))
n25$Capture <- factor(n25$Capture, levels=c(0,1))
n100$Capture <- factor(n100$Capture, levels=c(0,1))

head(n25)

##   Sample     Mean    upper    lower Capture
## 1      1 51174.91 53238.91 49110.91       1
## 2      2 50069.86 52133.86 48005.86       1
## 3      3 50928.97 52992.97 48864.97       1
## 4      4 50483.16 52547.16 48419.16       1
## 5      5 49128.56 51192.56 47064.56       1
## 6      6 51325.77 53389.77 49261.77       1

Generating Confidence Interval Plots in a Grid (with Varying Sample Sizes)

The same formatted plots are generated for this set of simulated confidence intervals. The result is a set of plots that clearly demonstrate the variation in long-term performance when varying sample sizes.

colorset = c('0'='red','1'='black')

n_plot_9 <- n9 %>% ggplot(aes(x = Sample, y = Mean)) + geom_point(aes(color = Capture)) + geom_errorbar(aes(ymin = lower, ymax = upper, color = Capture)) + scale_color_manual(values = colorset) + coord_flip() + geom_hline(yintercept = 50000, linetype = "dashed", color = "blue") + labs(title = "Confidence Intervals, n = 9") + theme(plot.title = element_text(hjust = 0.5)) + ylim(40000,60000)

n_plot_25 <- n25 %>% ggplot(aes(x = Sample, y = Mean)) + geom_point(aes(color = Capture)) + geom_errorbar(aes(ymin = lower, ymax = upper, color = Capture)) + scale_color_manual(values = colorset) + coord_flip() + geom_hline(yintercept = 50000, linetype = "dashed", color = "blue") + labs(title = "Confidence Intervals, n = 25") + theme(plot.title = element_text(hjust = 0.5)) + ylim(40000,60000)

n_plot_100 <- n100 %>% ggplot(aes(x = Sample, y = Mean)) + geom_point(aes(color = Capture)) + geom_errorbar(aes(ymin = lower, ymax = upper, color = Capture)) + scale_color_manual(values = colorset) + coord_flip() + geom_hline(yintercept = 50000, linetype = "dashed", color = "blue") + labs(title = "Confidence Intervals, n = 100") + theme(plot.title = element_text(hjust = 0.5)) + ylim(40000,60000)

grid.arrange(n_plot_9, n_plot_25, n_plot_100, ncol = 3)

```