This is an RMarkdown document displaying R code for generating and visualizing multiple simulated confidence intervals. This was created with the intention of demonstrating confidence intervals that differ by confidence level and sample size. Confidence intervals are color-coated to visualize whether or not the interval contains the true parameter value.
Sample means are simulated from an appropriate normal distribution. It is assumed that sample means originate from a normal distribution with a mean of $50,000.
The first block of code accomplishes the following:
library(dplyr)
library(ggplot2)
library(gridExtra)
meanset <- rnorm(50,50000,1000)
meanset <- as.data.frame(meanset)
colnames(meanset) <- "Mean"
Upper and lower boundaries are calculated to reflect 90%, 95%, and 99% confidence intervals. Additionally, a variable to represent the specific sample is created. This is used to create data frames for each of the three confidence level samples.
meanset90 <- meanset %>% mutate(upper = Mean + 1677) %>% mutate(lower = Mean - 1677)
meanset95 <- meanset %>% mutate(upper = Mean + 2010) %>% mutate(lower = Mean - 2010)
meanset99 <- meanset %>% mutate(upper = Mean + 2680) %>% mutate(lower = Mean - 2680)
Sample <- seq(1,50,1)
Sample <- as.data.frame(Sample)
ci90 <- cbind(Sample, meanset90)
ci95 <- cbind(Sample, meanset95)
ci99 <- cbind(Sample, meanset99)
For each confidence intervals, an indicator variable is created to track whether a confidence interval contains the true parameter value (mean income of $50,000) or not.
ci95 <- ci95 %>% mutate(Capture = ifelse(lower < 50000, ifelse(upper > 50000, 1, 0), 0))
ci90 <- ci90 %>% mutate(Capture = ifelse(lower < 50000, ifelse(upper > 50000, 1, 0), 0))
ci99 <- ci99 %>% mutate(Capture = ifelse(lower < 50000, ifelse(upper > 50000, 1, 0), 0))
ci90$Capture <- factor(ci90$Capture, levels = c(0,1))
ci95$Capture <- factor(ci95$Capture, levels = c(0,1))
ci99$Capture <- factor(ci99$Capture, levels = c(0,1))
head(ci90)
## Sample Mean upper lower Capture
## 1 1 51615.75 53292.75 49938.75 1
## 2 2 51196.33 52873.33 49519.33 1
## 3 3 50354.89 52031.89 48677.89 1
## 4 4 49284.53 50961.53 47607.53 1
## 5 5 50678.65 52355.65 49001.65 1
## 6 6 49779.49 51456.49 48102.49 1
Error bar plots are created for each of the three confidence interval scenarios. The point estimate is indicated by a point, and the interval estimate is shown as error bands. A dashed vertical line is displayed to represent the true parameter value. Red confidence intervals indicate a confidence interval that fails to capture the true parameter value. The result is a set of plots that clearly demonstrate the variation in long-term performance when varying confidence level.
colorset = c('0'='red','1'='black')
ci_plot_90 <- ci90 %>% ggplot(aes(x = Sample, y = Mean)) + geom_point(aes(color = Capture)) + geom_errorbar(aes(ymin = lower, ymax = upper, color = Capture)) + scale_color_manual(values = colorset) + coord_flip() + geom_hline(yintercept = 50000, linetype = "dashed", color = "blue") + labs(title = "90% Confidence Intervals") + theme(plot.title = element_text(hjust = 0.5)) + ylim(40000,60000)
ci_plot_95 <- ci95 %>% ggplot(aes(x = Sample, y = Mean)) + geom_point(aes(color = Capture)) + geom_errorbar(aes(ymin = lower, ymax = upper, color = Capture)) + scale_color_manual(values = colorset) + coord_flip() + geom_hline(yintercept = 50000, linetype = "dashed", color = "blue") + labs(title = "95% Confidence Intervals") + theme(plot.title = element_text(hjust = 0.5)) + ylim(40000,60000)
ci_plot_99 <- ci99 %>% ggplot(aes(x = Sample, y = Mean)) + geom_point(aes(color = Capture)) + geom_errorbar(aes(ymin = lower, ymax = upper, color = Capture)) + scale_color_manual(values = colorset) + coord_flip() + geom_hline(yintercept = 50000, linetype = "dashed", color = "blue") + labs(title = "99% Confidence Intervals") + theme(plot.title = element_text(hjust = 0.5)) + ylim(40000,60000)
grid.arrange(ci_plot_90, ci_plot_95, ci_plot_99, ncol = 3)
The same simulation and data-related steps are repeated, but this time varying sample size to create 3 different confidence interval scenarios.
meanset_9 <- rnorm(50,50000,1333)
meanset_9 <- as.data.frame(meanset_9)
colnames(meanset_9) <- "Mean"
meanset_25 <- rnorm(50,50000,1000)
meanset_25 <- as.data.frame(meanset_25)
colnames(meanset_25) <- "Mean"
meanset_100 <- rnorm(50,50000,500)
meanset_100 <- as.data.frame(meanset_100)
colnames(meanset_100) <- "Mean"
meanset_9 <- meanset_9 %>% mutate(upper = Mean + 3075) %>% mutate(lower = Mean - 3075)
meanset_25 <- meanset_25 %>% mutate(upper = Mean + 2064) %>% mutate(lower = Mean - 2064)
meanset_100 <- meanset_100 %>% mutate(upper = Mean + 992) %>% mutate(lower = Mean - 992)
Sample <- seq(1,50,1)
Sample <- as.data.frame(Sample)
n9 <- cbind(Sample, meanset_9)
n25 <- cbind(Sample, meanset_25)
n100 <- cbind(Sample, meanset_100)
n9 <- n9 %>% mutate(Capture = ifelse(lower < 50000, ifelse(upper > 50000, 1, 0), 0))
n25 <- n25 %>% mutate(Capture = ifelse(lower < 50000, ifelse(upper > 50000, 1, 0), 0))
n100 <- n100 %>% mutate(Capture = ifelse(lower < 50000, ifelse(upper > 50000, 1, 0), 0))
n9$Capture <- factor(n9$Capture, levels=c(0,1))
n25$Capture <- factor(n25$Capture, levels=c(0,1))
n100$Capture <- factor(n100$Capture, levels=c(0,1))
head(n25)
## Sample Mean upper lower Capture
## 1 1 51174.91 53238.91 49110.91 1
## 2 2 50069.86 52133.86 48005.86 1
## 3 3 50928.97 52992.97 48864.97 1
## 4 4 50483.16 52547.16 48419.16 1
## 5 5 49128.56 51192.56 47064.56 1
## 6 6 51325.77 53389.77 49261.77 1
The same formatted plots are generated for this set of simulated confidence intervals. The result is a set of plots that clearly demonstrate the variation in long-term performance when varying sample sizes.
colorset = c('0'='red','1'='black')
n_plot_9 <- n9 %>% ggplot(aes(x = Sample, y = Mean)) + geom_point(aes(color = Capture)) + geom_errorbar(aes(ymin = lower, ymax = upper, color = Capture)) + scale_color_manual(values = colorset) + coord_flip() + geom_hline(yintercept = 50000, linetype = "dashed", color = "blue") + labs(title = "Confidence Intervals, n = 9") + theme(plot.title = element_text(hjust = 0.5)) + ylim(40000,60000)
n_plot_25 <- n25 %>% ggplot(aes(x = Sample, y = Mean)) + geom_point(aes(color = Capture)) + geom_errorbar(aes(ymin = lower, ymax = upper, color = Capture)) + scale_color_manual(values = colorset) + coord_flip() + geom_hline(yintercept = 50000, linetype = "dashed", color = "blue") + labs(title = "Confidence Intervals, n = 25") + theme(plot.title = element_text(hjust = 0.5)) + ylim(40000,60000)
n_plot_100 <- n100 %>% ggplot(aes(x = Sample, y = Mean)) + geom_point(aes(color = Capture)) + geom_errorbar(aes(ymin = lower, ymax = upper, color = Capture)) + scale_color_manual(values = colorset) + coord_flip() + geom_hline(yintercept = 50000, linetype = "dashed", color = "blue") + labs(title = "Confidence Intervals, n = 100") + theme(plot.title = element_text(hjust = 0.5)) + ylim(40000,60000)
grid.arrange(n_plot_9, n_plot_25, n_plot_100, ncol = 3)
```