We would like to run nsims of simulations and in each, we will generate two datasets with different means and same variance (or standard deviation). For each simulation, we will generate n random numbers (following a normal distribution).
nsims <- 100 # number of simulation
n <- 30 # number of points per simulation
In each iteration, we will run a t-test on the two generated sets and will obtain the 95% confidence interval (two numbers, low and high). Our result will thus be stored in a data.frame or a matrix with two columns (for low and high) and nsims number of rows.
final.result <- matrix(NA, ncol = 2, nrow = nsims)
head(final.result)
## [,1] [,2]
## [1,] NA NA
## [2,] NA NA
## [3,] NA NA
## [4,] NA NA
## [5,] NA NA
## [6,] NA NA
dim(final.result)
## [1] 100 2
The algorithm goes like this. For each iteration i of the simulation:
i-th row of final.resultfor (i in 1:nsims) {
first.sample <- rnorm(n, mean = 10, sd = 5)
second.sample <- rnorm(n, mean = 15, sd = 5)
# par(mfrow = c(1, 2))
# hist(first.sample)
# hist(second.sample)
result <- t.test(first.sample, second.sample)
final.result[i, ] <- result$conf.int
rm(result, first.sample, second.sample)
}
To reiterate, thein following chunk we use two equivalent ways to fill a matrix of values, by hand and using a for loop.
fr <- matrix(NA, nrow = 5, ncol = 2)
for (i in 1:5) {
fr[i, ] <- runif(2)
}
# by hand
fr[1, ] <- runif(2)
fr[2, ] <- runif(2)
fr[3, ] <- runif(2)
fr[4, ] <- runif(2)
fr[5, ] <- runif(2)
We convert the matrix final.result into a data.frame which is what ggplot2 expects. We also give the data.frame more informative names and add a column (or variable) called sn (simulation number). See why below.
final.result <- as.data.frame(final.result)
names(final.result) <- c("lowCI", "highCI")
head(final.result)
## lowCI highCI
## 1 -6.117 -1.363
## 2 -7.760 -2.563
## 3 -8.406 -2.482
## 4 -6.604 -2.055
## 5 -8.882 -3.626
## 6 -9.168 -2.915
final.result$sn <- 1:nrow(final.result) # sn = simulation number
library(ggplot2)
ggplot(final.result) + #initialize plot
theme_bw() + # black and white theme
geom_linerange(aes(x = sn, ymin = lowCI, ymax = highCI)) + # add line ranges
geom_hline(y = 0, linetype = "dashed") # add dashed line from y = 0
cz <- final.result$lowCI < 0 & final.result$highCI > 0
We have recorded 1 in 100 (1%) confidence intervals that overlap 0.