Simple simulation with a t-test (ECPR 2014)

We would like to run nsims of simulations and in each, we will generate two datasets with different means and same variance (or standard deviation). For each simulation, we will generate n random numbers (following a normal distribution).

nsims <- 100 # number of simulation
n <- 30 # number of points per simulation

In each iteration, we will run a t-test on the two generated sets and will obtain the 95% confidence interval (two numbers, low and high). Our result will thus be stored in a data.frame or a matrix with two columns (for low and high) and nsims number of rows.

final.result <- matrix(NA, ncol = 2, nrow = nsims)
head(final.result)

##      [,1] [,2]
## [1,]   NA   NA
## [2,]   NA   NA
## [3,]   NA   NA
## [4,]   NA   NA
## [5,]   NA   NA
## [6,]   NA   NA

dim(final.result)

## [1] 100   2

The algorithm goes like this. For each iteration i of the simulation:

generate sample of mean 10 and sd 5 from a normal distribution
generate sample of mean 15 and sd 5 from a normal distribution
run a t-test and on the two samples (use defaults)
obtain 95% confidence intervals and save it to the i-th row of final.result
(optional) remove created objects

for (i in 1:nsims) {
  first.sample <- rnorm(n, mean = 10, sd = 5)
  second.sample <- rnorm(n, mean = 15, sd = 5)
  #   par(mfrow = c(1, 2))
  #   hist(first.sample)
  #   hist(second.sample)
  result <- t.test(first.sample, second.sample)
  final.result[i, ] <- result$conf.int
  rm(result, first.sample, second.sample)
}

To reiterate, thein following chunk we use two equivalent ways to fill a matrix of values, by hand and using a for loop.

fr <- matrix(NA, nrow = 5, ncol = 2)
for (i in 1:5) {
  fr[i, ] <- runif(2)
}

# by hand
fr[1, ] <- runif(2)
fr[2, ] <- runif(2)
fr[3, ] <- runif(2)
fr[4, ] <- runif(2)
fr[5, ] <- runif(2)

We convert the matrix final.result into a data.frame which is what ggplot2 expects. We also give the data.frame more informative names and add a column (or variable) called sn (simulation number). See why below.

final.result <- as.data.frame(final.result)
names(final.result) <- c("lowCI", "highCI")
head(final.result)

##    lowCI highCI
## 1 -6.117 -1.363
## 2 -7.760 -2.563
## 3 -8.406 -2.482
## 4 -6.604 -2.055
## 5 -8.882 -3.626
## 6 -9.168 -2.915

final.result$sn <- 1:nrow(final.result) # sn = simulation number

library(ggplot2)

ggplot(final.result) + #initialize plot
  theme_bw() +         # black and white theme
  geom_linerange(aes(x = sn, ymin = lowCI, ymax = highCI)) + # add line ranges
  geom_hline(y = 0, linetype = "dashed") # add dashed line from y = 0

plot of chunk plotting

cz <- final.result$lowCI < 0 & final.result$highCI > 0

We have recorded 1 in 100 (1%) confidence intervals that overlap 0.

Simple simulation with a t-test (ECPR 2014)

Roman Lustrik (roman.lustrik atsign biolitika.si)

Saturday, July 26, 2014