This simulation is trying to answer “how many trials do you need of response time data before the mean stabilises?” By stable we mean the point at which adding more trials to the average produces diminishing returns.

This is being addressed using the bootstrapping method first published by Schonbrodt and Perugini, and more recently by Hehman and colleagues. Adapted to response times, the procedure is as follows:

Below is an example of one repetition with n_sim = 1000 (but only traces of 10 simulations are shown in sky blue). The POS in this repetition was 119.

Functions

Below are the functions for each of the main chunks of the simulation.

Sampling from an ex-Gaussian distribution

This function has the following parameters:

  • n: How many RTs to produce
  • mu: Mu component of ex-Gaussian distribution
  • sigma: Sigma component of ex-Gaussian distribution
  • tau: Tau component of ex-Gaussian distribution
### This function samples n trials from a ex-Gaussian distribution
### and then computes & returns the average of this sample

r_exgauss <- function(n, mu, sigma, tau){
  
  # get the RTs. The "abs" is required to ensure positive return
  rt <- abs(rnorm(n, mu, sigma) + rexp(n, 1/tau))
  
  # calculate & return the mean
  return(mean(rt))
  
}

Calculating one point of stability

This function has the following parameters:

  • n_sims: Number of simulations (i.e., individual blue lines on the above graph).
  • max_trials: The maximum number of trials to explore. In the above graph this was set to 250, but should typically be larger (about 500).
  • cos_criterion: The width (in percentage) of the corridor of stability. This was set to 10 in the above graph.
  • stable_thresh: How many trials must the 95% CI remain in the COS before we consider it “stable”? In the above graph this was set to 10.
  • mu: Mu component of ex-Gaussian distribution
  • sigma: Sigma component of ex-Gaussian distribution
  • tau: Tau component of ex-Gaussian distribution

Note that this function uses a nested for loop, which is supposed to be very slow in R. However, in the comments in this function I have proposed a vectorised solution; in theory this should speed the computation, but in practice it doesn’t! Extensive testing shows the for loop and vectorised components produce the same running time.

### this code runs one simulation (i.e., produces one POS from n_sims)
do_simulation <- function(n_sims, max_trials, cos_criterion, stable_thresh,
                          mu, sigma, tau){
  
  # simulate the data itself
  sim_results <- matrix(0, nrow = n_sims, ncol = max_trials)
  for(i in 1:n_sims){
    for(j in 1:max_trials){
      sim_results[i, j] <- r_exgauss(j, mu, sigma, tau)
    }
  }
  
  # # apply version to remove outer i loop 
  # # If you use this,  make sure the remaining operations work on rows
  # # (they are currently set to work on columns)
  # sim_results <- matrix(1:max_trials, nrow = max_trials, ncol = n_sims)
  # sim_results <- apply(sim_results, 1:2, r_exgauss, mu = mu, sigma = sigma,
  #                      tau = tau)
  
 
  # what is the final mean of the sample?
  final_mean <- mean(sim_results[, max_trials])
  
  # calculate 95% range of means at each N
  range_upper <- apply(sim_results, 2, quantile, probs = c(0.975))
  range_lower <- apply(sim_results, 2, quantile, probs = c(0.025))
  
  # calculate COS based on cos_criterion
  cos_width <- 0.5 * ((final_mean / 100) * cos_criterion)
  cos_upper <- final_mean + cos_width
  cos_lower <- final_mean - cos_width
  
  ## find the point at which the COS is breached & maintained by the 95% interval
  
  # which RTs are in the COS?
  in_cos <- range_lower > cos_lower
  
  # have a sliding window through the RTs and find the point that the 
  # stable_thresh is reached
  stable <- logical()
  offset <- stable_thresh - 1
  len <- length(in_cos) - offset
  
  for(i in 1:len){
    j <- i + offset
    stable[i] <- all(in_cos[i:j])
  }
  
  # extract the point of stability. If "any(stable) == TRUE" returns FALSE, 
  # just store max_trials as POS (this means stability was not obtained)
  if(any(stable) == TRUE){
    pos <- min(which(stable == TRUE))
  } else {
    pos <- max_trials
  }
  
  return(pos)
  
}

Main parent function call

This is the function that runs the show by calling the above functions. Running the above do_simulation function produces just one POS. Ideally, we would like to repeat this many times to get a stable estimate of the mean POS for a set of ex-Gaussian parameters.

The parameters to this parent function are the same as the above. The only new parameter is: * n_repetitions: How many POSs to calulate (i.e., how many times to repeat all of the above)

### run the simulation
do_repetitions <- function(n_repetitions, n_sims, max_trials, cos_criterion,
                           stable_thresh, mu, sigma, tau){

  
  # data container to store POS from each simulation
  repetition_data <- numeric(n_repetitions)
  
  # loop over each repetition required & run a simulation. Store the
  # resulting POS for each simulation. 
  for(i in 1:n_repetitions){
    
      
    # print the current repetition to show progress to the user
    print(i)
    
    # perform the sim
    repetition_data[i] <- do_simulation(n_sims, max_trials, cos_criterion, 
                                        stable_thresh, mu, sigma, tau)
    

  }

  return(repetition_data)

}

Example Run of the Whole Simulation

Note that below I have set the number of repetitions (n_repetitions) to 5 just to show how slow the whole process is without you having to wait an eternity for it. In reality, I need n_repetitions to be closer to 1000.

On my machine, one repetition takes 25 seconds. Therefore, 1000 repetitions will take about 7 hours. The full simulation (not shown here) intends to explore about 64 different ex-Gaussian parameter settings, meaning the total time of the whole thing will be about 18 days!

#--- declare some parameters

# set seed for reproducibility
set.seed(42)

# maximum number of trials for point of stability check
max_trials <- 500

# how many repetitions of the whole thing 
n_repetitions = 5

# how many simulations per repetition
n_sims = 1000

# declare the ex-Gaussian parameters
mu <- 500
sigma <- 120
tau <- 120

# establish COS criterion & stabiity threshold
cos_criterion <- 10
stable_thresh <- 10

# run the simulation (& time it for demonstration only)
system.time(
sim_results <- do_repetitions(n_repetitions = n_repetitions, n_sims = n_sims,
                              max_trials = max_trials, 
                              cos_criterion = cos_criterion,
                              stable_thresh = stable_thresh, 
                              mu = mu, sigma = sigma, tau = tau)
)
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
##    user  system elapsed 
##  126.27    0.05  127.78