Function to Generate Artificial Data

The following code allows us to generate data from a multivariate normal distribution with a known correlation between the variables. So, in our case, we would like to generate artificial data from two distributions with a null correlation (i.e., \(r\) = 0).

mv_rnorm <- function(n, p, u, s, cor) {
  Z <- matrix(rnorm(n * p), p, n)
  t(u + s * t(chol(cor)) %*% Z)
}

In this function, \(n\) is the number of data points we want to generate (in our case, this is the number of participants in the simulated experiment), \(p\) is the number of variables for each participant (in our case, \(p\) = 2), \(u\) is the means for each variable (as we are using z scores, \(u\) would be zero for each variable), \(s\) is the standard deviation of each variable (1 for each variable), and \(cor\) is the correlation matrix we would like the two variables to have across all participants.

Let’s see how this works:

n <- 100
p <- 2
u <- c(0, 0)
s <- c(1, 1)
cor <- 0

# establish the correlation matrix
cor <- matrix(c(1, cor, 
              cor, 1), nrow = 2, ncol = 2, byrow = TRUE)

Let’s look at the correlation matrix:

##      [,1] [,2]
## [1,]    1    0
## [2,]    0    1

Now let’s generate some data and look at it:

# set the random seed so the results are reproducible
set.seed(42)

# generate the data
data <- mv_rnorm(n, p, u, s, cor)
head(data)
##           [,1]        [,2]
## [1,] 1.3709584 -0.56469817
## [2,] 0.3631284  0.63286260
## [3,] 0.4042683 -0.10612452
## [4,] 1.5115220 -0.09465904
## [5,] 2.0184237 -0.06271410
## [6,] 1.3048697  2.28664539

So, we generated data for 100 participants on two variables, each with a mean of zero and a standard deviation of one. We wanted the correlation between the two to be zero. Due to sampling error, this won’t always be exact, but let’s see whether this was the case:

# get the mean of the first and second column. Each should be around zero
apply(data, MARGIN = 2, FUN = mean)
## [1] -0.058966660  0.003997769
# get the SD of each column. They should be around one
apply(data, MARGIN = 2, FUN = sd)
## [1] 0.9600414 0.9927461
# get the correlation between each column. This should be around zero
observed_correlation <- cor(data[, 1], data[, 2])
observed_correlation
## [1] -0.02980491

Now let’s apply the Bayes Factor to this data:

# first, let's copy the BF function from the Wetzels and Wagenmakers (2012) paper
jzs_corbf <- function(r, n){
  
  int <- function(r, n, g){
    (1 + g) ^ ((n - 2) / 2) * (1 + (1 - r ^ 2) * g) ^ (-(n - 1) / 2) *
      g ^ (-3 / 2) * exp(-n / (2 * g))
  }
  
  bf10 <- sqrt((n / 2)) / gamma(1 / 2) * integrate(int, lower = 0, upper = Inf, 
                                                   r = r, n = n)$value
  return(bf10)
}

# now calculate the BF for the observed correlation
bf_cor <- jzs_corbf(r = observed_correlation, n = nrow(data))
bf_cor
## [1] 0.0824977