Using Bayes factor to examine group differences

Use of the Bayes factor (BF) has become more popular in cognitive training studies during the recent years. It allows researchers to examine the likelihood of their data both under null hypothesis (e.g., no effect following an experimental manipulation) and the alternative hypothesis (e.g., an effect following an experimental manipulation). In this project I will demonstrate how to conduct Bayesian inference in an experimental study design. For those who are interested to take a look on the whole code related to this project, see https://osf.io/q7xe2/.

In the present project, we employed a randomized controlled trial including three groups that received different intervention programs. One group received cognitive training with a strategy instruction (n = 73), one group received cognitive training without strategy instruction (n = 118) and one group served as a Passive control group that didn’t receive any training (n = 67). In such a setup, it’s important to make sure that none of the intervention group does not differ in any key-dependent variables before intervention. here, I will demonstrate how you can examine whether the three groups differed with respect to age, gender distribution, and eductaion length using Bayesian inference.

For computing the BFs, I have been using the “BayesFactor” package developed by Richard Morey. It allows one compute the BFs easily with fairly minimal effort. I also use “Tidyverse” for various preprocessing tasks before conducting the primariy analyses.

library(BayesFactor)
library(tidyverse)

Next, I load the dataset used in the analyses

alldata <- read_excel("C:/Users/dafe0017/Dropbox/R training/wmt/df_test/alldata.xlsx")

I also need to transform the group variable used as the independent variable to a factor. Otherwise, R will retrieve an error in the BF analysis.

alldata <- alldata %>%
  mutate(group = as.factor(group))

First, I create a function for retrieving the bayes factors in a data frame format the data output

# Create function for generating Demographic BFs in a dataframe format
BaselineDemographicsBFs <- function(bf) {
  #BFs
  BaselineBF1 <- as.data.frame(bf[[1]])[[1]] #ST vs. CT
  BaselineBF2 <- as.data.frame(bf[[2]])[[1]] #ST vs. PC
  BaselineBF3 <- as.data.frame(bf[[3]])[[1]] #CT vs. PC
  
  #Bind rows
  BaselineDemBFs <- as.data.frame(rbind(BaselineBF1, BaselineBF2, BaselineBF3))%>%
    rename(BF = V1)
  return(BaselineDemBFs)
}

Next, I create a function for computing the BFs between each group comparison. Note also that I compute separate BF functions for each of the three background variables. Here, I compute indepndent-sample One-Way Anovas between each group using a scaling factor r = .707 which is a typical scaling factor in the Bayesian approach. Note that strategy refers to the training group that received a strategy instruction, whereas active refers to the training group that reveived training without strategy instruction. passive refers to the group that didn’t receive any training.

# Get BF:s for age differences at baseline
getbaseline_age_bf <- function(data) {
  # ST vs PC
  ST.CT = anovaBF(age ~ group,  rscaleFixed = 0.707, 
                  data = as.data.frame(subset(data, group != "passive")))
  # ST vs PC
  ST.PC = anovaBF(age ~ group, rscaleFixed = 0.707, 
                  data = as.data.frame(subset(data, group != "active")))
  
  # CT vs PC
  CT.PC = anovaBF(age ~ group, rscaleFixed = 0.707, 
                  data = as.data.frame(subset(data, group != "strategy")))
  
  #Retrieve BFs
  bf <- list(ST.CT, ST.PC, CT.PC)
  
  return(bf)
}

# Get BF:s for education differences at baseline
getbaseline_education_bf <- function(data) {
  # ST vs PC
  ST.CT = anovaBF(education_years~ group,  rscaleFixed = 0.707, 
                  data = as.data.frame(subset(data, group != "passive")))
  # ST vs PC
  ST.PC = anovaBF(education_years ~ group, rscaleFixed = 0.707, 
                  data = as.data.frame(subset(data, group != "active")))
  
  # CT vs PC
  CT.PC = anovaBF(education_years ~ group, rscaleFixed = 0.707, 
                  data = as.data.frame(subset(data, group != "strategy")))
  
  #Retrieve BFs
  bf <- list(ST.CT, ST.PC, CT.PC)
  
  return(bf)
}

# Get BF:s for gender distribution
getbaseline_gender_bf <- function(data) {
  # ST vs CT
  gender.ST.CT <- subset(data, group != "passive", select = c("group", "gender"))%>%
    droplevels()
  
  gender.ST.CT <- with(gender.ST.CT, table(group, gender))
  gender.ST.CT.bf <- contingencyTableBF(gender.ST.CT, sampleType = "indepMulti",
                                        fixedMargin = "cols", rscaleFixed = 0.707)
  
  # ST vs CT
  gender.ST.PC <- subset(data, group != "active", select = c("group", "gender"))%>%
    droplevels()
  
  gender.ST.PC <- with(gender.ST.PC, table(group, gender))
  gender.ST.PC.bf <- contingencyTableBF(gender.ST.PC, sampleType = "indepMulti",
                                        fixedMargin = "cols", rscaleFixed = 0.707)
  
  # CT vs PC
  gender.CT.PC <- subset(data, group != "strategy", select = c("group", "gender"))%>%
    droplevels()
  gender.CT.PC <- with(gender.CT.PC, table(group, gender))
  gender.CT.PC.bf <- contingencyTableBF(gender.CT.PC, sampleType = "indepMulti",
                                        fixedMargin = "cols", rscaleFixed = 0.707)
  
  
  #Retrieve BFs
  bf <- list(gender.ST.CT.bf, gender.ST.PC.bf, gender.CT.PC.bf)
  
  return(bf)
}

Now we can compute the BFs for each background variable using only a few code snippets.

# Gender
baseline_gender_bf <- BaselineDemographicsBFs(getbaseline_gender_bf(alldata))%>%
  rename(Gender_BFs = BF)

# Age
baseline_age_bf <- BaselineDemographicsBFs(getbaseline_age_bf(alldata))%>%
  rename(Age_BFs = BF)

# Education
baseline_education_bf <- BaselineDemographicsBFs(getbaseline_education_bf(alldata))%>%
  rename(Education_BFs = BF)

Finally, we can retrieve the output. I typically like to retrieve the output in a table format. Here is an example how you can do that.

BackgrBFs <- bind_cols(baseline_gender_bf, baseline_age_bf, baseline_education_bf)%>%
  round(2)%>%
  mutate(agenull = ifelse(Age_BFs < 1, 1/Age_BFs , NA))%>%
  mutate(Age_BFs = ifelse(Age_BFs > 100, "> 100", Age_BFs ))%>%
  mutate(agenull = round(agenull, 2))%>%
  mutate(agenull = ifelse(agenull > 0, paste("1", agenull, sep = "/"), NA))%>%
  mutate(Age_BFs  = ifelse(!is.na(agenull), agenull, Age_BFs ))%>%
  mutate(gendernull = ifelse(Gender_BFs < 1, 1/Gender_BFs , NA))%>%
  mutate(Gender_BFs = ifelse(Gender_BFs > 100, "> 100", Gender_BFs ))%>%
  mutate(gendernull = round(gendernull, 2))%>%
  mutate(gendernull = ifelse(gendernull > 0, paste("1", gendernull, sep = "/"), NA))%>%
  mutate(Gender_BFs  = ifelse(!is.na(gendernull), gendernull, Age_BFs ))%>%
  mutate(edunull = ifelse(Education_BFs < 1, 1/Education_BFs, NA))%>%
  mutate(Education_BFs= ifelse(Gender_BFs > 100, "> 100", Education_BFs))%>%
  mutate(edunull = round(edunull, 2))%>%
  mutate(edunull = ifelse(edunull > 0, paste("1", edunull, sep = "/"), NA))%>%
  mutate(Education_BFs  = ifelse(!is.na(edunull), edunull, Education_BFs))%>%
  mutate(comparison = c("ST vs. CT", "ST vs. PC", "CT vs. PC"))%>%
  select(comparison, Age_BFs, Gender_BFs, Education_BFs)

As you can see below, there doesn’t seem to be any differences in the three background variables between the groups.

print(BackgrBFs)
##   comparison Age_BFs Gender_BFs Education_BFs
## 1  ST vs. CT  1/2.27        1/5        1/8.33
## 2  ST vs. PC  1/5.26     1/3.03        1/4.76
## 3  CT vs. PC  1/6.67     1/4.55        1/5.56