Introduction

We plan to use Bayes Factors as our inferential statistic. Unlike frequentist p values, this statistic allows us to evaluate both evidence for an effect (H1) and for the null. We plan to use an optional stopping procedure where we collect data in batches and until we have reached pre-set evidence thresholds to conclude evidence or for the null. (See Dienes 2016 for discussion of why optional stopping rules do not present the same problem for Bayes factors as p values since we stop when the evidence reaches a threshold in either direction). Our goal here is to establish a maximum sample size at which we will stop collecting data regardless of the status of the evidence. This should be determined such that we have a good chance of obtaining non-ambiguous evidence, whilst balancing this against feasibility.

Eye-tracking Simulation

Simulation was used to explore the relationship between sample size and evidence for H1/H0, focusing on our key hypothesis that there would be an interaction between condition (control vs experimental) and gender matchtype (match and mismatch). The dependent variable used in the model was calculated by finding the difference between correct logit transformed proportion of looks to the target versus the foil, with positive values representing more looks to the target than the foil region. For different values of N (total number of participants, split evenly between the control and experimental groups) we simulated 1000 data sets (140 responses per gender matchtype for each condition) using models in which (1) a parameter set where H1 was true and (2) a parameter set where H0 was true:

H1 is true:

We generated responses using a model with a grandmean of 1.383, a within participants factor of gender match type (two levels) with an effect of 0.279, one between participants factor of condition (two levels) with an effect of 0.056, and a gender matchtype by condition interaction effect of 0.296. We also generated a residual term for each participant based on the standard mean of the residual error term. These values were extracted from a model run over the pilot data. There were random intercepts for participant and a random by participant slope for gender matchtype (intercept: sd = 0.506, slope for gender matchtype =0.133 , correlation = 0.576). The random effects were taken from a model run over a subset of the pilot data, as a model run over the full dataset did not converge with a maximal random slopes structure.

H0 is true:

Responses were generated based on a model identical to the previous model except that the interaction between gender matchtype and condition was set to 0.

Code to Run Simulation

clear R and load libaries

rm(list = ls())

# for lmer
library(lme4)
## Loading required package: Matrix
library(lmerTest)
## 
## Attaching package: 'lmerTest'
## The following object is masked from 'package:lme4':
## 
##     lmer
## The following object is masked from 'package:stats':
## 
##     step
# for  inv.logit function
library(boot)
# MASS for generating multivariate random distributions (for correlated subject effects)
library(MASS)
library(ggplot2)
library(grid)
library(tidyverse)
## Warning: package 'stringr' was built under R version 4.3.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ lubridate 1.9.2     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyr::expand() masks Matrix::expand()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ✖ tidyr::pack()   masks Matrix::pack()
## ✖ dplyr::select() masks MASS::select()
## ✖ tidyr::unpack() masks Matrix::unpack()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
loadFunctionsGithub <-function(listFunctions){
  
  # set Github url folders
  urlFolder <- 'https://api.github.com/repos/n400peanuts/languagelearninglab/git/trees/master?recursive=1'
  urlRaw <- 'https://raw.githubusercontent.com/n400peanuts/languagelearninglab/master/tools/'
  
  if (!require(httr)) {
    stop("httr not installed")
  } else {
    print('----Downloading. Please wait----')
  };
  
  httr::GET(urlFolder)-> req #establish connection
  stop_for_status(req) # store errors if any
  filelist <- unlist(lapply(content(req)$tree, "[", "path"), use.names = F) # list all the functions in the folder
  urlFunctions <- grep("docs/tools/", filelist, value = TRUE, fixed = TRUE) # select the ones stored in the "docs" folder
  gsub("docs/tools/", "", urlFunctions) -> functions # remove unwanted strings
  
  for(x in listFunctions){
    if(!(x %in%functions)){
      stop(paste0("Could not find function: '", x,"'"))
    }
  }
  
  if (length(listFunctions) == 0){ #load all
    for (i in 1:length(functions)){
      httr::GET(paste0(urlRaw, functions[i]), ssl.verifypeer = FALSE)-> temp
      content(temp)->p3
      eval(parse(text = p3), envir = .GlobalEnv) #load them in the R environment
    } 
  } else {
    functions[functions %in% listFunctions]-> functionsIlike #only the one that I specified
    for (i in 1:length(functionsIlike)){
      httr::GET(paste0(urlRaw, functionsIlike[i]), ssl.verifypeer = FALSE)-> temp
      content(temp)->p3
      eval(parse(text = p3), envir = .GlobalEnv) #load them in the R environment
    }
  };
  print('----Download completed----')
}
listFunctions <- c("summarySEwithin.R", "summarySE.R","normDataWithin.R","lizCenter.R","myCenter.R","Bf.R","Bf_powercalc.R")
loadFunctionsGithub(listFunctions)
## Loading required package: httr
## [1] "----Downloading. Please wait----"
## [1] "----Download completed----"

Analyse Pilot Study Data

load("datalgd.rData")

# Run lmer on this data, then extract relevant values

model4 = lmer(datalgd ~ 1+condition.ct *gender_matchtype.ct + (gender_matchtype.ct||participant_id),
              control = lmerControl(optimizer = "bobyqa"),
              data = real_data)
## boundary (singular) fit: see help('isSingular')
summary(model4)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: 
## datalgd ~ 1 + condition.ct * gender_matchtype.ct + (gender_matchtype.ct ||  
##     participant_id)
##    Data: real_data
## Control: lmerControl(optimizer = "bobyqa")
## 
## REML criterion at convergence: 63856.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2663 -0.6415 -0.1097  0.8181  2.5638 
## 
## Random effects:
##  Groups           Name                Variance Std.Dev.
##  participant_id   (Intercept)          0.3408  0.5838  
##  participant_id.1 gender_matchtype.ct  0.0000  0.0000  
##  Residual                             14.0287  3.7455  
## Number of obs: 11638, groups:  participant_id, 42
## 
## Fixed effects:
##                                   Estimate Std. Error        df t value
## (Intercept)                      1.383e+00  9.654e-02 3.996e+01  14.326
## condition.ct                     5.617e-02  1.933e-01 3.996e+01   0.291
## gender_matchtype.ct              2.790e-01  6.951e-02 1.160e+04   4.013
## condition.ct:gender_matchtype.ct 2.963e-01  1.392e-01 1.160e+04   2.129
##                                  Pr(>|t|)    
## (Intercept)                       < 2e-16 ***
## condition.ct                       0.7729    
## gender_matchtype.ct              6.03e-05 ***
## condition.ct:gender_matchtype.ct   0.0333 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) cndtn. gndr_.
## conditin.ct -0.004              
## gndr_mtcht.  0.000  0.002       
## cndtn.ct:_.  0.002  0.000  0.000
## optimizer (bobyqa) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
grandmean = summary(model4)$coefficients["(Intercept)", "Estimate" ]
withinCondition = summary(model4)$coefficients["gender_matchtype.ct", "Estimate" ]
betweenCondition = summary(model4)$coefficients["condition.ct", "Estimate" ]
interaction= summary(model4)$coefficients["condition.ct:gender_matchtype.ct", "Estimate" ]
load("subj_v_cov.rData")
residual= subset(as.data.frame(VarCorr(model4)), grp=="Residual")[["sdcor"]]

Create Functions

Function to Simulate Dataset

  • n_sub is the number of participants we assume in our simulation (must be even as code assumes half in each of the two conditions)
  • n_obs is the number of observations for each of the subjects (which is assumed to be the same in both conditions)
  • grandmean is grandmean
  • withinCondition is estimate of the difference between the conditions (i.e. estimate of coefficient for main effect of withincondition.ct)
  • betweenCondition is difference between the two levels of between condition (Estimate of coefficient for main effect of betweencondition.ct)
  • interaction is difference of differences (interaction between the two variables)
  • subj_v_cov is the variance co-variance matrix for subject effects. We use this to generate correlated by-participant intercept and slope
  • residual is the standard mean of the residual error term
# function to simulate dataset
generate_LINEAR_v4 <- function(n_subj, n_obs, grandmean, withinCondition,betweenCondition, interaction,subj_v_cov, subj_tau,residual) {
    # create empty data frame where number of rows = number of subjects *     number of trials per subject
    data <- data.frame(matrix(0, ncol=0, nrow = n_subj * n_obs ))
    # make participant vector and add to this data frame
    data$participant <- as.factor(rep(seq(1:n_subj), each = n_obs))
    # add within condition- assumes each participant has equal n trials in both conditions
    data$withinCondition<-as.factor(rep((rep(c(0,1), each = n_obs/2)),n_subj)) #this is within
    # make centred version
    data$withinCondition.ct <- as.numeric(data$withinCondition) - mean(as.numeric(data$withinCondition))
    # add between condition
    data$betweenCondition <- as.factor(c(rep(1, n_subj/2*n_obs), rep(0, n_subj/2*n_obs)))
    # make centred version
    data$betweenCondition.ct <- as.numeric(data$betweenCondition) - mean(as.numeric(data$betweenCondition))

    # create table with participant random intercept and participant random slope for the withincondition where the sd of each and  correlation between the two is from the variance covariance matrix for subject effects sub_v_cov matrix
    u <- mvrnorm(n = n_subj, c(0,0), subj_v_cov)

    # get the residual error- have one per row
    resid <- rnorm(n = n_subj*n_obs, 0,residual )

    # finally, generate data on the basis of these parameters we begin by generating a linear term (note eta is the same as y in the linear version of the function, except if was linear would need to add a residual)
    # = grandmean + random intercept for participants + effect of condition + random slope for condition for this participant+interaction

    data$y = resid + grandmean + u[data$participant,1]  + data$withinCondition.ct * (withinCondition +u[data$participant,2])
    + data$betweenCondition.ct * (betweenCondition) + (data$withinCondition.ct *data$betweenCondition.ct)* interaction

    data$y = grandmean +  u[data$participant,1]  + data$withinCondition.ct * (withinCondition +u[data$participant,2]) + data$betweenCondition.ct * (betweenCondition) + (data$withinCondition.ct *data$betweenCondition.ct)* interaction +resid

    return(data)

  }

Run Simulation

Ns=c(50, 100, 150, 200) # a vector the size of N we want to simulate
nsims = 1000 # number of simulations - should be large for montecarlo (1000 or more)
n = vector()
H1estimate = betweenCondition
H1condition_Bf = vector()
H1condition_pvalue = vector()
H2estimate = withinCondition
H2condition_Bf = vector()
H2condition_pvalue = vector()
H3estimate = interaction
H3condition_Bf = vector()
H3condition_pvalue = vector()

singular = vector()
convergeError = vector()
i = 1

 for (n_subj in Ns) {
   for (j in 1:nsims){

     d = generate_LINEAR_v4(n_subj, n_obs= 280, grandmean=grandmean, withinCondition=withinCondition,betweenCondition=betweenCondition, interaction=interaction, subj_v_cov = subj_v_cov, residual=residual)

     model_simulated = suppressMessages( lmer(y ~ withinCondition.ct *betweenCondition.ct + (withinCondition.ct||participant),
                            control = lmerControl(optimizer = "bobyqa"),
                            data = d))

     # collect singularity error if not duplicate of singularity warning
     singular[i] = isSingular(model_simulated)
     # collect convergence error if not duplicate of singularity warning
     convergeError[i]=summary(model_simulated)$optinfo$conv$opt# optimizer checks convergence
     #print(i)
     n[i]=n_subj
     H1condition_pvalue[i] = summary(model_simulated)$coeff[ "betweenCondition.ct","Pr(>|t|)"]
     H1condition_est = summary(model_simulated)$coeff[ "betweenCondition.ct","Estimate"]
     H1condition_se = summary(model_simulated)$coeff[ "betweenCondition.ct","Std. Error"]
     H1condition_Bf[i] = Bf(sd=H1condition_se,  obtained=H1condition_est, uniform=0, meanoftheory=0, sdtheory= H1estimate, tail=1)$BayesFactor

     H2condition_pvalue[i] = summary(model_simulated)$coeff[ "withinCondition.ct","Pr(>|t|)"]
     H2condition_est = summary(model_simulated)$coeff[ "withinCondition.ct","Estimate"]
     H2condition_se = summary(model_simulated)$coeff[ "withinCondition.ct","Std. Error"]
     H2condition_Bf[i] = Bf(sd=H2condition_se,  obtained=H2condition_est, uniform=0, meanoftheory=0, sdtheory= H2estimate, tail=1)$BayesFactor

     H3condition_pvalue[i] = summary(model_simulated)$coeff[ "withinCondition.ct:betweenCondition.ct","Pr(>|t|)"]
     H3condition_est = summary(model_simulated)$coeff[ "withinCondition.ct:betweenCondition.ct","Estimate"]
     H3condition_se = summary(model_simulated)$coeff[ "withinCondition.ct:betweenCondition.ct","Std. Error"]
     H3condition_Bf[i] = Bf(sd=H3condition_se,  obtained=H3condition_est, uniform=0, meanoftheory=0, sdtheory= H3estimate, tail=1)$BayesFactor

     #
     i=i+1


   }
 }
## Warning: Model failed to converge with 1 negative eigenvalue: -1.2e+01
## Warning: Model failed to converge with 1 negative eigenvalue: -4.7e+02
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge: degenerate Hessian with 1 negative eigenvalues
## Warning: Model failed to converge with 1 negative eigenvalue: -5.5e+02
## Warning: Model failed to converge with 1 negative eigenvalue: -1.0e+03
## Warning: Model failed to converge with 1 negative eigenvalue: -6.0e+03
 results =data.frame(n, H1condition_Bf,  H1condition_pvalue, H2condition_Bf, H2condition_pvalue,H3condition_Bf, H3condition_pvalue, singular,convergeError)

 eyetracking_sim <- results

sample_size <- unique(eyetracking_sim$n)

bayes_power_3 <- vector()

bayes_power_3[1] <- mean((subset(eyetracking_sim, n==50)$H3condition_Bf>3)*1)
bayes_power_3[2] <- mean((subset(eyetracking_sim, n==100)$H3condition_Bf>3)*1)
bayes_power_3[3] <- mean((subset(eyetracking_sim, n==150)$H3condition_Bf>3)*1)
bayes_power_3[4] <- mean((subset(eyetracking_sim, n==200)$H3condition_Bf>3)*1)

bayes_power_.3_3 <- vector()

bayes_power_.3_3[1] <- mean((subset(eyetracking_sim, n==50)$H3condition_Bf>(1/3)&subset(eyetracking_sim, n==50)$H3condition_Bf<3)*1)
bayes_power_.3_3[2] <- mean((subset(eyetracking_sim, n==100)$H3condition_Bf>(1/3)&subset(eyetracking_sim, n==100)$H3condition_Bf<3)*1)
bayes_power_.3_3[3] <- mean((subset(eyetracking_sim, n==150)$H3condition_Bf>(1/3)&subset(eyetracking_sim, n==150)$H3condition_Bf<3)*1)
bayes_power_.3_3[4] <- mean((subset(eyetracking_sim, n==200)$H3condition_Bf>(1/3)&subset(eyetracking_sim, n==200)$H3condition_Bf<3)*1)


bayes_power_.3 <- vector()

bayes_power_.3[1] <- mean((subset(eyetracking_sim, n==50)$H3condition_Bf<(1/3))*1)
bayes_power_.3[2] <- mean((subset(eyetracking_sim, n==100)$H3condition_Bf<(1/3))*1)
bayes_power_.3[3] <- mean((subset(eyetracking_sim, n==150)$H3condition_Bf<(1/3))*1)
bayes_power_.3[4] <- mean((subset(eyetracking_sim, n==200)$H3condition_Bf<(1/3))*1)


bayes_total<-data.frame(sample_size,bayes_power_.3,bayes_power_.3_3,bayes_power_3)

bayes_total<- pivot_longer(bayes_total,cols=starts_with("bayes"),names_to = "Bayes_Factor",values_to = "Power")

bayes_total$Bayes_Factor[bayes_total$Bayes_Factor=="bayes_power_.3"]="Substaintial Evidence for H0"
bayes_total$Bayes_Factor[bayes_total$Bayes_Factor=="bayes_power_.3_3"]="Ambiguous"
bayes_total$Bayes_Factor[bayes_total$Bayes_Factor=="bayes_power_3"]="Substaintial Evidence for H1"


bayes_power_.3
## [1] 0.004 0.004 0.000 0.000
bayes_power_.3_3
## [1] 0.333 0.107 0.036 0.008
bayes_power_3
## [1] 0.663 0.889 0.964 0.992
  Ns=c(50, 100, 150, 200) # a vector the size of N we want to simulate
  nsims=1000 # number of simulations - should be large for montecarlo (1000 or more)
  n=vector()
  H1estimate = betweenCondition
  H1condition_Bf =vector()
  H1condition_pvalue=vector()
  H2estimate = withinCondition
  H2condition_Bf =vector()
  H2condition_pvalue=vector()
  H3estimate = interaction
  H3condition_Bf =vector()
  H3condition_pvalue=vector()
  singular=vector()
  convergeError = vector()
  i=1
  for (n_subj in Ns){
    for (j in 1:nsims){

      d = generate_LINEAR_v4(n_subj, n_obs= 280, grandmean=grandmean, withinCondition=withinCondition,betweenCondition=betweenCondition, interaction=0, subj_v_cov = subj_v_cov, residual=residual)

      model_simulated = suppressMessages( lmer(y ~ withinCondition.ct *betweenCondition.ct + (withinCondition.ct||participant),
                                               control = lmerControl(optimizer = "bobyqa"),
                                               data = d))

      # collect singularity error if not duplicate of singularity warning
      singular[i] = isSingular(model_simulated)
      # collect convergence error if not duplicate of singularity warning
      convergeError[i]=summary(model_simulated)$optinfo$conv$opt# optimizer checks convergence
      #print(i)
      n[i]=n_subj
      H1condition_pvalue[i] = summary(model_simulated)$coeff[ "betweenCondition.ct","Pr(>|t|)"]
      H1condition_est = summary(model_simulated)$coeff[ "betweenCondition.ct","Estimate"]
      H1condition_se = summary(model_simulated)$coeff[ "betweenCondition.ct","Std. Error"]
      H1condition_Bf[i] = Bf(sd=H1condition_se,  obtained=H1condition_est, uniform=0, meanoftheory=0, sdtheory= H1estimate, tail=1)$BayesFactor

      H2condition_pvalue[i] = summary(model_simulated)$coeff[ "withinCondition.ct","Pr(>|t|)"]
      H2condition_est = summary(model_simulated)$coeff[ "withinCondition.ct","Estimate"]
      H2condition_se = summary(model_simulated)$coeff[ "withinCondition.ct","Std. Error"]
      H2condition_Bf[i] = Bf(sd=H2condition_se,  obtained=H2condition_est, uniform=0, meanoftheory=0, sdtheory= H2estimate, tail=1)$BayesFactor

      H3condition_pvalue[i] = summary(model_simulated)$coeff[ "withinCondition.ct:betweenCondition.ct","Pr(>|t|)"]
      H3condition_est = summary(model_simulated)$coeff[ "withinCondition.ct:betweenCondition.ct","Estimate"]
      H3condition_se = summary(model_simulated)$coeff[ "withinCondition.ct:betweenCondition.ct","Std. Error"]
      H3condition_Bf[i] = Bf(sd=H3condition_se,  obtained=H3condition_est, uniform=0, meanoftheory=0, sdtheory= H3estimate, tail=1)$BayesFactor

      #
      i=i+1

    }
  }
## Warning: Model failed to converge with 1 negative eigenvalue: -1.7e+02
## Warning: Model failed to converge with 1 negative eigenvalue: -2.6e+01
## Warning: Model failed to converge with 1 negative eigenvalue: -8.5e+01
## Warning: Model failed to converge with 1 negative eigenvalue: -5.7e+02
## Warning: Model failed to converge with 1 negative eigenvalue: -5.6e+02
## Warning: Model failed to converge with 1 negative eigenvalue: -1.0e+02
  null_results =data.frame(n, H1condition_Bf,  H1condition_pvalue, H2condition_Bf, H2condition_pvalue,H3condition_Bf, H3condition_pvalue, singular,convergeError)

  null_eyetracking_sim <- null_results

sample_size <- unique(null_eyetracking_sim$n)

null_bayes_power_3 <- vector()

null_bayes_power_3[1] <- mean((subset(null_eyetracking_sim, n==50)$H3condition_Bf>3)*1)
null_bayes_power_3[2] <- mean((subset(null_eyetracking_sim, n==100)$H3condition_Bf>3)*1)
null_bayes_power_3[3] <- mean((subset(null_eyetracking_sim, n==150)$H3condition_Bf>3)*1)
null_bayes_power_3[4] <- mean((subset(null_eyetracking_sim, n==200)$H3condition_Bf>3)*1)


null_bayes_power_.3_3 <- vector()

null_bayes_power_.3_3[1] <- mean((subset(null_eyetracking_sim, n==50)$H3condition_Bf>(1/3)&subset(null_eyetracking_sim, n==50)$H3condition_Bf<3)*1)
null_bayes_power_.3_3[2] <- mean((subset(null_eyetracking_sim, n==100)$H3condition_Bf>(1/3)&subset(null_eyetracking_sim, n==100)$H3condition_Bf<3)*1)
null_bayes_power_.3_3[3] <- mean((subset(null_eyetracking_sim, n==150)$H3condition_Bf>(1/3)&subset(null_eyetracking_sim, n==150)$H3condition_Bf<3)*1)
null_bayes_power_.3_3[4] <- mean((subset(null_eyetracking_sim, n==200)$H3condition_Bf>(1/3)&subset(null_eyetracking_sim, n==200)$H3condition_Bf<3)*1)


null_bayes_power_.3 <- vector()

null_bayes_power_.3[1] <- mean((subset(null_eyetracking_sim, n==50)$H3condition_Bf<(1/3))*1)
null_bayes_power_.3[2] <- mean((subset(null_eyetracking_sim, n==100)$H3condition_Bf<(1/3))*1)
null_bayes_power_.3[3] <- mean((subset(null_eyetracking_sim, n==150)$H3condition_Bf<(1/3))*1)
null_bayes_power_.3[4] <- mean((subset(null_eyetracking_sim, n==200)$H3condition_Bf<(1/3))*1)


null_bayes_total<-data.frame(sample_size,null_bayes_power_.3,null_bayes_power_.3_3,null_bayes_power_3)

null_bayes_total<- pivot_longer(null_bayes_total,cols=starts_with("null_bayes"),names_to = "Bayes_Factor",values_to = "Power")


null_bayes_total$Bayes_Factor[null_bayes_total$Bayes_Factor=="null_bayes_power_.3"]="Substaintial Evidence for H0"
null_bayes_total$Bayes_Factor[null_bayes_total$Bayes_Factor=="null_bayes_power_.3_3"]="Ambiguous"
null_bayes_total$Bayes_Factor[null_bayes_total$Bayes_Factor=="null_bayes_power_3"]="Substaintial Evidence for H1"

null_bayes_power_.3
## [1] 0.386 0.547 0.636 0.684
null_bayes_power_.3_3
## [1] 0.588 0.430 0.344 0.297
null_bayes_power_3
## [1] 0.026 0.023 0.020 0.019

Plot Power Analysis

  df_lines <- c("Substaintial Evidence for H0" = "dashed","Ambiguous" = "dotted", "Substaintial Evidence for H1" = "solid")


bayes_power_fig <- ggplot(bayes_total, aes(x=sample_size,y = Power,shape = Bayes_Factor,linetype = Bayes_Factor)) +
  geom_line()+
  scale_linetype_manual(values = df_lines)+
  geom_point() +
  scale_shape_manual(values=c("Substaintial Evidence for H0" = 1,"Ambiguous" = 4,"Substaintial Evidence for H1" = 15))+
  scale_y_continuous(limits=c(0, 1)) +
  labs(x = "Sample Size",
         y = "Power",
         shapes= "Legend")+
  ggtitle("H1 is true") +
  theme_classic()


null_bayes_power_fig <- ggplot(null_bayes_total, aes(x=sample_size,y = Power,shape = Bayes_Factor,linetype = Bayes_Factor)) +
  geom_line()+
  scale_linetype_manual(values = df_lines)+
  geom_point() +
  scale_shape_manual(values=c("Substaintial Evidence for H0" = 1,"Ambiguous" = 4,"Substaintial Evidence for H1" = 15))+
  scale_y_continuous(limits=c(0, 1)) +
  labs(x = "Sample Size",
         y = "Power",
         shapes= "Legend")+
  ggtitle("H0 is true") +
  theme_classic()

  library(ggpubr)
  ggarrange(bayes_power_fig, null_bayes_power_fig, ncol=2, nrow=1, common.legend = TRUE, legend="bottom")

For each of the 1000 datasets for each model for each N, we ran a linear mixed effect model and extracted the estimate and standard error for the interaction. We then used this information as our model of the data to compute a Bayes factor using the method recommended by Dienes (2008) (see also Silvey, Dienes & Wonnacott (under review; https://doi.org/10.31234/osf.io/m4hju) and Brekelmans Lavan Clayards & Wonnacott (2022), comparing a point null against a half normal with a mean = 0 and sd = 0.296 (i.e. equal to our predicted effect size for interaction effect based on the pilot data analysis). For each of the datasets where H1 is true and where H0 is true, we look at the proportion of runs where the BF met the criteria for substantial evidence for H1 (BF>3), substantial evidence for the null (BF<1/3), or did not meet the criteria for either and was therefore ambiguous. This is shown in Figure 1. (Note that we also show proportions meeting the criteria for “strong” evidence i.e. above 10/ below 0.1 - this is shown for information however based our analyses on meeting the threshold for substantial criteria - which in the case of H1 is roughly the same strength of evidence as the more familiar p<.05) with larger sample sizes (N = 150 and N = 200), it is rare to incorrectly accept H1 when H0 is true, and vice versa. However, the challenge is avoiding obtaining evidence which is ambiguous (i.e., 0.33<BF<3). We can see from the plots that obtaining unambiguous evidence for H0 requires a larger sample size than H1, and that an increase between N=150 and N=200 does not lead to substantial gains in this respect. Thus balancing with feasibility, in each strand 2 experiment, N=150 participants seems reasonable, which will provide us with almost 100% chance of obtaining evidence for H1 if H1 is true, and approximately 66% chance of obtaining substantial evidence (BF<0.33) of H0 if H0 is true.