We plan to use Bayes Factors as our inferential statistic. Unlike frequentist p values, this statistic allows us to evaluate both evidence for an effect (H1) and for the null. We plan to use an optional stopping procedure where we collect data in batches and until we have reached pre-set evidence thresholds to conclude evidence or for the null. (See Dienes 2016 for discussion of why optional stopping rules do not present the same problem for Bayes factors as p values since we stop when the evidence reaches a threshold in either direction). Our goal here is to establish a maximum sample size at which we will stop collecting data regardless of the status of the evidence. This should be determined such that we have a good chance of obtaining non-ambiguous evidence, whilst balancing this against feasibility.
Simulation was used to explore the relationship between sample size and evidence for H1/H0, focusing on our key hypothesis that there would be an interaction between condition (control vs experimental) and gender matchtype (match and mismatch). The dependent variable used in the model was calculated by finding the difference between correct logit transformed proportion of looks to the target versus the foil, with positive values representing more looks to the target than the foil region. For different values of N (total number of participants, split evenly between the control and experimental groups) we simulated 1000 data sets (140 responses per gender matchtype for each condition) using models in which (1) a parameter set where H1 was true and (2) a parameter set where H0 was true:
H1 is true:
We generated responses using a model with a grandmean of 1.383, a within participants factor of gender match type (two levels) with an effect of 0.279, one between participants factor of condition (two levels) with an effect of 0.056, and a gender matchtype by condition interaction effect of 0.296. We also generated a residual term for each participant based on the standard mean of the residual error term. These values were extracted from a model run over the pilot data. There were random intercepts for participant and a random by participant slope for gender matchtype (intercept: sd = 0.506, slope for gender matchtype =0.133 , correlation = 0.576). The random effects were taken from a model run over a subset of the pilot data, as a model run over the full dataset did not converge with a maximal random slopes structure.
H0 is true:
Responses were generated based on a model identical to the previous model except that the interaction between gender matchtype and condition was set to 0.
clear R and load libaries
rm(list = ls())
# for lmer
library(lme4)
## Loading required package: Matrix
library(lmerTest)
##
## Attaching package: 'lmerTest'
## The following object is masked from 'package:lme4':
##
## lmer
## The following object is masked from 'package:stats':
##
## step
# for inv.logit function
library(boot)
# MASS for generating multivariate random distributions (for correlated subject effects)
library(MASS)
library(ggplot2)
library(grid)
library(tidyverse)
## Warning: package 'stringr' was built under R version 4.3.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ lubridate 1.9.2 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyr::expand() masks Matrix::expand()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tidyr::pack() masks Matrix::pack()
## ✖ dplyr::select() masks MASS::select()
## ✖ tidyr::unpack() masks Matrix::unpack()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
loadFunctionsGithub <-function(listFunctions){
# set Github url folders
urlFolder <- 'https://api.github.com/repos/n400peanuts/languagelearninglab/git/trees/master?recursive=1'
urlRaw <- 'https://raw.githubusercontent.com/n400peanuts/languagelearninglab/master/tools/'
if (!require(httr)) {
stop("httr not installed")
} else {
print('----Downloading. Please wait----')
};
httr::GET(urlFolder)-> req #establish connection
stop_for_status(req) # store errors if any
filelist <- unlist(lapply(content(req)$tree, "[", "path"), use.names = F) # list all the functions in the folder
urlFunctions <- grep("docs/tools/", filelist, value = TRUE, fixed = TRUE) # select the ones stored in the "docs" folder
gsub("docs/tools/", "", urlFunctions) -> functions # remove unwanted strings
for(x in listFunctions){
if(!(x %in%functions)){
stop(paste0("Could not find function: '", x,"'"))
}
}
if (length(listFunctions) == 0){ #load all
for (i in 1:length(functions)){
httr::GET(paste0(urlRaw, functions[i]), ssl.verifypeer = FALSE)-> temp
content(temp)->p3
eval(parse(text = p3), envir = .GlobalEnv) #load them in the R environment
}
} else {
functions[functions %in% listFunctions]-> functionsIlike #only the one that I specified
for (i in 1:length(functionsIlike)){
httr::GET(paste0(urlRaw, functionsIlike[i]), ssl.verifypeer = FALSE)-> temp
content(temp)->p3
eval(parse(text = p3), envir = .GlobalEnv) #load them in the R environment
}
};
print('----Download completed----')
}
listFunctions <- c("summarySEwithin.R", "summarySE.R","normDataWithin.R","lizCenter.R","myCenter.R","Bf.R","Bf_powercalc.R")
loadFunctionsGithub(listFunctions)
## Loading required package: httr
## [1] "----Downloading. Please wait----"
## [1] "----Download completed----"
Analyse Pilot Study Data
load("datalgd.rData")
# Run lmer on this data, then extract relevant values
model4 = lmer(datalgd ~ 1+condition.ct *gender_matchtype.ct + (gender_matchtype.ct||participant_id),
control = lmerControl(optimizer = "bobyqa"),
data = real_data)
## boundary (singular) fit: see help('isSingular')
summary(model4)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula:
## datalgd ~ 1 + condition.ct * gender_matchtype.ct + (gender_matchtype.ct ||
## participant_id)
## Data: real_data
## Control: lmerControl(optimizer = "bobyqa")
##
## REML criterion at convergence: 63856.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2663 -0.6415 -0.1097 0.8181 2.5638
##
## Random effects:
## Groups Name Variance Std.Dev.
## participant_id (Intercept) 0.3408 0.5838
## participant_id.1 gender_matchtype.ct 0.0000 0.0000
## Residual 14.0287 3.7455
## Number of obs: 11638, groups: participant_id, 42
##
## Fixed effects:
## Estimate Std. Error df t value
## (Intercept) 1.383e+00 9.654e-02 3.996e+01 14.326
## condition.ct 5.617e-02 1.933e-01 3.996e+01 0.291
## gender_matchtype.ct 2.790e-01 6.951e-02 1.160e+04 4.013
## condition.ct:gender_matchtype.ct 2.963e-01 1.392e-01 1.160e+04 2.129
## Pr(>|t|)
## (Intercept) < 2e-16 ***
## condition.ct 0.7729
## gender_matchtype.ct 6.03e-05 ***
## condition.ct:gender_matchtype.ct 0.0333 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) cndtn. gndr_.
## conditin.ct -0.004
## gndr_mtcht. 0.000 0.002
## cndtn.ct:_. 0.002 0.000 0.000
## optimizer (bobyqa) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
grandmean = summary(model4)$coefficients["(Intercept)", "Estimate" ]
withinCondition = summary(model4)$coefficients["gender_matchtype.ct", "Estimate" ]
betweenCondition = summary(model4)$coefficients["condition.ct", "Estimate" ]
interaction= summary(model4)$coefficients["condition.ct:gender_matchtype.ct", "Estimate" ]
load("subj_v_cov.rData")
residual= subset(as.data.frame(VarCorr(model4)), grp=="Residual")[["sdcor"]]
Function to Simulate Dataset
# function to simulate dataset
generate_LINEAR_v4 <- function(n_subj, n_obs, grandmean, withinCondition,betweenCondition, interaction,subj_v_cov, subj_tau,residual) {
# create empty data frame where number of rows = number of subjects * number of trials per subject
data <- data.frame(matrix(0, ncol=0, nrow = n_subj * n_obs ))
# make participant vector and add to this data frame
data$participant <- as.factor(rep(seq(1:n_subj), each = n_obs))
# add within condition- assumes each participant has equal n trials in both conditions
data$withinCondition<-as.factor(rep((rep(c(0,1), each = n_obs/2)),n_subj)) #this is within
# make centred version
data$withinCondition.ct <- as.numeric(data$withinCondition) - mean(as.numeric(data$withinCondition))
# add between condition
data$betweenCondition <- as.factor(c(rep(1, n_subj/2*n_obs), rep(0, n_subj/2*n_obs)))
# make centred version
data$betweenCondition.ct <- as.numeric(data$betweenCondition) - mean(as.numeric(data$betweenCondition))
# create table with participant random intercept and participant random slope for the withincondition where the sd of each and correlation between the two is from the variance covariance matrix for subject effects sub_v_cov matrix
u <- mvrnorm(n = n_subj, c(0,0), subj_v_cov)
# get the residual error- have one per row
resid <- rnorm(n = n_subj*n_obs, 0,residual )
# finally, generate data on the basis of these parameters we begin by generating a linear term (note eta is the same as y in the linear version of the function, except if was linear would need to add a residual)
# = grandmean + random intercept for participants + effect of condition + random slope for condition for this participant+interaction
data$y = resid + grandmean + u[data$participant,1] + data$withinCondition.ct * (withinCondition +u[data$participant,2])
+ data$betweenCondition.ct * (betweenCondition) + (data$withinCondition.ct *data$betweenCondition.ct)* interaction
data$y = grandmean + u[data$participant,1] + data$withinCondition.ct * (withinCondition +u[data$participant,2]) + data$betweenCondition.ct * (betweenCondition) + (data$withinCondition.ct *data$betweenCondition.ct)* interaction +resid
return(data)
}
Ns=c(50, 100, 150, 200) # a vector the size of N we want to simulate
nsims = 1000 # number of simulations - should be large for montecarlo (1000 or more)
n = vector()
H1estimate = betweenCondition
H1condition_Bf = vector()
H1condition_pvalue = vector()
H2estimate = withinCondition
H2condition_Bf = vector()
H2condition_pvalue = vector()
H3estimate = interaction
H3condition_Bf = vector()
H3condition_pvalue = vector()
singular = vector()
convergeError = vector()
i = 1
for (n_subj in Ns) {
for (j in 1:nsims){
d = generate_LINEAR_v4(n_subj, n_obs= 280, grandmean=grandmean, withinCondition=withinCondition,betweenCondition=betweenCondition, interaction=interaction, subj_v_cov = subj_v_cov, residual=residual)
model_simulated = suppressMessages( lmer(y ~ withinCondition.ct *betweenCondition.ct + (withinCondition.ct||participant),
control = lmerControl(optimizer = "bobyqa"),
data = d))
# collect singularity error if not duplicate of singularity warning
singular[i] = isSingular(model_simulated)
# collect convergence error if not duplicate of singularity warning
convergeError[i]=summary(model_simulated)$optinfo$conv$opt# optimizer checks convergence
#print(i)
n[i]=n_subj
H1condition_pvalue[i] = summary(model_simulated)$coeff[ "betweenCondition.ct","Pr(>|t|)"]
H1condition_est = summary(model_simulated)$coeff[ "betweenCondition.ct","Estimate"]
H1condition_se = summary(model_simulated)$coeff[ "betweenCondition.ct","Std. Error"]
H1condition_Bf[i] = Bf(sd=H1condition_se, obtained=H1condition_est, uniform=0, meanoftheory=0, sdtheory= H1estimate, tail=1)$BayesFactor
H2condition_pvalue[i] = summary(model_simulated)$coeff[ "withinCondition.ct","Pr(>|t|)"]
H2condition_est = summary(model_simulated)$coeff[ "withinCondition.ct","Estimate"]
H2condition_se = summary(model_simulated)$coeff[ "withinCondition.ct","Std. Error"]
H2condition_Bf[i] = Bf(sd=H2condition_se, obtained=H2condition_est, uniform=0, meanoftheory=0, sdtheory= H2estimate, tail=1)$BayesFactor
H3condition_pvalue[i] = summary(model_simulated)$coeff[ "withinCondition.ct:betweenCondition.ct","Pr(>|t|)"]
H3condition_est = summary(model_simulated)$coeff[ "withinCondition.ct:betweenCondition.ct","Estimate"]
H3condition_se = summary(model_simulated)$coeff[ "withinCondition.ct:betweenCondition.ct","Std. Error"]
H3condition_Bf[i] = Bf(sd=H3condition_se, obtained=H3condition_est, uniform=0, meanoftheory=0, sdtheory= H3estimate, tail=1)$BayesFactor
#
i=i+1
}
}
## Warning: Model failed to converge with 1 negative eigenvalue: -1.2e+01
## Warning: Model failed to converge with 1 negative eigenvalue: -4.7e+02
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge: degenerate Hessian with 1 negative eigenvalues
## Warning: Model failed to converge with 1 negative eigenvalue: -5.5e+02
## Warning: Model failed to converge with 1 negative eigenvalue: -1.0e+03
## Warning: Model failed to converge with 1 negative eigenvalue: -6.0e+03
results =data.frame(n, H1condition_Bf, H1condition_pvalue, H2condition_Bf, H2condition_pvalue,H3condition_Bf, H3condition_pvalue, singular,convergeError)
eyetracking_sim <- results
sample_size <- unique(eyetracking_sim$n)
bayes_power_3 <- vector()
bayes_power_3[1] <- mean((subset(eyetracking_sim, n==50)$H3condition_Bf>3)*1)
bayes_power_3[2] <- mean((subset(eyetracking_sim, n==100)$H3condition_Bf>3)*1)
bayes_power_3[3] <- mean((subset(eyetracking_sim, n==150)$H3condition_Bf>3)*1)
bayes_power_3[4] <- mean((subset(eyetracking_sim, n==200)$H3condition_Bf>3)*1)
bayes_power_.3_3 <- vector()
bayes_power_.3_3[1] <- mean((subset(eyetracking_sim, n==50)$H3condition_Bf>(1/3)&subset(eyetracking_sim, n==50)$H3condition_Bf<3)*1)
bayes_power_.3_3[2] <- mean((subset(eyetracking_sim, n==100)$H3condition_Bf>(1/3)&subset(eyetracking_sim, n==100)$H3condition_Bf<3)*1)
bayes_power_.3_3[3] <- mean((subset(eyetracking_sim, n==150)$H3condition_Bf>(1/3)&subset(eyetracking_sim, n==150)$H3condition_Bf<3)*1)
bayes_power_.3_3[4] <- mean((subset(eyetracking_sim, n==200)$H3condition_Bf>(1/3)&subset(eyetracking_sim, n==200)$H3condition_Bf<3)*1)
bayes_power_.3 <- vector()
bayes_power_.3[1] <- mean((subset(eyetracking_sim, n==50)$H3condition_Bf<(1/3))*1)
bayes_power_.3[2] <- mean((subset(eyetracking_sim, n==100)$H3condition_Bf<(1/3))*1)
bayes_power_.3[3] <- mean((subset(eyetracking_sim, n==150)$H3condition_Bf<(1/3))*1)
bayes_power_.3[4] <- mean((subset(eyetracking_sim, n==200)$H3condition_Bf<(1/3))*1)
bayes_total<-data.frame(sample_size,bayes_power_.3,bayes_power_.3_3,bayes_power_3)
bayes_total<- pivot_longer(bayes_total,cols=starts_with("bayes"),names_to = "Bayes_Factor",values_to = "Power")
bayes_total$Bayes_Factor[bayes_total$Bayes_Factor=="bayes_power_.3"]="Substaintial Evidence for H0"
bayes_total$Bayes_Factor[bayes_total$Bayes_Factor=="bayes_power_.3_3"]="Ambiguous"
bayes_total$Bayes_Factor[bayes_total$Bayes_Factor=="bayes_power_3"]="Substaintial Evidence for H1"
bayes_power_.3
## [1] 0.004 0.004 0.000 0.000
bayes_power_.3_3
## [1] 0.333 0.107 0.036 0.008
bayes_power_3
## [1] 0.663 0.889 0.964 0.992
Ns=c(50, 100, 150, 200) # a vector the size of N we want to simulate
nsims=1000 # number of simulations - should be large for montecarlo (1000 or more)
n=vector()
H1estimate = betweenCondition
H1condition_Bf =vector()
H1condition_pvalue=vector()
H2estimate = withinCondition
H2condition_Bf =vector()
H2condition_pvalue=vector()
H3estimate = interaction
H3condition_Bf =vector()
H3condition_pvalue=vector()
singular=vector()
convergeError = vector()
i=1
for (n_subj in Ns){
for (j in 1:nsims){
d = generate_LINEAR_v4(n_subj, n_obs= 280, grandmean=grandmean, withinCondition=withinCondition,betweenCondition=betweenCondition, interaction=0, subj_v_cov = subj_v_cov, residual=residual)
model_simulated = suppressMessages( lmer(y ~ withinCondition.ct *betweenCondition.ct + (withinCondition.ct||participant),
control = lmerControl(optimizer = "bobyqa"),
data = d))
# collect singularity error if not duplicate of singularity warning
singular[i] = isSingular(model_simulated)
# collect convergence error if not duplicate of singularity warning
convergeError[i]=summary(model_simulated)$optinfo$conv$opt# optimizer checks convergence
#print(i)
n[i]=n_subj
H1condition_pvalue[i] = summary(model_simulated)$coeff[ "betweenCondition.ct","Pr(>|t|)"]
H1condition_est = summary(model_simulated)$coeff[ "betweenCondition.ct","Estimate"]
H1condition_se = summary(model_simulated)$coeff[ "betweenCondition.ct","Std. Error"]
H1condition_Bf[i] = Bf(sd=H1condition_se, obtained=H1condition_est, uniform=0, meanoftheory=0, sdtheory= H1estimate, tail=1)$BayesFactor
H2condition_pvalue[i] = summary(model_simulated)$coeff[ "withinCondition.ct","Pr(>|t|)"]
H2condition_est = summary(model_simulated)$coeff[ "withinCondition.ct","Estimate"]
H2condition_se = summary(model_simulated)$coeff[ "withinCondition.ct","Std. Error"]
H2condition_Bf[i] = Bf(sd=H2condition_se, obtained=H2condition_est, uniform=0, meanoftheory=0, sdtheory= H2estimate, tail=1)$BayesFactor
H3condition_pvalue[i] = summary(model_simulated)$coeff[ "withinCondition.ct:betweenCondition.ct","Pr(>|t|)"]
H3condition_est = summary(model_simulated)$coeff[ "withinCondition.ct:betweenCondition.ct","Estimate"]
H3condition_se = summary(model_simulated)$coeff[ "withinCondition.ct:betweenCondition.ct","Std. Error"]
H3condition_Bf[i] = Bf(sd=H3condition_se, obtained=H3condition_est, uniform=0, meanoftheory=0, sdtheory= H3estimate, tail=1)$BayesFactor
#
i=i+1
}
}
## Warning: Model failed to converge with 1 negative eigenvalue: -1.7e+02
## Warning: Model failed to converge with 1 negative eigenvalue: -2.6e+01
## Warning: Model failed to converge with 1 negative eigenvalue: -8.5e+01
## Warning: Model failed to converge with 1 negative eigenvalue: -5.7e+02
## Warning: Model failed to converge with 1 negative eigenvalue: -5.6e+02
## Warning: Model failed to converge with 1 negative eigenvalue: -1.0e+02
null_results =data.frame(n, H1condition_Bf, H1condition_pvalue, H2condition_Bf, H2condition_pvalue,H3condition_Bf, H3condition_pvalue, singular,convergeError)
null_eyetracking_sim <- null_results
sample_size <- unique(null_eyetracking_sim$n)
null_bayes_power_3 <- vector()
null_bayes_power_3[1] <- mean((subset(null_eyetracking_sim, n==50)$H3condition_Bf>3)*1)
null_bayes_power_3[2] <- mean((subset(null_eyetracking_sim, n==100)$H3condition_Bf>3)*1)
null_bayes_power_3[3] <- mean((subset(null_eyetracking_sim, n==150)$H3condition_Bf>3)*1)
null_bayes_power_3[4] <- mean((subset(null_eyetracking_sim, n==200)$H3condition_Bf>3)*1)
null_bayes_power_.3_3 <- vector()
null_bayes_power_.3_3[1] <- mean((subset(null_eyetracking_sim, n==50)$H3condition_Bf>(1/3)&subset(null_eyetracking_sim, n==50)$H3condition_Bf<3)*1)
null_bayes_power_.3_3[2] <- mean((subset(null_eyetracking_sim, n==100)$H3condition_Bf>(1/3)&subset(null_eyetracking_sim, n==100)$H3condition_Bf<3)*1)
null_bayes_power_.3_3[3] <- mean((subset(null_eyetracking_sim, n==150)$H3condition_Bf>(1/3)&subset(null_eyetracking_sim, n==150)$H3condition_Bf<3)*1)
null_bayes_power_.3_3[4] <- mean((subset(null_eyetracking_sim, n==200)$H3condition_Bf>(1/3)&subset(null_eyetracking_sim, n==200)$H3condition_Bf<3)*1)
null_bayes_power_.3 <- vector()
null_bayes_power_.3[1] <- mean((subset(null_eyetracking_sim, n==50)$H3condition_Bf<(1/3))*1)
null_bayes_power_.3[2] <- mean((subset(null_eyetracking_sim, n==100)$H3condition_Bf<(1/3))*1)
null_bayes_power_.3[3] <- mean((subset(null_eyetracking_sim, n==150)$H3condition_Bf<(1/3))*1)
null_bayes_power_.3[4] <- mean((subset(null_eyetracking_sim, n==200)$H3condition_Bf<(1/3))*1)
null_bayes_total<-data.frame(sample_size,null_bayes_power_.3,null_bayes_power_.3_3,null_bayes_power_3)
null_bayes_total<- pivot_longer(null_bayes_total,cols=starts_with("null_bayes"),names_to = "Bayes_Factor",values_to = "Power")
null_bayes_total$Bayes_Factor[null_bayes_total$Bayes_Factor=="null_bayes_power_.3"]="Substaintial Evidence for H0"
null_bayes_total$Bayes_Factor[null_bayes_total$Bayes_Factor=="null_bayes_power_.3_3"]="Ambiguous"
null_bayes_total$Bayes_Factor[null_bayes_total$Bayes_Factor=="null_bayes_power_3"]="Substaintial Evidence for H1"
null_bayes_power_.3
## [1] 0.386 0.547 0.636 0.684
null_bayes_power_.3_3
## [1] 0.588 0.430 0.344 0.297
null_bayes_power_3
## [1] 0.026 0.023 0.020 0.019
df_lines <- c("Substaintial Evidence for H0" = "dashed","Ambiguous" = "dotted", "Substaintial Evidence for H1" = "solid")
bayes_power_fig <- ggplot(bayes_total, aes(x=sample_size,y = Power,shape = Bayes_Factor,linetype = Bayes_Factor)) +
geom_line()+
scale_linetype_manual(values = df_lines)+
geom_point() +
scale_shape_manual(values=c("Substaintial Evidence for H0" = 1,"Ambiguous" = 4,"Substaintial Evidence for H1" = 15))+
scale_y_continuous(limits=c(0, 1)) +
labs(x = "Sample Size",
y = "Power",
shapes= "Legend")+
ggtitle("H1 is true") +
theme_classic()
null_bayes_power_fig <- ggplot(null_bayes_total, aes(x=sample_size,y = Power,shape = Bayes_Factor,linetype = Bayes_Factor)) +
geom_line()+
scale_linetype_manual(values = df_lines)+
geom_point() +
scale_shape_manual(values=c("Substaintial Evidence for H0" = 1,"Ambiguous" = 4,"Substaintial Evidence for H1" = 15))+
scale_y_continuous(limits=c(0, 1)) +
labs(x = "Sample Size",
y = "Power",
shapes= "Legend")+
ggtitle("H0 is true") +
theme_classic()
library(ggpubr)
ggarrange(bayes_power_fig, null_bayes_power_fig, ncol=2, nrow=1, common.legend = TRUE, legend="bottom")
For each of the 1000 datasets for each model for each N, we ran a linear mixed effect model and extracted the estimate and standard error for the interaction. We then used this information as our model of the data to compute a Bayes factor using the method recommended by Dienes (2008) (see also Silvey, Dienes & Wonnacott (under review; https://doi.org/10.31234/osf.io/m4hju) and Brekelmans Lavan Clayards & Wonnacott (2022), comparing a point null against a half normal with a mean = 0 and sd = 0.296 (i.e. equal to our predicted effect size for interaction effect based on the pilot data analysis). For each of the datasets where H1 is true and where H0 is true, we look at the proportion of runs where the BF met the criteria for substantial evidence for H1 (BF>3), substantial evidence for the null (BF<1/3), or did not meet the criteria for either and was therefore ambiguous. This is shown in Figure 1. (Note that we also show proportions meeting the criteria for “strong” evidence i.e. above 10/ below 0.1 - this is shown for information however based our analyses on meeting the threshold for substantial criteria - which in the case of H1 is roughly the same strength of evidence as the more familiar p<.05) with larger sample sizes (N = 150 and N = 200), it is rare to incorrectly accept H1 when H0 is true, and vice versa. However, the challenge is avoiding obtaining evidence which is ambiguous (i.e., 0.33<BF<3). We can see from the plots that obtaining unambiguous evidence for H0 requires a larger sample size than H1, and that an increase between N=150 and N=200 does not lead to substantial gains in this respect. Thus balancing with feasibility, in each strand 2 experiment, N=150 participants seems reasonable, which will provide us with almost 100% chance of obtaining evidence for H1 if H1 is true, and approximately 66% chance of obtaining substantial evidence (BF<0.33) of H0 if H0 is true.