Introduction

We plan to use Bayes Factors as our inferential statistic. Unlike frequentist p values, this statistic allows us to evaluate both evidence for an effect (H1) and for the null. We plan to use an optional stopping procedure where we collect data in batches and until we have reached pre-set evidence thresholds to conclude evidence or for the null. (See Dienes 2016 for discussion of why optional stopping rules do not present the same problem for Bayes factors as p values since we stop when the evidence reaches a threshold in either direction). Our goal here is to establish a maximum sample size at which we will stop collecting data regardless of the status of the evidence. This should be determined such that we have a good chance of obtaining non-ambiguous evidence, whilst balancing this against feasibility.

Simulation 1: Eye-tracking (Strand 2)

Simulation was used to explore the relationship between sample size and evidence for H1/H0, focusing on our key hypothesis that there would be an interaction between condition (control vs experimental) and gender matchtype (match and mismatch). The dependent variable used in the model was calculated by finding the difference between correct logit transformed proportion of looks to the target versus the foil, with positive values representing more looks to the target than the foil region. For different values of N (total number of participants, split evenly between the control and experimental groups) we simulated 1000 data sets (140 responses per gender matchtype for each condition) using models in which (1) a parameter set where H1 was true and (2) a parameter set where H0 was true:

H1 is true:

We generated responses using a model with a grandmean of 1.383, a within participants factor of gender match type (two levels) with an effect of 0.279, one between participants factor of condition (two levels) with an effect of 0.056, and a gender matchtype by condition interaction effect of 0.296. We also generated a residual term for each participant based on the standard mean of the residual error term. These values were extracted from a model run over the pilot data. There were random intercepts for participant and a random by participant slope for gender matchtype (intercept: sd = 0.506, slope for gender matchtype =0.133 , correlation = 0.576). The random effects were taken from a model run over a subset of the pilot data, as a model run over the full dataset did not converge with a maximal random slopes structure.

H0 is true:

Responses were generated based on a model identical to the previous model except that the interaction between gender matchtype and condition was set to 0.

Code to Run Simulation

clear R and load libaries

rm(list = ls())

# for lmer
library(lme4)
## Loading required package: Matrix
library(lmerTest)
## 
## Attaching package: 'lmerTest'
## The following object is masked from 'package:lme4':
## 
##     lmer
## The following object is masked from 'package:stats':
## 
##     step
# for  inv.logit function
library(boot)
# MASS for generating multivariate random distributions (for correlated subject effects)
library(MASS)
library(ggplot2)
library(grid)
library(tidyverse)
## Warning: package 'stringr' was built under R version 4.3.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ lubridate 1.9.2     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyr::expand() masks Matrix::expand()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ✖ tidyr::pack()   masks Matrix::pack()
## ✖ dplyr::select() masks MASS::select()
## ✖ tidyr::unpack() masks Matrix::unpack()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
loadFunctionsGithub <-function(listFunctions){
  
  # set Github url folders
  urlFolder <- 'https://api.github.com/repos/n400peanuts/languagelearninglab/git/trees/master?recursive=1'
  urlRaw <- 'https://raw.githubusercontent.com/n400peanuts/languagelearninglab/master/tools/'
  
  if (!require(httr)) {
    stop("httr not installed")
  } else {
    print('----Downloading. Please wait----')
  };
  
  httr::GET(urlFolder)-> req #establish connection
  stop_for_status(req) # store errors if any
  filelist <- unlist(lapply(content(req)$tree, "[", "path"), use.names = F) # list all the functions in the folder
  urlFunctions <- grep("docs/tools/", filelist, value = TRUE, fixed = TRUE) # select the ones stored in the "docs" folder
  gsub("docs/tools/", "", urlFunctions) -> functions # remove unwanted strings
  
  for(x in listFunctions){
    if(!(x %in%functions)){
      stop(paste0("Could not find function: '", x,"'"))
    }
  }
  
  if (length(listFunctions) == 0){ #load all
    for (i in 1:length(functions)){
      httr::GET(paste0(urlRaw, functions[i]), ssl.verifypeer = FALSE)-> temp
      content(temp)->p3
      eval(parse(text = p3), envir = .GlobalEnv) #load them in the R environment
    } 
  } else {
    functions[functions %in% listFunctions]-> functionsIlike #only the one that I specified
    for (i in 1:length(functionsIlike)){
      httr::GET(paste0(urlRaw, functionsIlike[i]), ssl.verifypeer = FALSE)-> temp
      content(temp)->p3
      eval(parse(text = p3), envir = .GlobalEnv) #load them in the R environment
    }
  };
  print('----Download completed----')
}
listFunctions <- c("summarySEwithin.R", "summarySE.R","normDataWithin.R","lizCenter.R","myCenter.R","Bf.R","Bf_powercalc.R")
loadFunctionsGithub(listFunctions)
## Loading required package: httr
## [1] "----Downloading. Please wait----"
## [1] "----Download completed----"

Analyse Pilot Study Data

load("datalgd.rData")

# Run lmer on this data, then extract relevant values

model4 = lmer(datalgd ~ 1+condition.ct *gender_matchtype.ct + (gender_matchtype.ct||participant_id),
              control = lmerControl(optimizer = "bobyqa"),
              data = real_data)
## boundary (singular) fit: see help('isSingular')
summary(model4)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: 
## datalgd ~ 1 + condition.ct * gender_matchtype.ct + (gender_matchtype.ct ||  
##     participant_id)
##    Data: real_data
## Control: lmerControl(optimizer = "bobyqa")
## 
## REML criterion at convergence: 63856.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2663 -0.6415 -0.1097  0.8181  2.5638 
## 
## Random effects:
##  Groups           Name                Variance Std.Dev.
##  participant_id   (Intercept)          0.3408  0.5838  
##  participant_id.1 gender_matchtype.ct  0.0000  0.0000  
##  Residual                             14.0287  3.7455  
## Number of obs: 11638, groups:  participant_id, 42
## 
## Fixed effects:
##                                   Estimate Std. Error        df t value
## (Intercept)                      1.383e+00  9.654e-02 3.996e+01  14.326
## condition.ct                     5.617e-02  1.933e-01 3.996e+01   0.291
## gender_matchtype.ct              2.790e-01  6.951e-02 1.160e+04   4.013
## condition.ct:gender_matchtype.ct 2.963e-01  1.392e-01 1.160e+04   2.129
##                                  Pr(>|t|)    
## (Intercept)                       < 2e-16 ***
## condition.ct                       0.7729    
## gender_matchtype.ct              6.03e-05 ***
## condition.ct:gender_matchtype.ct   0.0333 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) cndtn. gndr_.
## conditin.ct -0.004              
## gndr_mtcht.  0.000  0.002       
## cndtn.ct:_.  0.002  0.000  0.000
## optimizer (bobyqa) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
grandmean = summary(model4)$coefficients["(Intercept)", "Estimate" ]
withinCondition = summary(model4)$coefficients["gender_matchtype.ct", "Estimate" ]
betweenCondition = summary(model4)$coefficients["condition.ct", "Estimate" ]
interaction= summary(model4)$coefficients["condition.ct:gender_matchtype.ct", "Estimate" ]
load("subj_v_cov.rData")
residual= subset(as.data.frame(VarCorr(model4)), grp=="Residual")[["sdcor"]]

Create Functions

Function to Simulate Dataset

  • n_sub is the number of participants we assume in our simulation (must be even as code assumes half in each of the two conditions)
  • n_obs is the number of observations for each of the subjects (which is assumed to be the same in both conditions)
  • grandmean is grandmean
  • withinCondition is estimate of the difference between the conditions (i.e. estimate of coefficient for main effect of withincondition.ct)
  • betweenCondition is difference between the two levels of between condition (Estimate of coefficient for main effect of betweencondition.ct)
  • interaction is difference of differences (interaction between the two variables)
  • subj_v_cov is the variance co-variance matrix for subject effects. We use this to generate correlated by-participant intercept and slope
  • residual is the standard mean of the residual error term
# function to simulate dataset
generate_LINEAR_v4 <- function(n_subj, n_obs, grandmean, withinCondition,betweenCondition, interaction,subj_v_cov, subj_tau,residual) {
    # create empty data frame where number of rows = number of subjects *     number of trials per subject
    data <- data.frame(matrix(0, ncol=0, nrow = n_subj * n_obs ))
    # make participant vector and add to this data frame
    data$participant <- as.factor(rep(seq(1:n_subj), each = n_obs))
    # add within condition- assumes each participant has equal n trials in both conditions
    data$withinCondition<-as.factor(rep((rep(c(0,1), each = n_obs/2)),n_subj)) #this is within
    # make centred version
    data$withinCondition.ct <- as.numeric(data$withinCondition) - mean(as.numeric(data$withinCondition))
    # add between condition
    data$betweenCondition <- as.factor(c(rep(1, n_subj/2*n_obs), rep(0, n_subj/2*n_obs)))
    # make centred version
    data$betweenCondition.ct <- as.numeric(data$betweenCondition) - mean(as.numeric(data$betweenCondition))

    # create table with participant random intercept and participant random slope for the withincondition where the sd of each and  correlation between the two is from the variance covariance matrix for subject effects sub_v_cov matrix
    u <- mvrnorm(n = n_subj, c(0,0), subj_v_cov)

    # get the residual error- have one per row
    resid <- rnorm(n = n_subj*n_obs, 0,residual )

    # finally, generate data on the basis of these parameters we begin by generating a linear term (note eta is the same as y in the linear version of the function, except if was linear would need to add a residual)
    # = grandmean + random intercept for participants + effect of condition + random slope for condition for this participant+interaction

    data$y = resid + grandmean + u[data$participant,1]  + data$withinCondition.ct * (withinCondition +u[data$participant,2])
    + data$betweenCondition.ct * (betweenCondition) + (data$withinCondition.ct *data$betweenCondition.ct)* interaction

    data$y = grandmean +  u[data$participant,1]  + data$withinCondition.ct * (withinCondition +u[data$participant,2]) + data$betweenCondition.ct * (betweenCondition) + (data$withinCondition.ct *data$betweenCondition.ct)* interaction +resid

    return(data)

  }

Run Simulation

Ns=c(50, 100, 150, 200) # a vector the size of N we want to simulate
nsims = 1000 # number of simulations - should be large for montecarlo (1000 or more)
n = vector()
H1estimate = betweenCondition
H1condition_Bf = vector()
H1condition_pvalue = vector()
H2estimate = withinCondition
H2condition_Bf = vector()
H2condition_pvalue = vector()
H3estimate = interaction
H3condition_Bf = vector()
H3condition_pvalue = vector()

singular = vector()
convergeError = vector()
i = 1

 for (n_subj in Ns) {
   for (j in 1:nsims){

     d = generate_LINEAR_v4(n_subj, n_obs= 280, grandmean=grandmean, withinCondition=withinCondition,betweenCondition=betweenCondition, interaction=interaction, subj_v_cov = subj_v_cov, residual=residual)

     model_simulated = suppressMessages( lmer(y ~ withinCondition.ct *betweenCondition.ct + (withinCondition.ct||participant),
                            control = lmerControl(optimizer = "bobyqa"),
                            data = d))

     # collect singularity error if not duplicate of singularity warning
     singular[i] = isSingular(model_simulated)
     # collect convergence error if not duplicate of singularity warning
     convergeError[i]=summary(model_simulated)$optinfo$conv$opt# optimizer checks convergence
     #print(i)
     n[i]=n_subj
     H1condition_pvalue[i] = summary(model_simulated)$coeff[ "betweenCondition.ct","Pr(>|t|)"]
     H1condition_est = summary(model_simulated)$coeff[ "betweenCondition.ct","Estimate"]
     H1condition_se = summary(model_simulated)$coeff[ "betweenCondition.ct","Std. Error"]
     H1condition_Bf[i] = Bf(sd=H1condition_se,  obtained=H1condition_est, uniform=0, meanoftheory=0, sdtheory= H1estimate, tail=1)$BayesFactor

     H2condition_pvalue[i] = summary(model_simulated)$coeff[ "withinCondition.ct","Pr(>|t|)"]
     H2condition_est = summary(model_simulated)$coeff[ "withinCondition.ct","Estimate"]
     H2condition_se = summary(model_simulated)$coeff[ "withinCondition.ct","Std. Error"]
     H2condition_Bf[i] = Bf(sd=H2condition_se,  obtained=H2condition_est, uniform=0, meanoftheory=0, sdtheory= H2estimate, tail=1)$BayesFactor

     H3condition_pvalue[i] = summary(model_simulated)$coeff[ "withinCondition.ct:betweenCondition.ct","Pr(>|t|)"]
     H3condition_est = summary(model_simulated)$coeff[ "withinCondition.ct:betweenCondition.ct","Estimate"]
     H3condition_se = summary(model_simulated)$coeff[ "withinCondition.ct:betweenCondition.ct","Std. Error"]
     H3condition_Bf[i] = Bf(sd=H3condition_se,  obtained=H3condition_est, uniform=0, meanoftheory=0, sdtheory= H3estimate, tail=1)$BayesFactor

     #
     i=i+1


   }
 }
## Warning: Model failed to converge with 1 negative eigenvalue: -1.9e+02
## Warning: Model failed to converge with 1 negative eigenvalue: -2.8e+02
## Warning: Model failed to converge with 1 negative eigenvalue: -2.6e+02
## Warning: Model failed to converge with 1 negative eigenvalue: -8.7e+00
 results =data.frame(n, H1condition_Bf,  H1condition_pvalue, H2condition_Bf, H2condition_pvalue,H3condition_Bf, H3condition_pvalue, singular,convergeError)

 eyetracking_sim <- results

sample_size <- unique(eyetracking_sim$n)

bayes_power_3 <- vector()

bayes_power_3[1] <- mean((subset(eyetracking_sim, n==50)$H3condition_Bf>3)*1)
bayes_power_3[2] <- mean((subset(eyetracking_sim, n==100)$H3condition_Bf>3)*1)
bayes_power_3[3] <- mean((subset(eyetracking_sim, n==150)$H3condition_Bf>3)*1)
bayes_power_3[4] <- mean((subset(eyetracking_sim, n==200)$H3condition_Bf>3)*1)

bayes_power_.3_3 <- vector()

bayes_power_.3_3[1] <- mean((subset(eyetracking_sim, n==50)$H3condition_Bf>(1/3)&subset(eyetracking_sim, n==50)$H3condition_Bf<3)*1)
bayes_power_.3_3[2] <- mean((subset(eyetracking_sim, n==100)$H3condition_Bf>(1/3)&subset(eyetracking_sim, n==100)$H3condition_Bf<3)*1)
bayes_power_.3_3[3] <- mean((subset(eyetracking_sim, n==150)$H3condition_Bf>(1/3)&subset(eyetracking_sim, n==150)$H3condition_Bf<3)*1)
bayes_power_.3_3[4] <- mean((subset(eyetracking_sim, n==200)$H3condition_Bf>(1/3)&subset(eyetracking_sim, n==200)$H3condition_Bf<3)*1)


bayes_power_.3 <- vector()

bayes_power_.3[1] <- mean((subset(eyetracking_sim, n==50)$H3condition_Bf<(1/3))*1)
bayes_power_.3[2] <- mean((subset(eyetracking_sim, n==100)$H3condition_Bf<(1/3))*1)
bayes_power_.3[3] <- mean((subset(eyetracking_sim, n==150)$H3condition_Bf<(1/3))*1)
bayes_power_.3[4] <- mean((subset(eyetracking_sim, n==200)$H3condition_Bf<(1/3))*1)


bayes_total<-data.frame(sample_size,bayes_power_.3,bayes_power_.3_3,bayes_power_3)

bayes_total<- pivot_longer(bayes_total,cols=starts_with("bayes"),names_to = "Bayes_Factor",values_to = "Power")

bayes_total$Bayes_Factor[bayes_total$Bayes_Factor=="bayes_power_.3"]="Substaintial Evidence for H0"
bayes_total$Bayes_Factor[bayes_total$Bayes_Factor=="bayes_power_.3_3"]="Ambiguous"
bayes_total$Bayes_Factor[bayes_total$Bayes_Factor=="bayes_power_3"]="Substaintial Evidence for H1"


bayes_power_.3
## [1] 0.009 0.000 0.002 0.000
bayes_power_.3_3
## [1] 0.314 0.112 0.018 0.004
bayes_power_3
## [1] 0.677 0.888 0.980 0.996
  Ns=c(50, 100, 150, 200) # a vector the size of N we want to simulate
  nsims=1000 # number of simulations - should be large for montecarlo (1000 or more)
  n=vector()
  H1estimate = betweenCondition
  H1condition_Bf =vector()
  H1condition_pvalue=vector()
  H2estimate = withinCondition
  H2condition_Bf =vector()
  H2condition_pvalue=vector()
  H3estimate = interaction
  H3condition_Bf =vector()
  H3condition_pvalue=vector()
  singular=vector()
  convergeError = vector()
  i=1
  for (n_subj in Ns){
    for (j in 1:nsims){

      d = generate_LINEAR_v4(n_subj, n_obs= 280, grandmean=grandmean, withinCondition=withinCondition,betweenCondition=betweenCondition, interaction=0, subj_v_cov = subj_v_cov, residual=residual)

      model_simulated = suppressMessages( lmer(y ~ withinCondition.ct *betweenCondition.ct + (withinCondition.ct||participant),
                                               control = lmerControl(optimizer = "bobyqa"),
                                               data = d))

      # collect singularity error if not duplicate of singularity warning
      singular[i] = isSingular(model_simulated)
      # collect convergence error if not duplicate of singularity warning
      convergeError[i]=summary(model_simulated)$optinfo$conv$opt# optimizer checks convergence
      #print(i)
      n[i]=n_subj
      H1condition_pvalue[i] = summary(model_simulated)$coeff[ "betweenCondition.ct","Pr(>|t|)"]
      H1condition_est = summary(model_simulated)$coeff[ "betweenCondition.ct","Estimate"]
      H1condition_se = summary(model_simulated)$coeff[ "betweenCondition.ct","Std. Error"]
      H1condition_Bf[i] = Bf(sd=H1condition_se,  obtained=H1condition_est, uniform=0, meanoftheory=0, sdtheory= H1estimate, tail=1)$BayesFactor

      H2condition_pvalue[i] = summary(model_simulated)$coeff[ "withinCondition.ct","Pr(>|t|)"]
      H2condition_est = summary(model_simulated)$coeff[ "withinCondition.ct","Estimate"]
      H2condition_se = summary(model_simulated)$coeff[ "withinCondition.ct","Std. Error"]
      H2condition_Bf[i] = Bf(sd=H2condition_se,  obtained=H2condition_est, uniform=0, meanoftheory=0, sdtheory= H2estimate, tail=1)$BayesFactor

      H3condition_pvalue[i] = summary(model_simulated)$coeff[ "withinCondition.ct:betweenCondition.ct","Pr(>|t|)"]
      H3condition_est = summary(model_simulated)$coeff[ "withinCondition.ct:betweenCondition.ct","Estimate"]
      H3condition_se = summary(model_simulated)$coeff[ "withinCondition.ct:betweenCondition.ct","Std. Error"]
      H3condition_Bf[i] = Bf(sd=H3condition_se,  obtained=H3condition_est, uniform=0, meanoftheory=0, sdtheory= H3estimate, tail=1)$BayesFactor

      #
      i=i+1

    }
  }
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge: degenerate Hessian with 1 negative eigenvalues
## Warning: Model failed to converge with 1 negative eigenvalue: -8.7e+03
## Warning: Model failed to converge with 1 negative eigenvalue: -2.4e+03
## Warning: Model failed to converge with 1 negative eigenvalue: -4.5e+01
  null_results =data.frame(n, H1condition_Bf,  H1condition_pvalue, H2condition_Bf, H2condition_pvalue,H3condition_Bf, H3condition_pvalue, singular,convergeError)

  null_eyetracking_sim <- null_results

sample_size <- unique(null_eyetracking_sim$n)

null_bayes_power_3 <- vector()

null_bayes_power_3[1] <- mean((subset(null_eyetracking_sim, n==50)$H3condition_Bf>3)*1)
null_bayes_power_3[2] <- mean((subset(null_eyetracking_sim, n==100)$H3condition_Bf>3)*1)
null_bayes_power_3[3] <- mean((subset(null_eyetracking_sim, n==150)$H3condition_Bf>3)*1)
null_bayes_power_3[4] <- mean((subset(null_eyetracking_sim, n==200)$H3condition_Bf>3)*1)


null_bayes_power_.3_3 <- vector()

null_bayes_power_.3_3[1] <- mean((subset(null_eyetracking_sim, n==50)$H3condition_Bf>(1/3)&subset(null_eyetracking_sim, n==50)$H3condition_Bf<3)*1)
null_bayes_power_.3_3[2] <- mean((subset(null_eyetracking_sim, n==100)$H3condition_Bf>(1/3)&subset(null_eyetracking_sim, n==100)$H3condition_Bf<3)*1)
null_bayes_power_.3_3[3] <- mean((subset(null_eyetracking_sim, n==150)$H3condition_Bf>(1/3)&subset(null_eyetracking_sim, n==150)$H3condition_Bf<3)*1)
null_bayes_power_.3_3[4] <- mean((subset(null_eyetracking_sim, n==200)$H3condition_Bf>(1/3)&subset(null_eyetracking_sim, n==200)$H3condition_Bf<3)*1)


null_bayes_power_.3 <- vector()

null_bayes_power_.3[1] <- mean((subset(null_eyetracking_sim, n==50)$H3condition_Bf<(1/3))*1)
null_bayes_power_.3[2] <- mean((subset(null_eyetracking_sim, n==100)$H3condition_Bf<(1/3))*1)
null_bayes_power_.3[3] <- mean((subset(null_eyetracking_sim, n==150)$H3condition_Bf<(1/3))*1)
null_bayes_power_.3[4] <- mean((subset(null_eyetracking_sim, n==200)$H3condition_Bf<(1/3))*1)


null_bayes_total<-data.frame(sample_size,null_bayes_power_.3,null_bayes_power_.3_3,null_bayes_power_3)

null_bayes_total<- pivot_longer(null_bayes_total,cols=starts_with("null_bayes"),names_to = "Bayes_Factor",values_to = "Power")


null_bayes_total$Bayes_Factor[null_bayes_total$Bayes_Factor=="null_bayes_power_.3"]="Substaintial Evidence for H0"
null_bayes_total$Bayes_Factor[null_bayes_total$Bayes_Factor=="null_bayes_power_.3_3"]="Ambiguous"
null_bayes_total$Bayes_Factor[null_bayes_total$Bayes_Factor=="null_bayes_power_3"]="Substaintial Evidence for H1"

null_bayes_power_.3
## [1] 0.378 0.542 0.640 0.707
null_bayes_power_.3_3
## [1] 0.586 0.431 0.335 0.274
null_bayes_power_3
## [1] 0.036 0.027 0.025 0.019

Plot Power Analysis

  df_lines <- c("Substaintial Evidence for H0" = "dashed","Ambiguous" = "dotted", "Substaintial Evidence for H1" = "solid")


bayes_power_fig <- ggplot(bayes_total, aes(x=sample_size,y = Power,shape = Bayes_Factor,linetype = Bayes_Factor)) +
  geom_line()+
  scale_linetype_manual(values = df_lines)+
  geom_point() +
  scale_shape_manual(values=c("Substaintial Evidence for H0" = 1,"Ambiguous" = 4,"Substaintial Evidence for H1" = 15))+
  scale_y_continuous(limits=c(0, 1)) +
  labs(x = "Sample Size",
         y = "Power",
         shapes= "Legend")+
  ggtitle("H1 is true") +
  theme_classic()


null_bayes_power_fig <- ggplot(null_bayes_total, aes(x=sample_size,y = Power,shape = Bayes_Factor,linetype = Bayes_Factor)) +
  geom_line()+
  scale_linetype_manual(values = df_lines)+
  geom_point() +
  scale_shape_manual(values=c("Substaintial Evidence for H0" = 1,"Ambiguous" = 4,"Substaintial Evidence for H1" = 15))+
  scale_y_continuous(limits=c(0, 1)) +
  labs(x = "Sample Size",
         y = "Power",
         shapes= "Legend")+
  ggtitle("H0 is true") +
  theme_classic()

  library(ggpubr)
  ggarrange(bayes_power_fig, null_bayes_power_fig, ncol=2, nrow=1, common.legend = TRUE, legend="bottom")

For each of the 1000 datasets for each model for each N, we ran a linear mixed effect model and extracted the estimate and standard error for the interaction. We then used this information as our model of the data to compute a Bayes factor using the method recommended by Dienes (2008) (see also Silvey, Dienes & Wonnacott (under review; https://doi.org/10.31234/osf.io/m4hju) and Brekelmans Lavan Clayards & Wonnacott (2022), comparing a point null against a half normal with a mean = 0 and sd = 0.296 (i.e. equal to our predicted effect size for interaction effect based on the pilot data analysis). For each of the datasets where H1 is true and where H0 is true, we look at the proportion of runs where the BF met the criteria for substantial evidence for H1 (BF>3), substantial evidence for the null (BF<1/3), or did not meet the criteria for either and was therefore ambiguous. This is shown in Figure 1. (Note that we also show proportions meeting the criteria for “strong” evidence i.e. above 10/ below 0.1 - this is shown for information however based our analyses on meeting the threshold for substantial criteria - which in the case of H1 is roughly the same strength of evidence as the more familiar p<.05) with larger sample sizes (N = 150 and N = 200), it is rare to incorrectly accept H1 when H0 is true, and vice versa. However, the challenge is avoiding obtaining evidence which is ambiguous (i.e., 0.33<BF<3). We can see from the plots that obtaining unambiguous evidence for H0 requires a larger sample size than H1, and that an increase between N=150 and N=200 does not lead to substantial gains in this respect. Thus balancing with feasibility, in each strand 2 experiment, N=150 participants seems reasonable, which will provide us with almost 100% chance of obtaining evidence for H1 if H1 is true, and approximately 66% chance of obtaining substantial evidence (BF<0.33) of H0 if H0 is true.

Simulation 2: Elicitation Task (Strand 3)

Simulation was used to explore the relationship between sample size and evidence for H1/H0, focusing on our key hypothesis that there will be more modification of nouns using adjectives at first mention than subsequent mentions. The dependent variable was the proportion of nouns that were modified per sequence. The key hypothesis in the elicitation task relates to the difference in the proportion of modified adjectives at first and later mentions. The dependent variable was the proportion of nouns that were modified per sequence. Once again, for different values of N (total number of participants) we simulated 1000 data sets (with 2 responses for first and 2 for later mentions) given (1) a parameter set where H1 was true and (2) a parameter set where H0 was true:

H1 is true:

We generated responses using a model with a grandmean of 0.173, and two within participants factors: mention (two levels) with an effect of 0.085, and condition (two levels) with an effect of 0.039, and an interaction between mention and condition with and effect of 0.025. There were random intercepts for participant and a random by participant slope for mention and condition (intercept: sd = 0.167; slope for mention = .111; slope for condition = 0.143; slope for mention by condition interaction = 0.143; correlations: intercept and mention = -.197; intercept and condition = 0.333; intercept and interaction = -.06; mention and condition = -.514; mention and interaction = 0.681; condition and interaction = 0.269). We also generated a residual term for each participant based on the standard mean of the residual error term. These values were extracted from a model run over the pilot data.

H0 is true:

Responses were generated based on a model identical to the previous model except that the within subjects effect of mention was set to 0.

Code to Run Simulation

clear R

rm(list = ls())
loadFunctionsGithub <-function(listFunctions){
  
  # set Github url folders
  urlFolder <- 'https://api.github.com/repos/n400peanuts/languagelearninglab/git/trees/master?recursive=1'
  urlRaw <- 'https://raw.githubusercontent.com/n400peanuts/languagelearninglab/master/tools/'
  
  if (!require(httr)) {
    stop("httr not installed")
  } else {
    print('----Downloading. Please wait----')
  };
  
  httr::GET(urlFolder)-> req #establish connection
  stop_for_status(req) # store errors if any
  filelist <- unlist(lapply(content(req)$tree, "[", "path"), use.names = F) # list all the functions in the folder
  urlFunctions <- grep("docs/tools/", filelist, value = TRUE, fixed = TRUE) # select the ones stored in the "docs" folder
  gsub("docs/tools/", "", urlFunctions) -> functions # remove unwanted strings
  
  for(x in listFunctions){
    if(!(x %in%functions)){
      stop(paste0("Could not find function: '", x,"'"))
    }
  }
  
  if (length(listFunctions) == 0){ #load all
    for (i in 1:length(functions)){
      httr::GET(paste0(urlRaw, functions[i]), ssl.verifypeer = FALSE)-> temp
      content(temp)->p3
      eval(parse(text = p3), envir = .GlobalEnv) #load them in the R environment
    } 
  } else {
    functions[functions %in% listFunctions]-> functionsIlike #only the one that I specified
    for (i in 1:length(functionsIlike)){
      httr::GET(paste0(urlRaw, functionsIlike[i]), ssl.verifypeer = FALSE)-> temp
      content(temp)->p3
      eval(parse(text = p3), envir = .GlobalEnv) #load them in the R environment
    }
  };
  print('----Download completed----')
}
listFunctions <- c("summarySEwithin.R", "summarySE.R","normDataWithin.R","lizCenter.R","myCenter.R","Bf.R","Bf_powercalc.R")
loadFunctionsGithub(listFunctions)
## [1] "----Downloading. Please wait----"
## [1] "----Download completed----"

Analyse Pilot Study Data

elicitation = read.csv("elicitation_data.csv")
elicitation$mention<- as.factor(elicitation$mention)
elicitation$mention= relevel(elicitation$mention,ref="Other")
elicitation<- lizCenter(elicitation, list("mention","condition"))

# Run lmer on this data, then extract relevant values

tmp_model = lmer(mod ~ 1+mention.ct * condition.ct +(mention.ct*condition.ct|PID), 
              control = lmerControl(optimizer = "bobyqa"), 
              data = elicitation)
## boundary (singular) fit: see help('isSingular')
summary(tmp_model)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: mod ~ 1 + mention.ct * condition.ct + (mention.ct * condition.ct |  
##     PID)
##    Data: elicitation
## Control: lmerControl(optimizer = "bobyqa")
## 
## REML criterion at convergence: -81.5
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.9788 -0.4475 -0.0438  0.2494  3.8131 
## 
## Random effects:
##  Groups   Name                    Variance Std.Dev. Corr             
##  PID      (Intercept)             0.02799  0.1673                    
##           mention.ct              0.01245  0.1116   -0.20            
##           condition.ct            0.03746  0.1935    0.33 -0.51      
##           mention.ct:condition.ct 0.02046  0.1430   -0.06  0.68  0.27
##  Residual                         0.01677  0.1295                    
## Number of obs: 201, groups:  PID, 34
## 
## Fixed effects:
##                         Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)              0.17336    0.03019 33.03000   5.743 2.05e-06 ***
## mention.ct               0.08572    0.02725 33.91925   3.146  0.00344 ** 
## condition.ct             0.03837    0.03813 32.86403   1.006  0.32163    
## mention.ct:condition.ct  0.02422    0.04591 38.43044   0.528  0.60076    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) mntn.c cndtn.
## mention.ct  -0.130              
## conditin.ct  0.268 -0.317       
## mntn.ct:cn. -0.033  0.254  0.129
## optimizer (bobyqa) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
grandmean = summary(tmp_model)$coefficients["(Intercept)", "Estimate" ]
withinConditionA = summary(tmp_model)$coefficients["mention.ct", "Estimate" ]
withinConditionB = summary(tmp_model)$coefficients["condition.ct", "Estimate" ]
interactionAB = summary(tmp_model)$coefficients["mention.ct:condition.ct", "Estimate" ]
subj_v_cov=as.matrix(Matrix::bdiag(VarCorr(tmp_model)))
residual= subset(as.data.frame(VarCorr(tmp_model)), grp=="Residual")[["sdcor"]]

Create Functions

Function to Simulate Dataset

  • n_sub is the number of participants we assume in our simulation (must be even as code assumes half in each of the two conditions)
  • n_obs is the number of observations for each of the subjects (which is assumed to be the same in both conditions)
  • grandmean is grandmean
  • withinCondition is estimate of the difference between the conditions (i.e. estimate of coefficient for main effect of withincondition.ct)
  • betweenCondition is difference between the two levels of between condition (Estimate of coefficient for main effect of betweencondition.ct)
  • interaction is difference of differences (interaction between the two variables)
  • subj_v_cov is the variance co-variance matrix for subject effects. We use this to generate correlated by-participant intercept and slope
  • residual is the standard mean of the residual error term
# function to simulate dataset
generate_linear_v5 <- function(n_subj, n_obs, grandmean, withinConditionA,withinConditionB, interactionAB,subj_v_cov,residual) {
  # create empty data frame where number of rows = number of subjects *     number of trials per subject 
  data <- data.frame(matrix(0, ncol=0, nrow = n_subj * n_obs ))
  # make participant vector and add to this data frame
  data$participant <- as.factor(rep(seq(1:n_subj), each = n_obs))
  # add in condition- assumes each participant has equal n trials in both conditions 
  data$withinConditionA<-as.factor(rep((rep(c(0,1), each = n_obs/2)),n_subj)) #this is within
  # make centred version
  data$withinConditionA.ct <- as.numeric(data$withinConditionA) - mean(as.numeric(data$withinConditionA))
  data$withinConditionB<-as.factor(rep(c(rep(0,n_obs/4),rep(1,n_obs/4)),n_subj))
  # make centred version
  data$withinConditionB.ct <- as.numeric(data$withinConditionB) - mean(as.numeric(data$withinConditionB))
  
  # create table four columns for (1)  participant random intercept (2) participant random slope for the withinconditionA (3)participant random slope for the withinconditionB (4) interaction. The sd of each and  correlation between each pair the two is from the variance covariance matrix for subject effects sub_v_cov matrix 
  
  u <- mvrnorm(n = n_subj, c(0,0,0,0), subj_v_cov)
  
  ## NOTE: from here is different from if it was linear version
  
  # finally, generate data on the basis of these parameters we begin by generating a linear term (note eta is the same as y in the linear version of the function, except if was linear would need to add a residual)
  # = grandmean + random intercept for participants 
  #+ effect of withinconditionA + random slope for withinconditionA for this participant
  #+ effect of withinconditionB + random slope for withinconditionB for this participant
  #+ effect of interaction + random slope for interaction for this participant
  #+ 
  #+ # get the residual error- have one per row
  resid <- rnorm(n = n_subj*n_obs, 0,residual )
  
  data$y = resid + grandmean + u[data$participant,1]  + data$withinConditionA.ct * (withinConditionA +u[data$participant,2]) + data$withinConditionB.ct * (withinConditionB +u[data$participant,3]) + (data$withinConditionA.ct *data$withinConditionB.ct)* (interactionAB +u[data$participant,4])     


  return(data)
  
}

Run Simulation

We are interested in the interaction between condition (between subjects) and gender matchtype (within subjects), which is “H3” throughout.

Ns=c(50,100,150,200) # a vector the size of N we want to simulate
nsims=1000 # number of simulations - this is small but should be large for montecarlo (1000 or more) 
H1estimate = withinConditionA
H2estimate = withinConditionB
H3estimate = interactionAB
n=vector()
H1condition_Bf =vector()
H1condition_pvalue=vector()
singular=vector()
convergeError = vector()
i=1
for (n_subj in Ns){
  for (j in 1:nsims){
    d = generate_linear_v5(n_subj, n_obs= 4, grandmean=grandmean, withinConditionA=withinConditionA,withinConditionB=withinConditionB, interactionAB=interactionAB, subj_v_cov = subj_v_cov, residual=residual)

    model_simulated <- suppressMessages( lmer(y ~ 1+withinConditionA.ct * withinConditionB.ct+(withinConditionA.ct||participant),
                                              control = lmerControl(optimizer = "bobyqa"),
                                              data = d))

    # collect singularity error if not duplicate of singularity warning
    singular[i] = isSingular(model_simulated)
    # collect convergence error if not duplicate of singularity warning
    convergeError[i]=summary(model_simulated)$optinfo$conv$opt# optimizer checks convergence
    #print(i)
    n[i]=n_subj
    H1condition_pvalue[i] = summary(model_simulated)$coeff[ "withinConditionA.ct","Pr(>|t|)"]
    H1condition_est = summary(model_simulated)$coeff[ "withinConditionA.ct","Estimate"]
    H1condition_se = summary(model_simulated)$coeff[ "withinConditionA.ct","Std. Error"]
    H1condition_Bf[i] = Bf(sd=H1condition_se,  obtained=H1condition_est, uniform=0, meanoftheory=0, sdtheory=H1estimate, tail=1)$BayesFactor

    i=i+1


  }

}

results =data.frame(n, H1condition_Bf,  H1condition_pvalue, singular,convergeError)

elicitation_sim <- results

sample_size <- unique(elicitation_sim$n)

bayes_power_3 <- vector()

bayes_power_3[1] <- mean((subset(elicitation_sim, n==50)$H1condition_Bf>3)*1)
bayes_power_3[2] <- mean((subset(elicitation_sim, n==100)$H1condition_Bf>3)*1)
bayes_power_3[3] <- mean((subset(elicitation_sim, n==150)$H1condition_Bf>3)*1)
bayes_power_3[4] <- mean((subset(elicitation_sim, n==200)$H1condition_Bf>3)*1)

bayes_power_.3_3 <- vector()

bayes_power_.3_3[1] <- mean((subset(elicitation_sim, n==50)$H1condition_Bf>(1/3)&subset(elicitation_sim, n==50)$H1condition_Bf<3)*1)
bayes_power_.3_3[2] <- mean((subset(elicitation_sim, n==100)$H1condition_Bf>(1/3)&subset(elicitation_sim, n==100)$H1condition_Bf<3)*1)
bayes_power_.3_3[3] <- mean((subset(elicitation_sim, n==150)$H1condition_Bf>(1/3)&subset(elicitation_sim, n==150)$H1condition_Bf<3)*1)
bayes_power_.3_3[4] <- mean((subset(elicitation_sim, n==200)$H1condition_Bf>(1/3)&subset(elicitation_sim, n==200)$H1condition_Bf<3)*1)


bayes_power_.3 <- vector()

bayes_power_.3[1] <- mean((subset(elicitation_sim, n==50)$H1condition_Bf<(1/3))*1)
bayes_power_.3[2] <- mean((subset(elicitation_sim, n==100)$H1condition_Bf<(1/3))*1)
bayes_power_.3[3] <- mean((subset(elicitation_sim, n==150)$H1condition_Bf<(1/3))*1)
bayes_power_.3[4] <- mean((subset(elicitation_sim, n==200)$H1condition_Bf<(1/3))*1)


bayes_total<-data.frame(sample_size,bayes_power_.3,bayes_power_.3_3,bayes_power_3)

bayes_total<- pivot_longer(bayes_total,cols=starts_with("bayes"),names_to = "Bayes_Factor",values_to = "Power")

bayes_total$Bayes_Factor[bayes_total$Bayes_Factor=="bayes_power_.3"]="Substaintial Evidence for H0"
bayes_total$Bayes_Factor[bayes_total$Bayes_Factor=="bayes_power_.3_3"]="Ambiguous"
bayes_total$Bayes_Factor[bayes_total$Bayes_Factor=="bayes_power_3"]="Substaintial Evidence for H1"

bayes_power_.3
## [1] 0 0 0 0
bayes_power_.3_3
## [1] 0.085 0.001 0.000 0.000
bayes_power_3
## [1] 0.915 0.999 1.000 1.000
Ns=c(50,100,150,200) # a vector the size of N we want to simulate
nsims=1000 # number of simulations - this is small but should be large for montecarlo (1000 or more)
H1estimate = withinConditionA
H2estimate = withinConditionB
H3estimate = interactionAB
n=vector()
H1condition_Bf =vector()
H1condition_pvalue=vector()
singular=vector()
convergeError = vector()
i=1
for (n_subj in Ns){

  for (j in 1:nsims){
    d = generate_linear_v5(n_subj, n_obs= 4, grandmean=grandmean, withinConditionA=0,withinConditionB=withinConditionB, interactionAB=interactionAB, subj_v_cov = subj_v_cov, residual=residual)

    model_simulated <- suppressMessages( lmer(y ~ 1+withinConditionA.ct * withinConditionB.ct+(withinConditionA.ct||participant),
                                              control = lmerControl(optimizer = "bobyqa"),
                                              data = d))

    # collect singularity error if not duplicate of singularity warning
    singular[i] = isSingular(model_simulated)
    # collect convergence error if not duplicate of singularity warning
    convergeError[i]=summary(model_simulated)$optinfo$conv$opt# optimizer checks convergence
    #print(i)
    n[i]=n_subj
    H1condition_pvalue[i] = summary(model_simulated)$coeff[ "withinConditionA.ct","Pr(>|t|)"]
    H1condition_est = summary(model_simulated)$coeff[ "withinConditionA.ct","Estimate"]
    H1condition_se = summary(model_simulated)$coeff[ "withinConditionA.ct","Std. Error"]
    H1condition_Bf[i] = Bf(sd=H1condition_se,  obtained=H1condition_est, uniform=0, meanoftheory=0, sdtheory=H1estimate, tail=1)$BayesFactor

    i=i+1

  }

}

null_results =data.frame(n, H1condition_Bf,  H1condition_pvalue, singular,convergeError)

null_elicitation_sim <- null_results

null_bayes_power_3 <- vector()

null_bayes_power_3[1] <- mean((subset(null_elicitation_sim, n==50)$H1condition_Bf>3)*1)
null_bayes_power_3[2] <- mean((subset(null_elicitation_sim, n==100)$H1condition_Bf>3)*1)
null_bayes_power_3[3] <- mean((subset(null_elicitation_sim, n==150)$H1condition_Bf>3)*1)
null_bayes_power_3[4] <- mean((subset(null_elicitation_sim, n==200)$H1condition_Bf>3)*1)

null_bayes_power_.3_3 <- vector()

null_bayes_power_.3_3[1] <- mean((subset(null_elicitation_sim, n==50)$H1condition_Bf>(1/3)&subset(null_elicitation_sim, n==50)$H1condition_Bf<3)*1)
null_bayes_power_.3_3[2] <- mean((subset(null_elicitation_sim, n==100)$H1condition_Bf>(1/3)&subset(null_elicitation_sim, n==100)$H1condition_Bf<3)*1)
null_bayes_power_.3_3[3] <- mean((subset(null_elicitation_sim, n==150)$H1condition_Bf>(1/3)&subset(null_elicitation_sim, n==150)$H1condition_Bf<3)*1)
null_bayes_power_.3_3[4] <- mean((subset(null_elicitation_sim, n==200)$H1condition_Bf>(1/3)&subset(null_elicitation_sim, n==200)$H1condition_Bf<3)*1)


null_bayes_power_.3 <- vector()

null_bayes_power_.3[1] <- mean((subset(null_elicitation_sim, n==50)$H1condition_Bf<(1/3))*1)
null_bayes_power_.3[2] <- mean((subset(null_elicitation_sim, n==100)$H1condition_Bf<(1/3))*1)
null_bayes_power_.3[3] <- mean((subset(null_elicitation_sim, n==150)$H1condition_Bf<(1/3))*1)
null_bayes_power_.3[4] <- mean((subset(null_elicitation_sim, n==200)$H1condition_Bf<(1/3))*1)


null_bayes_total<-data.frame(sample_size,null_bayes_power_.3,null_bayes_power_.3_3,null_bayes_power_3)

null_bayes_total<- pivot_longer(null_bayes_total,cols=starts_with("null_bayes"),names_to = "Bayes_Factor",values_to = "Power")


null_bayes_total$Bayes_Factor[null_bayes_total$Bayes_Factor=="null_bayes_power_.3"]="Substaintial Evidence for H0"
null_bayes_total$Bayes_Factor[null_bayes_total$Bayes_Factor=="null_bayes_power_.3_3"]="Ambiguous"
null_bayes_total$Bayes_Factor[null_bayes_total$Bayes_Factor=="null_bayes_power_3"]="Substaintial Evidence for H1"


null_bayes_power_.3
## [1] 0.554 0.697 0.776 0.808
null_bayes_power_.3_3
## [1] 0.432 0.294 0.206 0.186
null_bayes_power_3
## [1] 0.014 0.009 0.018 0.006

Plot Power Analysis

  df_lines <- c("Substaintial Evidence for H0" = "dashed","Ambiguous" = "dotted", "Substaintial Evidence for H1" = "solid")


bayes_power_fig <- ggplot(bayes_total, aes(x=sample_size,y = Power,shape = Bayes_Factor,linetype = Bayes_Factor)) +
  geom_line()+
  scale_linetype_manual(values = df_lines)+
  geom_point() +
  scale_shape_manual(values=c("Substaintial Evidence for H0" = 1,"Ambiguous" = 4,"Substaintial Evidence for H1" = 15))+
  scale_y_continuous(limits=c(0, 1)) +
  labs(x = "Sample Size",
         y = "Power",
         shapes= "Legend")+
  ggtitle("H1 is true") +
  theme_classic()

null_bayes_power_fig <- ggplot(null_bayes_total, aes(x=sample_size,y = Power,shape = Bayes_Factor,linetype = Bayes_Factor)) +
  geom_line()+
  scale_linetype_manual(values = df_lines)+
  geom_point() +
  scale_shape_manual(values=c("Substaintial Evidence for H0" = 1,"Ambiguous" = 4,"Substaintial Evidence for H1" = 15))+
  scale_y_continuous(limits=c(0, 1)) +
  labs(x = "Sample Size",
         y = "Power",
         shapes= "Legend")+
  ggtitle("H0 is true") +
  theme_classic()

  library(ggpubr)
  ggarrange(bayes_power_fig, null_bayes_power_fig, ncol=2, nrow=1, common.legend = TRUE, legend="bottom")

For each of the 1000 datasets for each model for each N, we ran a linear mixed effect model and extracted the estimate and standard error for the within subjects effect of mention. We compared a point null against a half normal with a mean = 0 and sd = 0.085 (taken from the within subjects effect of mention from our pilot data analysis). For each of the datasets where H1 is true and where H0 is true, we look at the proportion of runs where the BF met the criteria for substantial evidence for H1 (BF>3), substantial evidence for the null (BF<1/3), or did not meet the criteria for either and was therefore ambiguous (shown in Figure 3). We also show proportions meeting the criteria for “strong” evidence i.e. above 10/ below 0.1 - this is shown for information however based our analyses on meeting the threshold for substantial criteria - which in the case of H1 is roughly the same strength of evidence as the more familiar p<.05. With larger sample sizes (N = 150 and N = 200), it is rare to incorrectly accept H1 when H0 is true, and vice versa. However, the challenge is avoiding obtaining evidence which is ambiguous (i.e.⅓ < BF<3). We can see from the plots that obtaining unambiguous evidence for H0 requires a larger sample size than H1, and that an increase between N=150 and N=200 does not lead to substantial gains in this respect. Thus balancing with feasibility, in each Strand 3 experiment, N=150 participants seems reasonable, which will provide us with almost 100% chance of obtaining evidence for H1 if H1 is true, and approximately 76% chance of obtaining substantial evidence (BF<0.33) of H0 if H0 is true.