All that has been changed compared with previous script posted 18.06.17 (https://rstudio-pubs-static.s3.amazonaws.com/285969_da2f695ddb9540ac85089744626ca729.html) is that a menu has been added, some points about conditions have been clarified, and typos have been fixed. (Note in particular there as a typo re BF factors line 26)
This contains planned analyses as of 18/06/2017 for a data set which we will start to collect on 19/06/2017. Plans may change as a result of statistical consultation (we are currently seeking advice on some details).
The study is a training study to be conducted with Year 3 (7-8 year old) children with no previous knowledge of Mandarin or any other tonal language, training them on the four tones (6 tone contrasts) in Mandarin. There will initially be two conditions: (1) pictures-only: they are trained using a 2AFC task where they hear a (real) Mandarin word and identify which of two pictures it refers to (one correct, one a foil which differs only by tone); (2) pictures+diacritics - same task but the pictures are accompanied by pictures of the four diacritics. [If resources permit, we may later run a diacritics only condition]. Data will be collected from the training task and from some test tasks conducted pre and post training.
Our main analyses will be Bayesian and we will compute Bayes factors using the method advocated by Dienes (2008, 2015); we will also alongside provide frequentist statistics. Frequentist statistics will computed using logistic mixed effect models, this method is used since in each case we have a dependent variable of correct/incorrect response. Note that although mixed models allow us to include both participants and items, we do not intend to include items in the analyses since power on this dimension will be low. (NB - (i) we counterbalance trained/untrained wordlists across participants; (ii) it is not common for by items analyses to be included for work in this area; (iii) increasing the number of items in a learning study of this type with children is not feasible). We will use full random slopes structure for predictors of interest (generally, condition and session/pre->post). Factors such as tone contrast for the trial will be included as fixed factors but we will not include random slopes for these.
For the Bayesian analyses, we will continue to work in log odds space (as for frequentist analyses) to meet assumptions of normality, using estimates and standard errors which come from the logistic mixed effects models. Following Dienes (2008) we will model H1 by using an estimate of the mean for theory (an estimate coming from either from logistic mixed effects models run over pilot data, or in some cases from elsewhere in the data) as the SD of a half normal (for one tailed) or full normal (for two tailed) distribution. We will say we have substantial evidence for H1 is BF>3 and for H0 is BF<.33 [TYPO FIXED HERE FROM PREVIOUS]
See: https://doi.org/10.1016/j.jml.2017.01.005 for published paper using this approach and Rscript http://rpubs.com/ewonnacott/242454.
Note that the pilot data referred to below comes from three undergraduate student projects. None were exactly the same as the experiment planned here, but materials were very similar and in some cases similar questions were asked. The models of the pilot data presented below also serve to exemplify the approach to be taken with the current data.
Where possible, we conducted power analyses, although lme’s will be used for the actual analyses, we conducted the analyses as though for (near equivalent) t-tests over by subject proportions. This means that the values are only approximate, but since our main analyses are BF factors our aim here is to get an idea of approximate suitable sample sizes.
In addition to power analyses, where possible we also conducted Bayesian analyses on relevant parts of the pilot data to see if we had a sufficient sample for BF>3 or BF<.33. We also look to see what size sample might be expected to give BF>3 or BF<.33, given that SE varies as square root (N).
The analyses below suggest that there are cases where a sample of 20 participants per condition could be sufficient to achieve 90% power. Interactions with condition will be harder to power; as discussed below, there may be situations where there is evidence of H1 (i.e. data reflects learning) in condition1 but evidence for H0 (no evidence of learning in this data) for condition2. If so, we will deem this sufficient to draw conclusions about the different training methods.
Our policy will be as follows: we will inspect the data after we have approximately 20 participants in each condition (Note: this is approximate since we aim to collect this data by the end of the current academic year (July 2017); if we don’t quite get this many children, we will still look at the sample we have). If results for key analyses are inconclusive, resources permitting we will continue to test more participants, collecting 10 per condition at each step before inspecting the data again at each step. We will continue until N per condition = 50, resources permitting.
Final note: we have designed the experiment to have four sessions as follows:
session1 pre-tests: discrimination and word repetition training
session2 training post-tests: discrimination and word repetition 2AFC pic test/ 2AFC tone test session3 training
session4 training post-tests: discrimination and word repetition 2AFC pic test/ 2AFC tone test
For our first inspection of the data, we will conduct analyses both on session2 data and session4 data (both compared to relevant pre-tests; for training data itself, we will look at sessions1 and 2 versus session 1, 2, 3 & 4). If using data from the extra two sessions (session3 and session4) does not affect the key pattern of results, given the large expenditure of resources per participant, we will drop the final two sessions for ongoing participant collection.
Note that frequentist analyses will not be valid at .05 level, due to stopping problem. The current plan is to nevertheless report with that caveat; BF will be considered the critical analyses.
rm(list=ls())
#suppressPackageStartupMessages(library(stringdist))
#library(languageR)
library(lattice)
suppressPackageStartupMessages(library(lme4))
library(plotrix)
#suppressPackageStartupMessages(library(irr))
library(plyr)
library(knitr)
library(ggplot2) #for graphs
#library(reshape)
library(reshape2)
library(lsr)
library(pwr)
This function can be found on the website “Cookbook for R”
http://www.cookbook-r.com/Manipulating_data/Summarizing_data/
It summarizes data, giving count, mean, standard deviation, standard error of the mean, and confidence interval (default 95%).
summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=FALSE,
conf.interval=.95, .drop=TRUE) {
require(plyr)
# New version of length which can handle NA's: if na.rm==T, don't count them
length2 <- function (x, na.rm=FALSE) {
if (na.rm) sum(!is.na(x))
else length(x)
}
# This does the summary. For each group's data frame, return a vector with
# N, mean, and sd
datac <- ddply(data, groupvars, .drop=.drop,
.fun = function(xx, col) {
c(N = length2(xx[[col]], na.rm=na.rm),
mean = mean (xx[[col]], na.rm=na.rm),
sd = sd (xx[[col]], na.rm=na.rm)
)
},
measurevar
)
# Rename the "mean" column
datac <- rename(datac, c("mean" = measurevar))
datac$se <- datac$sd / sqrt(datac$N) # Calculate standard error of the mean
# Confidence interval multiplier for standard error
# Calculate t-statistic for confidence interval:
# e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1
ciMult <- qt(conf.interval/2 + .5, datac$N-1)
datac$ci <- datac$se * ciMult
return(datac)
}
This function can be found on the website “Cookbook for R”
http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_(ggplot2)/#Helper functions
From that website:
Summarizes data, handling within-subjects variables by removing inter-subject variability. It will still work if there are no within-subjects variables. Gives count, un-normed mean, normed mean (with same between-group mean), standard deviation, standard error of the mean, and confidence interval. If there are within-subject variables, calculate adjusted values using method from Morey (2008).
summarySEwithin <- function(data=NULL, measurevar, betweenvars=NULL, withinvars=NULL,
idvar=NULL, na.rm=FALSE, conf.interval=.95, .drop=TRUE) {
# Ensure that the betweenvars and withinvars are factors
factorvars <- vapply(data[, c(betweenvars, withinvars), drop=FALSE],
FUN=is.factor, FUN.VALUE=logical(1))
if (!all(factorvars)) {
nonfactorvars <- names(factorvars)[!factorvars]
message("Automatically converting the following non-factors to factors: ",
paste(nonfactorvars, collapse = ", "))
data[nonfactorvars] <- lapply(data[nonfactorvars], factor)
}
# Get the means from the un-normed data
datac <- summarySE(data, measurevar, groupvars=c(betweenvars, withinvars),
na.rm=na.rm, conf.interval=conf.interval, .drop=.drop)
# Drop all the unused columns (these will be calculated with normed data)
datac$sd <- NULL
datac$se <- NULL
datac$ci <- NULL
# Norm each subject's data
ndata <- normDataWithin(data, idvar, measurevar, betweenvars, na.rm, .drop=.drop)
# This is the name of the new column
measurevar_n <- paste(measurevar, "_norm", sep="")
# Collapse the normed data - now we can treat between and within vars the same
ndatac <- summarySE(ndata, measurevar_n, groupvars=c(betweenvars, withinvars),
na.rm=na.rm, conf.interval=conf.interval, .drop=.drop)
# Apply correction from Morey (2008) to the standard error and confidence interval
# Get the product of the number of conditions of within-S variables
nWithinGroups <- prod(vapply(ndatac[,withinvars, drop=FALSE], FUN=nlevels,
FUN.VALUE=numeric(1)))
correctionFactor <- sqrt( nWithinGroups / (nWithinGroups-1) )
# Apply the correction factor
ndatac$sd <- ndatac$sd * correctionFactor
ndatac$se <- ndatac$se * correctionFactor
ndatac$ci <- ndatac$ci * correctionFactor
# Combine the un-normed means with the normed results
merge(datac, ndatac)
}
This function is used by the SummarySEWithin function above. It can be found on the website “Cookbook for R”
http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_(ggplot2)/#Helper functions
From that website:
Norms the data within specified groups in a data frame; it normalizes each subject (identified by idvar) so that they have the same mean, within each group specified by betweenvars.
normDataWithin <- function(data=NULL, idvar, measurevar, betweenvars=NULL,
na.rm=FALSE, .drop=TRUE) {
#library(plyr)
# Measure var on left, idvar + between vars on right of formula.
data.subjMean <- ddply(data, c(idvar, betweenvars), .drop=.drop,
.fun = function(xx, col, na.rm) {
c(subjMean = mean(xx[,col], na.rm=na.rm))
},
measurevar,
na.rm
)
# Put the subject means with original data
data <- merge(data, data.subjMean)
# Get the normalized data in a new column
measureNormedVar <- paste(measurevar, "_norm", sep="")
data[,measureNormedVar] <- data[,measurevar] - data[,"subjMean"] +
mean(data[,measurevar], na.rm=na.rm)
# Remove this subject mean column
data$subjMean <- NULL
return(data)
}
This function outputs the centered values of a variable, which can be a numeric variable, a factor, or a data frame. It was taken from Florian Jaeger’s blog.
https://hlplab.wordpress.com/2009/04/27/centering-several-variables/.
From his blog:
myCenter= function(x) {
if (is.numeric(x)) { return(x - mean(x, na.rm=T)) }
if (is.factor(x)) {
x= as.numeric(x)
return(x - mean(x, na.rm=T))
}
if (is.data.frame(x) || is.matrix(x)) {
m= matrix(nrow=nrow(x), ncol=ncol(x))
colnames(m)= paste("c", colnames(x), sep="")
for (i in 1:ncol(x)) {
m[,i]= myCenter(x[,i])
}
return(as.data.frame(m))
}
}
This function provides a wrapper around myCenter allowing you to center a specific list of variables from a data frame.
The output is a copy of the data frame with a column (always a numeric variable) added for each of the centered variables. These columns are labelled with the column’s previous name, but with “.ct” appended (e.g., “variable1” will become “variable1.ct”).
lizCenter= function(x, listfname)
{
for (i in 1:length(listfname))
{
fname = as.character(listfname[i])
x[paste(fname,".ct", sep="")] = myCenter(x[fname])
}
return(x)
}
This function allows us to inspect particular coefficients from the output of an lme model by putting them in table.
#get_coeffs <- function(x,list){(kable(as.data.frame(summary(x)$coefficients)[list,],digits=3))}
get_coeffs <- function(x,list){(as.data.frame(summary(x)$coefficients)[list,])}
This function is equivalent to the Dienes (2008) calculator which can be found here: http://www.lifesci.sussex.ac.uk/home/Zoltan_Dienes/inference/Bayes.htm.
The code was provided by Baguely and Kayne (2010) and can be found here: http://www.academia.edu/427288/Review_of_Understanding_psychology_as_a_science_An_introduction_to_scientific_and_statistical_inference
Bf<-function(sd, obtained, uniform, lower=0, upper=1, meanoftheory=0,sdtheory=1, tail=2){
area <- 0
if(identical(uniform, 1)){
theta <- lower
range <- upper - lower
incr <- range / 2000
for (A in -1000:1000){
theta <- theta + incr
dist_theta <- 1 / range
height <- dist_theta * dnorm(obtained, theta, sd)
area <- area + height * incr
}
}else
{theta <- meanoftheory - 5 * sdtheory
incr <- sdtheory / 200
for (A in -1000:1000){
theta <- theta + incr
dist_theta <- dnorm(theta, meanoftheory, sdtheory)
if(identical(tail, 1)){
if (theta <= 0){
dist_theta <- 0
} else {
dist_theta <- dist_theta * 2
}
}
height <- dist_theta * dnorm(obtained, theta, sd)
area <- area + height * incr
}
}
LikelihoodTheory <- area
Likelihoodnull <- dnorm(obtained, 0, sd)
BayesFactor <- LikelihoodTheory / Likelihoodnull
ret <- list("LikelihoodTheory" = LikelihoodTheory,"Likelihoodnull" = Likelihoodnull, "BayesFactor" = BayesFactor)
ret
}
This works with the Bf function above. It requires the same values as that function (i.e. the obtained mean and SE for the current sample, a value for the predicted mean, which is set to be sdtheory (with meanoftheory=0), and the current number of participants N). However rather than return a BF for the current sample, it works out what the BF would be for a range of different subject numbers (assuming that the SE scales with sqrt(N)),
Bf_powercalc<-function(sd, obtained, uniform, lower=0, upper=1, meanoftheory=0, sdtheory=1, tail=2, N, min, max)
{
x = c(0)
y = c(0)
# note: working out what the difference between N and df is (for the contrast between two groups, this is 2; for constraints where there is 4 groups this will be 3, etc.)
for(newN in min : max)
{
B = as.numeric(Bf(sd = sd*sqrt(N/newN), obtained, uniform, lower, upper, meanoftheory, sdtheory, tail)[3])
x= append(x,newN)
y= append(y,B)
output = cbind(x,y)
}
output = output[-1,]
return(output)
}
DV: Each training trial produces a data point (did they pick the correct picture - 1/0; note they get feedback on this)
The child data below comes from a two session study where children were trained on stimuli similar to the “picture-only” condition in the current study. We have data from both 7 year olds and 11 year olds; the 11 year old data here are relevant in providing an estimate that can be used to inform H1 in our BF example analyses.
The adult data below come from a two session study where there were two conditions: (1) trained on stimuli similar to the “picture-only” condition in the current study (2) trained on pictures JUST of the diacritics. This is useful for considering possible effects of condition in the data (though with the caveat that the comparison is somewhat different).
Note: tone contrast is a fixed effect with 6 levels (six possible contrasts). It is expected to contribute to the model and is thus included but isn’t of specific interest.
child.train = read.csv("kidstrain_clean.csv")
adult.train = read.csv ("adultstrain_clean.csv")
child.train$agegroup = relevel(child.train$agegroup, ref="7years")
adult.train$condition = relevel(adult.train$condition, ref="i")
child.train.7 = subset(child.train, agegroup == "7years")
child.train.7 <- lizCenter(child.train.7, list("session", "tonecontrast"))
child.train.7.mod = glmer (result ~
+ session.ct
+ tonecontrast.ct
+ (session.ct|subject),
data = child.train.7, family = binomial, control=glmerControl(optimizer = "bobyqa"))
round(summary(child.train.7.mod)$coefficients,3)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.157 0.059 2.659 0.008
session.ct 0.158 0.077 2.062 0.039
tonecontrast.ct -0.026 0.022 -1.171 0.242
child.train.11 = subset(child.train, agegroup == "11years")
child.train.11 <- lizCenter(child.train.11, list("session", "tonecontrast"))
child.train.11.mod = glmer (result ~
+ session.ct
+ tonecontrast.ct
+ (session.ct|subject),
data = child.train.11, family = binomial, control=glmerControl(optimizer = "bobyqa"))
round(summary(child.train.11.mod)$coefficients,3)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.538 0.096 5.602 0.000
session.ct 0.301 0.085 3.538 0.000
tonecontrast.ct -0.051 0.023 -2.233 0.026
adult.train <- lizCenter(adult.train, list("session", "tonecontrast", "condition"))
adult.train.mod = glmer (result ~
+ session.ct * condition.ct
+ tonecontrast.ct
+ (session.ct|subject),
data = adult.train, family = binomial, control=glmerControl(optimizer = "bobyqa"))
round(summary(adult.train.mod)$coefficients,3)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.961 0.091 10.555 0.000
session.ct 0.487 0.067 7.263 0.000
condition.ct 0.416 0.181 2.291 0.022
tonecontrast.ct 0.067 0.010 6.740 0.000
session.ct:condition.ct -0.032 0.133 -0.241 0.809
We saw this in the pilot data from the seven year olds above, which was in a condition like the pictures-only condition. The pictures+diacritics condition contains the same information (plus some additional information) so we predict (at least - see below) above chance performance here too.
mean: 53.9%; log odds (beta from lme): 0.1572362; odds (exp(beta)):1.1702719;
We plan to use an lme model similar to that used on the 7-year-old pilot data above but with data from both conditions (pictures-only vs. pictures+diacritics). We will fit separate intercepts for each condition and compare each to chance = .50 (i.e. default value returned). Note that since all predictors are centered, the intercept reflects the overall average (rather than at base levels of a factor).
We will do this both for a model with just first two sessions and for a model with all data (with session then having four possible values and coded as a centered numerical factor).
Summary of data for each condition: mean and SE for intercept from lme; value to inform H1: estimate from above pilot data for seven year olds 0.1579623
NOTE: If (e.g.) we find an effect for condition1, we can also use the estimate of the intercept for that to inform H1 when looking at data from condition2; and vice versa. This is a better estimate since the conditions are more closely matched to each other than either is to the pilot data.
Look at the sample size required to obtain different levels of power with pilot data from 7 year olds:
dataS = as.numeric(with(droplevels(child.train.7), tapply(result,list(subject), mean, na.rm=T)))
d = cohensD(x = dataS, mu=0.5)
pwr.t.test(n = NULL, d = d, sig.level = 0.05, power = .9, type = c("one.sample"), alternative = c( "greater"))
One-sample t test power calculation
n = 20.5642
d = 0.6690231
sig.level = 0.05
power = 0.9
alternative = greater
pwr.t.test(n = NULL, d = d, sig.level = 0.05, power = .8, type = c("one.sample"), alternative = c( "greater"))
One-sample t test power calculation
n = 15.26043
d = 0.6690231
sig.level = 0.05
power = 0.8
alternative = greater
pwr.t.test(n = NULL, d = d, sig.level = 0.05, power = .7, type = c("one.sample"), alternative = c( "greater"))
One-sample t test power calculation
n = 11.97662
d = 0.6690231
sig.level = 0.05
power = 0.7
alternative = greater
rm(dataS)
rm(d)
Here we look at the pilot data from 7 year olds, using H1 informed by the estimate for 11 year olds. In our actual analyses with new data for 7 year olds we will be using an estimate from the current pilot data with 7 year olds to inform H1
meanBF = summary(child.train.7.mod)$coefficients["(Intercept)", "Estimate"]
seBF = summary(child.train.7.mod)$coefficients["(Intercept)", "Std. Error"]
h1mean = summary(child.train.11.mod)$coefficients["(Intercept)", "Estimate"]
Bf(seBF, meanBF, uniform = 0, meanoftheory = 0, sdtheory = h1mean, tail = 1)
$LikelihoodTheory
[1] 1.408113
$Likelihoodnull
[1] 0.1968384
$BayesFactor
[1] 7.153651
rm(meanBF)
rm(seBF)
We could reject H0 with the current sample of N = 15.
We saw this in the 7 year olds in the pilot study.
7 year olds means : 0.52, 0.56 7 year olds log odds (beta from lme): 0.1579623 7 year old odd (exp(beta))1.171122
We will use an lme model similar to that used on the pilot data above but with data from both conditions (pictures-only vs. pictures+diacritics). We will fit separate slopes for session for each condition. We will do this both for a model with just the first two sessions and for a model with all data.
Summary of data for each condition: mean and SE for session.ct for each level of condition from lme value to inform H1: estimate from above pilot data for seven year olds 0.1579623
If it appears that one of the conditions (c1) shows a positive effect of session and one doesn’t (c2), we can also do BF analyses for c2 data with H1 informed by the estimate from c1 (this is a better estimate than the pilot data since the conditions are more closely matched to each other than either is to the pilot data).
dataS = with(droplevels(child.train.7), tapply(result,list(subject,session), mean, na.rm=T))
d= cohensD(x = dataS[,1], y = dataS[,2], method = "paired")
pwr.t.test(n = NULL, d = d, sig.level = 0.05, power = .9, type = c("paired"), alternative = c("greater"))
Paired t test power calculation
n = 28.38133
d = 0.5634754
sig.level = 0.05
power = 0.9
alternative = greater
NOTE: n is number of *pairs*
pwr.t.test(n = NULL, d = d, sig.level = 0.05, power = .8, type = c("paired"), alternative = c("greater"))
Paired t test power calculation
n = 20.89366
d = 0.5634754
sig.level = 0.05
power = 0.8
alternative = greater
NOTE: n is number of *pairs*
pwr.t.test(n = NULL, d = d, sig.level = 0.05, power = .7, type = c("paired"), alternative = c( "greater"))
Paired t test power calculation
n = 16.25431
d = 0.5634754
sig.level = 0.05
power = 0.7
alternative = greater
NOTE: n is number of *pairs*
rm(dataS)
rm(d)
As for prediction 1, we again look at the pilot data from seven year olds, using H1 informed by the estimate for 11 year olds. In our actual analyses will be using an estimate from the pilot data to inform H1.
meanBF = summary(child.train.7.mod)$coefficients["session.ct", "Estimate"]
seBF = summary(child.train.7.mod)$coefficients["session.ct", "Std. Error"]
h1mean = summary(child.train.11.mod)$coefficients["session.ct", "Estimate"]
Bf(seBF, meanBF, uniform = 0, meanoftheory = 0, sdtheory = h1mean, tail = 1)
$LikelihoodTheory
[1] 2.206111
$Likelihoodnull
[1] 0.6215463
$BayesFactor
[1] 3.549392
rm(meanBF)
rm(seBF)
We have sufficient evidence to accept H1 over H0 on the basis of current sample.
We don’t have any pilot data with this type of manipulation for children.
This prediction is made on the basis of a related effect seen in adult data: they showed greater overall performance in a condition where they saw just diacritics rather than just pictures. Note that iconicity means that you can potentially respond when they are present. It is possible that having the diacritics present will help children too; though (a) children could well be different (diacritics might be too abstract) (b) it could be that having the picture there as well as the diacritic changes things.
Means from the pilot:
adult means : 0.67, 0.75 adult log odds (beta from lme): 0.4156183 adult odd (exp(beta))1.626952
WE will run an lme model similar to that used on pilot data above but with data from both conditions; We will look for a fixed effect of condition. We will do this both for a model with just first two sessions and for a model with all data.
Summary of data: mean and SE for condition.ct value to inform H1: estimate from above ADULT pilot data 0.4156183
There is a caveat here since this value comes from adults and will likely overestimate any difference for children (making it a conservative estimate, biasing evidence for H0).
NOTE: this is based on ADULT data. It likely underestimates the power required to see this difference.
dataS = with(droplevels(adult.train), tapply(result,list(subject,condition), mean, na.rm=T))
d = cohensD(x = dataS[,1], y = dataS[,2], method = "pooled")
pwr.t.test(n = NULL, d = d, sig.level = 0.05, power = .9, type = c("two.sample"), alternative = c("greater"))
Two-sample t test power calculation
n = 58.37074
d = 0.5449171
sig.level = 0.05
power = 0.9
alternative = greater
NOTE: n is number in *each* group
pwr.t.test(n = NULL, d = d, sig.level = 0.05, power = .8, type = c("two.sample"), alternative = c("greater"))
Two-sample t test power calculation
n = 42.3355
d = 0.5449171
sig.level = 0.05
power = 0.8
alternative = greater
NOTE: n is number in *each* group
pwr.t.test(n = NULL, d = d, sig.level = 0.05, power = .7, type = c("two.sample"), alternative = c("greater"))
Two-sample t test power calculation
n = 32.39217
d = 0.5449171
sig.level = 0.05
power = 0.7
alternative = greater
NOTE: n is number in *each* group
rm(dataS)
rm(d)
Suggests that for adults we would need 58 per condition to get 90% power. We would be likely to need more with children.
Given practical constraints, we are unlikely to obtain a sufficient sample for this. However this effect isn’t critical in terms of demonstrating what leads to greater learning. (It is therefore more important to demonstrate that they are above chance in each condition).
We don’t have any clear evidence to base this on. However it seems reasonable that the type of training will lead to different learning slopes. One possibility is that children may show greater learning in the condition with diacritics, due to more information per trial. However it is also possible that they could show steeper learning in the other condition, due to needing to pay more attention to and remember individual stimuli. We saw means in this latter direction in the adult pilot data i.e. more improvement in the condition with pictures (labelled condition “i” for implicit) than with diacritics (labelled “e” for explicit) (though the conditions were not exactly the same, see above) though it wasn’t significant.
adult means : 0.62, 0.71, 0.72, 0.79 beta: -0.0321743
We will use an lme model similar to that used on pilot data above but with data from both conditions; we will look for a fixed effect of condition.ct:session.ct. We will do this both for a model with just first two sessions and for a model with all data.
Summary of data: mean and SE for interaction between session and condition (session.ct by condition.ct) value to inform H1: estimate of session from the same model (e.g. summary(model.newdata)$coefficients[“session.ct” ,“Estimate”].
This will provide a relatively high estimate of likely effect of interaction (if one is present) given the actual data set (so again, a conservative estimate)
[Note: we are checking this with a statistics consultant]
Nothing to base power analysis on here (as there was no effect in the pilot data).
Note this is with adult data - it exemplifies the process we will actually use to conduct power analyses on the actual data
meanBF = summary(adult.train.mod)$coefficients["session.ct:condition.ct","Estimate"]
seBF = summary(adult.train.mod)$coefficients["session.ct:condition.ct","Std. Error"]
h1mean = summary(adult.train.mod)$coefficients["session.ct","Estimate"]
Bf(seBF, meanBF, uniform = 0, meanoftheory = 0, sdtheory = h1mean, tail = 2)
$LikelihoodTheory
[1] 0.7889355
$Likelihoodnull
[1] 2.905926
$BayesFactor
[1] 0.271492
rm(meanBF)
rm(seBF)
We use tail = 2 here as we don’t have a clear direction for this hypothesis
So, with adult pilot data we do have evidence in favor of H0 from the current data set.
One possibility with children is that we will not be able to obtain a large enough sample to provide evidence for/against the hypothesis that there is more learning in one direction, but we may find evidence for the H1 that there is an effect of session in one condition but for the H0 that there is not an effect of session in the other condition. If so, we will deem this sufficient to address this prediction.
These two tests are very similar to training but with either just pictures or just diacritics. The tests are only done as post tests, not as pre-tests (unlike the rest of the tests discussed after this one).
Children in the pictures+diacritics condition will get the 2AFC pictures test followed by the 2AFC diacritics test; children in the pictures-only test will get just 2AFC pictures but will get it twice (so that total exposure doesn’t differ across conditions).
Where we just look at conditions separately, for the pictures test, we will look at all of the data (i.e. for the pictures only condition, from both parts of the repeated test). Where we compare conditions, we will look only use data from the first 2AFC test.
We didn’t do this test in the pilot studies, however the test is similar to the training task so we use a model based on just the last 24 trials of training for informing power decisions.
child.train.7.s2.24 = droplevels(subset(child.train, agegroup == "7years" & session =="session2" & order>72))
child.train.7.s2.24 = lizCenter(child.train.7.s2.24, c("tonecontrast"))
child.train.7.s2.s4.mod = glmer (result ~
+ tonecontrast.ct
+ (1|subject),
data = child.train.7.s2.24, family = binomial, control=glmerControl(optimizer = "bobyqa"))
round(summary(child.train.7.s2.s4.mod)$coefficients,3)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.201 0.106 1.894 0.058
tonecontrast.ct 0.008 0.062 0.124 0.901
child.train.11.s2.24 = droplevels(subset(child.train, agegroup == "11years" & session =="session2" & order>72))
child.train.11.s2.24 = lizCenter(child.train.11.s2.24, c("tonecontrast"))
child.train.11.s2.s4.mod = glmer (result ~
+ tonecontrast.ct
+ (1|subject),
data = child.train.11.s2.24, family = binomial, control=glmerControl(optimizer = "bobyqa"))
round(summary(child.train.11.s2.s4.mod)$coefficients,3)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.686 0.117 5.839 0.000
tonecontrast.ct -0.075 0.066 -1.144 0.253
adult.train.s2.24 = subset(adult.train, session =="session2" & order>264)
adult.train.s2.24 = lizCenter(adult.train.s2.24, list("tonecontrast"))
adult.train.s2.24.mod = glmer (result ~
+ tonecontrast.ct
+ (1|subject),
data = adult.train.s2.24, family = binomial, control=glmerControl(optimizer = "bobyqa"))
round(summary(adult.train.s2.24.mod)$coefficients,3)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.284 0.136 9.428 0
tonecontrast.ct 0.218 0.055 3.977 0
Training data in pilot (see above)
We will use an lme model similar to that for the training data. Again we fit separate intercepts for each condition and compare each to chance = .50 (i.e. default value returned). Since all predictors are centered, the intercept reflects the overall average (rather than at base levels of a factor).
We will do this both for a model with just the session 2 test and session 4 test; as discussed above, we will use the full double data set for the 2AFC picture test for the participants in the picture+diacritic condition.
Summary of data for each condition: mean and SE for intercept from lme
Value to inform H1: assuming they show learning, we will use an estimate from the equivalent lme for training data from the current data set (NOT from pilot data - unless the intercept from training set didn’t suggest above chance performance). Again note that if we find an effect for condition1, we can also use the estimate for intercept for that to inform H1 when looking at data from condition2; and vice versa. This is a better estimate since the conditions are more closely matched to each other than either is to the pilot data.
Here we look at power using the values from the subset of training data from the above
dataS=as.numeric(with(droplevels(child.train.7.s2.24), tapply(result,list(subject), mean, na.rm=T)))
d= cohensD(x = dataS, mu=0.5)
pwr.t.test(n = NULL, d = d, sig.level = 0.05, power = .9, type = c("one.sample"), alternative = c( "greater"))
One-sample t test power calculation
n = 20.29292
d = 0.6738353
sig.level = 0.05
power = 0.9
alternative = greater
pwr.t.test(n = NULL, d = d, sig.level = 0.05, power = .8, type = c("one.sample"), alternative = c( "greater"))
One-sample t test power calculation
n = 15.06511
d = 0.6738353
sig.level = 0.05
power = 0.8
alternative = greater
pwr.t.test(n = NULL, d = d, sig.level = 0.05, power = .7, type = c("one.sample"), alternative = c( "greater"))
One-sample t test power calculation
n = 11.82842
d = 0.6738353
sig.level = 0.05
power = 0.7
alternative = greater
rm(dataS)
rm(d)
This suggests a sample size of 21 is sufficient for 90% power.
Here we use the subset of pilot data with 7 year olds and an H1 informed by the estimate from 11 year olds whole training data set (note: in actual analyses H1 will be informed by an estimate from model of the current training data with 7 year olds )
meanBF = summary(child.train.7.s2.s4.mod)$coefficients["(Intercept)" ,"Estimate"]
seBF = summary(child.train.7.s2.s4.mod)$coefficients["(Intercept)" ,"Std. Error"]
h1mean = summary(child.train.11.mod)$coefficients["(Intercept)" ,"Estimate"]
Bf(seBF, meanBF, uniform = 0, meanoftheory = 0, sdtheory = h1mean,tail = 1)
$LikelihoodTheory
[1] 1.319399
$Likelihoodnull
[1] 0.6261729
$BayesFactor
[1] 2.107085
Bf_powercalc(seBF, meanBF, uniform = 0, meanoftheory = 0, N = 15, min = 15, max = 20, sdtheory = h1mean, tail = 1)
x y
[1,] 15 2.107085
[2,] 16 2.311559
[3,] 17 2.538967
[4,] 18 2.791849
[5,] 19 3.073052
[6,] 20 3.385763
rm(meanBF)
rm(seBF)
Suggests don’t currently have sufficient sample (N = 15) to get reliable evidence for H1, but could do with 19 participants.
Children in the condition where diacritics are present might have become overly reliant on them during training and thus unable to do the test when they are absent, although we don’t currently have specific data to base this on.
We will run an lme model on 7 year olds data, looking at the effect of condition. We will do this both for a model with just day2 test and day4 test; as discussed above, we will only use the half set for the 2AFC picture test for the participants in the picture+diacritic condition.
Summary of data for each condition: mean and SE for condition from lme value to inform H1: assuming they show learning, use the estimate of intercept from the current model (note other values are centered, so this is equivalent to grand mean; it will be conservative)
(We are seeking statistical advice to confirm that this is a reasonable, if conservative, estimate for H1)
We don’t have any data that we can base analyses on here (comparison of conditions in the pilot adult data is too dissimilar).
Note that it may prove difficult to obtain a sufficient sample to gain evidence for H1 or H0. However, it may be that we see above chance performance only for the picture-only condition. In that case, we may look to the previous analyses (Prediction 1) and be able to show that there is evidence for H1 over H0 in one condition, but for H0 over H1 in the other. If so, we will be able to say that there is evidence of learning in one condition and not the other, and this will be deemed sufficient to address this hypothesis.
This test will be completed both pre and post test. Children will hear three frogs each produce a Mandarin word. One produces the target (“different”) word and the other two each produce the same foil word, which will be a word differing from the target only in tone (e.g. mao [t1], mao [t1], mao [t4]).
Each of the 6 one contrasts are tested. We will put tone contrast into the model, since it is likely to have an effect, but don’t have specific hypotheses of interest here. There are four talkers, 3f 1m. Trials in which the m is the odd one out are known to be easiest and those where one of the speakers is m but it is not the odd one out are hardest. There are an equal number of each trial type and we put trial type into the model as a factor; though again it isn’t a question for which we have predictions.
For the initial data set collected, we will consider both pre-test -> posttest1 (session 2) and pre-test -> post-test2 (session 4). If equivalent results are obtained, subsequent testing will drop the final two sessions (see note above).
The pilot child data below comes from same two session study as the pilot training data (recall, they were trained on stimuli similar to the “picture-only” condition in the current study.)
The adult data below comes from a different two session study where there were two condition (1) trained on stimuli similar to the “picture-only” condition in the current study (2) trained on pictures of JUST of the diacritics. This is useful for considering possible effects of condition in the data (with the caveat that comparison is somewhat different).
Note that for both adults and children in the pilot experiment we included both trained and untrained items in the discrimination test. For the current experiment we will use only untrained items in this test so we remove the trained items from the analyses below.
As above, tone contrast is a fixed effect with 6 levels (six possible contrasts). It is expected to contribute to model and is thus included but isn’t of specific interest.
child.discrim = read.csv("kidsdiscrim_clean.csv")
child.discrim$pre_post = relevel(child.discrim$pre_post, ref="pre")
child.discrim$agegroup = as.factor(child.discrim$agegroup)
child.discrim$pre_post = relevel(child.discrim$pre_post, ref="pre")
child.discrim <- droplevels(subset(child.discrim, neworold == "newword"))
adult.discrim = read.csv ("adultsdiscrim_clean.csv")
adult.discrim$pre_post = relevel(adult.discrim$pre_post, ref="pre")
adult.discrim$condition = relevel(adult.discrim$condition, ref="i")
adult.discrim <- droplevels(subset(adult.discrim, neworold == "newword"))
child.discrim.7 = droplevels(subset(child.discrim, agegroup == "7"))
child.discrim.7 <- lizCenter(child.discrim.7, list("pre_post", "trialtype", "tonecontrast"))
child.discrim.7.mod = glmer (correct ~
+ pre_post.ct
+ trialtype.ct
+ tonecontrast.ct
+ (pre_post.ct|participantnumber),
data = child.discrim.7, family = binomial, control=glmerControl(optimizer = "bobyqa"))
round(summary(child.discrim.7.mod)$coefficients, 3)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.445 0.128 -3.472 0.001
pre_post.ct -0.365 0.228 -1.604 0.109
trialtype.ct -0.252 0.139 -1.815 0.070
tonecontrast.ct 0.115 0.033 3.504 0.000
adult.discrim <- lizCenter(adult.discrim, list("pre_post", "trialtype", "condition", "tonecontrast"))
adult.discrim.mod = glmer (correct ~
+ condition.ct * pre_post.ct
+ trialtype.ct
+ tonecontrast.ct
+ (pre_post.ct|participantnumber),
data = adult.discrim, family = binomial, control=glmerControl(optimizer = "bobyqa"))
round(summary(adult.discrim.mod)$coefficients, 3)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.345 0.075 4.615 0.000
condition.ct -0.147 0.149 -0.984 0.325
pre_post.ct 0.287 0.099 2.897 0.004
trialtype.ct 0.066 0.054 1.232 0.218
tonecontrast.ct 0.041 0.013 3.251 0.001
condition.ct:pre_post.ct -0.071 0.198 -0.360 0.719
In the pilot data, adults show improvement with session, but 7 year olds didn’t (nor did 11 year olds)
means : 0.55, 0.61; log odds (beta from lme): 0.2868916; odd (exp(beta))1.3322798
7year olds (don’t show this- in fact the means are reversed )
means : 0.44, 0.36; log odds (beta from lme): -0.3652203; odd (exp(beta))0.6940437
We will use an lme model similar to that used on the pilot data above but with data from both conditions; we will fit separate slopes for pre and post test in each condition
For the initial data set collected we will do this both to compare pre test to post-test 1 (session2) and pre-test to post-test2 (session4).
Summary of data for each condition: mean and SE for pre-post from lme value to inform H1: We will use an estimate from the above pilot data for adults i.e. 0.2868916
Note: this adult value will likely overestimate the means. A better value would come from the new data from children themselves - therefore, if they show learning in one condition we will use that to inform the BF for the other condition (and if both show learning we will use to inform BF for each other).
Note: This analysis is included for completeness but is not really relevant since it is based on adult data. If children do show an effect, it is likely to be smaller and they will have larger SE
dataS = with(droplevels(adult.discrim), tapply(correct,list(participantnumber, pre_post), mean, na.rm=T))
d = cohensD(x = as.numeric(dataS[,1]),y = as.numeric(dataS[,2]), method = "paired")
pwr.t.test(n = NULL, d = d, sig.level = 0.05, power = .9, type = c("paired"), alternative = c( "greater"))
Paired t test power calculation
n = 31.24664
d = 0.5356902
sig.level = 0.05
power = 0.9
alternative = greater
NOTE: n is number of *pairs*
pwr.t.test(n = NULL, d = d, sig.level = 0.05, power = .8, type = c("paired"), alternative = c( "greater"))
Paired t test power calculation
n = 22.95973
d = 0.5356902
sig.level = 0.05
power = 0.8
alternative = greater
NOTE: n is number of *pairs*
pwr.t.test(n = NULL, d = d, sig.level = 0.05, power = .7, type = c("paired"), alternative = c( "greater"))
Paired t test power calculation
n = 17.82436
d = 0.5356902
sig.level = 0.05
power = 0.7
alternative = greater
NOTE: n is number of *pairs*
rm(d)
rm(dataS)
Here we look at data from 7 year olds in the pilot study
meanBF = summary(child.discrim.7.mod)$coefficients["pre_post.ct" ,"Estimate"]
seBF = summary(child.discrim.7.mod)$coefficients["pre_post.ct" ,"Std. Error"]
h1mean = summary(adult.discrim.mod)$coefficients["pre_post.ct" ,"Estimate"]
Bf(seBF, meanBF, uniform = 0, meanoftheory = 0, sdtheory = h1mean, tail = 1)
$LikelihoodTheory
[1] 0.1393478
$Likelihoodnull
[1] 0.4839102
$BayesFactor
[1] 0.2879621
rm(meanBF)
rm(seBF)
rm(h1mean)
Here we have evidence for the null hypothesis (H0) from just 15 participants. Note however, that we are using a relatively high estimate for H1, which may make it easier to find evidence for H0.
We don’t have any specific evidence to base this on. The closest contrast was in the adult pilot experiment (comparing a condition with picture-only training versus diacritic-only training) where they showed numerically more improvement in the implicit condition, but the difference was NS.
means : 0.56, 0.63, 0.54, 0.59 log odds (beta from lme): -0.0711115 odd (exp(beta))0.931358
For the initial data set collected we will do this both to compare pre test to post-test 1 (session2) and pre-test to post-test2 (session4).
We will use an lme model similar to that used on the pilot data above but with data from both conditions; we will look for an interaction between pre-post and condition.
Summary of data for each condition: mean and SE for pre-post*condition from lme value to inform H1: we will use the estimate for session from the current model (i.e. summary(new.data)$coefficients[“pre_post.ct” ,“Estimate”]`
Note: we will only do this if we have evidence of an effect of pre-post in at least one of the conditions. If they don’t learn in either condition then this test is inappropriate.
Also note that this will be a relatively conservative estimate for a large difference in how much the conditions lead to a change pre to post.
(We are seeking statistical advice here)
We have no data to look at this (since there was no sig. difference in the pilot data collected)
Here we are looking at this with adult pilot data.
meanBF = summary(adult.discrim.mod)$coefficients["condition.ct:pre_post.ct" ,"Estimate"]
seBF = summary(adult.discrim.mod)$coefficients["condition.ct:pre_post.ct" ,"Std. Error"]
h1mean = summary(adult.discrim.mod)$coefficients["pre_post.ct" ,"Estimate"]
Bf(seBF, meanBF, uniform = 0, meanoftheory = 0, sdtheory =h1mean, tail = 2)
$LikelihoodTheory
[1] 1.121749
$Likelihoodnull
[1] 1.89307
$BayesFactor
[1] 0.5925559
plot(Bf_powercalc(seBF, meanBF, uniform = 0, meanoftheory = 0, sdtheory = h1mean, N = 15, min = 900, max = 50, tail = 2))
abline(h = 3)
Bf_powercalc(seBF, meanBF, uniform = 0, meanoftheory = 0, sdtheory = h1mean, N = 15, min = 811, max = 815, tail = 2)
x y
[1,] 811 3.007075
[2,] 812 3.018251
[3,] 813 3.029471
[4,] 814 3.040735
[5,] 815 3.052043
rm(meanBF)
rm(seBF)
rm(h1mean)
This analysis suggests that even for adults, we would need a sample of over 800 participants per condition to see this interaction. This is implausible (though may be partly because we are basing the analyses on an overly conservative value for H1).
However this analysis suggests it will be difficult to gain power for this interaction. On the other hand, previous analyses may make it clear that there is no improvement on this test in EITHER condition (see Prediction 1 above). In that way it may be very difficult to power this. One possibility is that we won’t have power for the interaction but will have power to say we have evidence for learning in one condition, but for the other there is more evidence for the null than for H1 based on that condition.
This test is completed in the pre and post tests. Children hear a word and copy it back. We will look to see if they repeat back the tone correctly (the audio recordings will be coded by native speakers who are blind to both condition and what the target word is; the tone of the transcribed word will compared to tone of target and thus coded as correct/incorrect). There are equal number of words for each of the four tones.
For the initial data set collected, will consider both pre-test -> posttest1 (session2) and pre-test -> post-test2 (sesion4). If equivalent results are obtained, subsequent testing will drop final two sessions (see note above).
The child data below come from the same two session study as the pilot training data/discrimination data (i.e. where children were trained on stimuli similar to the “picture-only” condition in the current study).
The adult data below do not come from the same two day pilot study as the other adult data used above (that particular student project did not include this test), but instead from a much longer 9 session experiment conducted by current PhD student. There was no contrast of condition of the type relevant to the current study in that data.
Note that for both adults and children in the pilot experiments using this test we included both trained and untrained items. For this experiment we use only untrained items in this test so we remove the trained items from the analyses below.
child.wr = read.csv("kidswordrep_clean.csv")
child.wr$pre_post = relevel(child.wr$pre_post, ref = "pre")
child.wr = droplevels(subset(child.wr, wordtype == "new"))
adult.wr = read.csv("adultswordrep_clean.csv")
adult.wr$pre_post = relevel(adult.wr$pre_post, ref = "pre")
adult.wr = droplevels(subset(adult.wr, wordtype == "Untrained"))
child.wr.7 = droplevels(subset(child.wr, Age=="young"))
child.wr.7 <- lizCenter(child.wr.7, list("pre_post", "correcttone"))
child.wr.7.mod = glmer (resulttone ~
+ pre_post.ct
+ correcttone.ct
+ (pre_post.ct|participantname),
data = child.wr.7, family = binomial, control=glmerControl(optimizer = "bobyqa"))
round(summary(child.wr.7.mod)$coefficients,3)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.553 0.147 3.764 0.000
pre_post.ct 0.309 0.141 2.183 0.029
correcttone.ct 0.066 0.043 1.537 0.124
adult.wr <- lizCenter(adult.wr, list( "pre_post","correcttone"))
adult.wr.mod = glmer (resulttone ~
+ pre_post.ct
+ correcttone.ct
+ (pre_post.ct|participantname),
data = adult.wr, family = binomial, control=glmerControl(optimizer = "bobyqa"))
round(summary(adult.wr.mod)$coefficients,3)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.110 0.077 14.321 0
pre_post.ct 0.412 0.118 3.491 0
correcttone.ct -0.142 0.032 -4.409 0
Both children and adults showed this in the relevant pilot studies
7year mean :0.59, 0.65; 7year log odds (beta from lme): 0.3085348; 7year odd (exp(beta))1.3614289
adult mean : 0.7, 0.77; adult log odds (beta from lme): 0.4117959; adult odd (exp(beta))1.5095264
We will use an lme model similar to that used on the pilot data above but with data from both conditions; We will fit separate slopes for pre-post in each condition
For the initial data set collected we will do this both to compare pre test to post-test 1 and pre-test to post-test2 (see notes above).
Summary of data for each condition: mean and SE for pre-post from lme value to inform H1: we will use an estimate from the above pilot data for children i.e. 0.3085348
NOTE: If (e.g.) we find an effect for condition1, we can also use the estimate for session for that to inform H1 when looking at data from condition2; and vice versa. This is a better estimate since the conditions are more closely matched to each other than either is to the pilot data/
Note: this analysis is included for completeness but is not really relevant since it is based on adult data. If children do show an effect, it is likely to be smaller and they will have larger SE
dataS = with(droplevels(child.wr.7), tapply(resulttone, list(participantname,pre_post), mean, na.rm=T))
d = cohensD(x = dataS[,1], y = dataS[,2], method = "paired")
pwr.t.test(n = NULL, d = d, sig.level = 0.05, power = .9, type = c("paired"), alternative = c( "greater"))
Paired t test power calculation
n = 32.46102
d = 0.5250962
sig.level = 0.05
power = 0.9
alternative = greater
NOTE: n is number of *pairs*
pwr.t.test(n = NULL, d = d, sig.level = 0.05, power = .8, type = c("paired"), alternative = c( "greater"))
Paired t test power calculation
n = 23.83549
d = 0.5250962
sig.level = 0.05
power = 0.8
alternative = greater
NOTE: n is number of *pairs*
pwr.t.test(n = NULL, d = d, sig.level = 0.05, power = .7, type = c("paired"), alternative = c( "greater"))
Paired t test power calculation
n = 18.48998
d = 0.5250962
sig.level = 0.05
power = 0.7
alternative = greater
NOTE: n is number of *pairs*
rm(dataS)
rm(d)
Sample estimation suggests that N = 33 per condition for 90% power.
Note that here we use the mean from the adult pilot data to inform H1 (whereas for actual analyses we will be using a value from pilot data with 7 year olds to inform H1)
meanBF = summary(child.wr.7.mod)$coefficients["pre_post.ct" ,"Estimate"]
seBF = summary(child.wr.7.mod)$coefficients["pre_post.ct" ,"Std. Error"]
h1mean = summary(adult.wr.mod)$coefficients["pre_post.ct" ,"Estimate"]
Bf(seBF, meanBF, uniform = 0, meanoftheory = 0, sdtheory = h1mean, tail = 1)
$LikelihoodTheory
[1] 1.398479
$Likelihoodnull
[1] 0.2603624
$BayesFactor
[1] 5.371278
rm(meanBF)
rm(seBF)
We have sufficient evidence from the sample of N = 15 children for H1.
We have no data to base this on as we haven’t previously used this test in an experiment contrasting these (or similar) conditions. If the presence of the diacritics boosts learning, children may improve more in the diacritics+picture condition.
We will us an lme model similar to that used on the pilot data above but with data from both conditions; we will look for an interaction of pre-post and condition
For the initial data set collected we will do this both to compare pre test to post-test 1 and pre-test to post-test2 (see notes above).
Summary of data for each condition: mean and SE for pre-post*condition from lme value to inform H1: we will use an estimate for session from the model with the new data (i.e. summary(new.data)$coefficients[“pre_post.ct” ,“Estimate”]`
We only do this if we have evidence of an effect of pre-post in at least one of the conditions. If they don’t learn in either condition then this test is inappropriate.
Also note that this will be relatively conservative estimate for a large difference in how much the conditions lead to a change pre to post.
We have no relevant data to base this on for either frequentist or BF analyses. One possibility is that we won’t be able to get a sufficiently large sample to have power for this interaction. As for other tests, here we may have to draw whatever conclusions are possible based on which of the conditions individually provides evidence for H1/H0.