comment: toc comand not working, need to repost with headings links for ease of navigation
General notes
This contained planed analyses as of 18.06.17 for data set which we will start to collect on 19/06/17. Plans may change as a result of statsitical consulation (we are currently seeking advice on some details)
The study is a training study to be conducted with year 3 (7-8 years) children with no previous knowledge of Mandarin or any tone language, training them on the four tones (6 tone contrasts) in Mandarin. There will intially be two conditions: (1) pictures-only: they are trained using a 2AFC task where they hear a (real) Mandarin word and identify which of two pictures it refers to (one correct, one a foil which differs only by tone); (2) pictures+diacritics - same task but the pictures are acompanied by pictures of the four diacritics. [If resources permit, we may later run a diacritics only condition]. Data will be collected both from training and from some tests conducted pre and post training.
Our main analyses will be Baysean and we will computing Bayes factors using the method advocated by Dienes, 2008 ;Dienes, 2015; we will also along side provide frequentist statistics. Frequentist statistics will computed using logistic mixed effect models, this method is used since in each case we have a dependent variable of correct/incorrect response. Note that although mixed models allow us to include both participants and items, we do not intend to include items in the analyses since power on this dimension will be low. (NB - it is not common for by items analyses to be included for work in this area; increasing the number of items in a learning study of this type with children is not feasible). We use full random slopes structure for predictors of interest (generally, condition, session/pre->post). Factors such as tone contrast for the trial are included as fixed factors and we will not include random slopes for these.
For the Bayesian analsyes, we will continue to work in log odds space (as for Frequentist) to meet assumptions of normality, using estimates and standard errors which come from the logistic mixed effects models. Following Dienes (2008) we will model H1 by using an estimate of the mean for theory (an estimate coming from either from logistic mixed effects models run over pilot data, or in some cases from elsewhere in the data) as the SD of a half normal (for one tailed) or full normal (for two tailed) distribution. We will say we have substantial evidence for H1 is BF>3 and BF<.33 accepted if BF>4
See: https://doi.org/10.1016/j.jml.2017.01.005 for published paper using this approach and Rscript http://rpubs.com/ewonnacott/242454.
Note that the pilot data referred to below comes from three students projects. None were exactly the same as the experiments planned here, but materials were very similar and in some cases similar questions were asked. The models of the pilot data presented below also serve to exemplify the approach to be taken with the current data.
Where possible, we conducted power analyses, although lme’s will be used for the actual analsyes, we conducted the analsyes as though for (near equivalent) t-tests over by subject propotions. This means that the values are only approximate, but since our main analsyes are BF factors our aim here is to get an idea of approximate suitable samples.
In addition to power analyses, where possible we also conducted Bayesian analsyes on relevant parts of the pilot data to see is we had a sufficient sample for BF>3 or BF<.33. We also look to see what size sample might be expected to give BF>3 or BF<.33, given that SE varues as squareroot(N).
The analyses below suggest that there are cases where a sample of 20 participants per condition could be sufficient to acheive 90% power. Intereactions with condition will be harder to power; as discussed below, there may be situations where there is evidence of H1 (ie. data reflecs learning) in condition1 but evidence for H0 (no evidence of learning in this dat) for condition2. If so, we wil deem this sufficient to draw conculsions about the different training methods.
Our policy wil be as follows: we will inspect the data after we have approx. 20 participants in each condition (Note: this is approximate since we aim to get this by the end of the academic year; if we don’t quite get this many children, we will still look at the sample we have). If results for key analsyes are inconclusive, resources permitting we will continue to test more participants, collecting 10 per condition at each step before inspecting the data again at each step. We will continue until N per condition =50, resources permitting.
Final note: we have designed the experiment to have four sessions as follows: session1 pre-tests: discrimination and wordrepetition training session2 training post-tests: discrimination and wordrepetition 2AFC pictest/ 2AFC tonetest session3 training
session4 training post-tests: discrimination and wordrepetition 2AFC pictest/ 2AFC tonetest
For our first inspection of the data, we will conduct analsyes both on day2 data and day4 data (both compared to relevant pre-tests; for training data itself, we will look at sessions1 and 2 versus session 1,2,3&4). If using data from session4 extra two sessions do not affect the key pattern of results, given the large expeniture of resources per participant, we will drop the final two sessions for ongoing participant collection.
Note that frequentist analsyes will not be valid at .05 level, due to stopping problem. Current plan is to nevertheless report with that caveat; BF will be considered the critical analyses.)
Data will be collected from children in year 3 (aged 7-8)
Helper Functions
SummarySE
This function can be found on the website “Cookbook for R”.
http://www.cookbook-r.com/Manipulating_data/Summarizing_data/
It summarizes data, giving count, mean, standard deviation, standard error of the mean, and confidence interval (default 95%).
- data: a data frame.
- measurevar: the name of a column that contains the variable to be summarized.
- groupvars: a vector containing names of columns that contain grouping variables.
- na.rm: a boolean that indicates whether to ignore NA’s.
- conf.interval: the percent range of the confidence interval (default is 95%).
summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=FALSE,
conf.interval=.95, .drop=TRUE) {
require(plyr)
# New version of length which can handle NA's: if na.rm==T, don't count them
length2 <- function (x, na.rm=FALSE) {
if (na.rm) sum(!is.na(x))
else length(x)
}
# This does the summary. For each group's data frame, return a vector with
# N, mean, and sd
datac <- ddply(data, groupvars, .drop=.drop,
.fun = function(xx, col) {
c(N = length2(xx[[col]], na.rm=na.rm),
mean = mean (xx[[col]], na.rm=na.rm),
sd = sd (xx[[col]], na.rm=na.rm)
)
},
measurevar
)
# Rename the "mean" column
datac <- rename(datac, c("mean" = measurevar))
datac$se <- datac$sd / sqrt(datac$N) # Calculate standard error of the mean
# Confidence interval multiplier for standard error
# Calculate t-statistic for confidence interval:
# e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1
ciMult <- qt(conf.interval/2 + .5, datac$N-1)
datac$ci <- datac$se * ciMult
return(datac)
}
SummarySEwithin
This function can be found on the website “Cookbook for R”.
http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_(ggplot2)/#Helper functions
From that website:
Summarizes data, handling within-subjects variables by removing inter-subject variability. It will still work if there are no within-S variables. Gives count, un-normed mean, normed mean (with same between-group mean), standard deviation, standard error of the mean, and confidence interval. If there are within-subject variables, calculate adjusted values using method from Morey (2008). data: a data frame.
measurevar: the name of a column that contains the variable to be summariezed betweenvars: a vector containing names of columns that are between-subjects variables withinvars: a vector containing names of columns that are within-subjects variables idvar: the name of a column that identifies each subject (or matched subjects) na.rm: a boolean that indicates whether to ignore NA’s conf.interval: the percent range of the confidence interval (default is 95%)
summarySEwithin <- function(data=NULL, measurevar, betweenvars=NULL, withinvars=NULL,
idvar=NULL, na.rm=FALSE, conf.interval=.95, .drop=TRUE) {
# Ensure that the betweenvars and withinvars are factors
factorvars <- vapply(data[, c(betweenvars, withinvars), drop=FALSE],
FUN=is.factor, FUN.VALUE=logical(1))
if (!all(factorvars)) {
nonfactorvars <- names(factorvars)[!factorvars]
message("Automatically converting the following non-factors to factors: ",
paste(nonfactorvars, collapse = ", "))
data[nonfactorvars] <- lapply(data[nonfactorvars], factor)
}
# Get the means from the un-normed data
datac <- summarySE(data, measurevar, groupvars=c(betweenvars, withinvars),
na.rm=na.rm, conf.interval=conf.interval, .drop=.drop)
# Drop all the unused columns (these will be calculated with normed data)
datac$sd <- NULL
datac$se <- NULL
datac$ci <- NULL
# Norm each subject's data
ndata <- normDataWithin(data, idvar, measurevar, betweenvars, na.rm, .drop=.drop)
# This is the name of the new column
measurevar_n <- paste(measurevar, "_norm", sep="")
# Collapse the normed data - now we can treat between and within vars the same
ndatac <- summarySE(ndata, measurevar_n, groupvars=c(betweenvars, withinvars),
na.rm=na.rm, conf.interval=conf.interval, .drop=.drop)
# Apply correction from Morey (2008) to the standard error and confidence interval
# Get the product of the number of conditions of within-S variables
nWithinGroups <- prod(vapply(ndatac[,withinvars, drop=FALSE], FUN=nlevels,
FUN.VALUE=numeric(1)))
correctionFactor <- sqrt( nWithinGroups / (nWithinGroups-1) )
# Apply the correction factor
ndatac$sd <- ndatac$sd * correctionFactor
ndatac$se <- ndatac$se * correctionFactor
ndatac$ci <- ndatac$ci * correctionFactor
# Combine the un-normed means with the normed results
merge(datac, ndatac)
}
normDataWithin
This function is used by the SummarySEWithin fucntion above. It can be found on the website “Cookbook for R”.
http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_(ggplot2)/#Helper functions
From that website:
Norms the data within specified groups in a data frame; it normalizes each subject (identified by idvar) so that they have the same mean, within each group specified by betweenvars. data: a data frame. idvar: the name of a column that identifies each subject (or matched subjects) measurevar: the name of a column that contains the variable to be summariezed betweenvars: a vector containing names of columns that are between-subjects variables na.rm: a boolean that indicates whether to ignore NA’s
normDataWithin <- function(data=NULL, idvar, measurevar, betweenvars=NULL,
na.rm=FALSE, .drop=TRUE) {
#library(plyr)
# Measure var on left, idvar + between vars on right of formula.
data.subjMean <- ddply(data, c(idvar, betweenvars), .drop=.drop,
.fun = function(xx, col, na.rm) {
c(subjMean = mean(xx[,col], na.rm=na.rm))
},
measurevar,
na.rm
)
# Put the subject means with original data
data <- merge(data, data.subjMean)
# Get the normalized data in a new column
measureNormedVar <- paste(measurevar, "_norm", sep="")
data[,measureNormedVar] <- data[,measurevar] - data[,"subjMean"] +
mean(data[,measurevar], na.rm=na.rm)
# Remove this subject mean column
data$subjMean <- NULL
return(data)
}
myCenter
This function ouputs the centered values of an variable, which can be a numeric variable, a factor, or a data frame. It was taken from Florian Jaegers blog
https://hlplab.wordpress.com/2009/04/27/centering-several-variables/.
From his blog:
If the input is a numeric variable, the output is the centered variable.
If the input is a factor, the output is a numeric variable with centered factor level values. That is, the factor’s levels are converted into numerical values in their inherent order (if not specified otherwise, R defaults to alphanumerical order). More specifically, this centers any binary factor so that the value below 0 will be the 1st level of the original factor, and the value above 0 will be the 2nd level.
If the input is a data frame or matrix, the output is a new matrix of the same dimension and with the centered values and column names that correspond to the colnames() of the input preceded by “c” (e.g. “Variable1” will be “cVariable1”).
myCenter= function(x) {
if (is.numeric(x)) { return(x - mean(x, na.rm=T)) }
if (is.factor(x)) {
x= as.numeric(x)
return(x - mean(x, na.rm=T))
}
if (is.data.frame(x) || is.matrix(x)) {
m= matrix(nrow=nrow(x), ncol=ncol(x))
colnames(m)= paste("c", colnames(x), sep="")
for (i in 1:ncol(x)) {
m[,i]= myCenter(x[,i])
}
return(as.data.frame(m))
}
}
lizCenter
This function provides a wrapper around myCenter allowing you to center a specific list of variables from a dataframe.
- x: data frame
- listfname: a list of the variables to be centered (e.g. list(variable1,variable2))
The output is a copy of the data frame with a column (always a numeric variable) added for each of the centered variables. These columns are labelled with the each column’s previous name, but with “.ct” appended (e.g., “variable1” will become “variable1.ct”).
lizCenter= function(x, listfname)
{
for (i in 1:length(listfname))
{
fname = as.character(listfname[i])
x[paste(fname,".ct", sep="")] = myCenter(x[fname])
}
return(x)
}
get_coeffs
This function allows us to inspect particular coefficients from the output of an lme model by putting them in table.
- x: the output returned when running lmer or glmer (i.e. an objet of type lmerMod or glmerMod)
- list: a list of names of the coefficients to be extracted (e.g. c(“variable1”,“variable1:variable2”))
#get_coeffs <- function(x,list){(kable(as.data.frame(summary(x)$coefficients)[list,],digits=3))}
get_coeffs <- function(x,list){(as.data.frame(summary(x)$coefficients)[list,])}
Bf
This function is equivalent to the Dienes (2008) calculator which can be found here: http://www.lifesci.sussex.ac.uk/home/Zoltan_Dienes/inference/Bayes.htm.
The code was provided by Baguely and Kayne (2010) and can be found here: http://www.academia.edu/427288/Review_of_Understanding_psychology_as_a_science_An_introduction_to_scientific_and_statistical_inference
Bf<-function(sd, obtained, uniform, lower=0, upper=1, meanoftheory=0,sdtheory=1, tail=2){
area <- 0
if(identical(uniform, 1)){
theta <- lower
range <- upper - lower
incr <- range / 2000
for (A in -1000:1000){
theta <- theta + incr
dist_theta <- 1 / range
height <- dist_theta * dnorm(obtained, theta, sd)
area <- area + height * incr
}
}else
{theta <- meanoftheory - 5 * sdtheory
incr <- sdtheory / 200
for (A in -1000:1000){
theta <- theta + incr
dist_theta <- dnorm(theta, meanoftheory, sdtheory)
if(identical(tail, 1)){
if (theta <= 0){
dist_theta <- 0
} else {
dist_theta <- dist_theta * 2
}
}
height <- dist_theta * dnorm(obtained, theta, sd)
area <- area + height * incr
}
}
LikelihoodTheory <- area
Likelihoodnull <- dnorm(obtained, 0, sd)
BayesFactor <- LikelihoodTheory / Likelihoodnull
ret <- list("LikelihoodTheory" = LikelihoodTheory,"Likelihoodnull" = Likelihoodnull, "BayesFactor" = BayesFactor)
ret
}
Bf_powercalc
This works with the Bf funciton above. It requires the same values as that function (i.e. the obtained mean and SE for the current sample, a value for the predicted mean, which is set to be sdtheory (with meanoftheory=0), and the current number of participants N). However rather than return a BF for the current sample, it works out what the BF would be for a range of different subject numbers (assuming that the SE scales with sqrt(N)),
Bf_powercalc<-function(sd, obtained, uniform, lower=0, upper=1, meanoftheory=0, sdtheory=1, tail=2, N, min, max)
{
x = c(0)
y = c(0)
# note: working out what the difference between N and df is (for the contrast between two groups, this is 2; for constraints where there is 4 groups this will be 3, etc.)
for(newN in min : max)
{
B = as.numeric(Bf(sd = sd*sqrt(N/newN), obtained, uniform, lower, upper, meanoftheory, sdtheory, tail)[3])
x= append(x,newN)
y= append(y,B)
output = cbind(x,y)
}
output = output[-1,]
return(output)
}
Training
DV: Each training trial produces a data point (did they pick the correct picture - 1/0; note they get feedback on this)
load up and run lmes on relevant pilot data
The child data below come from a two session study where they were trained on stimuli similar to the “picture-only” condition in the current study. We have data from both 7 year olds and 11 year olds; the 11 year old data here are relevant in providing an estimate that can be used to inform H1 in our BF example analyses.
The adult data below come from a two session study where there were two conditions: (1) trained on stimuli similar to the “picture-only” condition in the current study (2) trained on pictures JUST of the diacritics. This is useful for considering possible effects of condition in the data (though with the caveat that comparison is somewhat different).
Note: tonecontrast is fixed effect with 6 levels (six possible contrasts). It is expected to contribute to model and is thus included but isn’t of specific interest.
child.train = read.csv("kidstrain_clean.csv")
adult.train = read.csv ("adultstrain_clean.csv")
child.train$agegroup = relevel(child.train$agegroup, ref="7years")
adult.train$condition = relevel(adult.train$condition, ref="i")
child.train.7 = subset(child.train, agegroup == "7years")
child.train.7 <- lizCenter(child.train.7, list("session", "tonecontrast"))
child.train.7.mod = glmer (result ~
+ session.ct
+ tonecontrast.ct
+ (session.ct|subject),
data = child.train.7, family = binomial, control=glmerControl(optimizer = "bobyqa"))
round(summary(child.train.7.mod)$coefficients,3)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.157 0.059 2.659 0.008
session.ct 0.158 0.077 2.062 0.039
tonecontrast.ct -0.026 0.022 -1.171 0.242
child.train.11 = subset(child.train, agegroup == "11years")
child.train.11 <- lizCenter(child.train.11, list("session", "tonecontrast"))
child.train.11.mod = glmer (result ~
+ session.ct
+ tonecontrast.ct
+ (session.ct|subject),
data = child.train.11, family = binomial, control=glmerControl(optimizer = "bobyqa"))
round(summary(child.train.11.mod)$coefficients,3)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.538 0.096 5.602 0.000
session.ct 0.301 0.085 3.538 0.000
tonecontrast.ct -0.051 0.023 -2.233 0.026
adult.train <- lizCenter(adult.train, list("session", "tonecontrast","condition"))
adult.train.mod = glmer (result ~
+ session.ct *condition.ct
+ tonecontrast.ct
+ (session.ct|subject),
data = adult.train, family = binomial, control=glmerControl(optimizer = "bobyqa"))
round(summary(adult.train.mod)$coefficients,3)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.961 0.091 10.555 0.000
session.ct 0.487 0.067 7.263 0.000
condition.ct 0.416 0.181 2.291 0.022
tonecontrast.ct 0.067 0.010 6.740 0.000
session.ct:condition.ct -0.032 0.133 -0.241 0.809
Prediction2: Improvement with session in each condition
Based on:
We saw this in 7 year olds in previous study.
7 year olds means : 0.52, 0.56 7 year olds log odds (beta from lme): 0.1579623 7 year old odd (exp(beta))1.171122
Planned Analyses: Frequentist
lme model similar to that on pilot data above but with data from both conditions; Fit separate slopes for session for each condition. We will do this both for a model with just first two sessions and for a model with all data.
Planned Analyses: BF
Summary of data for each condition: mean and se for session.ct for each level of condition from lme value to inform H1: estimate from above pilot data for seven year olds 0.1579623
If it appears that one of the conditions (c1) shows a positive effect of session and one doesn’t (c2), we can also do BF analsyes for c2 data with H1 informed by estimate from c1 (this is a better estimate than the pilot since the conditions are more closely matched to each other than either is to the pilot data)
Required sample analyses
Frequentists power
dataS = with(droplevels(child.train.7), tapply(result,list(subject,session), mean, na.rm=T))
d= cohensD(x = dataS[,1], y = dataS[,2], method = "paired")
pwr.t.test(n = NULL, d = d, sig.level = 0.05, power = .9, type = c("paired"), alternative = c( "greater"))
Paired t test power calculation
n = 28.38133
d = 0.5634754
sig.level = 0.05
power = 0.9
alternative = greater
NOTE: n is number of *pairs*
rm(dataS)
rm(d)
dataS = with(droplevels(child.train.7), tapply(result,list(subject,session), mean, na.rm=T))
d= cohensD(x = dataS[,1], y = dataS[,2], method = "paired")
pwr.t.test(n = NULL, d = d, sig.level = 0.05, power = .8, type = c("paired"), alternative = c( "greater"))
Paired t test power calculation
n = 20.89366
d = 0.5634754
sig.level = 0.05
power = 0.8
alternative = greater
NOTE: n is number of *pairs*
rm(dataS)
rm(d)
dataS = with(droplevels(child.train.7), tapply(result,list(subject,session), mean, na.rm=T))
d= cohensD(x = dataS[,1], y = dataS[,2], method = "paired")
pwr.t.test(n = NULL, d = d, sig.level = 0.05, power = .7, type = c("paired"), alternative = c( "greater"))
Paired t test power calculation
n = 16.25431
d = 0.5634754
sig.level = 0.05
power = 0.7
alternative = greater
NOTE: n is number of *pairs*
rm(dataS)
rm(d)
Required sample for BF analyses:
As for prediction 1, we again look at the pilot data from seven year olds, using H1 informed by estimate for 11 year olds. In our actual analsyes will be using estimate from the pilot data to inform H1.
meanBF = summary(child.train.7.mod)$coefficients["session.ct" ,"Estimate"]
seBF = summary(child.train.7.mod)$coefficients["session.ct" ,"Std. Error"]
h1mean = summary(child.train.11.mod)$coefficients["session.ct" ,"Estimate"]
Bf(seBF, meanBF, uniform =0,meanoftheory=0,sdtheory=h1mean,tail=1)
$LikelihoodTheory
[1] 2.206111
$Likelihoodnull
[1] 0.6215463
$BayesFactor
[1] 3.549392
rm(meanBF)
rm(seBF)
We have sufficient evidence to accept H1 over H0 on basis of current sample.
Prediction 4: They will improve more in one condition than the other(no predicted direction)
Based on:
We don’t have any clear evidence to base this on. However it seems reasonable that the type of training will leed to different learning slopes. One possibility is that while they show greater learning in the condition with diacrtics, due to more information per trial. However it is also possible that they could show steeper learning in the other condition, due to needing to pay more attention to and remember individual stimuli. We saw means in this latter direction in the adult pilot data i.e. more improvement in the condition with pictures (labelled condition “i” for implicit) than with diacrticis (labelled “e” for explicit) (though conditions not exactly the same see above). though it wsn’t signficant.
adult means : 0.62, 0.71, 0.72, 0.79 beta: -0.0321743
Planned Analyses: Frequentist
lme model similar to that on pilot data above but with data from both conditions; look for fixed effect of condition.ct:session.ct We will do this both for a model with just first two sessions and for a model with all data.
Planned Analyses: BF
Summary of data: mean and se for interaction session and condition (session.ct by conditoin.ct) value to inform H1: estimate of session from the same model model (e.g. summary(model.newdata)$coefficients[“session.ct” ,“Estimate”]`.
This will provide a relatively high estiamte of likely effect of interaction (if one is present) given the actual data set (so again, a conservative estimate)
[Note: are checking this with stats consultant]
Note this is exemplified below in sample size analysis using the current data from adults:
Required sample analyses
Frequentist power
Nothing to base power analysis on here (as was no effect in original data).
Required sample for BF analyses:
Note this is with adult data - it exemplifies the process we will actually use to conduct power analyses on actual data
meanBF = summary(adult.train.mod)$coefficients["session.ct:condition.ct","Estimate"]
seBF = summary(adult.train.mod)$coefficients["session.ct:condition.ct","Std. Error"]
h1mean = summary(adult.train.mod)$coefficients["session.ct","Estimate"]
Bf(seBF, meanBF, uniform =0,meanoftheory=0,sdtheory=h1mean,tail=2)
$LikelihoodTheory
[1] 0.7889355
$Likelihoodnull
[1] 2.905926
$BayesFactor
[1] 0.271492
rm(meanBF)
rm(seBF)
Note: we use tail =2 as don’t have a clear direction for this hypothesis
So note, with adult pilot data we do have evidence in favor of H0 from the current data set.
One possibility with children is that we will not be able to obtain a large enough sample to provide evidence for/against the hypothesis that there is more learning in one direction, but we may find evidence for the H1 that there is an effect of session in one condition but for the H0 that there is not and effect of session in the other condition. If so, we will deem this sufficient to address this prediciton.
2AFC picture and 2AFC diacritic
These two tests are very similar to training but with either just pictures or just diacritics. The tests are only done as post tests, not as pre-tests (unlike all of the tests discussed after this one)
Children in the pictures+diacritics condition will get the 2AFC pictures test followed by the 2AFC diacritics test; children in the picturesonly test will get just 2AFC pictures but will get it twice (so that total exposure doesn’t differ across conditions).
Where we just look a conditions separately, for the just pictures test, we will look at all of the data (i.e. for the pictures only condition, from both parts of the repeated test). Where we compare conditions, we will look only use data from the first 2AFC test.
Note: unlike for all the tests that follow, there is no pre-test.
run lmes on relevant pilot data
We didn’t do this test in the pilot, however the test is rather similar to training test so we use model based on just last 24 sessions of training for informing power decisions.
child.train.7.s2.24 = droplevels(subset(child.train, agegroup == "7years" & session =="session2" & order>72))
child.train.7.s2.24 = lizCenter(child.train.7.s2.24, c("tonecontrast"))
child.train.7.s2.s4.mod = glmer (result ~
1+ tonecontrast.ct
+ (1|subject),
data = child.train.7.s2.24, family = binomial, control=glmerControl(optimizer = "bobyqa"))
round(summary(child.train.7.s2.s4.mod)$coefficients,3)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.201 0.106 1.894 0.058
tonecontrast.ct 0.008 0.062 0.124 0.901
child.train.11.s2.24 = droplevels(subset(child.train, agegroup == "11years" & session =="session2" & order>72))
child.train.11.s2.24 = lizCenter(child.train.11.s2.24, c("tonecontrast"))
child.train.11.s2.s4.mod = glmer (result ~
1+ tonecontrast.ct
+ (1|subject),
data = child.train.11.s2.24, family = binomial, control=glmerControl(optimizer = "bobyqa"))
round(summary(child.train.11.s2.s4.mod)$coefficients,3)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.686 0.117 5.839 0.000
tonecontrast.ct -0.075 0.066 -1.144 0.253
adult.train.s2.24 = subset(adult.train, session =="session2" & order>264)
adult.train.s2.24 = lizCenter(adult.train.s2.24, list("tonecontrast"))
adult.train.s2.24.mod = glmer (result ~
1+ tonecontrast.ct
+ (1|subject),
data = adult.train.s2.24, family = binomial, control=glmerControl(optimizer = "bobyqa"))
round(summary(adult.train.s2.24.mod)$coefficients,3)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.284 0.136 9.428 0
tonecontrast.ct 0.218 0.055 3.977 0
Discrimination
This test will be given pre and post. Children will hear three frogs each produce a Mandarin word. One produces the target and the other two each produce the same foil word, which will be a word differeing from the target only in tone.
Each of the 6 one contrasts are tested. We put tonecontrast into the model, since it is likely to have an effect, but don’t have specific hypotheses of interest here. There are four talkers, 3f 1m. Trials in which the m is the odd one out are known to be easiest and those where one of the speakers is m but it is not the odd one out are eaisest. THere are an equal number of each trial type and we put trial type into the model as a factor; though again it isn’t a question for which we have predictions.
For the intial data set collected, we will consider both pre-test -> posttest1 and pre-test -> post-test2 (if equivlaent results obtaned, subsequenting testing will drop final two sessions - see note above)
load up and run lmes on relevant pilot data
The pilot child data below come from same 2-session study as pilot traning data (recall they were trained on stimuli similar to the “picture-only” condition in the current study.)
The adult data below come from a different two- session study where there were two condition (1) trained on stimuli similar to the “picture-only” condition in the current study (2) trained on pictures JUST of the diacritics. This is useful for considering possible effects of condition in the data (though with the caveat that comparison is somewhat different).
Note that for both adults and children in the pilot experiemnt we had both trained and untrained items. For this experiment we use only untrained items in this test so we remove the untrained items from the analyses below.
Note: tonecontrast is fixed effect with 6 levels (six possible contrasts). It is expected to contribute to model and is thus included but isn’t of specific interest.
child.discrim = read.csv("kidsdiscrim_clean.csv")
adult.discrim = read.csv ("adultsdiscrim_clean.csv")
child.discrim$pre_post = relevel(child.discrim$pre_post, ref="pre")
child.discrim$agegroup = as.factor(child.discrim$agegroup)
adult.discrim$pre_post = relevel(adult.discrim$pre_post, ref="pre")
adult.discrim$condition = relevel(adult.discrim$condition, ref="i")
child.discrim$pre_post = relevel(child.discrim$pre_post, ref="pre")
adult.discrim<- droplevels(subset(adult.discrim, neworold == "newword"))
child.discrim<- droplevels(subset(child.discrim, neworold == "newword"))
adult.discrim <- lizCenter(adult.discrim, list("pre_post","trialtype", "condition", "tonecontrast"))
adult.discrim.mod = glmer (correct ~
+ condition.ct * pre_post.ct
+ trialtype.ct
+ tonecontrast.ct
+ (pre_post.ct|participantnumber),
data = adult.discrim, family = binomial, control=glmerControl(optimizer = "bobyqa"))
round(summary(adult.discrim.mod)$coefficients, 3)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.345 0.075 4.615 0.000
condition.ct -0.147 0.149 -0.984 0.325
pre_post.ct 0.287 0.099 2.897 0.004
trialtype.ct 0.066 0.054 1.232 0.218
tonecontrast.ct 0.041 0.013 3.251 0.001
condition.ct:pre_post.ct -0.071 0.198 -0.360 0.719
child.discrim.7 = droplevels(subset(child.discrim, agegroup == "7"))
child.discrim.7 <- lizCenter(child.discrim.7, list("pre_post","trialtype", "condition", "tonecontrast"))
child.discrim.7.mod = glmer (correct ~
+ pre_post.ct
+ trialtype.ct
+ tonecontrast.ct
+ (pre_post.ct|participantnumber),
data = child.discrim.7, family = binomial, control=glmerControl(optimizer = "bobyqa"))
round(summary(child.discrim.7.mod)$coefficients, 3)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.445 0.128 -3.472 0.001
pre_post.ct -0.365 0.228 -1.604 0.109
trialtype.ct -0.252 0.139 -1.815 0.070
tonecontrast.ct 0.115 0.033 3.504 0.000
Prediction1: Improvement from pre-post test in each each condition
Based on
In the pilot data, adults show improvement with session but neither 7 year olds didn’t (nb nor did 11 year olds)
means : 0.55, 0.61; log odds (beta from lme): 0.2868916; odd (exp(beta))1.3322798
7year olds (don’t show this- in fact means reversed )
means : 0.44, 0.36; log odds (beta from lme): -0.3652203; odd (exp(beta))0.6940437
Planned Analyses: Frequentist
lme model similar to that on pilot data above but with data from both conditions; Fit separate slopes for pre-post in each condition
For the intial data set collected we will do this both to compare pre test to post-test 1 and pre-test to post-test2 (see notes above).
Planned Analyses: BF
Summary of data for each condition: mean and se for pre-post from lme value to inform H1: estimate from above pilot data for adults i.e. 0.2868916
Note: this adult value will likely overestimate the means. A better value would come from the new data from children themselves - therefore, if they show learing in one condition we will use that to inform BF for the other condition (and if both show learning we will use to inform BF for each other).
Required sample analyses
Frequentist Power
Note: this analysis included for completeness but is not really relevant since it is based on adult data. If children do show an effect, it is likely smaller and they will have larger SE
dataS=with(droplevels(adult.discrim), tapply(correct,list(participantnumber, pre_post), mean, na.rm=T))
d = cohensD(x = as.numeric(dataS[,1]),y = as.numeric(dataS[,2]), method = "paired")
pwr.t.test(n = NULL, d = d, sig.level = 0.05, power = .9, type = c("paired"), alternative = c( "greater"))
Paired t test power calculation
n = 31.24664
d = 0.5356902
sig.level = 0.05
power = 0.9
alternative = greater
NOTE: n is number of *pairs*
pwr.t.test(n = NULL, d = d, sig.level = 0.05, power = .8, type = c("paired"), alternative = c( "greater"))
Paired t test power calculation
n = 22.95973
d = 0.5356902
sig.level = 0.05
power = 0.8
alternative = greater
NOTE: n is number of *pairs*
pwr.t.test(n = NULL, d = d, sig.level = 0.05, power = .7, type = c("paired"), alternative = c( "greater"))
Paired t test power calculation
n = 17.82436
d = 0.5356902
sig.level = 0.05
power = 0.7
alternative = greater
NOTE: n is number of *pairs*
rm(d)
rm(dataS)
Required sample for BF analyses from pilot data
Here we look at data from 7 year olds in pilot and
meanBF = summary(child.discrim.7.mod)$coefficients["pre_post.ct" ,"Estimate"]
seBF = summary(child.discrim.7.mod)$coefficients["pre_post.ct" ,"Std. Error"]
h1mean = summary(adult.discrim.mod)$coefficients["pre_post.ct" ,"Estimate"]
Bf(seBF, meanBF, uniform =0,meanoftheory=0,sdtheory=h1mean,tail=1)
$LikelihoodTheory
[1] 0.1393478
$Likelihoodnull
[1] 0.4839102
$BayesFactor
[1] 0.2879621
rm(meanBF)
rm(seBF)
rm(h1mean)
Have evidence for the null from just 15 participants. Note however, that we are relatively high estimate for H1, which may make it easier to find evidence for the null.
Prediction2: More improvement in one condition than the other
Based on
We don’t have any specific evidence to base this on. The closest contrasts was in the adult pilot experiment (comparing condition with picture only training versus diacritic only training) where they showed numerically more improvement in the implicit condition, but the difference was NS.
means : 0.56, 0.63, 0.54, 0.59 log odds (beta from lme): -0.0711115 odd (exp(beta))0.931358
For the intial data set collected Ww will do this both to compare pre test to post-test 1 and pre-test to post-test2 (see notes above).
Planned Analyses: Frequentist
lme model similar to that on pilot data above but with data from both conditions; look for interaction between pre-post and condition.
Planned Ananlyses: BF
Summary of data for each condition: mean and se for pre-post*condition from lme value to inform H1: estimate for session from the current model (i.e. summary(new.data)$coefficients[“pre_post.ct” ,“Estimate”]`
Note: we only do this if we have evidence of an effect of pre-post in at least one of the conditions. If they don’t learn in either condition then this test is inappropriate.
Also note that this will be relatively conservative estimate for a large difference in how much the conditions lead to a change pre to post. (again will seek statistial advice here)
Required sample estimates
Frequentist Power
We have no data to look at this (since was no sig. difference in data collected)
Required sample for BF analyses from pilot data
Here we are looking at this with adult data.
meanBF = summary(adult.discrim.mod)$coefficients["condition.ct:pre_post.ct" ,"Estimate"]
seBF = summary(adult.discrim.mod)$coefficients["condition.ct:pre_post.ct" ,"Std. Error"]
h1mean = summary(adult.discrim.mod)$coefficients["pre_post.ct" ,"Estimate"]
Bf(seBF, meanBF, uniform =0,meanoftheory=0,sdtheory=h1mean,tail=2)
$LikelihoodTheory
[1] 1.121749
$Likelihoodnull
[1] 1.89307
$BayesFactor
[1] 0.5925559
plot(Bf_powercalc(seBF, meanBF, uniform =0,meanoftheory=0,sdtheory=h1mean, N=15, min=900, max = 50, tail=2))
abline(h=3)

Bf_powercalc(seBF, meanBF, uniform =0,meanoftheory=0,sdtheory=h1mean, N=15, min=811, max = 815, tail=2)
x y
[1,] 811 3.007075
[2,] 812 3.018251
[3,] 813 3.029471
[4,] 814 3.040735
[5,] 815 3.052043
rm(meanBF)
rm(seBF)
rm(h1mean)
General note: this analysis suggestst that even for adults, wouldn’t need a sample of over 800 participants per condition to see this interaction. This is implausible (though may be partly because are basing on over conservative value for h1).
However this analysis suggests it will be difficult to gain power for this interaction. On the other hand, previous analyses may make it clear that there is no improvement on this test in EITHER condition (see Prediction 1 above). In that that it may be very difficult to power this. One possibility is that we won’t have power for the interaction but will have power to say we have evidence for learning in one condition, but for the other there is more evidence for the null than for H1 based on that condition.
Word repetition
This test is given pre and post. Children hear a word and copy it back. We look to see if they repeat back the tone correctly (they will be coded by native speakers who blind as to both condition and what the target word is; the tone of the transcribed word will compared to tone of target and thus coded as correct/incorrect).
There are equal number of words for each of the four tones.
For the intial data set collected, will consider both pre-test -> posttest1 and pre-test -> post-test2 (if equivalent results obtaned, subsequenting testing will drop final two sessions - see note above)
load up and run lmes on pilot data
The child data below come from same two session study as pilot traning data/discrim data (i.e where they were trained on stimuli similar to the “picture-only” condition in the current study.)
The adult data below do not come from the same two day pilot study as the other adult data used above (that student project didn’t include this test), but instead from a much longer 9 session experiment conducted by current PhD student. There was no contrast of condition of the type relevant to the current study in that data.
Note that for both adults and children in pilot experiments using this test we had both trained and untrained items. For this experiment we use only untrained items in thiis test so we remove the untrained items from the analyses below.
child.wr = read.csv("kidswordrep_clean.csv")
adult.wr = read.csv("adultswordrep_clean.csv")
child.wr = droplevels(subset(child.wr, wordtype == "new"))
adult.wr = droplevels(subset(adult.wr, wordtype == "Untrained"))
child.wr$pre_post = relevel(child.wr$pre_post, ref = "pre")
adult.wr$pre_post = relevel(adult.wr$pre_post, ref = "pre")
child.wr.7 = droplevels(subset(child.wr, Age=="young"))
child.wr.7 <- lizCenter(child.wr.7, list("Age", "pre_post","correcttone"))
child.wr.7.mod = glmer (resulttone ~
+ pre_post.ct
+ correcttone.ct
+ (pre_post.ct|participantname),
data = child.wr.7, family = binomial, control=glmerControl(optimizer = "bobyqa"))
round(summary(child.wr.7.mod)$coefficients,3)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.553 0.147 3.764 0.000
pre_post.ct 0.309 0.141 2.183 0.029
correcttone.ct 0.066 0.043 1.537 0.124
adult.wr <- lizCenter(adult.wr, list( "pre_post","correcttone"))
adult.wr.mod = glmer (resulttone ~
+ pre_post.ct
+ correcttone.ct
+ (pre_post.ct|participantname),
data = adult.wr, family = binomial, control=glmerControl(optimizer = "bobyqa"))
round(summary(adult.wr.mod)$coefficients,3)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.110 0.077 14.321 0
pre_post.ct 0.412 0.118 3.491 0
correcttone.ct -0.142 0.032 -4.409 0
Prediction1: Improvement from pre-post test in each each condition
Based on
Both adults and children showed this in the relevant pilot studies
adult mean : 0.7, 0.77; adult log odds (beta from lme): 0.4117959; adult odd (exp(beta))1.5095264
7year mean :0.59, 0.65; 7year log odds (beta from lme): 0.3085348; 7year odd (exp(beta))1.3614289
Planned Analyses: Frequentist
lme model similar to that on pilot data above but with data from both conditions; Fit separate slopes for pre-post in each condition
For the intial data set collected We will do this both to compare pre test to post-test 1 and pre-test to post-test2 (see notes above).
Planned Analyses: BF
Summary of data for each condition: mean and se for pre-post from lme value to inform H1: estimate from above pilot data for children i.e. 0.3085348
NOTE: If (e.g.) we find an effect for condition1, we can also use the estimate for session for that to inform H1 when looking at data from condition2; and vice versa. This is a better estimate since the conditions are more closely matched to each other than either is to the pilot data/
Required sample analyses
Frequentist Power
Note: this analysis included for completeness but is not really relevant since it is based on adult data. If children do show an effect, it is likely smaller and they will have larger SE
7
dataS = with(droplevels(child.wr.7), tapply(resulttone,list(participantname,pre_post), mean, na.rm=T))
d= cohensD(x = dataS[,1], y = dataS[,2], method = "paired")
pwr.t.test(n = NULL, d = d, sig.level = 0.05, power = .9, type = c("paired"), alternative = c( "greater"))
Paired t test power calculation
n = 32.46102
d = 0.5250962
sig.level = 0.05
power = 0.9
alternative = greater
NOTE: n is number of *pairs*
pwr.t.test(n = NULL, d = d, sig.level = 0.05, power = .8, type = c("paired"), alternative = c( "greater"))
Paired t test power calculation
n = 23.83549
d = 0.5250962
sig.level = 0.05
power = 0.8
alternative = greater
NOTE: n is number of *pairs*
pwr.t.test(n = NULL, d = d, sig.level = 0.05, power = .7, type = c("paired"), alternative = c( "greater"))
Paired t test power calculation
n = 18.48998
d = 0.5250962
sig.level = 0.05
power = 0.7
alternative = greater
NOTE: n is number of *pairs*
rm(dataS)
rm(d)
Sample estimation suggests that N=33 per condition for 90% power.
Required sample for BF analyses from pilot data
Note that here we use the mean from the adult pilot data adults to inform H1 (whereas for actual analyses we will be using value from pilot data with 7 year olds to inform H1)
meanBF = summary(child.wr.7.mod)$coefficients["pre_post.ct" ,"Estimate"]
seBF = summary(child.wr.7.mod)$coefficients["pre_post.ct" ,"Std. Error"]
h1mean = summary(adult.wr.mod)$coefficients["pre_post.ct" ,"Estimate"]
Bf(seBF, meanBF, uniform =0,meanoftheory=0,sdtheory=h1mean,tail=1)
$LikelihoodTheory
[1] 1.398479
$Likelihoodnull
[1] 0.2603624
$BayesFactor
[1] 5.371278
rm(meanBF)
rm(seBF)
We have sufficient evidence from the sample of N=15 children for H1.
Prediction 2: There will be more improvement in one condition by pre-post
Based on
We have no data to base this on as we haven’t previously used this test in an experiment contrasting these (or similar) conditions.
If the presence of the diacritics boosts learning, children may improve more in the diacrticis+picture condition.
Planned Analyses: Frequentist
lme model similar to that on pilot data above but with data from both conditions; look for interaction of pre-post and condition
For the intial data set collected Ww will do this both to compare pre test to post-test 1 and pre-test to post-test2 (see notes above).
Planned Ananlyses: BF
Summary of data for each condition: mean and se for pre-post*condition from lme value to inform H1: estimate for session from the model with the new data (i.e. summary(new.data)$coefficients[“pre_post.ct” ,“Estimate”]`
Note: we only do this if we have evidence of an effect of pre-post in at least one of the conditions. If they don’t learn in either condition then this test is inappropriate.
Also note that this will be relatively conservative estimate for a large difference in how much the conditions lead to a change pre to post.
Required sample analyses
We have no relevant data to can base this on for either frequentist or BF analyses. One possibility is that we won’t be able to get a sufficiently large sample to have power for this interaction. As for other tests, here we may have to draw whatever conculsionsare possible based on which of the conditions individually provides evidence for H1/H0.
