Overview of Study
Overview of Statistical Approach
- Frequentist Statistics: Logistic Mixed Effect Models
- Bayes Factors (Dienes, 2008, 2015, UCL workshop July 2017)
Load Packages and Helper Functions
- Packages
- Helper functions
  - myCenter
  - lizCenter
  - get_coeffs
  - Contrasts
  - Bf
  - Bf_powercalc
  - Bf_range
  - logodds
  - percentage
Training
- Load and Run LMEs on Relevant Pilot Data
- Prediction 1: Participants will show above chance performance in the training task
2 AFC Test

This document contains planned analyses for data to be collected from 11 year olds, with testing beginning in the week beginning 25 Sep 2017.

OUr policy for collecting the sample will be as follows:

We will inspect the data after we have approximately 20 participants (Note: this is approximate since this is the number of participants we beleive we can test over two weeks. If we end up with a slightly larger/smaller set, we will inspect the data at that point).

If results for key analyses are inconclusive, resources permitting we will continue to test more participants, collecting approx. 10 per condition at each step before inspecting the data again at each step. We will continue until N = 60, resources permitting.

Note that this approach means that frequentist analyses will not be valid at .05 level, due to stopping problem. The current plan is to nevertheless report with that caveat; BF will be considered the critical analyses.

NOTE: at our first inspection of the data, when N=20, we will also first check that our data meet the following baseline conditions:

That participants as a group show evidence of being above chance in the test in EITHER session1 OR session2 for BOTH segmental and tonal trials
That participants as a group are not at ceiling (> 85%) in session1 for either segmental or tonal trials

If either of these conditions is not met, we will stop the experiment and re-design.

Parents/Guardians will be asked to complete a sleep diary for the night between sessions and a sleep questionnaire that gathers information about normal sleep patterns. We will combine this information to determine whether the child has had a “normal” night of sleep on the night between sessions (we will define “normal” as a sleep duration that is within 2 hours of what is reported the normal sleep duration for a weekday night). Children whose sleep diary suggests that they had an abnormal night of sleep on the night between sessions 1 and 2 will be replaced

Overview of Study

Background

Previous research has shown that memory for new vocabulary items is strengthened following consolidation periods that include sleep (e.g. Dumay & Gaskell, 2007; Henderson, Weighall, Brown, & Gaskell, 2012). However, previous studies have used either rare L1 words, or pseudowords with L1 phonology. To date no study has examined whether consolidation processes contribute equally to both L1 and L2 aspects of vocabulary learning. We explore this here by teaching participants with no prior experience of tonal languages words from Mandarin. We will test learning of both tonal and segmental aspects of the words immediately after learning, and again approximately 24 hours later.

Key Predictions

Learning of segmental information will be strengthened following a consolidation period that includes sleep in both age groups
The effect of consolidation processes on non-native aspects of phonology (i.e. tonal information) has yet to be tested. Our key test compares the the H1 that performance on these trials increases following a consolidation period that includes sleep against the H0 that there is no change in performance between sessions. We test the prediction that they will improve in a similar manner to segmental information.

Participants

At this stage we start with Year 6 (10-11 year olds).

None of the participants will have any prior experience of Mandarin or any other tonal language and no known language impairments, hearing impairments or learning disorders.

Stimuli

Participants will learn 24 items. An equal number of words will be taken from each of the six possible tone contrasts (1-2, 1-3, 1-4, 2-3, 2-4, 3-4).

Design

Each participant will complete two sessions separated by approximately 24 hours

Session 1

English introduction: See picture, hear corresponding English word (to ensure that the correct meaning is associated with each picture)
Training: 2AFC with feedback
Distractor Task: (to minimize short-term recency effects)
Test Task: Either a 2AFC task (children) or a Match/Mismatch task (adults)

Session 2

Test Task: 2AFC task
Sleep Diary and Questionnaire
Language Questionnaire to determine whether participants have any prior experience of Mandarin or any other tonal language

Overview of Statistical Approach

Frequentist Statistics: Logistic Mixed Effect Models

Logistic models are used since in each case we have a dependent variable of correct/incorrect response.
we make comparisons to chance (Training Prediction 1; Test Predictions 1->4) and for these compare the relevant intercepts to chance (50% - i.e. the default reported). For Test, Prediction 5, we need to include session as a fixed effect since we are interested in increases in performance after sleep.
We will include random effects for both participants and items. For participants we will use full random slope structure for the predictor of interest i.e. session (1 v 2). For items, we will automatically include random intercepts but will only include slopes (and correlations between slope) where they add to the model, using the bacKwards identifcation process identified in Matuschek et al 2017 (Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., & Bates, D. (2017). Balancing Type I error and power in linear mixed models. Journal of Memory and Language, 94, 305-315). Note that we have counterbalance whether tonal or segmental information is tested for each item across sessions such that each word occurs just once during each test session.
Which tone contrast is involved for this item (1-2, 1-3, 1-4, 2-3, 2-4, 3-4) is likely to contribute to the model. We include these contrasts as fixed factors, using a coding that ensures that other fixed effects are reported as averaged across all tone contrasts.We won’t include random slopes for this (since it is a control factor)
Which talker produces the item is also likely important. Again we include this in the model, both for training (where there are four talkers) and test (where there are two talkers). Again use a coding that ensures that other fixed effects are reported as averaged across all tone contrasts. Again we dont’ include slopes for this factor (since it is a control factor)

Bayes Factors (Dienes, 2008, 2015, UCL workshop July 2017)

We will work in log odds space to meet assumptions of normality
We will use estimates and standard errors from the logistic mixed effects models (as in http://rpubs.com/ewonnacott/242454 which gives data analysis for this paper https://doi.org/10.1016/j.jml.2017.01.005)
Following Dienes (2008) we will model H1 as a half normal (since our predictions are one tailed). In cases where we have what we consider to be an estimate of the mean for H1 this will be set as the SD of the half normal; where we have what we consider to be maximum value for the H1, we set HALF of this value as the SD of the half normal (so that 2sd is a maxium, but there is a bias for smaller values, thus making it better than using a uniform). Our estimates come from a combination fo the estimates obtained from pilot data/ estimates from elsewhere in the new data collected)
We will say we have substantial evidence for H1 if BF > 3
We will say we have substantial evidence for H0 if BF < (1/3)

Load Packages and Helper Functions

Packages

rm(list=ls())
library(lattice)
suppressPackageStartupMessages(library(lme4))
library(plotrix)
library(plyr)
library(knitr)
library(ggplot2)
library(reshape2)
library(lsr)
library(pwr)

Helper functions

myCenter

This function outputs the centered values of a variable, which can be a numeric variable, a factor, or a data frame. It was taken from Florian Jaeger’s blog.

https://hlplab.wordpress.com/2009/04/27/centering-several-variables/.

From his blog:

If the input is a numeric variable, the output is the centered variable.
If the input is a factor, the output is a numeric variable with centered factor level values. That is, the factor’s levels are converted into numerical values in their inherent order (if not specified otherwise, R defaults to alphanumerical order). More specifically, this centers any binary factor so that the value below 0 will be the 1st level of the original factor, and the value above 0 will be the 2nd level.
If the input is a data frame or matrix, the output is a new matrix of the same dimension and with the centered values and column names that correspond to the colnames() of the input preceded by “c” (e.g. “Variable1” will be “cVariable1”).

myCenter= function(x) {
 if (is.numeric(x)) { return(x - mean(x, na.rm=T)) }
    if (is.factor(x)) {
        x= as.numeric(x)
        return(x - mean(x, na.rm=T))
    }
    if (is.data.frame(x) || is.matrix(x)) {
        m= matrix(nrow=nrow(x), ncol=ncol(x))
        colnames(m)= paste("c", colnames(x), sep="")
    
        for (i in 1:ncol(x)) {
        
            m[,i]= myCenter(x[,i])
        }
        return(as.data.frame(m))
    }
}

lizCenter

This function provides a wrapper around myCenter allowing you to center a specific list of variables from a data frame.

x: data frame
listfname: a list of the variables to be centered (e.g. list(variable1,variable2))

The output is a copy of the data frame with a column (always a numeric variable) added for each of the centered variables. These columns are labelled with the column’s previous name, but with “.ct” appended (e.g., “variable1” will become “variable1.ct”).

lizCenter= function(x, listfname) 
{
    for (i in 1:length(listfname)) 
    {
        fname = as.character(listfname[i])
        x[paste(fname,".ct", sep="")] = myCenter(x[fname])
    }
        
    return(x)
}

get_coeffs

This function allows us to inspect particular coefficients from the output of an lme model by putting them in table.

x: the output returned when running lmer or glmer (i.e. an object of type lmerMod or glmerMod)
list: a list of the names of the coefficients to be extracted (e.g. c(“variable1”, “variable1:variable2”))

get_coeffs <- function(x,list){(as.data.frame(summary(x)$coefficients)[list,])}

Contrasts

lizContrasts

This function can be used used to create two centered dummy variables which stand in place of a three way factor (condition). This allows us to inspect each contrast separately, as well as their interactions with other factors. Other fixed effects in the model can be evaluated as the average effects across all levels of the factor.

The function takes a data frame (d), a factor from that database (condition), which must have three levels, and the name of the level of the factor which is to be used as the baseline for the contrasts (base level).

For example, if d is the data frame with a factor “condition” with three levels “lex_skew” lex_noskew" “mixed” then lizContrasts(d, d$condition, “lex_no_skew”) returns a data frame with two (numeric) columns added labelled “lex_noskew_VERSUS_lex_mixed” and “lex_noskew_VERSUS_lex_skew”. Wherever you would normally use “condition” in a formula in an lme, it can be replaced by (lex_noskew_VERSUS_lex_mixed + “lex_noskew_VERSUS_lex_skew) e.g. ~ (a * condition) becomes ~ (a * (lex_noskew_VERSUS_lex_mixed + lex_noskew_VERSUS_lex_skew)).

lizContrasts= function(d, condition, baselevel) 
{
  
    condition = factor(condition)
  condition = relevel(condition, baselevel)

    a= (contrasts(condition)-apply(contrasts(condition),2,mean))
    d$dummy1[condition== rownames(a)[1]] <- a[1] 
    d$dummy1[condition== rownames(a)[2]] <- a[2] 
    d$dummy1[condition== rownames(a)[3]] <- a[3] 
    
    d$dummy2[condition== rownames(a)[1]] <- a[4] 
    d$dummy2[condition== rownames(a)[2]] <- a[5] 
    d$dummy2[condition== rownames(a)[3]] <- a[6] 

    name1 = paste(baselevel, rownames(a)[2],sep="_VERSUS_")
    name2 = paste(baselevel, rownames(a)[3],sep="_VERSUS_")

    d[name1] = d$dummy1 
    d[name2] = d$dummy2 

    d$dummy1 <-NULL 
    d$dummy2 <-NULL 
    
    return(d)
}

lizContrasts4

This function is like lizContrasts but is used with a factor with four levels

lizContrasts4= function(d, condition, baselevel) 
{

  condition = factor(condition)
  condition = relevel(condition, baselevel)
  
    a= (contrasts(condition)- apply(contrasts(condition),2,mean))
    d$dummy1[condition== rownames(a)[1]] <- a[1] 
    d$dummy1[condition== rownames(a)[2]] <- a[2] 
    d$dummy1[condition== rownames(a)[3]] <- a[3] 
    d$dummy1[condition== rownames(a)[4]] <- a[4] 
    
    d$dummy2[condition== rownames(a)[1]] <- a[5] 
    d$dummy2[condition== rownames(a)[2]] <- a[6] 
    d$dummy2[condition== rownames(a)[3]] <- a[7] 
    d$dummy2[condition== rownames(a)[4]] <- a[8] 
    d$dummy3[condition== rownames(a)[1]] <- a[9] 
    d$dummy3[condition== rownames(a)[2]] <- a[10] 
    d$dummy3[condition== rownames(a)[3]] <- a[11] 
    d$dummy3[condition== rownames(a)[4]] <- a[12] 

    name1 = paste(baselevel, rownames(a)[2],sep="_VERSUS_")
    name2 = paste(baselevel, rownames(a)[3],sep="_VERSUS_")
    name3 = paste(baselevel, rownames(a)[4],sep="_VERSUS_")

    d[name1] = d$dummy1 
    d[name2] = d$dummy2 
    d[name3] = d$dummy3 

    d$dummy1 <-NULL 
    d$dummy2 <-NULL 
    d$dummy3 <-NULL 
    
    return(d)
}

lizContrasts6

This function is like lizContrasts but is used with a factor with six levels

lizContrasts6= function(d, condition, baselevel) 
{
  condition = factor(condition)
  condition = relevel(condition, baselevel)

    a= (contrasts(condition)- apply(contrasts(condition),2,mean))
    d$dummy1[condition== rownames(a)[1]] <- a[1] 
    d$dummy1[condition== rownames(a)[2]] <- a[2] 
    d$dummy1[condition== rownames(a)[3]] <- a[3] 
    d$dummy1[condition== rownames(a)[4]] <- a[4] 
    d$dummy1[condition== rownames(a)[5]] <- a[5] 
    d$dummy1[condition== rownames(a)[6]] <- a[6]    
    
    d$dummy2[condition== rownames(a)[1]] <- a[7] 
    d$dummy2[condition== rownames(a)[2]] <- a[8] 
    d$dummy2[condition== rownames(a)[3]] <- a[9] 
    d$dummy2[condition== rownames(a)[4]] <- a[10] 
    d$dummy2[condition== rownames(a)[5]] <- a[11] 
    d$dummy2[condition== rownames(a)[6]] <- a[12] 
    
    d$dummy3[condition== rownames(a)[1]] <- a[13] 
    d$dummy3[condition== rownames(a)[2]] <- a[14] 
    d$dummy3[condition== rownames(a)[3]] <- a[15] 
    d$dummy3[condition== rownames(a)[4]] <- a[16] 
    d$dummy3[condition== rownames(a)[5]] <- a[17] 
    d$dummy3[condition== rownames(a)[6]] <- a[18]
    
    d$dummy4[condition== rownames(a)[1]] <- a[19] 
    d$dummy4[condition== rownames(a)[2]] <- a[20] 
    d$dummy4[condition== rownames(a)[3]] <- a[21] 
    d$dummy4[condition== rownames(a)[4]] <- a[22] 
    d$dummy4[condition== rownames(a)[5]] <- a[23] 
    d$dummy4[condition== rownames(a)[6]] <- a[24] 
    
    d$dummy5[condition== rownames(a)[1]] <- a[25] 
    d$dummy5[condition== rownames(a)[2]] <- a[26] 
    d$dummy5[condition== rownames(a)[3]] <- a[27] 
    d$dummy5[condition== rownames(a)[4]] <- a[28] 
    d$dummy5[condition== rownames(a)[5]] <- a[29] 
    d$dummy5[condition== rownames(a)[6]] <- a[30] 

    name1 = paste(baselevel, rownames(a)[2],sep="_VERSUS_")
    name2 = paste(baselevel, rownames(a)[3],sep="_VERSUS_")
    name3 = paste(baselevel, rownames(a)[4],sep="_VERSUS_")
  name4 = paste(baselevel, rownames(a)[5],sep="_VERSUS_")
  name5 = paste(baselevel, rownames(a)[6],sep="_VERSUS_")

    d[name1] = d$dummy1 
    d[name2] = d$dummy2 
    d[name3] = d$dummy3 
  d[name4] = d$dummy4   
  d[name5] = d$dummy5   

    d$dummy1 <-NULL 
    d$dummy2 <-NULL 
    d$dummy3 <-NULL 
    d$dummy4 <-NULL 
    d$dummy5 <-NULL 

    return(d)
}

Bf

This function is equivalent to the Dienes (2008) calculator which can be found here: http://www.lifesci.sussex.ac.uk/home/Zoltan_Dienes/inference/Bayes.htm.

The code was provided by Baguely and Kayne (2010) and can be found here: http://www.academia.edu/427288/Review_of_Understanding_psychology_as_a_science_An_introduction_to_scientific_and_statistical_inference

Bf<-function(sd, obtained, uniform, lower=0, upper=1, meanoftheory=0,sdtheory=1, tail=2){
 area <- 0
 if(identical(uniform, 1)){
 theta <- lower
 range <- upper - lower
 incr <- range / 2000
 for (A in -1000:1000){
   theta <- theta + incr
   dist_theta <- 1 / range
   height <- dist_theta * dnorm(obtained, theta, sd)
   area <- area + height * incr
 }
 }else
  {theta <- meanoftheory - 5 * sdtheory
  incr <- sdtheory / 200
  for (A in -1000:1000){
   theta <- theta + incr
   dist_theta <- dnorm(theta, meanoftheory, sdtheory)
   if(identical(tail, 1)){
    if (theta <= 0){
     dist_theta <- 0
    } else {
     dist_theta <- dist_theta * 2
    }
   }
   height <- dist_theta * dnorm(obtained, theta, sd)
   area <- area + height * incr
  }
 }
 LikelihoodTheory <- area
 Likelihoodnull <- dnorm(obtained, 0, sd)
 BayesFactor <- LikelihoodTheory / Likelihoodnull
 ret <- list("LikelihoodTheory" = LikelihoodTheory,"Likelihoodnull" = Likelihoodnull, "BayesFactor" = BayesFactor)
 ret
}

Bf_powercalc

This works with the Bf function above. It requires the same values as that function (i.e. the obtained mean and SE for the current sample, a value for the predicted mean, which is set to be sdtheory (with meanoftheory=0), and the current number of participants N). However rather than return a BF for the current sample, it works out what the BF would be for a range of different subject numbers (assuming that the SE scales with sqrt(N)),

Bf_powercalc<-function(sd, obtained, uniform, lower=0, upper=1, meanoftheory=0, sdtheory=1, tail=2, N, min, max)
{
 
 x = c(0)
 y = c(0)
  # note: working out what the difference between N and df is (for the contrast between two groups, this is 2; for constraints where there is 4 groups this will be 3, etc.) 

 for(newN in min : max)
 {
 B = as.numeric(Bf(sd = sd*sqrt(N/newN), obtained, uniform, lower, upper, meanoftheory, sdtheory, tail)[3])
 x= append(x,newN) 
 y= append(y,B)
 output = cbind(x,y)
 
 } 
 output = output[-1,] 
 return(output) 
}

Bf_range

This works with the Bf function above. It requires some of the same values as that function (i.e. the obtained mean and SE for the current sample, a value for the predicted mean, which is set to be sdtheory (with meanoftheory=0)). However rather than setting sdtheory to a single value, it is set to a vector of values (sdrange) covering the desired range to test over.

Bf_range<-function(sd, obtained, meanoftheory=0, sdtheoryrange, tail=2)
{
 
 x = c(0)
 y = c(0)

 for(sdi in sdtheoryrange)
 {
  B = as.numeric(Bf(sd, obtained, meanoftheory=0, uniform = 0, sdtheory=sdi, tail)[3])
  
  x= append(x,sdi)  
  y= append(y,B)
  output = cbind(x,y)
  
 } 
 output = output[-1,] 
 colnames(output) = c("sdtheory", "BF")
 return(output) 
}

logodds

This function takes a percentage p and returns the logodds value

logodds <- function(p){log(p/(100-p))}

percentage

This function does the reverse of the function above

percentage <- function(logodds)  {(exp(logodds)/(1+exp(logodds)) ) *100}

Training

DV: Each training trial produces a data point (did they pick the correct picture - 1/0; note that participants get feedback on this)

Load and Run LMEs on Relevant Pilot Data

These pilot data come from an undergraduate project. This was a two session study in which 11 year old children (N = 15) were trained on stimuli similar to those used in the current study. Here we use data from the Session 1 training task. Note that children in this study learned only 12 words (half of the number we will be using in the current study), although they had slighlty less exposure (96 trials compared with 144)

Note that the model contains tonecontrast (6 levels) and speaker (4 levels) as fixed effects. The coding using means that intercept is reported as veraged over all the levels of tonecontrast and speaker.

child11.train = read.csv("child11_train_pilot.csv")
child11.train = lizCenter(child11.train, list("speaker"))
child11.train = lizContrasts6(child11.train, child11.train$tonecontrast, "T12")
child11.train = lizContrasts4(child11.train, child11.train$speaker, "tv1")

child11.train.mod = glmer (acc ~
        + (T12_VERSUS_T13 + T12_VERSUS_T14 + T12_VERSUS_T23 + T12_VERSUS_T24 + T12_VERSUS_T34)
        + (tv1_VERSUS_tv2 + tv1_VERSUS_tv3 + tv1_VERSUS_tv4 )
        + (1|pt_N)
        + (1|item)
        ,
        data = child11.train, family = binomial, control=glmerControl(optimizer = "bobyqa"))

round(summary(child11.train.mod)$coefficients,3)

##                Estimate Std. Error z value Pr(>|z|)
## (Intercept)       0.391      0.093   4.192    0.000
## T12_VERSUS_T13   -0.107      0.215  -0.500    0.617
## T12_VERSUS_T14    0.200      0.218   0.921    0.357
## T12_VERSUS_T23   -0.001      0.215  -0.005    0.996
## T12_VERSUS_T24   -0.264      0.214  -1.232    0.218
## T12_VERSUS_T34   -0.194      0.214  -0.908    0.364
## tv1_VERSUS_tv2   -0.229      0.155  -1.472    0.141
## tv1_VERSUS_tv3   -0.345      0.155  -2.232    0.026
## tv1_VERSUS_tv4   -0.252      0.155  -1.625    0.104

Prediction 1: Participants will show above chance performance in the training task

Based on: Above chance performance in all of the pilot data (above)

Planned Frequentist Analyses

We will use a model identical to that above. We will compare each intercept to chance = .50 (i.e. default value returned). Note that our coding of the fixed factors (which are essentially control factors) ensures that the intercept reflects the overall average (rather than at base levels of the other factors).

Planned Bayes Factor Analyses

Summary of data for each age group: mean and SE for intercept from LME. Value to inform H1:

Given that children are learning twice as many words as in the pilot data, we take peformance in the pilot to be maximum we could expect. (Note- we have increased the number of trials from 96-> 144; nevertheless, it seems this is unlikely to be sufficient to lead to equivalent levels of learning).

We therefore take HALF of the estimate from the pilot data - i.e. 0.1954405 and set that as the sd of the half normal.

Required Sample Size

We use t-tests compared to chance to get an idea of what size sample is required. (Note- although lme’s will be used for the actual analyses, we conducted the analyses as though for t-tests over by subject proportions. This means that the values are only approximate, but since our main analyses are BF factors our aim here is to get an idea of approximate suitable sample sizes.)

For Frequentist Analyses

# 11 Year Olds
dataS11 = as.numeric(with(droplevels(child11.train), tapply(acc,list(pt_N), mean, na.rm = T)))
d11 = cohensD(x = dataS11, mu = 0.5)
pwr.t.test(n = NULL, d = d11, sig.level = 0.05, power = .9, type = c("one.sample"), alternative = c( "greater"))

## 
##      One-sample t test power calculation 
## 
##               n = 8.261684
##               d = 1.130056
##       sig.level = 0.05
##           power = 0.9
##     alternative = greater

Suggests with previous study only needed about 10 participants to see above chance learning (BUT note the learning task will be harder so likely under estimates)..

For Bayes Factor Analyses

Here we look at the pilot data from 11 year olds, using H1 informed by the the scale: assume a maxium of 95% estimate for 11 year olds. (In our actual analyses with new data for 11 year olds we will be using an estimate from the current pilot data with 11 year olds to inform H1 but we can’t use the same value for the data itself).

meanBF = summary(child11.train.mod)$coefficients["(Intercept)", "Estimate"]
percentage(meanBF)

## [1] 59.64948

seBF = summary(child11.train.mod)$coefficients["(Intercept)", "Std. Error"]
h1mean = logodds(95)/2
Bf(seBF, meanBF, uniform = 0, meanoftheory = 0, sdtheory = h1mean, tail = 1)

## $LikelihoodTheory
## [1] 0.5222093
## 
## $Likelihoodnull
## [1] 0.0006538978
## 
## $BayesFactor
## [1] 798.6101

round(Bf_powercalc(seBF, meanBF, uniform = 0, meanoftheory = 0, sdtheory = h1mean, tail = 1, N=15, min=2, max=50))

##        x            y
##  [1,]  2            1
##  [2,]  3            2
##  [3,]  4            2
##  [4,]  5            4
##  [5,]  6            6
##  [6,]  7           11
##  [7,]  8           18
##  [8,]  9           31
##  [9,] 10           52
## [10,] 11           89
## [11,] 12          154
## [12,] 13          266
## [13,] 14          460
## [14,] 15          799
## [15,] 16         1389
## [16,] 17         2421
## [17,] 18         4227
## [18,] 19         7391
## [19,] 20        12942
## [20,] 21        22689
## [21,] 22        39822
## [22,] 23        69965
## [23,] 24       123040
## [24,] 25       216564
## [25,] 26       381482
## [26,] 27       672482
## [27,] 28      1186271
## [28,] 29      2093937
## [29,] 30      3698286
## [30,] 31      6535487
## [31,] 32     11555285
## [32,] 33     20440655
## [33,] 34     36174936
## [34,] 35     64048364
## [35,] 36    113444927
## [36,] 37    201015346
## [37,] 38    356312897
## [38,] 39    631806168
## [39,] 40   1120672162
## [40,] 41   1988422586
## [41,] 42   3529129986
## [42,] 43   6265409032
## [43,] 44  11126238622
## [44,] 45  19763288850
## [45,] 46  35113737632
## [46,] 47  62401825983
## [47,] 48 110921472451
## [48,] 49 197209577714
## [49,] 50 350695855269

Estimates that we would have a sufficient sample to get substantial evidence for BF with any sample size from 15 to our maximum sample.

2 AFC Test

DV: Did the participant select the correct word to go with the picture (1/0)?

Pilot data

Pilot data here come from a similar study with adults. In this study, they did not show improvement for either tones or segments, however a concern was that they were to close to ceiling for segments at s1 (they were above 90% correct). Here we use the pilot data (1) to demonstrate the type of frequentist analysis that will be conducted (2) as a way to get maximum values that we could possibly expect from children in this task (given that we know adults generally outperform children in vocab learning tasks)

adult.afc = read.csv("adult_afc_pilot.csv")
adult.afc$session = as.factor(adult.afc$session)
adult.afc <- lizCenter(adult.afc, list("session", "seg_tone"))
adult.afc = lizContrasts6(adult.afc, adult.afc$tonecontrast, "T12")
adult.afc = lizCenter(adult.afc, c("speaker"))

Note in the modesl:

session.ct is a centered coding of the 2-level factor session which ensures other effects are averaged across sessions
In the model for tone, tonecontrast is a factor with 6 levels. As for training, we use a coding here that means that intercept is reported averaged over all the levels of tonecontrastand speaker.
We automatically include by-participant slope for session (if had not converged would have removed correlation between slopes, and if that didn’t converge then also slope for session). We test whether to include slopes for item using a backwards procedure from Matuschek et al (2016)

# Model for segemntal  trials

adult.afc.seg = droplevels(subset(adult.afc, seg_tone == "seg"))

seg.mod1 = glmer (acc ~
        + session.ct
        + speaker.ct
        + (session.ct|pt_N)
              ,
        data = adult.afc.seg, family = binomial, control=glmerControl(optimizer = "bobyqa"))

# note - this converges (if it hadn't would have attempted to remove correlation between slope for session, then slope for session)

# now work out slope structure for items

seg.mod2 = glmer (acc ~
        + session.ct
        + speaker.ct
        + (session.ct|pt_N)
        + (session.ct|item)
              
        ,
        data = adult.afc.seg, family = binomial, control=glmerControl(optimizer = "bobyqa"))


seg.mod3 = glmer (acc ~
        + session.ct
        + speaker.ct
        + (session.ct|pt_N)
        + (session.ct||item)
              
        ,
        data = adult.afc.seg, family = binomial, control=glmerControl(optimizer = "bobyqa"))

anova(seg.mod2, seg.mod3)

## Data: adult.afc.seg
## Models:
## seg.mod3: acc ~ +session.ct + speaker.ct + (session.ct | pt_N) + (session.ct || 
## seg.mod3:     item)
## seg.mod2: acc ~ +session.ct + speaker.ct + (session.ct | pt_N) + (session.ct | 
## seg.mod2:     item)
##          Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
## seg.mod3  8 500.11 537.26 -242.06   484.11                         
## seg.mod2  9 502.10 543.89 -242.05   484.10 0.0156      1     0.9006

seg.mod4 = glmer (acc ~
        + session.ct
        + speaker.ct
        + (session.ct|pt_N)
        + (1|item)
              
        ,
        data = adult.afc.seg, family = binomial, control=glmerControl(optimizer = "bobyqa"))

anova(seg.mod3, seg.mod4)

## Data: adult.afc.seg
## Models:
## seg.mod4: acc ~ +session.ct + speaker.ct + (session.ct | pt_N) + (1 | item)
## seg.mod3: acc ~ +session.ct + speaker.ct + (session.ct | pt_N) + (session.ct || 
## seg.mod3:     item)
##          Df    AIC    BIC  logLik deviance Chisq Chi Df Pr(>Chisq)
## seg.mod4  7 498.11 530.62 -242.06   484.11                        
## seg.mod3  8 500.11 537.26 -242.06   484.11     0      1          1

seg.mod = seg.mod4
summary(seg.mod)$coefficients

##              Estimate Std. Error   z value     Pr(>|z|)
## (Intercept) 2.5728868  0.2659928 9.6727691 3.935816e-22
## session.ct  0.4732217  0.3046554 1.5533014 1.203511e-01
## speaker.ct  0.3708275  0.3739542 0.9916388 3.213738e-01

Note: model comparsion anova(seg.mod2, seg.mod3) and anova(seg.mod3, seg.mod4) both suggested more complex model no better p>.2; thus move to simplest model (in each case, if had seen p<.2 would have stuck with more complex model).

If any of models 2->4 hadn’t model hadn’t converged, would have moved to next model without doing model comparison.

adult.afc.tone = droplevels(subset(adult.afc, seg_tone == "tone"))
adult.afc.tone = lizContrasts6(adult.afc.tone, adult.afc.tone$tonecontrast, "T12")

tone.mod1 = glmer (acc ~
        + session.ct
        + speaker.ct
        + (session.ct|pt_N)
        + (T12_VERSUS_T13 + T12_VERSUS_T14 + T12_VERSUS_T23 + T12_VERSUS_T24 + T12_VERSUS_T34)
      
        ,
        data = adult.afc.tone, family = binomial, control=glmerControl(optimizer = "bobyqa"))

# note - this converges (if it hadn't would have attempted to remove correlation between slope for session, then slope for session)

# now work out slope structure for items

tone.mod2 = glmer (acc ~
        + session.ct
        + speaker.ct
        + (session.ct|pt_N)
        + (session.ct|item)
        + (T12_VERSUS_T13 + T12_VERSUS_T14 + T12_VERSUS_T23 + T12_VERSUS_T24 + T12_VERSUS_T34)
              
        ,
        data = adult.afc.tone, family = binomial, control=glmerControl(optimizer = "bobyqa"))


tone.mod3 = glmer (acc ~
        + session.ct
        + speaker.ct
        + (session.ct|pt_N)
        + (session.ct||item)
        + (T12_VERSUS_T13 + T12_VERSUS_T14 + T12_VERSUS_T23 + T12_VERSUS_T24 + T12_VERSUS_T34)
              
        ,
        data = adult.afc.tone, family = binomial, control=glmerControl(optimizer = "bobyqa"))

anova(tone.mod2, tone.mod3)

## Data: adult.afc.tone
## Models:
## tone.mod3: acc ~ +session.ct + speaker.ct + (session.ct | pt_N) + (session.ct || 
## tone.mod3:     item) + (T12_VERSUS_T13 + T12_VERSUS_T14 + T12_VERSUS_T23 + 
## tone.mod3:     T12_VERSUS_T24 + T12_VERSUS_T34)
## tone.mod2: acc ~ +session.ct + speaker.ct + (session.ct | pt_N) + (session.ct | 
## tone.mod2:     item) + (T12_VERSUS_T13 + T12_VERSUS_T14 + T12_VERSUS_T23 + 
## tone.mod2:     T12_VERSUS_T24 + T12_VERSUS_T34)
##           Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)  
## tone.mod3 13 829.51 889.88 -401.75   803.51                           
## tone.mod2 14 827.66 892.68 -399.83   799.66 3.8429      1    0.04996 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

tone.mod = tone.mod2
summary(tone.mod)$coefficients

##                    Estimate Std. Error     z value     Pr(>|z|)
## (Intercept)     1.314122863  0.1598278  8.22211665 1.999426e-16
## session.ct      0.172917093  0.2288411  0.75562087 4.498765e-01
## speaker.ct     -0.109354022  0.2808912 -0.38931091 6.970462e-01
## T12_VERSUS_T13  1.454157553  0.3889422  3.73875019 1.849374e-04
## T12_VERSUS_T14 -0.033756914  0.3282164 -0.10284955 9.180824e-01
## T12_VERSUS_T23  1.387102046  0.3958987  3.50367936 4.588775e-04
## T12_VERSUS_T24 -0.007205016  0.3299009 -0.02183994 9.825756e-01
## T12_VERSUS_T34  1.220404399  0.3877344  3.14752651 1.646582e-03

Model2 is better than model3, so we stick with more complex model.

Prediction 1-4: Above Chance Performance on both tonal and segmental trials both in s1 and s2

Prediction 1: Participants will show above chance performance in session 1 for the segmental trials
Prediction 2: Participants will show above chance performance in session 2 for the segmental trials
Prediction 3: Participants will show above chance performance in session 1 for the tonal trials
Prediction 4: Participants will show above chance performance in session 2 for the tonal trials

Predicton 1 and Prediciton 2 are based on previous vocabulary learning experiments with children. Prediction 3 and Prediction 4 are based on the fact that in the pilot study we saw above chance in the training in session 1, which also taps learning of tone.

Planned Frequetist Analyses

We will run two models - one for segmental trials, one for tone trials - identical to the models from the adult pilot data above (with the same procedure will be used to identity slope structure).

We will then re-run each model such that we fit separate intercepts for each trial type.

seg.modA = glmer (acc ~
        -1
        + session
        + speaker.ct
        + (session.ct|pt_N)
        + (1|item)
           
        ,
        data = adult.afc.seg, family = binomial, control=glmerControl(optimizer = "bobyqa"))

anova(seg.mod, seg.modA)

## Data: adult.afc.seg
## Models:
## seg.mod: acc ~ +session.ct + speaker.ct + (session.ct | pt_N) + (1 | item)
## seg.modA: acc ~ -1 + session + speaker.ct + (session.ct | pt_N) + (1 | 
## seg.modA:     item)
##          Df    AIC    BIC  logLik deviance Chisq Chi Df Pr(>Chisq)    
## seg.mod   7 498.11 530.62 -242.06   484.11                            
## seg.modA  7 498.11 530.62 -242.06   484.11     0      0  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(seg.modA)$coefficients

##             Estimate Std. Error   z value     Pr(>|z|)
## session1   2.3362760  0.2824170 8.2724344 1.312511e-16
## session2   2.8094979  0.3288658 8.5429921 1.307904e-17
## speaker.ct 0.3708278  0.3739525 0.9916441 3.213712e-01

round(get_coeffs(seg.modA, c("session1", "session2")),3)

##          Estimate Std. Error z value Pr(>|z|)
## session1    2.336      0.282   8.272        0
## session2    2.809      0.329   8.543        0

(Note that model comparison demonstrated that this is the same model as the original)

tone.modA = glmer (acc ~
        -1
        + session
        + speaker.ct
        + (T12_VERSUS_T13 + T12_VERSUS_T14 + T12_VERSUS_T23 + T12_VERSUS_T24 + T12_VERSUS_T34)
        + (session.ct|pt_N)
        
        + (session.ct|item)
        
        ,
        data = adult.afc.tone, family = binomial, control=glmerControl(optimizer = "bobyqa"))

anova(tone.mod, tone.modA)

## Data: adult.afc.tone
## Models:
## tone.mod: acc ~ +session.ct + speaker.ct + (session.ct | pt_N) + (session.ct | 
## tone.mod:     item) + (T12_VERSUS_T13 + T12_VERSUS_T14 + T12_VERSUS_T23 + 
## tone.mod:     T12_VERSUS_T24 + T12_VERSUS_T34)
## tone.modA: acc ~ -1 + session + speaker.ct + (T12_VERSUS_T13 + T12_VERSUS_T14 + 
## tone.modA:     T12_VERSUS_T23 + T12_VERSUS_T24 + T12_VERSUS_T34) + (session.ct | 
## tone.modA:     pt_N) + (session.ct | item)
##           Df    AIC    BIC  logLik deviance Chisq Chi Df Pr(>Chisq)    
## tone.mod  14 827.66 892.68 -399.83   799.66                            
## tone.modA 14 827.66 892.68 -399.83   799.66     0      0  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(tone.modA)$coefficients

##                    Estimate Std. Error    z value     Pr(>|z|)
## session1        1.227664427  0.1632314  7.5210075 5.435573e-14
## session2        1.400581543  0.2250097  6.2245383 4.829761e-10
## speaker.ct     -0.109353990  0.2808908 -0.3893114 6.970458e-01
## T12_VERSUS_T13  1.454158073  0.3889445  3.7387294 1.849526e-04
## T12_VERSUS_T14 -0.033756628  0.3282168 -0.1028486 9.180831e-01
## T12_VERSUS_T23  1.387102213  0.3959011  3.5036588 4.589129e-04
## T12_VERSUS_T24 -0.007204982  0.3299030 -0.0218397 9.825758e-01
## T12_VERSUS_T34  1.220405017  0.3877364  3.1475120 1.646663e-03

round(get_coeffs(tone.modA, c("session1", "session2")),3)

##          Estimate Std. Error z value Pr(>|z|)
## session1    1.228      0.163   7.521        0
## session2    1.401      0.225   6.225        0

Planned Bayes Factor Analyses

Summary of data: mean and SE for each intercept from the relevant logistic models for the data Values to inform H1:

We use the values from the previous adult model to inform our H1 - setting these as maxiumums. For consistency, we will just use the adult values from session1 i.e. for Predictions 1 and 2, we will set the sd for the half normal for h1 to be equal to summary(seg.modA)$coefficients["session1", "Estimate"]/2 and for pred3 and pred4 to be equal to summary(seg.modA)$coefficients[“session1”, “Estimate”]/2

The values are as follows (also show the equvialent percentage corrects, which are easier to think about)

percentage(summary(tone.modA)$coefficients["session1", "Estimate"])

## [1] 77.34095

percentage(summary(seg.modA)$coefficients["session1", "Estimate"])

## [1] 91.18372

summary(tone.modA)$coefficients["session1", "Estimate"]/2

## [1] 0.6138322

summary(seg.modA)$coefficients["session1", "Estimate"]/2

## [1] 1.168138

Note that better values may be obtained from within the new data. So if, for example, we see above chance performance on segments, we can use the estimate from the segments model to inform the H1 for the tone data. Similary if we see learning in s1, we can use that to inform analyses of s2.

Here we demonstrate using intercepts from s1 to inform hypothesis about s2 for each of tones and segments

meanBF = summary(tone.modA)$coefficients["session1", "Estimate"]
seBF = summary(tone.modA)$coefficients["session1", "Std. Error"]
h1mean = summary(tone.modA)$coefficients["session2", "Estimate"]
Bf(seBF, meanBF, uniform = 0, meanoftheory = 0, sdtheory = h1mean, tail = 1)

## $LikelihoodTheory
## [1] 0.387346
## 
## $Likelihoodnull
## [1] 1.273657e-12
## 
## $BayesFactor
## [1] 304121126584

Sample size

We know that children will find this much harder than adults, so power analyses based on that data don’t really make sense here. (Note that when we do re-do this with adults, we plan to deliberately use a harder type of test)

Predictions 5-6: Improvement from s1 to s2 for both tonal and segmental trials

(Note that these predictions are key to the purpose of our study)

Prediction 5: Accuracy will increase between sessions for segmental trials
Prediction 6: Accuracy will increase between sessions for tonal trials

Prediction 5 is based largely on previous research showing a onsolidation benefit for nonwords with L1 phonology (e.g. Dumay & Gaskell, 2007; Henderson et al., 2012) (Note - in pilot data above, session wasn’t signficant but majority participants (43/62) were at ceiling at pre-test (>85% correct). Of those invidiuals who were not at ceiling in s1 , 11/15 improved s1->s2 and the remaining 4/15 had the same score).

Planned Frequetist Analyses

We will again use models equivlaent to those used for the pilot data above. Will be looking for effect of session.ct

Planned Bayes Factor Analyses

Summary of data: mean and SE for session.ct Values to inform H1:

Approach 1 (using pilot data to give a maximum):

For each of tone/segments:

Will calculate a maximum value using this logic: maximum possible difference is constrained by what they actually got at s1 and the maximum we could possibly imagine them getting at s2.

So we need two values:

estimate of actual performance in s1
estimate of maximum performance we could ever expect on this test

we will obtain these as follows:

will come from the estimates we obtained as models of our data when testing predictions 1 and 3 above.
will come from the adult pilot data here. For consistency, we will use the same values we used when modelling H1 previously i.e.

summary(tone.modA)$coefficients["session1", "Estimate"]

## [1] 1.227664

summary(seg.modA)$coefficients["session1", "Estimate"]

## [1] 2.336276

So for our model of H1, we will set mean to 0 and sd to

((B)-(A))/2

(the /2 is because this is clearly a maximum we could expect)

Approach 2:

More appropriate values may come from within the data themselves.

In particular, it seems likely that we might find a difference between s1->s2 for segemental trials but not tonal trials. In that case, we can use the value of session from the segment model to inform the H1 when looking at the effect of s1 -> s2 for tone trials

Memory Consolidation Effects on Tone Learning

Helen Brown & Elizabeth Wonnacott

21 September 2017

Overview of Study

Background

Key Predictions

Participants

Stimuli

Design

Overview of Statistical Approach

Frequentist Statistics: Logistic Mixed Effect Models

Bayes Factors (Dienes, 2008, 2015, UCL workshop July 2017)

Load Packages and Helper Functions

Packages

Helper functions

myCenter

lizCenter

get_coeffs

Contrasts

lizContrasts

lizContrasts4

lizContrasts6

Bf

Bf_powercalc

Bf_range

logodds

percentage

Training

Load and Run LMEs on Relevant Pilot Data

Prediction 1: Participants will show above chance performance in the training task

Planned Frequentist Analyses

Planned Bayes Factor Analyses

Required Sample Size

For Frequentist Analyses

For Bayes Factor Analyses

2 AFC Test

Pilot data

Prediction 1-4: Above Chance Performance on both tonal and segmental trials both in s1 and s2

Planned Frequetist Analyses

Planned Bayes Factor Analyses

Sample size

Predictions 5-6: Improvement from s1 to s2 for both tonal and segmental trials

Planned Frequetist Analyses

Planned Bayes Factor Analyses