General notes

This contains planned analyses as of 26/11/2017 for a data set which we will start to collect on 27/11/2017.

The study is an artificial language learning study to be conducted with Year 1 children (5-6 year-olds). The experiment compares the effect of entrenching versus preempting forms on children’s judgments of argument-structure overgeneralizations: In natural languages, this relies on the ability to form an abstract generalization that, e.g., not all verbs may appear both in the transitive-causative and the intransive-inchoative construction (e.g., I laughed; I laughed the clown), while several certainly do (e.g., I rolled the ball/ The ball rolled; I bounced the ball/ The ball bounced). Under the entrenchment hypothesis, argument structure overgeneralization errors for a verb (e.g., I laughed the clown) are blocked by the repeated presentation of that verb in one (or more) attested constructions. Under the preemption hypothesis, only nearly-synonymous uses contribute to this inference from absence. While it is likely that both of these hypotheses explain speakers’ ability to form restricted verb argument-structure generalizations, little work has pulled apart the effects of pre-emption and entrenchment. This is partly because overall verb frequency is often very highly correlated with the frequency of that verb in the single most nearly synonymous construction. Artificial language learning experiments allow precise control over such distributional properties of the input, while holding all other factors constant.

Design

This is a two-phase (phase 1: training; phase 2: test) artificial language learning experiment. Training with the language (administered in 4 sessions- note that training was administered in 1 session in the pilot detailed below) is followed by tests that assess (upon training completion) participants’ ability to produce/judge sentences in the language.

Training:

Preamble: Participants are told that “they will hear sentences in Freddie’s language in order to learn to say how things happen in these videos”.

Sentences are learnt through two types of procedures:

• “Copy-only” blocks of trials, whereby participants hear and copy Freddie’s sentences (and view accompanying animations that exemplify what each sentence means)

• “Training-with-recast” blocks of trials whereby participants i) hear the first word in Freddie’s sentence (and view the accompanying animation), ii) they have a go at producing the last word in Freddie’s sentence and iii) hear “how Freddie would have said it”. [Note that adult participants in the pilot detailed below were told that Freddie would give them feedback as to whether they said the last word right (or not), but this will not be the case in follow-up studies: feedback will be more implicit, in the sense that participants will not be told that a mismatch between what Freddie and they say means that they produced the word wrong]

“Copy-only” blocks of trials are followed by “training-with-recast” blocks of trials, for to a total of 16 blocks (half with without recast; half with recast). There are 12 sentences in each block for a total of 192 training trials.

Training stimuli: The artificial sentences consist of a verb (e.g., coomo) obligatorily followed by one of two particles (e.g., gos, kem). For example, “coomo gos” is a sentence that, in one condition, corresponds to a scene showing “Bart dropping a football”.

For each participant, there are three verbs during training (e.g., coomo, chila, tombat): Two that consistently appear with just one particle (e.g., coomo only occur with gos and chila only occurs with kem) [restricted verbs] and one that appears with both particles (e.g., tombat occurs equal number of times with kem and gos) [alternating verb]. The latter “alternating verb” serves to demonstrate that each verb can indeed appear with in both type of constructions.

Particles denote:

my_data = pandoc.table(table1)                          
## 
## ----------------------------------------------------------------------------
##      X                 preemption                     entrenchment          
## ----------- -------------------------------- -------------------------------
##  particle1    verb in transitive causative    verb in transitive causative  
## 
##  particle2   verb in periphrastic causative   verb in transitive inchoative 
## ----------------------------------------------------------------------------

The critical aspect of the training design is that, in the preemption condition, the two argument-structure constructions denote direct external causation, and, as such, they compete for meaning; whereas in the entrenchment condition, only one of the two constructions (causative-1), denotes direct external causation. Importantly, the total number of exposures to each verb type (restricted 1; restricted 2; alternating) is identical between conditions.

Tests:

Production test: Participants are told that they will now see new animations (these are novel scenes that feature new characters; e.g., Marge bouncing a football) and will be asked to produce sentences in order to describe them in Freddie’s language. They are given the verb (e.g., “chila”) and are asked to produce the whole sentence (i.e., verb+ particle, e.g., chila gos), however, unlike the training-with-recast blocks of trials, they do not hear what “Freddie would have said to describe the scene”. Test scenes feature all three trained verbs, as well as a novel verb, animated either causatively (preemption condition) or half of the time causatively and half of the time noncausatively.

Grammaticality judgment test: Participants are told that they will see some more animations and they will have to rate (using smiley faces) how well a sentence in Freddie’s language describes what they see in the animation. As in the production task, test scenes feature all three trained verbs, as well as a novel verb (note that this is different verb from the one used in production), animated either causatively (preemption condition) or half of the time causatively and half of the time uncausatively. Each possible scene is accompanied by a sentence consisting of the correct verb + either of each of the two particles.

We predict that if preempting forms have a greater negative effect on the production acceptability of argument structure overgeneralization errors than merely entrenching forms (as some previous work suggests), participants will have a stronger preference for attested over unattested constructions in the preemption condition relative to the entrenchment condition.

Note on data analyses:

Data will be collected from the training task on days 1-4 and from the production and grammaticality judgment task which are administered on day 4 (as outlined above).

Frequentist statistics

Generalized Linear or Logistics Mixed Effect models will be used to compute frequentist statistics: (i) logistic mixed effect models will be used when we have a binary dependent variable (e.g., correct/incorrect response); and (ii) linear mixed effect models will be used when we have a continuous or ordinal dependent variable. Note that although mixed models allow us to include both participants and items, we do not intend to include items in the analyses since power on this dimension will be low. (NB - (i) we counterbalance verb-type assignment across participants; (ii) it is not common for by items analyses to be included for work in this area; (iii) increasing the number of items in a learning study of this type with children is not feasible). To avoid anti-conservative conclusions (Barr, Levy, Scheepers, & Tily, 2013), we will use a full random effect structure in our models (provided that the models converge), including intercepts for subjects and by-subject random slopes for all within-subject factors and their interactions. Our approach will be to inspect models for effects and interactions between the experimental variables where there are clear predictions.

BF analyses

We will also compute Bayes factors for all key analyses in order to assess whether we have have substantial evidence for the H1 or the null (H0). We will say we have substantial evidence for H1 if BF > 3 and for H0 if BF<(1/3). As advocated by Dienes (2008, 2015), we will model H1 by using either: a) an estimate of a plausible maximum effect. Computing SDs using knowledge of constraints on the likely maximum value is useful in cases where it is hard to obtain an estimate of a predicted effect size, e.g., because there is no previous relevant study with children. In such cases, we will assume that the values obtained from the pilot study with adults, detailed below, are plausible maximums of the values we could expect from children, and we compute SDs to be half of the maximum plausible effects. Note that i) we will test one-sided predictions and that ii) H1s will be modelled as half-normal distributions rather than uniform distributions, since the former consider smaller effects as more likely than bigger ones (which is expected in research with children).

  1. an estimate of the mean for theory, e.g., an estimate of roughly predicted effect size coming from a relevant previous study. We will test one-sided predictions, using estimates as the SD of a half-normal. Note that this means that the maximum we might expect is twice our estimate.

See: https://doi.org/10.1016/j.jml.2017.01.005 for a published paper (at Rscript at http://rpubs.com/ewonnacott/242454) illustrating the approach of using estimates and standard errors which come from logistic mixed effects models: This is useful for binary data since it means you work in log odds space.

Notes on pilot data

Note 1: The models of the pilot data presented below serve to exemplify the approach to be taken with the current data. Pilot data were collected from 20 adults (native English speakers;mean age = 24.67) tested on a nearly identical version of the experiment with the following exceptions: (i) adults were trained and tested in a single session, whereas children will be trained over 4 sessions (administered on 4 consecutive days) and tested at the end of session 4. (ii) during the training-with-recast blocks, adults in the pilot detailed below were given explicit feedback that “Freddie will give them feedback after they have had a go at saying the sentence” whereas, children will be given more implicit feedback that “after theyt have had a go, they will also hear freddie saying it”. (iii) For adults in the preemption condition, the 4 alternating-verb trials per block were paired with animations of all possible 4 agents, such that, in each training block, agents 1 and 2 produced particle 1 whereas agents 3 and 4 produce particle 2. This means that adults were (potentially) learning that the alternating verb occured with particle 1 for some agents, and particle 2 occured with some other agents in a given copying block, and could (potentially) use this information to guess “what Freddie would have said” in each subsequent training-with-recast block. In the main child study, we will reorder alternating-verb trials so that there is no association between agents and particles neither overall (as in the adult study) but also nor at the block level: this will be achieved by using animations for only 2 (rather than 4 agents) in each block, such that, the 4 alternating-verb trials will feature agents 1 and 2 producing either particle 1 or particle 2.

Notes on criteria for participant and trial exclusion

  1. Participant exclusion:
  1. There is a minimum requirement that all participants have succeeded in learning the three verb meanings (e.g., “chila” matches scenes whereby someone is bouncing a ball; “tombat” matches scenes whereby a football is rolling etc.). This requirement is set to ensure that findings are not distorted by the presence of child participants who are not able to cope with the high demands of a fully artificial language learning experiment. To measure this ability, we have devised a brief (6-trial-long) baseline task which involves pointing to the correct (out of three) animations matching a soundfile. This is administered at the end of the experiment. Participants who perform at chance on this task, will be excluded and replaced.

  2. We will also exclude and replace child participants in the entrenchment condition who do not show the expected pattern of producing semantically appropriate particles for the alternating verb (i.e., produced causative particles in response to causative scenes at test (and vice versa)). This is crucial to ensure that overall performance in the entrenchment condition does not come across as somewhat conservative due to the presence of a subset of participants who have not learnt the difference in meaning between the two argument-structure constructions and are therefore conservative (i.e., they just produce verb+ particle 1 if this is what the heard). Importantly, this requirement ensures that we provide a valid comparison between the entrenchment and preemption condition.

  1. Trial exclusion: Our main production analyses will exclude trials where children do not produce a particle that is clearly identifiable as either kem or gos (i.e., the two occurring in the input). For example, these are trials where children produce no particle or trials where children produce something irrelevant such as “chila ball”. Particle mispronunciations which are identifiable as one of the two particles (e.g. a single phoneme substitutions as in kem → ken) will not be excluded. Before exclusion, we will carry out preliminary analyses on the proportion of excluded trials as a function of all production analyses predictors. This approach is consistent with previous artificial language learning studies with child participants (See: https://doi.org/10.1016/j.cogpsych.2017.02.004 for published paper using this approach and Rscript http://rpubs.com/AnnaSamara/248957).

Notes on criteria for sample size determination

Our policy will be as follows: We will collect data from 20 children in the entrenchment condition and will run analyses to assess whether children have picked up on the difference in meaning between the two argument-structure constructions (i.e., particle 1 = verb in causative 1 constuction vs. particle 2 = verb in noncausative construction). Note that this is required to provide a valid comparison between the preemption and entrenchment conditions. If the BF analyses suggest that our results are inconclusive regarding this ability (i.e., 1/3 < BF < 3), we will run power analyses to estimate what size sample might be expected to give BF > 3 or BF < 1/3, given that SE varies as square root (N); We will abort if N > 40, otherwise, we will continue to test more participants (collecting 10 at each step before inspecting the data again) until N = 40.

If we obtain substantial evidence that children are unable to learn the semantics described above (BF < 1/3), we will abort the experiment.

If we obtain substantial evidence that children have learned the semantics described above, we will begin data collection on the preemption condition (beginning with n = 20) and will run the analyses outlined below to compare performance in the entrenchment and preemption condition. If the BF analyses suggest that our results are inconclusive regarding the comparison between entrenchment and preemption, we will follow the procedure outlined above: i) run power analyses to estimate what size sample might be expected to give BF > 3 or BF < 1/3, and ii) continue to test more participants (collecting 10 per condition at each step before inspecting the data again) until we obtain substantial evidence for/again the predicted advantage of preemption over entrenchment.

Load packages and helper functions

Packages

library(akima)
library(compute.es)
library(cowplot)
library(doBy)
library(ez)
library(ggplot2)
library(Hmisc)
library(knitr)
library(languageR)
library(lattice)
library(lme4)
library(multcomp)
library(nlme)
library(pastecs)
library(plotrix)
library(plyr)
library(psych)
library(Rcpp)
library(reshape2)
library(stringdist)

theme_set(theme_bw())

Helper functions

SummarySE

This function can be found on the website “Cookbook for R”.

http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_(ggplot2)/#Helper functions

It summarizes data, giving count, mean, standard deviation, standard error of the mean, and confidence intervals (default 95%).

data: a data frame.

measurevar: the name of a column that contains the variable to be summariezed

groupvars: a vector containing names of columns that contain grouping variables

na.rm: a boolean that indicates whether to ignore NA’s

conf.interval: the percent range of the confidence interval (default is 95%)

summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=FALSE,
                      conf.interval=.95, .drop=TRUE) {
    require(plyr)

    # New version of length which can handle NA's: if na.rm==T, don't count them
    length2 <- function (x, na.rm=FALSE) {
        if (na.rm) sum(!is.na(x))
        else       length(x)
    }

    # This does the summary. For each group's data frame, return a vector with
    # N, mean, and sd
    datac <- ddply(data, groupvars, .drop=.drop,
      .fun = function(xx, col) {
        c(N    = length2(xx[[col]], na.rm=na.rm),
          mean = mean   (xx[[col]], na.rm=na.rm),
          sd   = sd     (xx[[col]], na.rm=na.rm)
        )
      },
      measurevar
    )

    # Rename the "mean" column    
    datac <- rename(datac, c("mean" = measurevar))

    datac$se <- datac$sd / sqrt(datac$N)  # Calculate standard error of the mean

    # Confidence interval multiplier for standard error
    # Calculate t-statistic for confidence interval: 
    # e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1
    ciMult <- qt(conf.interval/2 + .5, datac$N-1)
    datac$ci <- datac$se * ciMult

    return(datac)
}

SummarySEwithin

This function can be found on the website “Cookbook for R”.

http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_(ggplot2)/#Helper functions

From the website:

It summarizes data, handling within-subjects variables by removing inter-subject variability. It will still work if there are no within-S variables. It gives count, un-normed mean, normed mean (with same between-group mean), standard deviation, standard error of the mean, and confidence intervals. If there are within-subject variables, calculate adjusted values using method from Morey (2008).

data: a data frame. measurevar: the name of a column that contains the variable to be summariezed betweenvars: a vector containing names of columns that are between-subjects variables withinvars: a vector containing names of columns that are within-subjects variables idvar: the name of a column that identifies each subject (or matched subjects) na.rm: a boolean that indicates whether to ignore NA’s conf.interval: the percent range of the confidence interval (default is 95%)

summarySEwithin <- function(data=NULL, measurevar, betweenvars=NULL, withinvars=NULL,
                            idvar=NULL, na.rm=FALSE, conf.interval=.95, .drop=TRUE) {

  # Ensure that the betweenvars and withinvars are factors
  factorvars <- vapply(data[, c(betweenvars, withinvars), drop=FALSE],
    FUN=is.factor, FUN.VALUE=logical(1))

  if (!all(factorvars)) {
    nonfactorvars <- names(factorvars)[!factorvars]
    message("Automatically converting the following non-factors to factors: ",
            paste(nonfactorvars, collapse = ", "))
    data[nonfactorvars] <- lapply(data[nonfactorvars], factor)
  }

  # Get the means from the un-normed data
  datac <- summarySE(data, measurevar, groupvars=c(betweenvars, withinvars),
                     na.rm=na.rm, conf.interval=conf.interval, .drop=.drop)

  # Drop all the unused columns (these will be calculated with normed data)
  datac$sd <- NULL
  datac$se <- NULL
  datac$ci <- NULL

  # Norm each subject's data
  ndata <- normDataWithin(data, idvar, measurevar, betweenvars, na.rm, .drop=.drop)

  # This is the name of the new column
  measurevar_n <- paste(measurevar, "_norm", sep="")

  # Collapse the normed data - now we can treat between and within vars the same
  ndatac <- summarySE(ndata, measurevar_n, groupvars=c(betweenvars, withinvars),
                      na.rm=na.rm, conf.interval=conf.interval, .drop=.drop)

  # Apply correction from Morey (2008) to the standard error and confidence interval
  #  Get the product of the number of conditions of within-S variables
  nWithinGroups    <- prod(vapply(ndatac[,withinvars, drop=FALSE], FUN=nlevels,
                           FUN.VALUE=numeric(1)))
  correctionFactor <- sqrt( nWithinGroups / (nWithinGroups-1) )

  # Apply the correction factor
  ndatac$sd <- ndatac$sd * correctionFactor
  ndatac$se <- ndatac$se * correctionFactor
  ndatac$ci <- ndatac$ci * correctionFactor

  # Combine the un-normed means with the normed results
  merge(datac, ndatac)
}

normDataWithin

This function is used by the SummarySEWithin function above. It can be found on the website “Cookbook for R”

http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_(ggplot2)/#Helper functions

From that website:

Norms the data within specified groups in a data frame; it normalizes each subject (identified by idvar) so that they have the same mean, within each group specified by betweenvars.

data: a data frame idvar: the name of a column that identifies each subject (or matched subjects) measurevar: the name of a column that contains the variable to be summarized betweenvars: a vector containing names of columns that are between-subjects variables na.rm: a boolean that indicates whether to ignore NA’s

normDataWithin <- function(data=NULL, idvar, measurevar, betweenvars=NULL,
              na.rm=FALSE, .drop=TRUE) {
  #library(plyr)
  # Measure var on left, idvar + between vars on right of formula.
  data.subjMean <- ddply(data, c(idvar, betweenvars), .drop=.drop,
   .fun = function(xx, col, na.rm) {
    c(subjMean = mean(xx[,col], na.rm=na.rm))
   },
   measurevar,
   na.rm
  )
  # Put the subject means with original data
  data <- merge(data, data.subjMean)
  # Get the normalized data in a new column
  measureNormedVar <- paste(measurevar, "_norm", sep="")
  data[,measureNormedVar] <- data[,measurevar] - data[,"subjMean"] +
                mean(data[,measurevar], na.rm=na.rm)
  # Remove this subject mean column
  data$subjMean <- NULL
  return(data)
}

myCenter

This function outputs the centered values of an variable, which can be a numeric variable, a factor, or a data frame. It was taken from Florian Jaegers blog https://hlplab.wordpress.com/2009/04/27/centering-several-variables/.

From his blog:

-If the input is a numeric variable, the output is the centered variable.

-If the input is a factor, the output is a numeric variable with centered factor level values. That is, the factor’s levels are converted into numerical values in their inherent order (if not specified otherwise, R defaults to alphanumerical order). More specifically, this centers any binary factor so that the value below 0 will be the 1st level of the original factor, and the value above 0 will be the 2nd level.

-If the input is a data frame or matrix, the output is a new matrix of the same dimension and with the centered values and column names that correspond to the colnames() of the input preceded by “c” (e.g. “Variable1” will be “cVariable1”).

myCenter= function(x) {
  if (is.numeric(x)) { return(x - mean(x, na.rm=T)) }
    if (is.factor(x)) {
        x= as.numeric(x)
        return(x - mean(x, na.rm=T))
    }
    if (is.data.frame(x) || is.matrix(x)) {
        m= matrix(nrow=nrow(x), ncol=ncol(x))
        colnames(m)= paste("c", colnames(x), sep="")
    
        for (i in 1:ncol(x)) {
        
            m[,i]= myCenter(x[,i])
        }
        return(as.data.frame(m))
    }
}

lizCenter

This function provides a wrapper around myCenter allowing you to center a specific list of variables from a dataframe. The input is a dataframe (x) and a list of the names of the variables which you wish to center (listfname). The output is a copy of the dataframe with a column (numeric) added for each of the centered variables with each one labelled with it’s previous name with “.ct” appended. For example, if x is a dataframe with columns “a” and “b” lizCenter(x, list(“a”, “b”)) will return a dataframe with two additional columns, a.ct and b.ct, which are numeric, centered codings of the corresponding variables.

lizCenter= function(x, listfname) 
{
    for (i in 1:length(listfname)) 
    {
        fname = as.character(listfname[i])
        x[paste(fname,".ct", sep="")] = myCenter(x[fname])
    }
        
    return(x)
}

Bf

This function is equivalent to the Dienes (2008) calculator which can be found here: http://www.lifesci.sussex.ac.uk/home/Zoltan_Dienes/inference/Bayes.htm.

The code was provided by Baguely and Kayne (2010) and can be found here: http://www.academia.edu/427288/Review_of_Understanding_psychology_as_a_science_An_introduction_to_scientific_and_statistical_inference

Bf<-function(sd, obtained, uniform, lower=0, upper=1, meanoftheory=0,sdtheory=1, tail=2){
 area <- 0
 if(identical(uniform, 1)){
 theta <- lower
 range <- upper - lower
 incr <- range / 2000
 for (A in -1000:1000){
   theta <- theta + incr
   dist_theta <- 1 / range
   height <- dist_theta * dnorm(obtained, theta, sd)
   area <- area + height * incr
 }
 }else
  {theta <- meanoftheory - 5 * sdtheory
  incr <- sdtheory / 200
  for (A in -1000:1000){
   theta <- theta + incr
   dist_theta <- dnorm(theta, meanoftheory, sdtheory)
   if(identical(tail, 1)){
    if (theta <= 0){
     dist_theta <- 0
    } else {
     dist_theta <- dist_theta * 2
    }
   }
   height <- dist_theta * dnorm(obtained, theta, sd)
   area <- area + height * incr
  }
 }
 LikelihoodTheory <- area
 Likelihoodnull <- dnorm(obtained, 0, sd)
 BayesFactor <- LikelihoodTheory / Likelihoodnull
 ret <- list("LikelihoodTheory" = LikelihoodTheory,"Likelihoodnull" = Likelihoodnull, "BayesFactor" = BayesFactor)
 ret
} 

Bf_powercalc

This works with the Bf function above. It requires the same values as that function (i.e. the obtained mean and SE for the current sample, a value for the predicted mean, which is set to be sdtheory (with meanoftheory=0), and the current number of participants N). However, rather than return a BF for the current sample, it works out what the BF would be for a range of different subject numbers (assuming that the SE scales with sqrt(N)).

Bf_powercalc<-function(sd, obtained, uniform, lower=0, upper=1, meanoftheory=0, sdtheory=1, tail=2, N, min, max)
{
 
 x = c(0)
 y = c(0)
 
 for(newN in min : max)
 {
 B = as.numeric(Bf(sd = sd*sqrt(N/newN), obtained, uniform, lower, upper, meanoftheory, sdtheory, tail)[3])
 x= append(x,newN) 
 y= append(y,B)
 output = cbind(x,y)
 
 } 
 output = output[-1,] 
 return(output) 
}

lmedrop

Given an lmer model (model) and one of the coefficients, this returns p value for that coefficient using model comparison (i.e. comparing identical models with and without those terms). It can be used to get p-values when using lmer rather than gmler and dealing with continuous (rather than binonimal) dependent variables.

lmedrop<-function(model, term) {
  model.dropped<-update(model,eval(paste(".~.-",term)));
  anova(model.dropped,model) }

Load pilot datasets

The dataframes training.df, production.df, grammaticality.df, and baseline.df contain pilot data from 20 adults’ performance on the training, production, grammaticality judgment and baseline tasks, respectively.

trainingdata.df <- read.csv("training.csv", header=TRUE)
productiondata.df <- read.csv("production.csv", header=TRUE)
judgmentdata.df <- read.csv("grammaticality.csv", header=TRUE) 
baseline.df <- read.csv("baseline.csv", header=TRUE) 

Data exclusion

As outlined above, we will exclude data from all participants who perform at chance on the baseline task. If applicable, we will also run analyses on and subsequently exclude production trials whereby participants produce something other than particle 1 or particle 2.

Relevant pilot data:

Participants

round(with(baseline.df, tapply(accuracy, list(pt_code), mean, na.rm=T)),2)
##        a4  ad_pre_1 ad_pre_10  ad_pre_2  ad_pre_3  ad_pre_4  ad_pre_5 
##      1.00      1.00      1.00      1.00      1.00      1.00      1.00 
##  ad_pre_6  ad_pre_7  ad_pre_8  ad_pre_9       ad1      ad10       ad2 
##      1.00      1.00      1.00      1.00      1.00      1.00      1.00 
##       ad3       ad5       ad6       ad7       ad8       ad9 
##      1.00      1.00      0.67      1.00      1.00      1.00

No participants are excluded from further analyses. Note that most adults perform at ceiling on the baseline task (accuracy = 100%), which is likely to be different with child data.

Trials

#create appropriate dataset for production in the entrenchment condition
productiondata_entrenchment.df = subset(productiondata.df, condition == "entrenchment")

#code excluded (det_other/none)
productiondata_entrenchment.df$det_excluded <- 0
productiondata_entrenchment.df$det_excluded[productiondata_entrenchment.df$det_lenient_adapted == "det other"] <- 1

#code det included
productiondata_entrenchment.df$det_included <- 0
productiondata_entrenchment.df$det_included[productiondata_entrenchment.df$det_excluded==0] <- 1  

#turn long format  
productiondata_entrenchment.long.df <- melt(productiondata_entrenchment.df, id.vars=c("pt_code", "trial_code", "verb_type_training", "verb_type_test", "old_new"),
                                       measure.vars=c("det_excluded", "det_included"), variable.name="det_produced", value.name="measurement"
)

round(with(productiondata_entrenchment.long.df, tapply(measurement, list(det_produced), mean, na.rm=T)),2)
## det_excluded det_included 
##            0            1

There is no responses where participants produce something other than particle 1 or particle 2. Note that, once again, this is likely to be different with child data.

Research question 1 (Entrenchment condition only): Have children picked up on the difference in meaning between the two argument-structure constructions?

To provide a valid comparison between the preemption and entrenchment conditions, we need to ensure that participants trained with entrenching forms have picked up on the difference in meaning between the two argument-structure constructions. To ensure this baseline has been met, we will (a) examine the semantic appropriateness of children’s particle productions in the production test and (b) look for differences in their acceptability ratings for semantically appropriate and inappropriate particles in the forced choice task. Previous work shows that participants’ productions for verbs that are restricted in the training input are sometimes conservative: that is, they may show a preference for the attested particles, even if they have learnt that particle-usage depends on the underlying semantics. For this reason, we will carry out all baseline analyses on the two unrestricted verbs, i.e., the novel and alternating verbs.

Relevant pilot data

Production performance for the alternating verb

The dependent variable in all subsequent models on production performance is the semantic appropriateness of the particles produced: that is, the proportion of times participants produced causative particles in response to causative scenes at test (and vice versa). There is one predictor, Scene at Test [coded as verb_type_test.ct]. Including this in the model explores whether participants are more correct with one type of scenes (e.g., causatives), which is possible, though not clearly predicted.

A significant intercept would suggest that participants produce the semantically appropriate particles with better than chance accuracy

#Select production data for the alternating verb:
productiondata_entrenchment_alternating.df = subset(productiondata_entrenchment.df, verb_type_training == "alternating")

#Center variables of interest using the lizCenter function:
d_prod_alt = lizCenter(productiondata_entrenchment_alternating.df, list("verb_type_test"))  


#Calculate average percentage of semantically correct performance for the alternating verb:
d_prod_alt_aggregated = summarySEwithin(d_prod_alt, measurevar="semantically_correct", withinvars= "verb_type_test", idvar="pt_code", na.rm=FALSE, conf.interval=.95) 

round(mean(d_prod_alt_aggregated$semantically_correct),2)
## [1] 0.93
# Calculate average percentage of semantically correct performance for the alternating verb separately for causative and noncausative scenes:
round(tapply(d_prod_alt_aggregated$semantically_correct, d_prod_alt_aggregated$verb_type_test, mean),2)
## intransitive   transitive 
##         0.97         0.88
#Run the lme:
a = glmer(semantically_correct ~ verb_type_test.ct + (verb_type_test.ct|pt_code), family =binomial, control=glmerControl(optimizer = "bobyqa"), data = d_prod_alt)
round(summary(a)$coefficients,3)
##                   Estimate Std. Error z value Pr(>|z|)
## (Intercept)         10.309      5.997   1.719    0.086
## verb_type_test.ct   13.531     11.813   1.145    0.252

There is no evidence that adults perform differently with verbs that appear in the causative vs. noncausative scenes. The intercept (93%) correct is not significantly different from chance, however, given that participantsa are almost at ceiling, this seems to be due to lack of power and should be resolved in the main child study where our starting sample will be double (20 participants/condition). (Note that removing verb_type_test.ct from the model yields a significant intercept (b = 9.98, SE = 4.49, z = 2.22 p = .026) in the simplified model).

Production performance for the novel verb

#Select production data for the novel verb:
productiondata_entrenchment_novel.df = subset(productiondata_entrenchment.df, verb_type_training == "new1")

#Center variables of interest using the lizCenter function:
d_prod_novel = lizCenter(productiondata_entrenchment_novel.df, list("verb_type_test"))

#Calculate average percentage of semantically correct performance for the novel verb:
d_prod_novel_aggregated = summarySEwithin(d_prod_novel, measurevar="semantically_correct", withinvars= "verb_type_test", idvar="pt_code", na.rm=FALSE, conf.interval=.95) 

round(mean(d_prod_novel_aggregated$semantically_correct),2)
## [1] 0.9
#Calculate average percentage of semantically correct performance for the novel verb separately for causative and noncausative scenes:

round(tapply(d_prod_novel_aggregated$semantically_correct, d_prod_novel_aggregated$verb_type_test, mean),2)
## intransitive   transitive 
##         0.89         0.91
#Run the lme:
#Note that that random slopes for the within-subject factor "scene at test" needs to be dropped to achieve model convergeance.

a = glmer(semantically_correct ~ verb_type_test.ct + (1|pt_code), family =binomial, control=glmerControl(optimizer = "bobyqa"), data = d_prod_novel)
round(summary(a)$coefficients,3)
##                   Estimate Std. Error z value Pr(>|z|)
## (Intercept)          8.394      3.473   2.417    0.016
## verb_type_test.ct    0.923      1.407   0.656    0.512

As in the analyses for the alternating verb above, there is no evidence that adults perform differently with verbs that appear in the causative vs. noncausative scenes. The intercept (90%) correct is significantly above from chance.

Grammaticality judgment performance for the alternating verb

The dependent variable in these analyses is the mean rating participants give to the novel and alternating verbs. [min = 1, max = 5]. There are two predictors in the models: Semantic Appropriateness of the particle used with a verb at test (i.e., whether a verb that is causative at test is paired with a causative particle, and vice versa); and Scene at Test (causative, noncausative). As in the previous analyses, including Scene at Test in the model explores whether participants have a bias to generalize the causative (or the noncausative). A significant main effect of Semantic Appropriateness would suggest that participants have picked up on the underlying semantics.

judgmentdata_entrenchment.df = subset(judgmentdata.df, condition == "entrenchment")

#turn semantically_correct into a factor
judgmentdata_entrenchment.df$semantically_correct = factor(judgmentdata_entrenchment.df$semantically_correct)

#recode data to give sensible variable names
judgmentdata_entrenchment.df$semantically_correct <- revalue(x = judgmentdata_entrenchment.df$semantically_correct, c("1" = "semantically correct", "0" = "semantically incorrect"))

#Select judgment data for the alternating verb:
judgmentdata_entrenchment_alternating.df = subset(judgmentdata_entrenchment.df, verb_type_training == "alternating")

#Center variables of interest using the lizCenter function:
d_alternating_gr = lizCenter(judgmentdata_entrenchment_alternating.df, list("verb_type_test","semantically_correct"))

#Calculate average ratings for semantically correct and incorrect trials featuring the alternatig verb:

d_alternating_gr_aggregated = summarySEwithin(d_alternating_gr, measurevar="rating_original", withinvars= c("verb_type_test", "semantically_correct"), idvar="pt_code", na.rm=FALSE, conf.interval=.95) 

round(tapply(d_alternating_gr_aggregated$rating_original,d_alternating_gr_aggregated$semantically_correct, mean),2)
## semantically incorrect   semantically correct 
##                   2.03                   4.71
#Calculate average ratings for semantically correct and incorrect trials featuring the alternatig verb separately for causative and noncausative scenes:
round(tapply(d_alternating_gr_aggregated$rating_original, list(d_alternating_gr_aggregated$semantically_correct,d_alternating_gr_aggregated$verb_type_test), mean),2)
##                        intransitive transitive
## semantically incorrect         2.05       2.00
## semantically correct           4.63       4.79
#Run the lme:
a = lmer(rating_original ~ semantically_correct.ct * verb_type_test.ct + (verb_type_test.ct|pt_code), data = d_alternating_gr)
round(summary(a)$coefficients,3)
##                                           Estimate Std. Error t value
## (Intercept)                                  3.383      0.137  24.665
## semantically_correct.ct                      2.684      0.168  15.955
## verb_type_test.ct                            0.055      0.169   0.323
## semantically_correct.ct:verb_type_test.ct    0.211      0.336   0.626
lmedrop(a,"semantically_correct.ct")
## Data: d_alternating_gr
## Models:
## model.dropped: rating_original ~ verb_type_test.ct + (verb_type_test.ct | pt_code) + 
## model.dropped:     semantically_correct.ct:verb_type_test.ct
## model: rating_original ~ semantically_correct.ct * verb_type_test.ct + 
## model:     (verb_type_test.ct | pt_code)
##               Df    AIC    BIC   logLik deviance  Chisq Chi Df Pr(>Chisq)
## model.dropped  7 296.64 312.95 -141.319   282.64                         
## model          8 190.07 208.72  -87.037   174.07 108.57      1  < 2.2e-16
##                  
## model.dropped    
## model         ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Grammaticality judgment performance for the novel verb

#Select judgment data for the alternating verb:
judgmentdata_entrenchment_novel.df = subset(judgmentdata_entrenchment.df, verb_type_training == "new2")

#Center variables of interest using the lizCenter function:
d_novel_gr = lizCenter(judgmentdata_entrenchment_novel.df, list("verb_type_test","semantically_correct"))

#Calculate average ratings for semantically correct and incorrect trials featuring the alternatig verb:
d_novel_gr_aggregated = summarySEwithin(d_novel_gr, measurevar="rating_original", withinvars= c("verb_type_test", "semantically_correct"), idvar="pt_code", na.rm=FALSE, conf.interval=.95) 

round(tapply(d_novel_gr_aggregated$rating_original, d_novel_gr_aggregated$semantically_correct, mean),2)
## semantically incorrect   semantically correct 
##                   1.82                   4.11
#Calculate average ratings for semantically correct and incorrect trials featuring the alternatig verb separately for causative and noncausative scenes:
round(tapply(d_novel_gr_aggregated$rating_original, list(d_novel_gr_aggregated$semantically_correct,d_novel_gr_aggregated$verb_type_test), mean),2)
##                        intransitive transitive
## semantically incorrect         1.84       1.79
## semantically correct           4.16       4.05
#Run the lme:
a = lmer(rating_original ~ semantically_correct.ct * verb_type_test.ct + (verb_type_test.ct|pt_code), data = d_novel_gr)
round(summary(a)$coefficients,3)
##                                           Estimate Std. Error t value
## (Intercept)                                  2.992      0.226  13.261
## semantically_correct.ct                      2.289      0.234   9.803
## verb_type_test.ct                           -0.080      0.234  -0.343
## semantically_correct.ct:verb_type_test.ct   -0.053      0.467  -0.113
lmedrop(a,"semantically_correct.ct")
## Data: d_novel_gr
## Models:
## model.dropped: rating_original ~ verb_type_test.ct + (verb_type_test.ct | pt_code) + 
## model.dropped:     semantically_correct.ct:verb_type_test.ct
## model: rating_original ~ semantically_correct.ct * verb_type_test.ct + 
## model:     (verb_type_test.ct | pt_code)
##               Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
## model.dropped  7 302.34 318.66 -144.17   288.34                         
## model          8 243.28 261.93 -113.64   227.28 61.055      1   5.55e-15
##                  
## model.dropped    
## model         ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In sum, the analyses on both the alternating and novel verb suggest that adults have a preference for scenes that are semantically correct over scenes that are semantically incorrect, i.e., have learnt the difference in meaning between particle 1 and 2.

Planned frequentist analyses for main child study

  1. Production performance for the alternating and novel verb: We will use logistic mixed effect models identical to that used on the pilot data above.
  2. Grammaticality judgment performance for the altenating and novel verb: We will use linear mixed effect models identical to that used on the pilot data above.

Planned Bayes Factor Analyses for main child study

  1. Production performance for the alternating and novel verb:
  1. Summary of data for each type of verb: mean and SE for intercept from lmes
  2. Value to inform H1 for each type of verb: mean of theory = 0; roughly expected maximum effect sizes from above pilot data with adults: 10.309 and 8.394 for the alternating and novel verb, respectively. As outlined in “Note on data analyses”, SDs will be set to half of these max values, i.e., SD = 5.1545 and 4.197 for the alternating and novel verb, respectively.
  1. Grammaticality judgment performance for the altenating and novel verb:
  1. Summary of data for each type of verb: mean and SE for main effect of Semantic Appropriateness from lmes
  2. Value to inform H1 for each type of verb: mean of theory = 0; roughly expected maximum effect sizes from above pilot data with adults: 2.684 and 2.289 for alternating and novel verb, respectively. As outlined in “Note on data analyses”, SDs will be set to half of these max values, i.e., SD = 1.342 and 1.1445 for the alternating and novel verb, respectively.

Research question 2: Is the difference in ratings between attested and unattested sentences [for restricted verbs] bigger in the preemption relative to the entrenchment condition?

Having established that participants pick up on the difference in meaning between the two argument-structure constructions, we will ask: do child participants prefer attested over unattested sentences for verbs that were restricted to one construction in the training input more strongly in the preemption condition relative to the entrenchment condition.

Relevant pilot data

Grammaticality judgment performance for restricted verbs

The dependent variable in these analyses is the mean rating participants give to restricted verbs: these are the two verbs, that during training, only occured with one of the two particles (referred to as construction-1 only and construction-2 only verbs). There are two predictors: condition (i.e., entrenchment versus preemption); and a factor reflecting whether a sentence has been attested with that particle during training (attested) or not (unattested): a) Attested trials: construction1-only verbs appearing with construction1 particles; construction2 only verbs appearing with construction2 particles b) Unattested trials: construction1-only verbs appearing with construction2 particles; construction2 only verbs appearing with construction1 particles

Note that there was no difference in performance for construction-1 only and construction-2 only verbs in the pilot data reported below, thus, this factor has been dropped from the analyses. Similarly, we will test for differences between these verb types in the main child analyses, and drop the factor if no significant effect is found.

d_preemption_vs_entrenchment = subset(judgmentdata.df, restricted_verbs == "yes")

#Center variables of interest using the lizCenter function:
d_preemption_vs_entrenchment = lizCenter(d_preemption_vs_entrenchment, list("attested_unattested","condition"))

#Calculate average ratings for attested and unattested sentences in each condition:
d_preemption_vs_entrenchment_aggregated = summarySEwithin(d_preemption_vs_entrenchment, measurevar="rating_original", withinvars= c("attested_unattested", "condition"), idvar="pt_code", na.rm=FALSE, conf.interval=.95) 

round(tapply(d_preemption_vs_entrenchment_aggregated$rating_original, list(d_preemption_vs_entrenchment_aggregated$attested_unattested,d_preemption_vs_entrenchment_aggregated$condition), mean),2)
##            entrenchment preemption
## attested           3.49       5.00
## unattested         3.26       1.85
#Run the lme:
a = lmer(rating_original ~ condition.ct * attested_unattested.ct + (attested_unattested.ct|pt_code), data = d_preemption_vs_entrenchment)
summary(a)
## Linear mixed model fit by REML ['lmerMod']
## Formula: 
## rating_original ~ condition.ct * attested_unattested.ct + (attested_unattested.ct |  
##     pt_code)
##    Data: d_preemption_vs_entrenchment
## 
## REML criterion at convergence: 994.9
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.19966 -0.32615  0.01509  0.46572  2.24315 
## 
## Random effects:
##  Groups   Name                   Variance Std.Dev. Corr
##  pt_code  (Intercept)            0.1787   0.4228       
##           attested_unattested.ct 0.5979   0.7732   1.00
##  Residual                        1.2736   1.1286       
## Number of obs: 312, groups:  pt_code, 20
## 
## Fixed effects:
##                                     Estimate Std. Error t value
## (Intercept)                          3.39865    0.11427  29.742
## condition.ct                         0.05409    0.22853   0.237
## attested_unattested.ct              -1.72800    0.21530  -8.026
## condition.ct:attested_unattested.ct -2.91884    0.43059  -6.779
## 
## Correlation of Fixed Effects:
##             (Intr) cndtn. atts_.
## conditin.ct 0.015               
## attstd_ntt. 0.667  0.014        
## cndtn.ct:_. 0.014  0.667  0.014
lmedrop(a,"condition.ct*attested_unattested.ct")
## Data: d_preemption_vs_entrenchment
## Models:
## model.dropped: rating_original ~ (attested_unattested.ct | pt_code)
## model: rating_original ~ condition.ct * attested_unattested.ct + (attested_unattested.ct | 
## model:     pt_code)
##               Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
## model.dropped  5 1054.5 1073.2 -522.27  1044.54                         
## model          8 1004.9 1034.9 -494.47   988.93 55.605      3  5.099e-12
##                  
## model.dropped    
## model         ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The key prediction that there would be a greater difference between attested and unattested sentences in the preemption relative to the entrenchment condition (which suggests that preempting forms are more influential than entrenching forms in restricting overgeneralization argument-structure errors) has been confirmed. This value will be used in the main child study as the maximum predicted difference between entrenchment and preemption performance (condition.ct * attested_unattested.ct).

We also want to assess the effect that each condition has on the difference in ratings between attested and unattested sentences:

a = lmer(rating_original ~ 1 + condition:  attested_unattested.ct + condition.ct + (attested_unattested.ct|pt_code), data = d_preemption_vs_entrenchment)
summary(a)
## Linear mixed model fit by REML ['lmerMod']
## Formula: 
## rating_original ~ 1 + condition:attested_unattested.ct + condition.ct +  
##     (attested_unattested.ct | pt_code)
##    Data: d_preemption_vs_entrenchment
## 
## REML criterion at convergence: 994.9
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.19966 -0.32615  0.01509  0.46572  2.24315 
## 
## Random effects:
##  Groups   Name                   Variance Std.Dev. Corr
##  pt_code  (Intercept)            0.1787   0.4228       
##           attested_unattested.ct 0.5979   0.7732   1.00
##  Residual                        1.2736   1.1286       
## Number of obs: 312, groups:  pt_code, 20
## 
## Fixed effects:
##                                              Estimate Std. Error t value
## (Intercept)                                   3.39865    0.11427  29.742
## condition.ct                                  0.05409    0.22853   0.237
## conditionentrenchment:attested_unattested.ct -0.23116    0.30623  -0.755
## conditionpreemption:attested_unattested.ct   -3.15000    0.30271 -10.406
## 
## Correlation of Fixed Effects:
##             (Intr) cndtn. cndtnn:_.
## conditin.ct  0.015                 
## cndtnntr:_.  0.459 -0.471          
## cndtnprm:_.  0.485  0.473  0.000

BF analyses using an estimate from Ambridge et al.’s work with adults that the a roughly expected mean difference between attested and unattested sentences would be about 1 are used to assess whether we have substantial evidence for H1 (higher ratings for attested over unattested sentences) over the null in each condition

#entrenchment: the mean difference between attested and unattested ratigs is 0.23116
Bf(0.30623, 0.23116, uniform = 0, meanoftheory = 0, sdtheory = 1, tail = 1)
## $LikelihoodTheory
## [1] 0.5674259
## 
## $Likelihoodnull
## [1] 0.9797826
## 
## $BayesFactor
## [1] 0.5791345

The BF is < 3 and > 1/3, which suggests that the data are insensitive.

Bf_powercalc(sd = 0.30623, obtained = 0.23116, uniform = 0, meanoftheory=0, sdtheory=1, tail=1, N = 10, min = 60, max = 100)
##         x        y
##  [1,]  60 1.288931
##  [2,]  61 1.317004
##  [3,]  62 1.345805
##  [4,]  63 1.375352
##  [5,]  64 1.405662
##  [6,]  65 1.436755
##  [7,]  66 1.468651
##  [8,]  67 1.501368
##  [9,]  68 1.534929
## [10,]  69 1.569353
## [11,]  70 1.604662
## [12,]  71 1.640879
## [13,]  72 1.678027
## [14,]  73 1.716128
## [15,]  74 1.755209
## [16,]  75 1.795292
## [17,]  76 1.836404
## [18,]  77 1.878572
## [19,]  78 1.921821
## [20,]  79 1.966180
## [21,]  80 2.011677
## [22,]  81 2.058342
## [23,]  82 2.106204
## [24,]  83 2.155295
## [25,]  84 2.205646
## [26,]  85 2.257289
## [27,]  86 2.310258
## [28,]  87 2.364588
## [29,]  88 2.420314
## [30,]  89 2.477471
## [31,]  90 2.536098
## [32,]  91 2.596231
## [33,]  92 2.657912
## [34,]  93 2.721179
## [35,]  94 2.786074
## [36,]  95 2.852640
## [37,]  96 2.920920
## [38,]  97 2.990960
## [39,]  98 3.062804
## [40,]  99 3.136501
## [41,] 100 3.212099

If SE varies as square root, we need approximately 100 participants to have conclusive evidence that attested > unattested in this condition

#preemption: the mean difference between attested and unattested ratigs is 3.15000
Bf(0.30271, 3.15000, uniform = 0, meanoftheory = 0, sdtheory = 1, tail = 1)
## $LikelihoodTheory
## [1] 0.008111892
## 
## $Likelihoodnull
## [1] 4.037698e-24
## 
## $BayesFactor
## [1] 2.009039e+21

We have conclusive evidence that attested > unattested in the preemption condition.

Planned Analyses: Frequentist

We will use lme models identical to those used on the pilot data above to (i) compare the effect of preemption vs. entrenchment on children’s preference for attested over unattested verb argument constructions and (ii) assess to the individual effects that each of these conditions have on children’s ratings.

Planned Bayes Factor Analyses for main child study

  1. Effect of preemption vs. entrenchment on ratings for attested vs. unattested verbs:
  1. Summary of data: mean and SE for the interaction between condition and the variable capturing if a sentence has been attested during training (attested_unattested.ct) from lmes
  2. Value to inform H1 for each type of verb: mean of theory = 0; roughly expected maximum effect size from above pilot data with adults: -2.91884; As outlined in “Note on data analyses”, SD will be set to half of the max value, i.e., SD = -1.45942.
  1. Individual effect of each condition on participants’ ratings for attested over unattested setnences.
  1. Summary of data for each condition: mean and SE for main effect of the “attested_unattested.ct” variable (capturing if a sentence has been attested during training) from lmes in each condition
  2. Value to inform H1 for each condition: mean of theory = 0; roughly expected maximum difference between attested and unattested sentences from the preemption pilot data with adults: 3.15000. As outlined in “Note on data analyses”, SD will be set to half of the max value, i.e., SD = 1.575.

Other exploratory analyses

These are exploratory data analyses with no are no clear predictions.

Training data analyses

We will analyse children’s production accuracy during each of the training-with-recast blocks, for each of the three types of training verbs, in each of the two conditions.

Relevant pilot data

#Select data for the recast training blocks:
trainingdata.df = subset(trainingdata.df, procedure == "training_feedback")

#Center variables of interest using the lizCenter function:
d_training = lizCenter(trainingdata.df, list("block","verb_type_training", "condition"))  

#Run the lme:
a = glmer(training_accuracy ~ condition.ct * verb_type_training.ct * block.ct + (verb_type_training.ct*block.ct|pt_code), family =binomial, control=glmerControl(optimizer = "bobyqa"), data = d_training)
round(summary(a)$coefficients,3)
##                                             Estimate Std. Error z value
## (Intercept)                                    3.987      0.470   8.487
## condition.ct                                  -1.430      0.762  -1.877
## verb_type_training.ct                          2.219      0.449   4.943
## block.ct                                      -0.037      0.071  -0.527
## condition.ct:verb_type_training.ct             0.797      0.688   1.159
## condition.ct:block.ct                         -0.065      0.085  -0.761
## verb_type_training.ct:block.ct                 0.021      0.075   0.276
## condition.ct:verb_type_training.ct:block.ct   -0.038      0.093  -0.408
##                                             Pr(>|z|)
## (Intercept)                                    0.000
## condition.ct                                   0.061
## verb_type_training.ct                          0.000
## block.ct                                       0.599
## condition.ct:verb_type_training.ct             0.247
## condition.ct:block.ct                          0.446
## verb_type_training.ct:block.ct                 0.783
## condition.ct:verb_type_training.ct:block.ct    0.684

Planned Analyses: Frequentist

As above

Planned Bayes Factor Analyses for main child study

No planned BF analyses

Production/grammaticality judgment performance analyses for the two restricted verbs (entrenchment condition only)

In the analyses of restricted verbs outlined above, we do not differentiate between causative-only and noncausatively-only verbs in the entrenchment condition (note that this distinction is not relevant in the pre-emption condition). However, we plan to carry out analyses (for both production and grammaticality judgment) to see if they are more/less accurate with these verb types. These analyses will be akin to those production/grammaticality judgment performance analyses described above for the novel and alternating verb in the entrenchment condition. Note that unlike the analyses for the novel and alternating verb, these may be noninformative regarding children’s knowledge of the underlying semantics. However, it would be interesting to explore whether there is a different pattern of performance in children’s productions and judgments for the two restricted verbs relative to their productions for judgments for the two nonrestricted verbs. If there is a large difference between these verb types (which we do not predict), it may be informative to repeat the comparison between the entrenchment and pre-emption condition condition separately for noncausative-only and causative-only verbs (that is, repeat the analyses with (i) causative-only verbs removed, and (ii) with noncausative-only removed).