General notes

This document is an update on the planned analyses preregistered at http://rpubs.com/AnnaSamara/333562

Design and procedure

We investigate how Year 1 children (5-6 year-olds) learn restrictions on the morpho-syntactic behaviour of certain lexical items. For example, roll can appear in both the transitive and periphrastic causative construction (e.g., The man rolled the ball / The man made the ball roll), while destroy is restricted to the former (e.g., The man destroyed the city / The man made the city destroy) and laugh is restricted to the latter (e.g., The man laughed the girl / The man made the girl laugh [where *denotes an ungrammatical utterance]).

We test two proposals:

  1. Entrenchment. The more often a particular item (e.g., laugh) appears, regardless of construction (e.g., including in non-causative, intransitive constructions, such as The girl laughed; That girl is always laughing), the greater learners’ inference that it cannot appear in unwitnessed constructions (e.g., *The man laughed the girl).

  2. Preemption is similar, except that ungrammatical uses (e.g., *The man laughed the girl) are probabilistically blocked not by any uses of the target item, but only items that constitute a near paraphrase of – and hence compete semantically with – the error (e.g., The man laughed the girl).

• As before, we test our proposals in the contect of a two-phase (phase 1: training; phase 2: test) artificial language learning experiment.

Unlike the previous study:

• we use novel but onomatopoiec nouns (e.g., mouse is squeako, pig is oinko, cow is moo-o etc.) (instead of novel verbs) and static pictures of animals (instead video animations) • training occurs over 2 (instead of 4) days, followed by tests administered at the end of day 2.

Training procedure

Preamble: Children are told that “they will hear sentences in Freddie’s language in order to learn to describe some pictures”.

The language is learnt through two types of procedures:

• “Copy-only” training blocks of trials, whereby participants hear and copy Freddie’s sentences (each consisting of a noun+determiner, e.g., squeako bup) (and view accompanying pictures that exemplify what each sentence means, e.g, a picture of one mouse)

• “Training-with-recast” blocks of trials whereby participants i) hear the first word in Freddie’s sentence (and view the accompanying picture), ii) they have a go at producing the last word in Freddie’s sentence (i.e., the determiner) and iii) receive implicit feedback by hearing “how Freddie would have said it”.

Each “copy-only” blocks of trials is followed by a “training-with-recast” block of trials, for a total of 8 blocks (half with without recast; half with recast). There are 14 sentences in each block for a total of 112 training trials.

Training stimuli

(Noun type and determiner assignment is counterbalanced across participants; for clarity, we describe here one list of stimuli).

Preemption condition

• Alternating noun: Children hear an alternating noun (e.g, in one counterbalanced condition: a squeako i.e., a mouse) in singular -bup form (8 presentations per session), plural -gos form (8 presentations per session) and plural -kem form (8 presentations per session). The main purpose of this noun is to demonstrate to learners that at least some nouns can be pluralized with either -kem or -gos, in free variation (e.g., both squeako-kem and squeako-gos mean ‘several mice’). It also constitutes a baseline for comparison with performance on the nonalternating nouns.

Restricted nouns: • Plural1-form-unwitnessed noun: Children hear a nonalternating noun (e.g., in one counterbalanced condition: a woofo, i.e., a dog) in both singular -bup form (8 presentations per session) and plural -gos form (8 presentations per session), but never in plural -kem form.

• Plural2-form-unwitnessed noun: Children hear a second non-alternating noun (e.g., in one counterbalanced condition: a moo-o i.e., a cow) in both singular -bup form (8 presentations per session) and plural -kem form (8 presentations per session), but never in plural -gos form.

Entrenchment condition

• Alternating noun: Children hear an alternating noun identical to the preemption condition.

Restricted nouns

• Plural1-form-unwitnessed noun: Children hear an (in principle) non-alternating noun (e.g., in one counterbalanced condition: a woofo, i.e., a adog) in singular -bup form only (16 presentations per session) but no plural forms at all.

• Plural2-form-unwitnessed noun: Children hear a second (in principle) non-alternating noun (e.g., in one counterbalanced condition: a moo-o, i.e., a cow) in singular -bup form only (16 presentations per session) but no plural forms at all.

Tests

(Administered upon completion of training; order of task administration is counterbalanced across participants)

  1. Production test: Children are told that they will now see new pictures (e.g., plural scenes feature 18 instead of 12 dogs) and will be asked to produce sentences in order to describe them in Freddie’s language. They are given the noun (e.g., “squeako”) and are asked to produce the whole sentence (i.e., noun + determiner, e.g., squeako gos) without feedback. Test scenes feature all three trained nouns, as well as a novel noun: Half of test stimuli show one animal and half show several animals.

  2. Grammaticality judgment test: Children are told that they will see some more pictures and they will have to rate (using smiley faces) how well a sentence in Freddie’s language describes what they see in the picture. Test scenes feature all three trained nouns, as well as a novel noun (note that this is a different noun from the one used in production): Half of test stimuli show one animal and half show several animals. The accompanying sentences in each trial consist of the correct noun + either of each of the three determiners (e.g., oinko gos, oinko bup, oinko kem).

Predictions

We will assess the following predictions:

Prediction 1a (preemption condition): Effect of preemption on children’s ratings of witnessed/unwitnessed forms

• For the two non-alternating nouns, the witnessed forms are in perfect semantic competition with the unwitnessed forms. I.e., children have frequently heard woofo-gos as a plural but never woofo-kem as plural (and similarly, they have frequently heard moo-o kem in plural but never moo-o gos in plural). Therefore, they should dislike:

o woofo-kem (relative to both woofo-gos and squeako-kem)

o moo-o-gos (relative to both moo-o-kem and squeako-gos)

Prediction 2a (entrenchment condition): Effect of entrenchment on children’s ratings of witnessed/unwitnessed forms

• Because children have frequently heard woofo and moo-o, but never in -gos OR in -kem form to indicate plurality, the Entrenchment account predicts that, children should dislike:

o woofo-kem (relative to both woofo-gos and squeako-kem)

o moo-o-gos (relative to both moo-o-kem and squeako-gos)

o woofo-gos (relative to both squeako-gos and squeako-kem)

o moo-o-gos (relative to both squeako-gos and squeako-kem)

Prediction 1b (preemption condition): Effect of preemption on children’s test productions of witnessed/unwitnessed forms [note that this will be addressed by exploratory analyses]

• For the two nonalternating nouns, the witnessed forms are in perfect semantic competition with the unwitnessed forms. I.e., children have frequently heard woofo-gos as a plural but never woofo-kem as plural (and similarly, they have frequently heard moo-o kem in plural but never moo-o gos in plural). Therefore, they should fail to produce:

o woofo-kem (relative to both woofo-gos and squeako-kem)

o moo-o-gos (relative to both moo-o-kem and squeako-gos)

Prediction 2b (entrenchment condition): Effect of entrenchment on children’s test productions of witnessed/unwitnessed forms [note that this will be addressed by exploratory analyses]

• Because children have frequently heard woofo and moo-o, but never in -gos OR in -kem form to indicate plurality, the Entrenchment account predicts that, children should fail to produce:

o woofo-kem (relative to both woofo-gos and squeako-kem)

o moo-o-gos (relative to both moo-o-kem and squeako-gos)

o woofo-gos (relative to both squeako-gos and squeako-kem)

o moo-o-gos (relative to both squeako-gos and squeako-kem)

Predictions 3a & 3b: Effect of preemption versus entrenchment on chidren’s ratings and test productions of witnessed/unwitnessed forms

Prediction 3a (Key analyses)

• If Preemption is more powerful than Entrenchment, then the Preemption condition will show, relative to the Entrenchment condition, a greater dislike of:

o woofo-kem (relative to both woofo-gos and squeako-kem).

o moo-o-gos (relative to both moo-o-kem and squeako-gos)

This is because, in the Preemption condition, these forms are blocked by witnessed directly-semantically-competing uses (i.e., woofo-gos and moo-o-kem), rather than uses that do not directly compete with these target forms (Entrenchment condition: woofo-bup and moo-o-bup, which mean ‘one dog’/”one cow”).

Prediction 3b (Exploratory analyses)

In addition to the key analyses on children’s judgment ratings, we will also assess children’s productions, although it is somewhat unclear how the different possible unwitnessed responses (e.g., producing nothing/other/the singular determiner for entrenchment vs. producing nothing/other/the unattested plural/the singular determiner for preemption) map onto the predictions of entrenchment vs. preemption. We also lack a theoretically informed estimate (or an estimate of a plausible maximum) regarding the production difference between attested over unattested sentences in children in the preemption condition. As such, we will leave this as a more exploratory analysis.

Other predictions regarding control/manipulation-check test trials

• In both the Preemption and Entrenchment conditions, children should like (judgment task) and happily produce (production task) o singulars with the singular determiner, for all three trained nouns, and a novel noun o both plural determiners for the alternating noun, and a novel noun.

• Furthermore, the two conditions would not be expected to differ in performance on these trials; at least according to the Preemption and Entrenchment hypotheses - they may differ as a result of learners’ forming different overhypotheses (Perfors et al, 2010) about the extent of (ir)regularity in the language.

Note on data analyses:

Data will be collected from the training, production and grammaticality judgment task.

Frequentist statistics

We will adopt a Bayesian approach to statistical analyses using the R package brms (Bürkner, 2017) to avoid model non-convergence (a common problem with conventional frequentist mixed effects models fitted using lme4). Another advantage of adopting a Bayesian approach, is that Bayesian models yield ‘p’ values ( pMCMC values) and credible intervals (cf. frequentist confidence intervals) that, unlike their frequentist counterparts, can be interpreted intuitively: The pMCMC value represents the probability that the true size of the effect is (for positive effects) zero or lower (for negative effects, zero or higher). The 95% credible interval represents an interval which contains, with 95% probability, the true value of the effect in question. Logistic mixed effect models will be used when we have a binary dependent variable (e.g., semantically correct/incorrect response); and linear mixed effect models will be used when we have a continuous or ordinal dependent variable.

All predictors will be scaled into standard deviation units (z-scores) allowing us to use the same relatively uninformative prior (M = 0, SD = 1) across all predictors. This prior is chosen simply on the basis that, in a normal distribution centered around zero, the majority of observations (roughly 68%) fall with one standard deviation of the mean (95% within two). We will report simultaneous models, which demonstrate the effect of each main effect (or interaction) above and beyond all the other predictors included in the model (e.g., Wurm & Fisicaro, 2014).

BF analyses

We will follow-up our key frequentist analyses (i.e., the analyses assessing prediction 1a, 2a, and 3a) using BF analyses. Following Dienes (personal communication), we will also compute ranges of values over which substantial Bayes Factors hold.

We will model H1 by using either:

  1. an estimate of the mean for theory, e.g., an estimate of roughly predicted effect size coming from a relevant previous study. We will test one-sided predictions, using estimates as the SD of a half-normal. Note that this means that the maximum we might expect is twice our estimate.

OR

  1. an estimate of a plausible maximum effect. Computing SDs using knowledge of constraints on the likely maximum value is useful in cases where it is hard to obtain an estimate of a predicted effect size, e.g., because there is no previous relevant study with children. In such cases, we will use values that are plausible maximums of the values we could expect, and we compute SDs to be half of the maximum plausible effects. We will test one-sided predictions and that ii) H1s will be modelled as half-normal distributions rather than uniform distributions, since the former consider smaller effects as more likely than bigger ones (which is expected in research with children).

Notes on criteria for participant and trial exclusion

  1. Participant exclusion:
  1. There is a minimum requirement that all participants have succeeded in learning the three noun meanings (e.g., “squeako” = mouse; “oinko” = pig etc.). To measure this ability, we have devised a brief (6-trial-long) baseline task which involves pointing to the correct (out of three) pictures matching a soundfile. This is administered at the end of the experiment. Participants who perform at chance on this task, will be excluded and replaced. (This is in line with what we did preregistered for the verb training study, even though we anticipate that the onomatopeic nouns will be easier to learn relative to the fully artificial verbs used in that study).

  2. We will also exclude from our key analyses and replace child participants in the entrenchment condition who have not learnt the new underlying semantics (i.e., determiner 1 = one animal vs. determiner 2/3 = several animals): specifically, we will exclude chidren whose test-phase production performance of semantically appropriate determiners for the alternating noun is less than 60%. This is crucial to ensure that overall performance in the entrenchment condition does not come across as somewhat conservative due to the presence of a subset of participants who have not learnt the semantics and are therefore conservative (i.e., they just produce the noun followed by the determiner they have heard it with). This requirement ensures that we provide a valid comparison between the entrenchment and preemption condition.

(We will also assess the semamtic appropriateness of children’s test-phase productions for the novel noun as well as their judgment score differences in semantically appropriate vs. inappropriate trials for the alternating and novel noun; however, we will not exclude further participants on the basis of these results: this is because a) previous work in our lab suggests that children are often ‘thrown’ by novelty and may hesitate to produce anything at all in response to scenes featuring nouns that never occured during training; and b) there is no objective threshold of how ‘big’ of a rating difference = succesful learning of the semantics.

  1. Trial exclusion: Our main production analyses will exclude trials where children do not produce a determiner that is clearly identifiable as bup, kem or tid (i.e., the three determiners occurring in the input). For example, these are trials where children produce no determiner or trials where children produce something irrelevant such as “squeako cute”. Determiner mispronunciations which are identifiable as one of the determiners (e.g. a single phoneme substitutions as in kem → ken) will not be excluded. If more than 5% trials are to be excluded, before we remove them from any further analyses, we will carry out preliminary analyses on the proportion of excluded trials as a function of all production analyses predictors. This approach is consistent with previous artificial language learning studies with child participants (See: https://doi.org/10.1016/j.cogpsych.2017.02.004 for published paper using this approach and Rscript http://rpubs.com/AnnaSamara/248957).

Notes on criteria for sample size determination

Our policy will be as follows: We will collect data from 20 children in the entrenchment condition and will run analyses to assess whether children have picked up on the difference in meaning between singular and plural determiners (i.e., determiner 1 = singular scene (e.g., 1 animal) vs. determiners 2/3 = plural scene). This is required to provide a valid comparison between the preemption and entrenchment conditions. If the BF analyses suggest that our results are inconclusive regarding this ability (i.e., 1/3 < BF < 3), we will run power analyses to estimate what size sample might be expected to give BF > 3 or BF < 1/3, given that SE varies as square root (N); We will abort if N > 40, otherwise, we will continue to test more participants (collecting 10 at each step before inspecting the data again) until N = 40.

If we obtain substantial evidence that children are unable to learn the semantics described above (BF < 1/3), we will abort the experiment.

If we obtain substantial evidence that children have learned the semantics described above, we will begin data collection on the preemption condition (beginning with n = 20) and will run the analyses outlined below to compare performance in the entrenchment and preemption condition. If the BF analyses suggest that our results are inconclusive regarding the comparison between entrenchment and preemption, we will follow the procedure outlined above: i) run power analyses to estimate what size sample might be expected to give BF > 3 or BF < 1/3, and ii) continue to test more participants (collecting 10 per condition at each step before inspecting the data again) until we obtain substantial evidence for/again the predicted advantage of preemption over entrenchment.

Relevant pilot data

Some relevant pilot data collected from 18 6-year-old children (native English speakers; mean age = 72.71 months) tested on the same experiment (entrenchment condition) are presented below. (Note that the pilot data have been analyzed using conventional frequentist mixed effects models fitted using lme4, whereas our main child data will be analysed using brms; see ##Note on data analyses: frequentist statistics)

Load packages and helper functions

Packages

library(akima)
library(compute.es)
library(ggplot2)
library(cowplot)
library(doBy)
library(ez)
library(Hmisc)
library(knitr)
library(languageR)
library(lattice)
library(lme4)
library(multcomp)
library(nlme)
library(pastecs)
library(plotrix)
library(plyr)
library(psych)
library(Rcpp)
library(reshape2)
library(stringdist)
library(brms)

theme_set(theme_bw())

Helper functions

SummarySE

This function can be found on the website “Cookbook for R”.

http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_(ggplot2)/#Helper functions

It summarizes data, giving count, mean, standard deviation, standard error of the mean, and confidence intervals (default 95%).

data: a data frame.

measurevar: the name of a column that contains the variable to be summariezed

groupvars: a vector containing names of columns that contain grouping variables

na.rm: a boolean that indicates whether to ignore NA’s

conf.interval: the percent range of the confidence interval (default is 95%)

summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=FALSE,
                      conf.interval=.95, .drop=TRUE) {
    require(plyr)

    # New version of length which can handle NA's: if na.rm==T, don't count them
    length2 <- function (x, na.rm=FALSE) {
        if (na.rm) sum(!is.na(x))
        else       length(x)
    }

    # This does the summary. For each group's data frame, return a vector with
    # N, mean, and sd
    datac <- ddply(data, groupvars, .drop=.drop,
      .fun = function(xx, col) {
        c(N    = length2(xx[[col]], na.rm=na.rm),
          mean = mean   (xx[[col]], na.rm=na.rm),
          sd   = sd     (xx[[col]], na.rm=na.rm)
        )
      },
      measurevar
    )

    # Rename the "mean" column    
    datac <- rename(datac, c("mean" = measurevar))

    datac$se <- datac$sd / sqrt(datac$N)  # Calculate standard error of the mean

    # Confidence interval multiplier for standard error
    # Calculate t-statistic for confidence interval: 
    # e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1
    ciMult <- qt(conf.interval/2 + .5, datac$N-1)
    datac$ci <- datac$se * ciMult

    return(datac)
}

SummarySEwithin

This function can be found on the website “Cookbook for R”.

http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_(ggplot2)/#Helper functions

From the website:

It summarizes data, handling within-subjects variables by removing inter-subject variability. It will still work if there are no within-S variables. It gives count, un-normed mean, normed mean (with same between-group mean), standard deviation, standard error of the mean, and confidence intervals. If there are within-subject variables, calculate adjusted values using method from Morey (2008).

data: a data frame. measurevar: the name of a column that contains the variable to be summariezed betweenvars: a vector containing names of columns that are between-subjects variables withinvars: a vector containing names of columns that are within-subjects variables idvar: the name of a column that identifies each subject (or matched subjects) na.rm: a boolean that indicates whether to ignore NA’s conf.interval: the percent range of the confidence interval (default is 95%)

summarySEwithin <- function(data=NULL, measurevar, betweenvars=NULL, withinvars=NULL,
                            idvar=NULL, na.rm=FALSE, conf.interval=.95, .drop=TRUE) {

  # Ensure that the betweenvars and withinvars are factors
  factorvars <- vapply(data[, c(betweenvars, withinvars), drop=FALSE],
    FUN=is.factor, FUN.VALUE=logical(1))

  if (!all(factorvars)) {
    nonfactorvars <- names(factorvars)[!factorvars]
    message("Automatically converting the following non-factors to factors: ",
            paste(nonfactorvars, collapse = ", "))
    data[nonfactorvars] <- lapply(data[nonfactorvars], factor)
  }

  # Get the means from the un-normed data
  datac <- summarySE(data, measurevar, groupvars=c(betweenvars, withinvars),
                     na.rm=na.rm, conf.interval=conf.interval, .drop=.drop)

  # Drop all the unused columns (these will be calculated with normed data)
  datac$sd <- NULL
  datac$se <- NULL
  datac$ci <- NULL

  # Norm each subject's data
  ndata <- normDataWithin(data, idvar, measurevar, betweenvars, na.rm, .drop=.drop)

  # This is the name of the new column
  measurevar_n <- paste(measurevar, "_norm", sep="")

  # Collapse the normed data - now we can treat between and within vars the same
  ndatac <- summarySE(ndata, measurevar_n, groupvars=c(betweenvars, withinvars),
                      na.rm=na.rm, conf.interval=conf.interval, .drop=.drop)

  # Apply correction from Morey (2008) to the standard error and confidence interval
  #  Get the product of the number of conditions of within-S variables
  nWithinGroups    <- prod(vapply(ndatac[,withinvars, drop=FALSE], FUN=nlevels,
                           FUN.VALUE=numeric(1)))
  correctionFactor <- sqrt( nWithinGroups / (nWithinGroups-1) )

  # Apply the correction factor
  ndatac$sd <- ndatac$sd * correctionFactor
  ndatac$se <- ndatac$se * correctionFactor
  ndatac$ci <- ndatac$ci * correctionFactor

  # Combine the un-normed means with the normed results
  merge(datac, ndatac)
}

normDataWithin

This function is used by the SummarySEWithin function above. It can be found on the website “Cookbook for R”

http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_(ggplot2)/#Helper functions

From that website:

Norms the data within specified groups in a data frame; it normalizes each subject (identified by idvar) so that they have the same mean, within each group specified by betweenvars.

data: a data frame idvar: the name of a column that identifies each subject (or matched subjects) measurevar: the name of a column that contains the variable to be summarized betweenvars: a vector containing names of columns that are between-subjects variables na.rm: a boolean that indicates whether to ignore NA’s

normDataWithin <- function(data=NULL, idvar, measurevar, betweenvars=NULL,
              na.rm=FALSE, .drop=TRUE) {
  #library(plyr)
  # Measure var on left, idvar + between vars on right of formula.
  data.subjMean <- ddply(data, c(idvar, betweenvars), .drop=.drop,
   .fun = function(xx, col, na.rm) {
    c(subjMean = mean(xx[,col], na.rm=na.rm))
   },
   measurevar,
   na.rm
  )
  # Put the subject means with original data
  data <- merge(data, data.subjMean)
  # Get the normalized data in a new column
  measureNormedVar <- paste(measurevar, "_norm", sep="")
  data[,measureNormedVar] <- data[,measurevar] - data[,"subjMean"] +
                mean(data[,measurevar], na.rm=na.rm)
  # Remove this subject mean column
  data$subjMean <- NULL
  return(data)
}

myCenter

This function outputs the centered values of an variable, which can be a numeric variable, a factor, or a data frame. It was taken from Florian Jaegers blog https://hlplab.wordpress.com/2009/04/27/centering-several-variables/.

From his blog:

-If the input is a numeric variable, the output is the centered variable.

-If the input is a factor, the output is a numeric variable with centered factor level values. That is, the factor’s levels are converted into numerical values in their inherent order (if not specified otherwise, R defaults to alphanumerical order). More specifically, this centers any binary factor so that the value below 0 will be the 1st level of the original factor, and the value above 0 will be the 2nd level.

-If the input is a data frame or matrix, the output is a new matrix of the same dimension and with the centered values and column names that correspond to the colnames() of the input preceded by “c” (e.g. “Variable1” will be “cVariable1”).

myCenter= function(x) {
  if (is.numeric(x)) { return(x - mean(x, na.rm=T)) }
    if (is.factor(x)) {
        x= as.numeric(x)
        return(x - mean(x, na.rm=T))
    }
    if (is.data.frame(x) || is.matrix(x)) {
        m= matrix(nrow=nrow(x), ncol=ncol(x))
        colnames(m)= paste("c", colnames(x), sep="")
    
        for (i in 1:ncol(x)) {
        
            m[,i]= myCenter(x[,i])
        }
        return(as.data.frame(m))
    }
}

lizCenter

This function provides a wrapper around myCenter allowing you to center a specific list of variables from a dataframe. The input is a dataframe (x) and a list of the names of the variables which you wish to center (listfname). The output is a copy of the dataframe with a column (numeric) added for each of the centered variables with each one labelled with it’s previous name with “.ct” appended. For example, if x is a dataframe with columns “a” and “b” lizCenter(x, list(“a”, “b”)) will return a dataframe with two additional columns, a.ct and b.ct, which are numeric, centered codings of the corresponding variables.

lizCenter= function(x, listfname) 
{
    for (i in 1:length(listfname)) 
    {
        fname = as.character(listfname[i])
        x[paste(fname,".ct", sep="")] = myCenter(x[fname])
    }
        
    return(x)
}

Bf

This function is equivalent to the Dienes (2008) calculator which can be found here: http://www.lifesci.sussex.ac.uk/home/Zoltan_Dienes/inference/Bayes.htm.

The code was provided by Baguely and Kayne (2010) and can be found here: http://www.academia.edu/427288/Review_of_Understanding_psychology_as_a_science_An_introduction_to_scientific_and_statistical_inference

Bf<-function(sd, obtained, uniform, lower=0, upper=1, meanoftheory=0,sdtheory=1, tail=2){
 area <- 0
 if(identical(uniform, 1)){
 theta <- lower
 range <- upper - lower
 incr <- range / 2000
 for (A in -1000:1000){
   theta <- theta + incr
   dist_theta <- 1 / range
   height <- dist_theta * dnorm(obtained, theta, sd)
   area <- area + height * incr
 }
 }else
  {theta <- meanoftheory - 5 * sdtheory
  incr <- sdtheory / 200
  for (A in -1000:1000){
   theta <- theta + incr
   dist_theta <- dnorm(theta, meanoftheory, sdtheory)
   if(identical(tail, 1)){
    if (theta <= 0){
     dist_theta <- 0
    } else {
     dist_theta <- dist_theta * 2
    }
   }
   height <- dist_theta * dnorm(obtained, theta, sd)
   area <- area + height * incr
  }
 }
 LikelihoodTheory <- area
 Likelihoodnull <- dnorm(obtained, 0, sd)
 BayesFactor <- LikelihoodTheory / Likelihoodnull
 ret <- list("LikelihoodTheory" = LikelihoodTheory,"Likelihoodnull" = Likelihoodnull, "BayesFactor" = BayesFactor)
 ret
} 

Bf_powercalc

This works with the Bf function above. It requires the same values as that function (i.e. the obtained mean and SE for the current sample, a value for the predicted mean, which is set to be sdtheory (with meanoftheory=0), and the current number of participants N). However, rather than return a BF for the current sample, it works out what the BF would be for a range of different subject numbers (assuming that the SE scales with sqrt(N)).

Bf_powercalc<-function(sd, obtained, uniform, lower=0, upper=1, meanoftheory=0, sdtheory=1, tail=2, N, min, max)
{
 
 x = c(0)
 y = c(0)
 
 for(newN in min : max)
 {
 B = as.numeric(Bf(sd = sd*sqrt(N/newN), obtained, uniform, lower, upper, meanoftheory, sdtheory, tail)[3])
 x= append(x,newN) 
 y= append(y,B)
 output = cbind(x,y)
 
 } 
 output = output[-1,] 
 return(output) 
}

lmedrop

Given an lmer model (model) and one of the coefficients, this returns p value for that coefficient using model comparison (i.e. comparing identical models with and without those terms). It can be used to get p-values when using lmer rather than gmler and dealing with continuous (rather than binonimal) dependent variables.

lmedrop<-function(model, term) {
  model.dropped<-update(model,eval(paste(".~.-",term)));
  anova(model.dropped,model) }

Load pilot datasets

The dataframes training.df, production.df, grammaticality.df, and baseline.df contain pilot data from 18 children’s performance on the training, production, grammaticality judgment and baseline tasks, respectively.

trainingdata.df <- read.csv("training.csv", header=TRUE)
productiondata.df <- read.csv("production.csv", header=TRUE)
judgmentdata.df <- read.csv("grammaticality.csv", header=TRUE) 
baseline.df <- read.csv("baseline.csv", header=TRUE) 

1. Data exclusion

As outlined above, we exclude data from all participants who perform at chance on the baseline task. We also exclude production trials whereby participants produce something other than determiner 1, 2, or 3 in our key production analyses.

Relevant pilot data:

Participants’ performance on the baseline task

round(with(baseline.df, tapply(accuracy, list(pt_code), mean, na.rm=T)),2)
##  ch1 ch10 ch11 ch1M  ch2 ch2M  ch3 ch3M  ch4 ch4M  ch5 ch5M  ch6 ch6M ch7M 
##    1    1    1    1    1    1    1    1    1    1    1    1    1    1    1 
##  ch8 ch8M  ch9 
##    1    1    1

All participants are at ceiling, thus, none are excluded from further analyses on the basis of their baseline task performance.

2. Trial exclusion

#code excluded (det_other/none)
productiondata.df$det_excluded <- 0
productiondata.df$det_excluded[productiondata.df$det_lenient_adapted == "other"] <- 1

productiondata.df$det_excluded[productiondata.df$det_lenient_adapted == "none"] <- 1

#code det included
productiondata.df$det_included <- 0
productiondata.df$det_included[productiondata.df$det_excluded==0] <- 1  

#turn long format  
productiondata.long.df <- melt(productiondata.df, id.vars=c("pt_code", "trial_code", "noun_type_training3", "noun_type_test"),
                                       measure.vars=c("det_excluded", "det_included"), variable.name="det_produced", value.name="measurement"
)

round(with(productiondata.long.df, tapply(measurement, list(det_produced), mean, na.rm=T)),4)
## det_excluded det_included 
##       0.0312       0.9688

There is less than 5% responses where participants produce something other than one of the three determiners or nothing at all. These will be excluded from our key production analyses. No further analyses are carried out on this data.

3. Baseline check: Have children picked up on the difference in meaning between the determiner 1 vs. determiners 2/3?

To ensure this baseline has been met, we will examine the semantic appropriateness of children’s determiners during test-phase productions in response to scenes featuring the alternating noun. Participants who perform with less than 60% accuracy will be excluded.

Planned frequentist analyses for main child study

  1. Production performance for the alternating and novel noun (performance with the alternating verb is our baseline for participant exclusion): We will use Bayesian (logistic) mixed effect models. The dependent variable will be the semantic appropriateness of the determiners produced: that is, whether participants produced a singular or one of the two plural determiners as required by the picture. There is one predictor, Scene at Test [coded as noun_type_test.ct]. Including this in the model explores whether participants are more correct with one type of scenes (e.g., singular scenes), which is possible, though not clearly predicted. A significant intercept would suggest that participants produce the semantically appropriate determiners with better than chance accuracy.

  2. Grammaticality judgment performance for the altenating and novel noun (no participant exclusion on the basis of these results): We will use Bayesian (linear) mixed effect models. The dependent variable will be the mean rating participants give to the novel and alternating nouns [min = 1, max = 5]. There are two predictors in the models: Semantic Appropriateness of the determiner used with a noun at test (i.e., whether a noun that is singular at test is paired with a singular determiner, and vice versa); and Scene at Test (singular, plural). As in the previous analyses, including Scene at Test in the model explores whether participants have a preference for the singular (or the plural). A significant main effect of Semantic Appropriateness would suggest that participants have picked up on the underlying semantics.

Planned Bayes Factor Analyses for main child study

• Production performance for the alternating noun: Are participants producing the semantically appropriate determiners in test scenes feauturing the alternating noun with better than chance (50%) accuracy?

  1. Summary of data: mean and SE for intercept from bayesian lmes

  2. Value to inform H1: mean of theory = 0; roughly predicted effect size from pilot data with chidren below: 88.89% correct

• Production performance for the novel noun (no participant exclusion on the basis of these results): Are participants producing the semantically appropriate determiners in test scenes featuring the novel noun with better than chance (50%) accuracy?

  1. Summary of data: mean and SE for intercept from bayesian lmes

  2. Value to inform H1: mean of theory = 0; roughly predicted effect size from pilot data with chidren below: 81.52% correct

• Grammaticality judgment performance for the altenating and novel nouns: Are participants judging semantically appropriate trials that feature the alternating/novel noun higher than semantically inappropriate trials?

  1. Summary of data for each type of noun: mean and SE for main effect of Semantic Appropriateness from bayesian lmes

  2. Value to inform H1 for each type of noun: mean of theory = 0; roughly predicted difference between ratings for semantically appropriate vs. inappropriate trials from pilot data with chidren below: 1.694 and 1.569 for alternating and novel noun, respectively.

Relevant pilot data (using lmes; main analyses will use brms)

Production performance for the alternating noun

#Select production data for the alternating noun:
aggregated.means_alt = subset(productiondata.df, noun_type_training3 == "alternating")

aggregated.means_alt = subset(aggregated.means_alt, semantically_correct =="1" | semantically_correct =="0")

write.csv(aggregated.means_alt,"a.csv") 

#str(aggregated.means_alt)
#Calculate average percentage of semantically correct performance for the alternating noun separately for singular and plural test scenes
aggregated.means_alt1 = aggregate(semantically_correct ~ noun_type_test + pt_code, aggregated.means_alt, FUN=mean)

# Means separately for singular and plural test scenes
Means_alt.df<- summarySEwithin(aggregated.means_alt1, measurevar="semantically_correct", withinvars= c("noun_type_test"), na.rm=FALSE, conf.interval=.95)

round(tapply(Means_alt.df$semantically_correct, Means_alt.df$noun_type_test, mean),4)
##   plural singular 
##   1.0000   0.7778
# Means across singular and plural test scenes
round(mean(Means_alt.df$semantically_correct),4)
## [1] 0.8889
b = lizCenter(aggregated.means_alt, list("noun_type_test"))

#Run the model:
b1 = glmer(semantically_correct ~ noun_type_test.ct + (noun_type_test.ct|pt_code), family =binomial, control=glmerControl(optimizer = "bobyqa"), data = b)

#model does not converge

b2 = glmer(semantically_correct ~ noun_type_test.ct + (1|pt_code), family =binomial, control=glmerControl(optimizer = "bobyqa"), data = b)

#simplified model does not converge either. non-converge will be a problem with brms

# Calculate performance per child to identify those at chance:

Means_alt_kids.df<- summarySEwithin(aggregated.means_alt1, measurevar="semantically_correct", withinvars= c("pt_code"), na.rm=FALSE, conf.interval=.95)

round(tapply(Means_alt_kids.df$semantically_correct, Means_alt_kids.df$pt_code, mean),4)
##   ch1  ch10  ch11  ch1M   ch2  ch2M   ch3  ch3M   ch4  ch4M   ch5  ch5M 
## 1.000 0.875 0.750 0.875 1.000 1.000 1.000 1.000 1.000 0.750 0.875 1.000 
##   ch6  ch6M  ch7M   ch8  ch8M   ch9 
## 1.000 0.875 0.625 0.500 0.875 1.000
# All but one participant (ch8) performed above the 60% threshold. ch8 will be removed from the analyses assessing predictions 1b and 3a&b.

Production performance for the novel noun

#Select production data for the novel noun:
aggregated.means_nov = subset(productiondata.df, noun_type_training3 == "novel1")

#Calculate average percentage of semantically correct performance for the alternating noun separately for singular and plural test scenes
aggregated.means_nov1 = aggregate(semantically_correct ~ noun_type_test + pt_code, aggregated.means_nov, FUN=mean)

# Means separately for singular and plural test scenes
Means_nov.df<- summarySEwithin(aggregated.means_nov1, measurevar="semantically_correct", withinvars= c("noun_type_test"), na.rm=FALSE, conf.interval=.95)

round(tapply(Means_nov.df$semantically_correct, Means_nov.df$noun_type_test, mean),4)
##   plural singular 
##   0.8269   0.8036
# Means across singular and plural test scenes
round(mean(Means_nov.df$semantically_correct),4)
## [1] 0.8152
b = lizCenter(aggregated.means_nov, list("noun_type_test"))

#Run the model:
b1 = glmer(semantically_correct ~ noun_type_test.ct + (noun_type_test.ct|pt_code), family =binomial, control=glmerControl(optimizer = "bobyqa"), data = b)

round(summary(b1)$coefficients,3)
##                   Estimate Std. Error z value Pr(>|z|)
## (Intercept)          2.576      0.928   2.775    0.006
## noun_type_test.ct    0.544      1.620   0.335    0.737
#There is no evidence that children perform differently with singular/plural scenes. The intercept (81.52%) correct is significantly above from chance.

Grammaticality judgment performance for the alternating noun

#Select judgment data for the alternating noun:
judgmentdata1.df = subset(judgmentdata.df, noun_type_training3 == "alternating")

#Calculate average ratings for semantically correct and incorrect trials featuring the alternating noun:
judgmentdata1.df_aggregated = summarySEwithin(judgmentdata1.df, measurevar="rating_original", withinvars= c("noun_type_test", "semantically_correct"), idvar="pt_code", na.rm=FALSE, conf.interval=.95) 

# Means separately for singular and plural test scenes
round(tapply(judgmentdata1.df_aggregated $rating_original,judgmentdata1.df_aggregated $semantically_correct, mean),2)
##   no  yes 
## 2.58 4.28
# Means across singular and plural test scenes:
round(tapply(judgmentdata1.df_aggregated$rating_original, list(judgmentdata1.df_aggregated$semantically_correct,judgmentdata1.df_aggregated$noun_type_test), mean),2)
##     plural singular
## no    2.50     2.67
## yes   4.11     4.44
#Center variables of interest using the lizCenter function:
judgmentdata1.df = lizCenter(judgmentdata1.df, list("noun_type_test","semantically_correct"))

#Run the lme:
c = lmer(rating_original ~ semantically_correct.ct * noun_type_test.ct + (noun_type_test.ct|pt_code), data = judgmentdata1.df)

round(summary(c)$coefficients,3)
##                                           Estimate Std. Error t value
## (Intercept)                                  3.431      0.234  14.660
## semantically_correct.ct                      1.694      0.219   7.745
## noun_type_test.ct                            0.250      0.221   1.131
## semantically_correct.ct:noun_type_test.ct    0.167      0.438   0.381
lmedrop(c,"semantically_correct.ct")
## Data: judgmentdata1.df
## Models:
## model.dropped: rating_original ~ noun_type_test.ct + (noun_type_test.ct | pt_code) + 
## model.dropped:     semantically_correct.ct:noun_type_test.ct
## model: rating_original ~ semantically_correct.ct * noun_type_test.ct + 
## model:     (noun_type_test.ct | pt_code)
##               Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
## model.dropped  7 407.72 426.49 -196.86   393.72                         
## model          8 362.52 383.97 -173.26   346.52 47.202      1  6.405e-12
##                  
## model.dropped    
## model         ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# consistent evidence with production that children have learnt the semantics

lmedrop(c, "semantically_correct.ct:noun_type_test.ct")
## Data: judgmentdata1.df
## Models:
## model.dropped: rating_original ~ semantically_correct.ct + noun_type_test.ct + 
## model.dropped:     (noun_type_test.ct | pt_code)
## model: rating_original ~ semantically_correct.ct * noun_type_test.ct + 
## model:     (noun_type_test.ct | pt_code)
##               Df    AIC    BIC  logLik deviance Chisq Chi Df Pr(>Chisq)
## model.dropped  7 360.67 379.44 -173.33   346.67                        
## model          8 362.52 383.97 -173.26   346.52  0.15      1     0.6986
# no evidence of an interaction with type of test scenes (singular/plural) 

Grammaticality judgment performance for the novel noun

#Select judgment data for the novel noun:
judgmentdata_novel.df = subset(judgmentdata.df, noun_type_training3 == "novel2")

#Calculate average ratings for semantically correct and incorrect trials featuring the novel noun:
judgmentdata_novel.df_aggregated = summarySEwithin(judgmentdata_novel.df, measurevar="rating_original", withinvars= c("noun_type_test", "semantically_correct"), idvar="pt_code", na.rm=FALSE, conf.interval=.95) 

# Means separately for singular and plural test scenes
round(tapply(judgmentdata_novel.df_aggregated $rating_original,judgmentdata_novel.df_aggregated$semantically_correct, mean),2)
##   no  yes 
## 2.81 4.38
# Means across singular and plural test scenes:
round(tapply(judgmentdata_novel.df_aggregated$rating_original, list(judgmentdata_novel.df_aggregated$semantically_correct,judgmentdata_novel.df_aggregated$noun_type_test), mean),2)
##     plural singular
## no    2.94     2.67
## yes   4.19     4.56
#Center variables of interest using the lizCenter function:
judgmentdata_novel.df = lizCenter(judgmentdata_novel.df, list("noun_type_test","semantically_correct"))

#Run the lme:
d = lmer(rating_original ~ semantically_correct.ct * noun_type_test.ct + (noun_type_test.ct|pt_code), data = judgmentdata_novel.df)

round(summary(d)$coefficients,3)
##                                           Estimate Std. Error t value
## (Intercept)                                  3.590      0.221  16.252
## semantically_correct.ct                      1.569      0.232   6.764
## noun_type_test.ct                            0.042      0.249   0.167
## semantically_correct.ct:noun_type_test.ct    0.639      0.464   1.377
lmedrop(d,"semantically_correct.ct")
## Data: judgmentdata_novel.df
## Models:
## model.dropped: rating_original ~ noun_type_test.ct + (noun_type_test.ct | pt_code) + 
## model.dropped:     semantically_correct.ct:noun_type_test.ct
## model: rating_original ~ semantically_correct.ct * noun_type_test.ct + 
## model:     (noun_type_test.ct | pt_code)
##               Df    AIC    BIC  logLik deviance Chisq Chi Df Pr(>Chisq)
## model.dropped  7 408.62 427.39 -197.31   394.62                        
## model          8 373.35 394.81 -178.68   357.35 37.27      1  1.029e-09
##                  
## model.dropped    
## model         ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# consistent evidence with production/grammaticality judgments for the alternating noun that children have learnt the semantics

lmedrop(d, "semantically_correct.ct:noun_type_test.ct")
## Data: judgmentdata_novel.df
## Models:
## model.dropped: rating_original ~ semantically_correct.ct + noun_type_test.ct + 
## model.dropped:     (noun_type_test.ct | pt_code)
## model: rating_original ~ semantically_correct.ct * noun_type_test.ct + 
## model:     (noun_type_test.ct | pt_code)
##               Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
## model.dropped  7 373.27 392.05 -179.64   359.27                         
## model          8 373.35 394.81 -178.68   357.35 1.9239      1     0.1654
# no evidence of an interaction with type of test scenes (singular/plural) 

4. Prediction 1a (preemption condition): Effect of preemption on children’s ratings of witnessed/unwitnessed forms

Planned frequentist analyses for main child study

We will use Bayesian (linear) mixed effect models. The dependent variable in these analyses is the mean rating participants give to restricted nouns. There are two predictors: a factor reflecting whether a trial has been witnessed (attested) with that determiner during training (attested) or not (unattested); and the control factor noun type (plural1-only noun vs. plural2-only noun).

Planned Bayes Factor Analyses for main child study

Do children prefer attested over unattested sentences (for nouns that were restricted to one determiner during training) in the preemption condition?

To assess the individual effect of preemption on participants’ ratings for attested over unattested sentences we need:

  1. Summary of data for each condition: mean and SE for main effect of the “attested_unattested.ct” variable (capturing if a sentence has been attested during training) from bayesian lmes in this condition.

  2. Value to inform H1 for each condition: mean of theory = 0; roughly expected maximum rating difference between attested and unattested sentences: 1.00 [ This maximum reflects previous work suggesting that, when adults are rating novel verbs, the biggest difference you get between “grammatical” and “ungrammatical” forms is about 1 point on the five point scale]. As outlined in “Note on data analyses”, the SD will be set to half of these max value, i.e., SD = 0.5

5. Prediction 2a (entrenchment condition): Effect of entrenchment on children’s ratings of witnessed/unwitnessed forms

Planned frequentist analyses for main child study

We will use Bayesian (linear) mixed effect models. The dependent variable in these analyses is the mean rating participants give to restricted nouns. There are two predictors: a factor reflecting whether a trial has been witnessed (attested) with that determiner during training (attested) or not (unattested); and the control factor noun type (plural1-only noun vs. plural2-only noun).

Planned Bayes Factor Analyses for main child study

Do children prefer attested over unattested sentences (for nouns that were restricted to one determiner during training) in the entrenchment condition?

To assess the individual effect of each condition on participants’ ratings for attested over unattested sentences we need:

  1. Summary of data for each condition: main effect of the “attested_unattested.ct” variable (capturing if a sentence has been attested during training) from bayesian lmes in this condition.

  2. Value to inform H1 for each condition: mean of theory = 0; roughly expected rating difference between attested and unattested sentences from the entrenchment pilot data with children below: 0.5

Relevant pilot data (using lmes; main analyses will use brms)

#firstly, we need to exclude the participant who did not meet the criterion of learning the semantics
attested_vs_unattested = subset(judgmentdata.df, pt_code != "ch8", header=TRUE)

# selected the two nonalternating nouns
attested_vs_unattested = subset(attested_vs_unattested, restricted_nouns == "yes")

#Center variables of interest using the lizCenter function:
d_attested_unattested = lizCenter(attested_vs_unattested, list("noun_type_training","unattested_attested"))

a = lmer(rating_original ~ noun_type_training.ct * unattested_attested.ct + (noun_type_training.ct * unattested_attested.ct|pt_code), data = d_attested_unattested)

round(summary(a)$coefficients,3)
##                                              Estimate Std. Error t value
## (Intercept)                                     3.250      0.168  19.362
## noun_type_training.ct                          -0.034      0.221  -0.153
## unattested_attested.ct                         -0.569      0.235  -2.420
## noun_type_training.ct:unattested_attested.ct   -0.261      0.471  -0.554
lmedrop(a,"unattested_attested.ct")
## Data: d_attested_unattested
## Models:
## model.dropped: rating_original ~ noun_type_training.ct + (noun_type_training.ct * 
## model.dropped:     unattested_attested.ct | pt_code) + noun_type_training.ct:unattested_attested.ct
## model: rating_original ~ noun_type_training.ct * unattested_attested.ct + 
## model:     (noun_type_training.ct * unattested_attested.ct | pt_code)
##               Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)  
## model.dropped 14 798.63 844.95 -385.32   770.63                           
## model         15 794.80 844.42 -382.40   764.80 5.8345      1    0.01571 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#there is a significant effect of attested > unattested in the entrenchment condition

lmedrop(a, "noun_type_training.ct:unattested_attested.ct")
## Data: d_attested_unattested
## Models:
## model.dropped: rating_original ~ noun_type_training.ct + unattested_attested.ct + 
## model.dropped:     (noun_type_training.ct * unattested_attested.ct | pt_code)
## model: rating_original ~ noun_type_training.ct * unattested_attested.ct + 
## model:     (noun_type_training.ct * unattested_attested.ct | pt_code)
##               Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
## model.dropped 14 793.11 839.43 -382.56   765.11                         
## model         15 794.80 844.42 -382.40   764.80 0.3119      1     0.5765
# No interaction with the different type of nouns

6. Prediction 1b (preemption condition): Effect of preemption on children’s test productions of witnessed/unwitnessed forms [exploratory analyses]

The question addressed here is: Are children producing witnessed over unwitnessed sentences more frequently than expected by chance (33%) in the preemption condition? These analyses will be treated as are exploratory analyses since we lack a theoretically informed estimate (or an estimate of a plausible maximum) regarding the production difference between attested over unattested sentences in children in this condition. Thus, we will not follow-up our bayesian mixed effect analyses with BF analyses.

Planned frequentist analyses for main child study

We will use Bayesian (logistic) mixed effect models. The dependent variable in these analyses is the proportion of time children produce witnessed (i.e., attested) over unwitnessed (i.e., unattested) sentences in response to the two restricted nouns: these are the two nouns, that during training, only occured with one of the two plural determiners (referred to as plural1-only and plural2- only nouns). There is one predictor: noun_type_training (i.e., plural1-only noun vs. plural2-only noun)

Planned Bayes Factor Analyses for main child study

No planned Bayes Factor analyses

7. Prediction 2b (entrenchment condition): Effect of entrenchment on children’s test productions of witnessed/unwitnessed forms

The question addressed here is: Are children producing witnessed over unwitnessed sentences more frequently than expected by chance (33%) in the entrenchment condition? These analyses will be treated as are exploratory analyses since a) it is somewhat unclear which unwittnessed forms children may produce here (e.g., the singular determiners vs. nothing vs. something else). Thus, we will not follow-up our bayesian mixed effect analyses with BF analyses.

Planned frequentist analyses for main child study

We will use Bayesian (logistic) mixed effect models. The dependent variable in these analyses is the proportion of time children produce attested over unattested sentences in response to the two restricted nouns. There is one predictor: noun_type_training (i.e., plural1-only-in-principle noun vs. plural2-only-in-principle noun)

Planned Bayes Factor Analyses for main child study

No planned Bayes Factor analyses

Relevant pilot data (using lmes; main analyses will use brms)

# Select data
prod_data.df <- read.csv("production.csv", header=TRUE)

#firstly, we need to exclude the participant who did not meet the criterion of learning the semantics
attested_vs_unattested_prod = subset(prod_data.df, pt_code != "ch8", header=TRUE)

# selected data for the two restricted (nonalternating) nouns
attested_vs_unattested_prod = subset(attested_vs_unattested_prod, restricted_nouns == "yes")

#Calculate proportion of witnessed (attested)/unwitnessed (unattested) sentences produced for each of the two restricted nouns (unwittnessed plural1_only; unwittnessed plural2_only)

aggregated.means_attested_unattested = aggregate(attested_unattested ~ noun_type_training + pt_code, attested_vs_unattested_prod, FUN=mean)

Means_attested_unattested.df<- summarySEwithin(aggregated.means_attested_unattested, measurevar="attested_unattested", withinvars= c("noun_type_training"), na.rm=FALSE, conf.interval=.95)

round(tapply(Means_attested_unattested.df$attested_unattested, Means_attested_unattested.df$noun_type_training, mean),4)
##  alternating       novel1 plural1_only plural2_only 
##           NA           NA       0.5757       0.5956
round(mean(Means_attested_unattested.df$attested_unattested),4)
## [1] 0.5857
#children produce attested trials 58.57% of the time

#Center variables of interest using the lizCenter function:
d_attested_unattested_prod = lizCenter(attested_vs_unattested_prod, list("noun_type_training"))

a = glmer(attested_unattested ~ noun_type_training.ct +( noun_type_training.ct|pt_code), family =binomial, control=glmerControl(optimizer = "bobyqa"), data = d_attested_unattested_prod)

round(summary(a)$coefficients,3)
##                       Estimate Std. Error z value Pr(>|z|)
## (Intercept)              0.326      0.118   2.767    0.006
## noun_type_training.ct    0.110      0.237   0.466    0.641

8. Predictions 3a & 3b: Effect of preemption vs. entrenchment on chidren’s test productions and ratings of of witnessed/unwitnessed forms

Planned frequentist analyses for main child study

  • Judgment ratings for for restricted nouns [key analyses]: We will use Bayesian (logistic) mixed effect models. The dependent variable in these analyses is the mean rating participants give to restricted nouns. There are three predictors: condition (i.e., entrenchment versus preemption); a factor reflecting whether a trials has been attested with that determiner during training (attested) or not (unattested); and the control factor noun type training (plural1-only vs. plural2-only)

  • Production performance for restricted nouns [exploratory analyses]: We will use Bayesian (linear) mixed effect models. The dependent variable in these analyses is the proportion of time children produce attested over unattested sentences in response to the two restricted nouns. There are two predictors: condition (i.e., entrenchment versus preemption); and the control factor noun type training (plural1-only vs. plural2-only)

Planned Bayes Factor analyses for main child study

Judgment ratings: Do children prefer attested over unattested sentences for nouns that were restricted to one determiner in the training input more strongly in the preemption condition relative to the entrenchment condition?

To assess the effect of preemption vs. entrenchment on ratings for attested vs. unattested nouns, we need:

  1. Summary of data: mean and SE for the interaction between condition and the variable capturing if a sentence has been attested during training (attested_unattested.ct) from bayesian lmes.

  2. Value to inform H1: mean of theory = 0; roughly expected maximum rating difference between attested and unattested nouns (given the effect sizes specified under Prediction 1a and Prediction 2a for preemption and entrenchment, respectively): 1 - 0.5 = 0.5. As outlined in “Note on data analyses”, the SD will be set to half of these max value, i.e., SD = 0.25

Production data: In addition to the key analyses on children’s judgment ratings, we will also assess children’s productions, although it is somewhat unclear how the different possible unwitnessed responses (e.g., producing nothing/other/the singular determiner for entrenchment vs. producing nothing/other/the unattested plural/the singular determiner for preemption) map onto the predictions of entrenchment vs. preemption. We also lack a theoretically informed estimate (or an estimate of a plausible maximum) regarding the production difference between attested over unattested sentences in children in the preemption condition. Given the lack exploratory nature of these analyses, we will not carry out BF analyses.

9. Other predictions regarding control/manipulation-check test trials

#exclude participant 8 who did not learn the semantics
productiondata1.df <- read.csv("production.csv", header=TRUE)
productiondata1.df <- subset(productiondata1.df, pt_code != "ch8", header=TRUE)

#code excluded (det_other/none)

productiondata1.df$det_excluded <- 0
productiondata1.df$det_excluded[productiondata1.df$det_lenient_adapted == "other"] <- 1

productiondata1.df$det_excluded[productiondata1.df$det_lenient_adapted == "none"] <- 1

#code det_singular
productiondata1.df$det_singular <- 0
productiondata1.df$det_singular[productiondata1.df$det_lenient_adapted == "det singular"] <- 1

#code det_plural1
productiondata1.df$det_plural1 <- 0
productiondata1.df$det_plural1[productiondata1.df$det_lenient_adapted == "det plural1"] <- 1                                      

#code det_plural2
productiondata1.df$det_plural2 <- 0
productiondata1.df$det_plural2[productiondata1.df$det_lenient_adapted == "det plural2"] <- 1                                     

#turn long format  
productiondata1.long.df <- melt(productiondata1.df, id.vars=c("pt_code", "trial_code", "noun_type_training3", "noun_type_test"),
                                       measure.vars=c("det_excluded", "det_singular","det_plural1","det_plural2"), variable.name="det_produced", value.name="measurement")

aggregated.production.LONG.df = aggregate(measurement ~ det_produced + noun_type_training3 + noun_type_test, productiondata1.long.df, FUN=mean)

round(with(productiondata1.long.df, tapply(measurement, list(det_produced, noun_type_test, noun_type_training3), mean, na.rm=T)),4)
## , , alternating
## 
##              plural singular
## det_excluded 0.0147   0.0000
## det_singular 0.0000   0.8235
## det_plural1  0.5441   0.0882
## det_plural2  0.4412   0.0882
## 
## , , novel1
## 
##              plural singular
## det_excluded 0.0000   0.0577
## det_singular 0.1731   0.8077
## det_plural1  0.3462   0.0385
## det_plural2  0.4808   0.0962
## 
## , , plural_form_unwittnessed
## 
##              plural singular
## det_excluded 0.0329   0.0197
## det_singular 0.1579   0.9737
## det_plural1  0.5066   0.0000
## det_plural2  0.3026   0.0066

The descriptives above suggest that children happily produce the singular determiner with singular scenes for all three trained nouns and the novel noun; they also produce both plural determiners for the alternating noun and the novel noun (as well as the restricted nouns), although det plural1 and det plural2 productions are somewhat unbalanced. This difference is probably spurious, given that we have 2 possible assignments for det plural 1 and det plural 2 such that: list 1: det plural 1 = tid; det plural 2 = kem list 2: det plural 1 = bup; det plural 2 = tid

Therefore, these differences do not reflect one of them being phonologically easier.