General notes on update
Load packages and helper functions
- Packages
- Helper functions
Planned Data analysis exemplified with pilot data

General notes on update

This is an update on the child artificial grammar learning study preregistered at http://rpubs.com/AnnaSamara/429816. The study sought to compare the preemption and entrenchment accounts of morphosyntactic generalization in children and was inspired from an earlier experiment with adults (https://rpubs.com/AnnaSamara/458000).

A key difference between the planned experiment with children relative to the adult study concerns the language design: The adult experiment featured restrictions between verbs and particles denoting different argument structure constructions (e.g., transitive vs. intransive inchoative) whereas the child-appropriate experiment featured onomatopoeic nouns (e.g., mouse = squeako) paired with singular and plural noun determiners (i.e., squeako kem = one mouse vs. squeako gos = 12/13 mice). (The rationale for this modification is outlined at http://rpubs.com/AnnaSamara/429816).

Beyond this critical design difference, some further (exploratory) changes were also introduced to the adapted child language. For example, (unlike the adult study) a subset of singular test trials (not contributing to analysis) were included in the preemption condition, there were two possible markers of plularity etc.

Analyses over preliminary data collected in 2018-2019 suggest that these additional modifications reduced item-based power in our tests. We have therefore decided to abort the study and to preregister a new onomatopeic noun study that is appropriate for use with child participants but replicates closer the design of the experiment conducted with adults preregistered at https://rpubs.com/AnnaSamara/458000

Changes will be implemented in future child studies as of 10/10/2019.

Design of new onomatopeic noun study

We will run a closely matched design to the verb adult study preregistered at https://rpubs.com/AnnaSamara/458000.

All aspects of training will be identical except that:

The verbs chila, coomo, panjol, roosa, tombat and will be replaced with the onomatopeic nouns squeako, oinko, moo-o, purro, and woofo.
“Kem” (denoting verb in causative construction - in one counterbalanced version of the verb experiment) will now indicate singularity and “gos” (denoting verb in untrasitive inchoative construction - in one counterbalanced version of the verb experiment) will indicate plurality.

All test phase aspects of the design will be also identical to https://rpubs.com/AnnaSamara/458000 except that:

Grammaticality judgments in entrenchment will only feature semantically correct trials (n=32). That is, we will remove and replace trials where participants see multiple animals and hear singular determiners (and vice versa) that were previously in the trial set but was not contributing to the analyses. This change increases item-based power and balance out the number of trials contributing to test analysis in the preemption and entrenchment conditions.

Frequentist data analyses

We will take the same approach as detailed at http://rpubs.com/AnnaSamara/429816

BF analyses

We will take the same approach as detailed at http://rpubs.com/AnnaSamara/429816. Priors for particular hypothesis are exemplified in the pilot analyses below.

Notes on pilot data

The models of the pilot data presented below serve to exemplify the approach to be taken with the current data. Pilot data were collected from 17 Year 1 children (native English speakers) tested on the same version of the experiment.

Notes on criteria for participant and trial exclusion

Identical to http://rpubs.com/AnnaSamara/429816

Notes on criteria for sample size determination

Identical to http://rpubs.com/AnnaSamara/429816

Load packages and helper functions

Packages

library(akima)
library(brms)
library(compute.es)
library(cowplot)
library(doBy)
library(ggplot2)
library(knitr)
library(languageR)
library(lattice)
library(lme4)
library(multcomp)
library(nlme)
library(pastecs)
library(plotrix)
library(plyr)
library(psych)
library(Rcpp)
library(reshape2)

theme_set(theme_bw())

Helper functions

SummarySE

This function can be found on the website “Cookbook for R”.

http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_(ggplot2)/#Helper functions

It summarizes data, giving count, mean, standard deviation, standard error of the mean, and confidence intervals (default 95%).

data: a data frame.

measurevar: the name of a column that contains the variable to be summariezed

groupvars: a vector containing names of columns that contain grouping variables

na.rm: a boolean that indicates whether to ignore NA’s

conf.interval: the percent range of the confidence interval (default is 95%)

summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=FALSE,
                      conf.interval=.95, .drop=TRUE) {
    require(plyr)

    # New version of length which can handle NA's: if na.rm==T, don't count them
    length2 <- function (x, na.rm=FALSE) {
        if (na.rm) sum(!is.na(x))
        else       length(x)
    }

    # This does the summary. For each group's data frame, return a vector with
    # N, mean, and sd
    datac <- ddply(data, groupvars, .drop=.drop,
      .fun = function(xx, col) {
        c(N    = length2(xx[[col]], na.rm=na.rm),
          mean = mean   (xx[[col]], na.rm=na.rm),
          sd   = sd     (xx[[col]], na.rm=na.rm)
        )
      },
      measurevar
    )

    # Rename the "mean" column    
    datac <- rename(datac, c("mean" = measurevar))

    datac$se <- datac$sd / sqrt(datac$N)  # Calculate standard error of the mean

    # Confidence interval multiplier for standard error
    # Calculate t-statistic for confidence interval: 
    # e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1
    ciMult <- qt(conf.interval/2 + .5, datac$N-1)
    datac$ci <- datac$se * ciMult

    return(datac)
}

SummarySEwithin

This function can be found on the website “Cookbook for R”.

http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_(ggplot2)/#Helper functions

From the website:

It summarizes data, handling within-subjects variables by removing inter-subject variability. It will still work if there are no within-S variables. It gives count, un-normed mean, normed mean (with same between-group mean), standard deviation, standard error of the mean, and confidence intervals. If there are within-subject variables, calculate adjusted values using method from Morey (2008).

data: a data frame. measurevar: the name of a column that contains the variable to be summariezed betweenvars: a vector containing names of columns that are between-subjects variables withinvars: a vector containing names of columns that are within-subjects variables idvar: the name of a column that identifies each subject (or matched subjects) na.rm: a boolean that indicates whether to ignore NA’s conf.interval: the percent range of the confidence interval (default is 95%)

summarySEwithin <- function(data=NULL, measurevar, betweenvars=NULL, withinvars=NULL,
                            idvar=NULL, na.rm=FALSE, conf.interval=.95, .drop=TRUE) {

  # Ensure that the betweenvars and withinvars are factors
  factorvars <- vapply(data[, c(betweenvars, withinvars), drop=FALSE],
    FUN=is.factor, FUN.VALUE=logical(1))

  if (!all(factorvars)) {
    nonfactorvars <- names(factorvars)[!factorvars]
    message("Automatically converting the following non-factors to factors: ",
            paste(nonfactorvars, collapse = ", "))
    data[nonfactorvars] <- lapply(data[nonfactorvars], factor)
  }

  # Get the means from the un-normed data
  datac <- summarySE(data, measurevar, groupvars=c(betweenvars, withinvars),
                     na.rm=na.rm, conf.interval=conf.interval, .drop=.drop)

  # Drop all the unused columns (these will be calculated with normed data)
  datac$sd <- NULL
  datac$se <- NULL
  datac$ci <- NULL

  # Norm each subject's data
  ndata <- normDataWithin(data, idvar, measurevar, betweenvars, na.rm, .drop=.drop)

  # This is the name of the new column
  measurevar_n <- paste(measurevar, "_norm", sep="")

  # Collapse the normed data - now we can treat between and within vars the same
  ndatac <- summarySE(ndata, measurevar_n, groupvars=c(betweenvars, withinvars),
                      na.rm=na.rm, conf.interval=conf.interval, .drop=.drop)

  # Apply correction from Morey (2008) to the standard error and confidence interval
  #  Get the product of the number of conditions of within-S variables
  nWithinGroups    <- prod(vapply(ndatac[,withinvars, drop=FALSE], FUN=nlevels,
                           FUN.VALUE=numeric(1)))
  correctionFactor <- sqrt( nWithinGroups / (nWithinGroups-1) )

  # Apply the correction factor
  ndatac$sd <- ndatac$sd * correctionFactor
  ndatac$se <- ndatac$se * correctionFactor
  ndatac$ci <- ndatac$ci * correctionFactor

  # Combine the un-normed means with the normed results
  merge(datac, ndatac)
}

###normDataWithin This function is used by the SummarySEWithin function above. It can be found on the website “Cookbook for R”

http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_(ggplot2)/#Helper functions

From that website:

Norms the data within specified groups in a data frame; it normalizes each subject (identified by idvar) so that they have the same mean, within each group specified by betweenvars.

data: a data frame idvar: the name of a column that identifies each subject (or matched subjects) measurevar: the name of a column that contains the variable to be summarized betweenvars: a vector containing names of columns that are between-subjects variables na.rm: a boolean that indicates whether to ignore NA’s

normDataWithin <- function(data=NULL, idvar, measurevar, betweenvars=NULL,
              na.rm=FALSE, .drop=TRUE) {
  #library(plyr)
  # Measure var on left, idvar + between vars on right of formula.
  data.subjMean <- ddply(data, c(idvar, betweenvars), .drop=.drop,
   .fun = function(xx, col, na.rm) {
    c(subjMean = mean(xx[,col], na.rm=na.rm))
   },
   measurevar,
   na.rm
  )
  # Put the subject means with original data
  data <- merge(data, data.subjMean)
  # Get the normalized data in a new column
  measureNormedVar <- paste(measurevar, "_norm", sep="")
  data[,measureNormedVar] <- data[,measurevar] - data[,"subjMean"] +
                mean(data[,measurevar], na.rm=na.rm)
  # Remove this subject mean column
  data$subjMean <- NULL
  return(data)
}

myCenter

This function outputs the centered values of an variable, which can be a numeric variable, a factor, or a data frame. It was taken from Florian Jaegers blog https://hlplab.wordpress.com/2009/04/27/centering-several-variables/.

From his blog:

-If the input is a numeric variable, the output is the centered variable.

-If the input is a factor, the output is a numeric variable with centered factor level values. That is, the factor’s levels are converted into numerical values in their inherent order (if not specified otherwise, R defaults to alphanumerical order). More specifically, this centers any binary factor so that the value below 0 will be the 1st level of the original factor, and the value above 0 will be the 2nd level.

-If the input is a data frame or matrix, the output is a new matrix of the same dimension and with the centered values and column names that correspond to the colnames() of the input preceded by “c” (e.g. “Variable1” will be “cVariable1”).

myCenter= function(x) {
  if (is.numeric(x)) { return(x - mean(x, na.rm=T)) }
    if (is.factor(x)) {
        x= as.numeric(x)
        return(x - mean(x, na.rm=T))
    }
    if (is.data.frame(x) || is.matrix(x)) {
        m= matrix(nrow=nrow(x), ncol=ncol(x))
        colnames(m)= paste("c", colnames(x), sep="")
    
        for (i in 1:ncol(x)) {
        
            m[,i]= myCenter(x[,i])
        }
        return(as.data.frame(m))
    }
}

lizCenter

This function provides a wrapper around myCenter allowing you to center a specific list of variables from a dataframe. The input is a dataframe (x) and a list of the names of the variables which you wish to center (listfname). The output is a copy of the dataframe with a column (numeric) added for each of the centered variables with each one labelled with it’s previous name with “.ct” appended. For example, if x is a dataframe with columns “a” and “b” lizCenter(x, list(“a”, “b”)) will return a dataframe with two additional columns, a.ct and b.ct, which are numeric, centered codings of the corresponding variables.

lizCenter= function(x, listfname) 
{
    for (i in 1:length(listfname)) 
    {
        fname = as.character(listfname[i])
        x[paste(fname,".ct", sep="")] = myCenter(x[fname])
    }
        
    return(x)
}

###Bf

This function is equivalent to the Dienes (2008) calculator which can be found here: http://www.lifesci.sussex.ac.uk/home/Zoltan_Dienes/inference/Bayes.htm.

The code was provided by Baguely and Kayne (2010) and can be found here: http://www.academia.edu/427288/Review_of_Understanding_psychology_as_a_science_An_introduction_to_scientific_and_statistical_inference

Bf<-function(sd, obtained, uniform, lower=0, upper=1, meanoftheory=0,sdtheory=1, tail=2){
 area <- 0
 if(identical(uniform, 1)){
 theta <- lower
 range <- upper - lower
 incr <- range / 2000
 for (A in -1000:1000){
   theta <- theta + incr
   dist_theta <- 1 / range
   height <- dist_theta * dnorm(obtained, theta, sd)
   area <- area + height * incr
 }
 }else
  {theta <- meanoftheory - 5 * sdtheory
  incr <- sdtheory / 200
  for (A in -1000:1000){
   theta <- theta + incr
   dist_theta <- dnorm(theta, meanoftheory, sdtheory)
   if(identical(tail, 1)){
    if (theta <= 0){
     dist_theta <- 0
    } else {
     dist_theta <- dist_theta * 2
    }
   }
   height <- dist_theta * dnorm(obtained, theta, sd)
   area <- area + height * incr
  }
 }
 LikelihoodTheory <- area
 Likelihoodnull <- dnorm(obtained, 0, sd)
 BayesFactor <- LikelihoodTheory / Likelihoodnull
 ret <- list("LikelihoodTheory" = LikelihoodTheory,"Likelihoodnull" = Likelihoodnull, "BayesFactor" = BayesFactor)
 ret
}

###Bf_powercalc

This works with the Bf function above. It requires the same values as that function (i.e. the obtained mean and SE for the current sample, a value for the predicted mean, which is set to be sdtheory (with meanoftheory=0), and the current number of participants N). However, rather than return a BF for the current sample, it works out what the BF would be for a range of different subject numbers (assuming that the SE scales with sqrt(N)).

Bf_powercalc<-function(sd, obtained, uniform, lower=0, upper=1, meanoftheory=0, sdtheory=1, tail=2, N, min, max)
{
 
 x = c(0)
 y = c(0)
 
 for(newN in min : max)
 {
 B = as.numeric(Bf(sd = sd*sqrt(N/newN), obtained, uniform, lower, upper, meanoftheory, sdtheory, tail)[3])
 x= append(x,newN) 
 y= append(y,B)
 output = cbind(x,y)
 
 } 
 output = output[-1,] 
 return(output) 
}

lmedrop

Given an lmer model (model) and one of the coefficients, this returns p value for that coefficient using model comparison (i.e. comparing identical models with and without those terms). It can be used to get p-values when using lmer rather than gmler and dealing with continuous (rather than binonimal) dependent variables.

lmedrop<-function(model, term) {
  model.dropped<-update(model,eval(paste(".~.-",term)));
  anova(model.dropped,model) }

Planned Data analysis exemplified with pilot data

The pilot data below are used to exemplify the statistical approach to be taken for three key data analysis. Other exploratory analysis (for example, as detailed at http://rpubs.com/AnnaSamara/429816) may be also carried out.

#The dataframes production.df, grammaticality.df, and baseline.df contain pilot data from 17 children's performance on the production, grammaticality judgment and baseline tasks, respectively. 

productiondata.df <- read.csv("production.csv", header=TRUE)
judgmentdata.df <- read.csv("grammaticality.csv", header=TRUE) 
baseline.df <- read.csv("baseline.csv", header=TRUE)

Data exclusion

Participants’ performance on the baseline task

round(with(baseline.df, tapply(Correct, list(pt_code), mean, na.rm=T)),2)

##  r2_ch1 r2_ch11 r2_ch1P  r2_ch2 r2_ch2P  r2_ch3 r2_ch3P  r2_ch4 r2_ch4P 
##    0.83    1.00    1.00    0.67    1.00    1.00    1.00    1.00    0.67 
##  r2_ch5 r2_ch5P  r2_ch6 r2_ch7P  r2_ch8 r2_ch8P  r2_ch9 r2_ch9P 
##    1.00    1.00    1.00    1.00    0.83    1.00    1.00    0.83

No participants are excluded from further analyses in either condition

Trial exclusion (entrenchment condition)

#create appropriate dataset for production in the entrenchment condition
productiondata_entrenchment.df = subset(productiondata.df, condition == "entrenchment")

#code excluded (det_other/none)
productiondata_entrenchment.df$det_excluded <- 0
productiondata_entrenchment.df$det_excluded[productiondata_entrenchment.df$det_lenient_adapted == "other"] <- 1
productiondata_entrenchment.df$det_excluded[productiondata_entrenchment.df$det_lenient_adapted == "none"] <- 1

#code det included
productiondata_entrenchment.df$det_included <- 0
productiondata_entrenchment.df$det_included[productiondata_entrenchment.df$det_excluded==0] <- 1  

#turn long format  
productiondata_entrenchment.long.df <- melt(productiondata_entrenchment.df, id.vars=c("pt_code", "trial_code", "noun_type_training", "noun_type_test", "old_new"),
                                       measure.vars=c("det_excluded", "det_included"), variable.name="det_produced", value.name="measurement"
)

round(with(productiondata_entrenchment.long.df, tapply(measurement, list(det_produced), mean, na.rm=T)),2)

## det_excluded det_included 
##         0.01         0.99

1% of data excluded for entrenchment

Trial exclusion (preemption condition)

#create appropriate dataset for production in the entrenchment condition
productiondata_preemption.df = subset(productiondata.df, condition == "preemption")

#code excluded (det_other/none)
productiondata_preemption.df$det_excluded <- 0
productiondata_preemption.df$det_excluded[productiondata_preemption.df$det_lenient_adapted == "other"] <- 1
productiondata_preemption.df$det_excluded[productiondata_preemption.df$det_lenient_adapted == "none"] <- 1

#code det included
productiondata_preemption.df$det_included <- 0
productiondata_preemption.df$det_included[productiondata_preemption.df$det_excluded==0] <- 1  

#turn long format  
productiondata_preemption.long.df <- melt(productiondata_preemption.df, id.vars=c("pt_code", "trial_code", "noun_type_training", "noun_type_test", "old_new"),
                                       measure.vars=c("det_excluded", "det_included"), variable.name="det_produced", value.name="measurement"
)

round(with(productiondata_preemption.long.df, tapply(measurement, list(det_produced), mean, na.rm=T)),2)

## det_excluded det_included 
##         0.02         0.98

2% of data excluded for preemption

1. Baseline check: Have children in entrenchment picked up on the difference in meaning between the singular and plural determiner?

Exclusion of individual participants who perform below 60% in producing semantically appropriate responses for the alternating noun

(Note: In the main analyses, we will also assess the semantic appropriateness of children’s test-phase productions for the novel noun as well as their judgment score differences in semantically appropriate vs. inappropriate trials for the alternating and novel noun; however, we will not exclude further participants on the basis of these results. See http://rpubs.com/AnnaSamara/429816)

#create appropriate dataset for production in the entrenchment condition
productiondata_entrenchment_alternating.df = subset(productiondata_entrenchment.df, noun_type_training == "alternating")

productiondata_entrenchment_alternating.df = subset(productiondata_entrenchment_alternating.df, det_included == 1)

summarySEwithin(productiondata_entrenchment_alternating.df, measurevar="semantically_correct", withinvars= "noun_type_training", "pt_code", idvar="pt_code", na.rm=FALSE, conf.interval=.95)

##   pt_code noun_type_training N semantically_correct
## 1  r2_ch1        alternating 8                1.000
## 2 r2_ch11        alternating 8                1.000
## 3  r2_ch2        alternating 8                0.375
## 4  r2_ch3        alternating 8                1.000
## 5  r2_ch4        alternating 8                1.000
## 6  r2_ch5        alternating 8                1.000
## 7  r2_ch6        alternating 8                0.375
## 8  r2_ch8        alternating 8                1.000
## 9  r2_ch9        alternating 8                1.000
##   semantically_correct_norm        sd        se        ci
## 1                 0.8611111 0.0000000 0.0000000 0.0000000
## 2                 0.8611111 0.0000000 0.0000000 0.0000000
## 3                 0.8611111 0.5669467 0.2004459 0.4739793
## 4                 0.8611111 0.0000000 0.0000000 0.0000000
## 5                 0.8611111 0.0000000 0.0000000 0.0000000
## 6                 0.8611111 0.0000000 0.0000000 0.0000000
## 7                 0.8611111 0.5669467 0.2004459 0.4739793
## 8                 0.8611111 0.0000000 0.0000000 0.0000000
## 9                 0.8611111 0.0000000 0.0000000 0.0000000

Participants r2_ch2 and r2_ch6 will be excluded from further analysis

Planned frequentist analyses

#Exclude participants who are at chance in semantics:
productiondata_entrenchment_alternating.df = subset(productiondata_entrenchment_alternating.df, pt_code != "r2_ch2")

productiondata_entrenchment_alternating.df = subset(productiondata_entrenchment_alternating.df, pt_code != "r2_ch6")

#Center variables of interest using the lizCenter function:
d_prod_alt = lizCenter(productiondata_entrenchment_alternating.df, list("noun_type_test"))

#Calculate average percentage of semantically correct performance for the novel noun:
d_prod_alt_aggregated = summarySEwithin(d_prod_alt, measurevar="semantically_correct", withinvars= "noun_type_test", idvar="pt_code", na.rm=FALSE, conf.interval=.95) 

round(mean(d_prod_alt_aggregated$semantically_correct),2)

## [1] 1

#Calculate average percentage of semantically correct performance for the novel verb separately for singular and plural scenes:
round(tapply(d_prod_alt_aggregated$semantically_correct, d_prod_alt_aggregated$noun_type_test, mean),2)

##   plural singular 
##        1        1

No significant difference in producing singulars vs. plurals. Participants contributing to the analyses are 100% semanticall correct in the pilot data.

2 Prediction 2a (entrenchment condition): Effect of entrenchment on children’s ratings of witnessed/unwitnessed forms

Planned frequentist analyses

We will carry out 2 sets of analysis to investigate the effect of entrenchment on children’s ratings of witnessed/unwitnessed forms.

The first set of analyses is identical to those reported at http://rpubs.com/AnnaSamara/429816. The dependent variable in these analyses is the mean rating participants give to scenes featuring the restricted nouns. There are two predictors: a factor reflecting whether a trial has been witnessed (attested) with that determiner during training (attested) or not (unattested); and the control factor noun type at test (singular vs. plural).
[New KEY analyses] A stricter test of the effect of entrenchment will involve comparing children’s ratings of unwitnessed forms for restricted nouns (as above) against unwitnessed novel forms. There are two predictors: a factor reflecting whether a trial is unwitnessed-restricted vs. unwitnessed-novel; and the control factor noun type at test (singular vs. plural).

# (1)
attested_unattested_entrenchment1.df = subset(judgmentdata.df, condition == "entrenchment")

attested_unattested_entrenchment1.df = subset(attested_unattested_entrenchment1.df, pt_code != "r2_ch2")

attested_unattested_entrenchment1.df = subset(attested_unattested_entrenchment1.df, pt_code != "r2_ch6")

attested_unattested_entrenchment.df = subset(attested_unattested_entrenchment1.df, restricted_nouns == "yes")

#Center variables of interest using the lizCenter function:
d_attested_unattested_ent = lizCenter(attested_unattested_entrenchment.df , list("noun_type_test","attested_unattested"))

a = lmer(rating_original ~ noun_type_test.ct * attested_unattested.ct + (1|pt_code), data = d_attested_unattested_ent)

round(summary(a)$coefficients,3)

##                                          Estimate Std. Error t value
## (Intercept)                                 4.512      0.313  14.411
## noun_type_test.ct                           0.071      0.188   0.379
## attested_unattested.ct                     -0.881      0.188  -4.676
## noun_type_test.ct:attested_unattested.ct   -0.048      0.377  -0.126

lmedrop(a,"attested_unattested.ct")

## Data: d_attested_unattested_ent
## Models:
## model.dropped: rating_original ~ noun_type_test.ct + (1 | pt_code) + noun_type_test.ct:attested_unattested.ct
## model: rating_original ~ noun_type_test.ct * attested_unattested.ct + 
## model:     (1 | pt_code)
##               Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
## model.dropped  5 256.32 268.47 -123.16   246.32                         
## model          6 238.38 252.97 -113.19   226.38 19.931      1   8.03e-06
##                  
## model.dropped    
## model         ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note 1: Main data will be analysed using Bayesian (linear) mixed effect models.

Note 2: Random slopes for attested.unattested.ct and noun_type_test.ct have been removed to achieve convergence. Maximal random effect structure will be used in main data analyses.

# (2)

unattested_novel_entrenchment.df = subset(attested_unattested_entrenchment1.df, noun_type_training3 != "alternating")

unattested_novel_entrenchment.df = subset(unattested_novel_entrenchment.df, attested_unattested == "unattested" | noun_type_training3 == "novel2")

round(tapply(unattested_novel_entrenchment.df$rating_original, unattested_novel_entrenchment.df$restricted_nouns, mean),3)

##    no   yes 
## 4.143 4.071

#no- this is the novel: 4.142
#yes-this is the unattested: 4.071

#Center variables of interest using the lizCenter function:
unattested_novel_entrenchment.df = lizCenter(unattested_novel_entrenchment.df, list("restricted_nouns", "noun_type_test"))

a = lmer(rating_original ~ restricted_nouns.ct * noun_type_test.ct + (1|pt_code), data = unattested_novel_entrenchment.df)
summary(a)

## Linear mixed model fit by REML ['lmerMod']
## Formula: rating_original ~ restricted_nouns.ct * noun_type_test.ct + (1 |  
##     pt_code)
##    Data: unattested_novel_entrenchment.df
## 
## REML criterion at convergence: 200.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.5853 -0.1673  0.1134  0.2568  2.1029 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  pt_code  (Intercept) 1.9262   1.3879  
##  Residual             0.4604   0.6785  
## Number of obs: 84, groups:  pt_code, 7
## 
## Fixed effects:
##                                       Estimate Std. Error t value
## (Intercept)                            4.10714    0.52977   7.753
## restricted_nouns.ct                   -0.07143    0.14807  -0.482
## noun_type_test.ct                     -0.07143    0.14807  -0.482
## restricted_nouns.ct:noun_type_test.ct  0.23810    0.29614   0.804
## 
## Correlation of Fixed Effects:
##             (Intr) rstr_. nn_t_.
## rstrctd_nn. 0.000               
## nn_typ_tst. 0.000  0.000        
## rstrc_.:__. 0.000  0.000  0.000

#Note 1: Main data will be analysed using Bayesian (linear) mixed effect models.

#Note 2: Random slopes for restricted_nouns.ct and noun_type_test.ct have been removed to achieve convergence. Maximal random effect structure will be used in main data analyses.

Planned Bayes Factor analyses for main analyses

analysis 1

Summary of data for entrenchment condition: mean and SE for main effect of the “attested_unattested.ct” variable (capturing if a sentence has been attested during training) from bayesian lmes in this condition.
Value to inform H1 for entrenchment condition: mean of theory = 0; roughly expected maximum rating difference between attested and unattested sentences from our previous study with adults: 0.346 (n = 39; data collection is ongoing for n= 40). As outlined in “Note on data analyses” at http://rpubs.com/AnnaSamara/429816, the SD will be set to half of these max value, i.e., SD = 0.346/2

[New KEY analyses]

Summary of data for entrenchment condition: mean and SE for main effect of the variable capturing if a sentence was “unwitnessed restricted” or “unwitnessed novel” from bayesian lmes in this condition.
Value to inform H1 for entrenchment condition: mean of theory = 0; roughly expected maximum rating difference: analyses over the adult data suggest that we do not have evidence for H1 in this stricter analyses; thus, we will use the difference between attested and unattested sentences from our previous study with adults (0.346) as a roughly expected maximum rating difference in the analyses here. The SD will be set to half of these max value, i.e., SD = 0.346/2

#(1)
Bf(0.188, 0.881, uniform = 0, meanoftheory = 0, sdtheory = 0.346/2, tail = 1)

## $LikelihoodTheory
## [1] 0.008167542
## 
## $Likelihoodnull
## [1] 3.615408e-05
## 
## $BayesFactor
## [1] 225.9093

# substantial evidence for effect of entrenchment

#(2)
Bf(0.14807, 0.071, uniform = 0, meanoftheory = 0, sdtheory = 0.346/2, tail = 1)

## $LikelihoodTheory
## [1] 2.148154
## 
## $Likelihoodnull
## [1] 2.401684
## 
## $BayesFactor
## [1] 0.8944367

# inconclusive evidence for effect of entrenchment in stricter (key analysis)

Bf_powercalc(sd = 0.14807, obtained = 0.071, uniform = 0, meanoftheory=0, sdtheory=0.346/2, tail=1, N = 8, min = 100, max = 200)

##          x        y
##   [1,] 100 1.739890
##   [2,] 101 1.758331
##   [3,] 102 1.777013
##   [4,] 103 1.795940
##   [5,] 104 1.815114
##   [6,] 105 1.834539
##   [7,] 106 1.854216
##   [8,] 107 1.874149
##   [9,] 108 1.894342
##  [10,] 109 1.914797
##  [11,] 110 1.935518
##  [12,] 111 1.956507
##  [13,] 112 1.977769
##  [14,] 113 1.999305
##  [15,] 114 2.021121
##  [16,] 115 2.043218
##  [17,] 116 2.065601
##  [18,] 117 2.088273
##  [19,] 118 2.111238
##  [20,] 119 2.134499
##  [21,] 120 2.158060
##  [22,] 121 2.181924
##  [23,] 122 2.206096
##  [24,] 123 2.230579
##  [25,] 124 2.255378
##  [26,] 125 2.280495
##  [27,] 126 2.305935
##  [28,] 127 2.331703
##  [29,] 128 2.357802
##  [30,] 129 2.384236
##  [31,] 130 2.411009
##  [32,] 131 2.438127
##  [33,] 132 2.465593
##  [34,] 133 2.493411
##  [35,] 134 2.521587
##  [36,] 135 2.550124
##  [37,] 136 2.579028
##  [38,] 137 2.608302
##  [39,] 138 2.637952
##  [40,] 139 2.667983
##  [41,] 140 2.698398
##  [42,] 141 2.729204
##  [43,] 142 2.760405
##  [44,] 143 2.792006
##  [45,] 144 2.824012
##  [46,] 145 2.856429
##  [47,] 146 2.889262
##  [48,] 147 2.922516
##  [49,] 148 2.956196
##  [50,] 149 2.990308
##  [51,] 150 3.024858
##  [52,] 151 3.059851
##  [53,] 152 3.095293
##  [54,] 153 3.131189
##  [55,] 154 3.167546
##  [56,] 155 3.204370
##  [57,] 156 3.241666
##  [58,] 157 3.279440
##  [59,] 158 3.317700
##  [60,] 159 3.356450
##  [61,] 160 3.395698
##  [62,] 161 3.435450
##  [63,] 162 3.475712
##  [64,] 163 3.516492
##  [65,] 164 3.557795
##  [66,] 165 3.599628
##  [67,] 166 3.641999
##  [68,] 167 3.684914
##  [69,] 168 3.728381
##  [70,] 169 3.772407
##  [71,] 170 3.816998
##  [72,] 171 3.862163
##  [73,] 172 3.907909
##  [74,] 173 3.954243
##  [75,] 174 4.001173
##  [76,] 175 4.048707
##  [77,] 176 4.096852
##  [78,] 177 4.145617
##  [79,] 178 4.195010
##  [80,] 179 4.245039
##  [81,] 180 4.295712
##  [82,] 181 4.347038
##  [83,] 182 4.399025
##  [84,] 183 4.451682
##  [85,] 184 4.505018
##  [86,] 185 4.559041
##  [87,] 186 4.613761
##  [88,] 187 4.669186
##  [89,] 188 4.725327
##  [90,] 189 4.782192
##  [91,] 190 4.839790
##  [92,] 191 4.898133
##  [93,] 192 4.957228
##  [94,] 193 5.017087
##  [95,] 194 5.077718
##  [96,] 195 5.139133
##  [97,] 196 5.201342
##  [98,] 197 5.264354
##  [99,] 198 5.328182
## [100,] 199 5.392834
## [101,] 200 5.458323

#power analysis suggests that difference of 0.071 is substantial evidence against the null with 150 participants

2 Prediction 2b (preemption condition): Effect of preemption on children’s ratings of witnessed/unwitnessed forms

# (1)
attested_unattested_preemption1.df = subset(judgmentdata.df, condition == "preemption")

attested_unattested_preemption.df = subset(attested_unattested_preemption1.df, restricted_nouns == "yes")

#Center variables of interest using the lizCenter function:
d_attested_unattested_pre = lizCenter(attested_unattested_preemption.df, list("noun_type_test","attested_unattested"))

a = lmer(rating_original ~ noun_type_test.ct * attested_unattested.ct + (1|pt_code), data = d_attested_unattested_pre)

round(summary(a)$coefficients,3)

##                        Estimate Std. Error t value
## (Intercept)               3.667      0.205  17.857
## attested_unattested.ct   -2.542      0.214 -11.856

lmedrop(a,"attested_unattested.ct")

## Data: d_attested_unattested_pre
## Models:
## model.dropped: rating_original ~ noun_type_test.ct + (1 | pt_code) + noun_type_test.ct:attested_unattested.ct
## model: rating_original ~ noun_type_test.ct * attested_unattested.ct + 
## model:     (1 | pt_code)
##               Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
## model.dropped  3 380.79 388.48 -187.39   374.79                         
## model          4 298.17 308.43 -145.09   290.17 84.615      1  < 2.2e-16
##                  
## model.dropped    
## model         ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note 1: Main data will be analysed using Bayesian (linear) mixed effect models.

Note 2: Random slopes for attested.unattested.ct and noun_type_test.ct have been removed to achieve convergence. Maximal random effect structure will be used in main data analyses.

# (2)
unattested_novel_preemption.df = subset(attested_unattested_preemption1.df, noun_type_training3 != "alternating")

unattested_novel_preemption.df = subset(unattested_novel_preemption.df, attested_unattested == "unattested" | noun_type_training3 == "novel2")

round(tapply(unattested_novel_preemption.df$rating_original, unattested_novel_preemption.df$restricted_nouns, mean),3)

##    no   yes 
## 3.438 2.396

#no- this is the novel: 3.438
#yes-this is the unattested: 2.396

#Center variables of interest using the lizCenter function:
unattested_novel_preemption.df = lizCenter(unattested_novel_preemption.df, list("restricted_nouns", "noun_type_test"))

a = lmer(rating_original ~ restricted_nouns.ct * noun_type_test.ct + (1|pt_code), data = unattested_novel_preemption.df)
summary(a)

## Linear mixed model fit by REML ['lmerMod']
## Formula: rating_original ~ restricted_nouns.ct * noun_type_test.ct + (1 |  
##     pt_code)
##    Data: unattested_novel_preemption.df
## 
## REML criterion at convergence: 346.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.5024 -0.7454 -0.2459  0.6921  2.1008 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  pt_code  (Intercept) 0.7679   0.8763  
##  Residual             1.8807   1.3714  
## Number of obs: 96, groups:  pt_code, 8
## 
## Fixed effects:
##                     Estimate Std. Error t value
## (Intercept)           2.9167     0.3400   8.579
## restricted_nouns.ct  -1.0417     0.2799  -3.721
## 
## Correlation of Fixed Effects:
##             (Intr)
## rstrctd_nn. 0.000 
## fit warnings:
## fixed-effect model matrix is rank deficient so dropping 2 columns / coefficients

Note 1: Main data will be analysed using Bayesian (linear) mixed effect models.

Note 2: Random slopes for restricted_nouns.ct and noun_type_test.ct have been removed to achieve convergence. Maximal random effect structure will be used in main data analyses.

Planned Bayes Factor analyses for main analyses

analysis 1

Summary of data for preemption condition: mean and SE for main effect of the “attested_unattested.ct” variable (capturing if a sentence has been attested during training) from bayesian lmes in this condition.
Value to inform H1 for preemption condition: mean of theory = 0; roughly expected maximum rating difference between attested and unattested sentences from our previous study with adults: 2.516 (n = 38; data collection is ongoing for n= 40). As outlined in “Note on data analyses” at http://rpubs.com/AnnaSamara/429816, the SD will be set to half of these max value, i.e., SD = 2.516/2

[New KEY analyses]

Summary of data for preemption condition: mean and SE for main effect of the variable capturing if a sentence was unwitnessed restricted or unwitnessed novel from bayesian lmes in this condition.
Value to inform H1 for preemption condition: mean of theory = 0; roughly expected maximum rating difference from our previous study with adults: 0.599 (n = 38; data collection is ongoing for n= 40). As outlined in “Note on data analyses” at http://rpubs.com/AnnaSamara/429816, the SD will be set to half of these max value, i.e., SD = 0.599/2

#(1)      
Bf(0.214, 2.542 , uniform = 0, meanoftheory = 0, sdtheory = 2.516/2, tail = 1)

## $LikelihoodTheory
## [1] 0.08597398
## 
## $Likelihoodnull
## [1] 4.278241e-31
## 
## $BayesFactor
## [1] 2.009564e+29

# substantial evidence for effect of preemption

#(2)
Bf(0.2799, 1.0417, uniform = 0, meanoftheory = 0, sdtheory = 0.599/2, tail = 1)

## $LikelihoodTheory
## [1] 0.07683664
## 
## $Likelihoodnull
## [1] 0.00140027
## 
## $BayesFactor
## [1] 54.87273

# substantial evidence for effect of preemption in stricter (key analyses)

3 Prediction 3: Effect of preemption vs. entrenchment on chidren’s test ratings of of witnessed/unwitnessed forms

Planned frequentist analyses

As above, with the additional predictor: condition (i.e., entrenchment versus preemption)

# (1)

attested_unattested_conditions.df = subset(judgmentdata.df, pt_code != "r2_ch2")

attested_unattested_conditions.df = subset(attested_unattested_conditions.df, pt_code != "r2_ch6")

attested_unattested_conditions.df = subset(attested_unattested_conditions.df, restricted_nouns == "yes")

#Center variables of interest using the lizCenter function:
d_attested_unattested_cond = lizCenter(attested_unattested_conditions.df, list("condition","attested_unattested"))

a = lmer(rating_original ~ condition.ct * attested_unattested.ct + (1|pt_code), data = d_attested_unattested_cond)

round(summary(a)$coefficients,3)

##                                     Estimate Std. Error t value
## (Intercept)                            4.061      0.182  22.281
## condition.ct                          -0.845      0.365  -2.314
## attested_unattested.ct                -1.767      0.144 -12.303
## condition.ct:attested_unattested.ct   -1.661      0.288  -5.770

lmedrop(a,"condition.ct:attested_unattested.ct")

## Data: d_attested_unattested_cond
## Models:
## model.dropped: rating_original ~ condition.ct + attested_unattested.ct + (1 | 
## model.dropped:     pt_code)
## model: rating_original ~ condition.ct * attested_unattested.ct + (1 | 
## model:     pt_code)
##               Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
## model.dropped  5 561.80 577.76 -275.90   551.80                         
## model          6 533.13 552.29 -260.57   521.13 30.664      1  3.068e-08
##                  
## model.dropped    
## model         ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#(2)

unattested_novel_cond.df = subset(judgmentdata.df, pt_code != "r2_ch2")

unattested_novel_cond.df = subset(unattested_novel_cond.df, pt_code != "r2_ch6")

unattested_novel_cond.df = subset(unattested_novel_cond.df, noun_type_training3 != "alternating")

unattested_novel_cond.df = subset(unattested_novel_cond.df, attested_unattested == "unattested" | noun_type_training3 == "novel2")


#Center variables of interest using the lizCenter function:
unattested_novel_cond.df = lizCenter(unattested_novel_cond.df, list("restricted_nouns", "condition"))

a = lmer(rating_original ~ restricted_nouns.ct * condition.ct + (1|pt_code), data = unattested_novel_cond.df)
summary(a)

## Linear mixed model fit by REML ['lmerMod']
## Formula: 
## rating_original ~ restricted_nouns.ct * condition.ct + (1 | pt_code)
##    Data: unattested_novel_cond.df
## 
## REML criterion at convergence: 583.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2744 -0.6630  0.0260  0.4211  2.6428 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  pt_code  (Intercept) 1.303    1.142   
##  Residual             1.215    1.102   
## Number of obs: 180, groups:  pt_code, 15
## 
## Fixed effects:
##                                  Estimate Std. Error t value
## (Intercept)                        3.4722     0.3060  11.347
## restricted_nouns.ct               -0.5889     0.1643  -3.583
## condition.ct                      -1.1905     0.6134  -1.941
## restricted_nouns.ct:condition.ct  -0.9702     0.3294  -2.945
## 
## Correlation of Fixed Effects:
##             (Intr) rstr_. cndtn.
## rstrctd_nn. 0.000               
## conditin.ct 0.000  0.000        
## rstrctd_.:. 0.000  0.000  0.000

lmedrop(a,"condition.ct:restricted_nouns.ct")

## Data: unattested_novel_cond.df
## Models:
## model.dropped: rating_original ~ restricted_nouns.ct + condition.ct + (1 | pt_code)
## model: rating_original ~ restricted_nouns.ct * condition.ct + (1 | pt_code)
##               Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
## model.dropped  5 599.76 615.73 -294.88   589.76                         
## model          6 593.21 612.37 -290.60   581.21 8.5558      1   0.003444
##                 
## model.dropped   
## model         **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Planned Bayes Factor analyses

analysis 1

Summary of data for condition comparison: mean and SE for the interaction between the “attested_unattested.ct” variable and condition from bayesian lmes.
Value to inform H1 for preemption condition: mean of theory = 0; roughly expected maximum difference between attested and unattested sentences in entrenchment vs. preemption from our previous study with adults: 2.17 (n = 76; data collection is ongoing for n= 80). As outlined in “Note on data analyses” at http://rpubs.com/AnnaSamara/429816, the SD will be set to half of these max value, i.e., SD = 2.17/2

[New KEY analyses]

Summary of data for preemption condition: mean and SE for interaction between condition and the variable capturing if a sentence is unwitnessed restricted or unwitnessed novel from bayesian lmes.
Value to inform H1 for preemption condition: mean of theory = 0; roughly expected maximum rating difference between conditions from our previous study with adults: 1.047 (n = 76; data collection is ongoing for n= 80). As outlined in “Note on data analyses” at http://rpubs.com/AnnaSamara/429816, the SD will be set to half of these max value, i.e., SD = 1.047/2

Bf(0.288, 1.661, uniform = 0, meanoftheory = 0, sdtheory = 2.17/2, tail = 1)

## $LikelihoodTheory
## [1] 0.237859
## 
## $Likelihoodnull
## [1] 8.292155e-08
## 
## $BayesFactor
## [1] 2868482

# substantial evidence for greater effect of preemption over entrenchment

Bf(0.3294, 0.9702, uniform = 0, meanoftheory = 0, sdtheory = 1.047/2, tail = 1)

## $LikelihoodTheory
## [1] 0.3745419
## 
## $Likelihoodnull
## [1] 0.01582737
## 
## $BayesFactor
## [1] 23.66419

# substantial evidence for greater effect of preemption over entrenchment in stricter (key analyses)

Update on rpubs.com/AnnaSamara/429816

Anna Samara

10 October 2019

General notes on update

Design of new onomatopeic noun study

Frequentist data analyses

BF analyses

Notes on pilot data

Notes on criteria for participant and trial exclusion

Notes on criteria for sample size determination

Load packages and helper functions

Packages

Helper functions

SummarySE

SummarySEwithin

myCenter

lizCenter

lmedrop

Planned Data analysis exemplified with pilot data

Data exclusion

Participants’ performance on the baseline task

Trial exclusion (entrenchment condition)

Trial exclusion (preemption condition)

1. Baseline check: Have children in entrenchment picked up on the difference in meaning between the singular and plural determiner?

Planned frequentist analyses

2 Prediction 2a (entrenchment condition): Effect of entrenchment on children’s ratings of witnessed/unwitnessed forms

Planned frequentist analyses

Planned Bayes Factor analyses for main analyses

2 Prediction 2b (preemption condition): Effect of preemption on children’s ratings of witnessed/unwitnessed forms

Planned Bayes Factor analyses for main analyses

3 Prediction 3: Effect of preemption vs. entrenchment on chidren’s test ratings of of witnessed/unwitnessed forms

Planned frequentist analyses

Planned Bayes Factor analyses