This contains planned analyses as of 26/11/2017 for a data set which we will start to collect on 27/11/2017.
The study is an artificial language learning study to be conducted with Year 1 children (5-6 year-olds). The experiment compares the effect of entrenching versus preempting forms on children’s judgments of argument-structure overgeneralizations: In natural languages, this relies on the ability to form an abstract generalization that, e.g., not all verbs may appear both in the transitive-causative and the intransive-inchoative construction (e.g., I laughed; I laughed the clown), while several certainly do (e.g., I rolled the ball/ The ball rolled; I bounced the ball/ The ball bounced). Under the entrenchment hypothesis, argument structure overgeneralization errors for a verb (e.g., I laughed the clown) are blocked by the repeated presentation of that verb in one (or more) attested constructions. Under the preemption hypothesis, only nearly-synonymous uses contribute to this inference from absence. While it is likely that both of these hypotheses explain speakers’ ability to form restricted verb argument-structure generalizations, little work has pulled apart the effects of pre-emption and entrenchment. This is partly because overall verb frequency is often very highly correlated with the frequency of that verb in the single most nearly synonymous construction. Artificial language learning experiments allow precise control over such distributional properties of the input, while holding all other factors constant.
This is a two-phase (phase 1: training; phase 2: test) artificial language learning experiment. Training with the language (administered in 4 sessions- note that training was administered in 1 session in the pilot detailed below) is followed by tests that assess (upon training completion) participants’ ability to produce/judge sentences in the language.
Training:
Preamble: Participants are told that “they will hear sentences in Freddie’s language in order to learn to say how things happen in these videos”.
Sentences are learnt through two types of procedures:
• “Copy-only” blocks of trials, whereby participants hear and copy Freddie’s sentences (and view accompanying animations that exemplify what each sentence means)
• “Training-with-recast” blocks of trials whereby participants i) hear the first word in Freddie’s sentence (and view the accompanying animation), ii) they have a go at producing the last word in Freddie’s sentence and iii) hear “how Freddie would have said it”. [Note that adult participants in the pilot detailed below were told that Freddie would give them feedback as to whether they said the last word right (or not), but this will not be the case in follow-up studies: feedback will be more implicit, in the sense that participants will not be told that a mismatch between what Freddie and they say means that they produced the word wrong]
“Copy-only” blocks of trials are followed by “training-with-recast” blocks of trials, for to a total of 16 blocks (half with without recast; half with recast). There are 12 sentences in each block for a total of 192 training trials.
Training stimuli: The artificial sentences consist of a verb (e.g., coomo) obligatorily followed by one of two particles (e.g., gos, kem). For example, “coomo gos” is a sentence that, in one condition, corresponds to a scene showing “Bart dropping a football”.
For each participant, there are three verbs during training (e.g., coomo, chila, tombat): Two that consistently appear with just one particle (e.g., coomo only occur with gos and chila only occurs with kem) [restricted verbs] and one that appears with both particles (e.g., tombat occurs equal number of times with kem and gos) [alternating verb]. The latter “alternating verb” serves to demonstrate that each verb can indeed appear with in both type of constructions.
Particles denote:
my_data = pandoc.table(table1)
##
## ----------------------------------------------------------------------------
## X preemption entrenchment
## ----------- -------------------------------- -------------------------------
## particle1 verb in transitive causative verb in transitive causative
##
## particle2 verb in periphrastic causative verb in transitive inchoative
## ----------------------------------------------------------------------------
The critical aspect of the training design is that, in the preemption condition, the two argument-structure constructions denote direct external causation, and, as such, they compete for meaning; whereas in the entrenchment condition, only one of the two constructions (causative-1), denotes direct external causation. Importantly, the total number of exposures to each verb type (restricted 1; restricted 2; alternating) is identical between conditions.
Tests:
Production test: Participants are told that they will now see new animations (these are novel scenes that feature new characters; e.g., Marge bouncing a football) and will be asked to produce sentences in order to describe them in Freddie’s language. They are given the verb (e.g., “chila”) and are asked to produce the whole sentence (i.e., verb+ particle, e.g., chila gos), however, unlike the training-with-recast blocks of trials, they do not hear what “Freddie would have said to describe the scene”. Test scenes feature all three trained verbs, as well as a novel verb, animated either causatively (preemption condition) or half of the time causatively and half of the time noncausatively.
Grammaticality judgment test: Participants are told that they will see some more animations and they will have to rate (using smiley faces) how well a sentence in Freddie’s language describes what they see in the animation. As in the production task, test scenes feature all three trained verbs, as well as a novel verb (note that this is different verb from the one used in production), animated either causatively (preemption condition) or half of the time causatively and half of the time uncausatively. Each possible scene is accompanied by a sentence consisting of the correct verb + either of each of the two particles.
We predict that if preempting forms have a greater negative effect on the production acceptability of argument structure overgeneralization errors than merely entrenching forms (as some previous work suggests), participants will have a stronger preference for attested over unattested constructions in the preemption condition relative to the entrenchment condition.
Data will be collected from the training task on days 1-4 and from the production and grammaticality judgment task which are administered on day 4 (as outlined above).
Generalized Linear or Logistics Mixed Effect models will be used to compute frequentist statistics: (i) logistic mixed effect models will be used when we have a binary dependent variable (e.g., correct/incorrect response); and (ii) linear mixed effect models will be used when we have a continuous or ordinal dependent variable. Note that although mixed models allow us to include both participants and items, we do not intend to include items in the analyses since power on this dimension will be low. (NB - (i) we counterbalance verb-type assignment across participants; (ii) it is not common for by items analyses to be included for work in this area; (iii) increasing the number of items in a learning study of this type with children is not feasible). To avoid anti-conservative conclusions (Barr, Levy, Scheepers, & Tily, 2013), we will use a full random effect structure in our models (provided that the models converge), including intercepts for subjects and by-subject random slopes for all within-subject factors and their interactions. Our approach will be to inspect models for effects and interactions between the experimental variables where there are clear predictions.
We will also compute Bayes factors for all key analyses in order to assess whether we have have substantial evidence for the H1 or the null (H0). We will say we have substantial evidence for H1 if BF > 3 and for H0 if BF<(1/3). As advocated by Dienes (2008, 2015), we will model H1 by using either: a) an estimate of a plausible maximum effect. Computing SDs using knowledge of constraints on the likely maximum value is useful in cases where it is hard to obtain an estimate of a predicted effect size, e.g., because there is no previous relevant study with children. In such cases, we will assume that the values obtained from the pilot study with adults, detailed below, are plausible maximums of the values we could expect from children, and we compute SDs to be half of the maximum plausible effects. Note that i) we will test one-sided predictions and that ii) H1s will be modelled as half-normal distributions rather than uniform distributions, since the former consider smaller effects as more likely than bigger ones (which is expected in research with children).
See: https://doi.org/10.1016/j.jml.2017.01.005 for a published paper (at Rscript at http://rpubs.com/ewonnacott/242454) illustrating the approach of using estimates and standard errors which come from logistic mixed effects models: This is useful for binary data since it means you work in log odds space.
Note 1: The models of the pilot data presented below serve to exemplify the approach to be taken with the current data. Pilot data were collected from 20 adults (native English speakers;mean age = 24.67) tested on a nearly identical version of the experiment with the following exceptions: (i) adults were trained and tested in a single session, whereas children will be trained over 4 sessions (administered on 4 consecutive days) and tested at the end of session 4. (ii) during the training-with-recast blocks, adults in the pilot detailed below were given explicit feedback that “Freddie will give them feedback after they have had a go at saying the sentence” whereas, children will be given more implicit feedback that “after theyt have had a go, they will also hear freddie saying it”. (iii) For adults in the preemption condition, the 4 alternating-verb trials per block were paired with animations of all possible 4 agents, such that, in each training block, agents 1 and 2 produced particle 1 whereas agents 3 and 4 produce particle 2. This means that adults were (potentially) learning that the alternating verb occured with particle 1 for some agents, and particle 2 occured with some other agents in a given copying block, and could (potentially) use this information to guess “what Freddie would have said” in each subsequent training-with-recast block. In the main child study, we will reorder alternating-verb trials so that there is no association between agents and particles neither overall (as in the adult study) but also nor at the block level: this will be achieved by using animations for only 2 (rather than 4 agents) in each block, such that, the 4 alternating-verb trials will feature agents 1 and 2 producing either particle 1 or particle 2.
There is a minimum requirement that all participants have succeeded in learning the three verb meanings (e.g., “chila” matches scenes whereby someone is bouncing a ball; “tombat” matches scenes whereby a football is rolling etc.). This requirement is set to ensure that findings are not distorted by the presence of child participants who are not able to cope with the high demands of a fully artificial language learning experiment. To measure this ability, we have devised a brief (6-trial-long) baseline task which involves pointing to the correct (out of three) animations matching a soundfile. This is administered at the end of the experiment. Participants who perform at chance on this task, will be excluded and replaced.
We will also exclude and replace child participants in the entrenchment condition who do not show the expected pattern of producing semantically appropriate particles for the alternating verb (i.e., produced causative particles in response to causative scenes at test (and vice versa)). This is crucial to ensure that overall performance in the entrenchment condition does not come across as somewhat conservative due to the presence of a subset of participants who have not learnt the difference in meaning between the two argument-structure constructions and are therefore conservative (i.e., they just produce verb+ particle 1 if this is what the heard). Importantly, this requirement ensures that we provide a valid comparison between the entrenchment and preemption condition.
Our policy will be as follows: We will collect data from 20 children in the entrenchment condition and will run analyses to assess whether children have picked up on the difference in meaning between the two argument-structure constructions (i.e., particle 1 = verb in causative 1 constuction vs. particle 2 = verb in noncausative construction). Note that this is required to provide a valid comparison between the preemption and entrenchment conditions. If the BF analyses suggest that our results are inconclusive regarding this ability (i.e., 1/3 < BF < 3), we will run power analyses to estimate what size sample might be expected to give BF > 3 or BF < 1/3, given that SE varies as square root (N); We will abort if N > 40, otherwise, we will continue to test more participants (collecting 10 at each step before inspecting the data again) until N = 40.
If we obtain substantial evidence that children are unable to learn the semantics described above (BF < 1/3), we will abort the experiment.
If we obtain substantial evidence that children have learned the semantics described above, we will begin data collection on the preemption condition (beginning with n = 20) and will run the analyses outlined below to compare performance in the entrenchment and preemption condition. If the BF analyses suggest that our results are inconclusive regarding the comparison between entrenchment and preemption, we will follow the procedure outlined above: i) run power analyses to estimate what size sample might be expected to give BF > 3 or BF < 1/3, and ii) continue to test more participants (collecting 10 per condition at each step before inspecting the data again) until we obtain substantial evidence for/again the predicted advantage of preemption over entrenchment.
library(akima)
library(compute.es)
library(cowplot)
library(doBy)
library(ez)
library(ggplot2)
library(Hmisc)
library(knitr)
library(languageR)
library(lattice)
library(lme4)
library(multcomp)
library(nlme)
library(pastecs)
library(plotrix)
library(plyr)
library(psych)
library(Rcpp)
library(reshape2)
library(stringdist)
theme_set(theme_bw())
This function can be found on the website “Cookbook for R”.
http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_(ggplot2)/#Helper functions
It summarizes data, giving count, mean, standard deviation, standard error of the mean, and confidence intervals (default 95%).
data: a data frame.
measurevar: the name of a column that contains the variable to be summariezed
groupvars: a vector containing names of columns that contain grouping variables
na.rm: a boolean that indicates whether to ignore NA’s
conf.interval: the percent range of the confidence interval (default is 95%)
summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=FALSE,
conf.interval=.95, .drop=TRUE) {
require(plyr)
# New version of length which can handle NA's: if na.rm==T, don't count them
length2 <- function (x, na.rm=FALSE) {
if (na.rm) sum(!is.na(x))
else length(x)
}
# This does the summary. For each group's data frame, return a vector with
# N, mean, and sd
datac <- ddply(data, groupvars, .drop=.drop,
.fun = function(xx, col) {
c(N = length2(xx[[col]], na.rm=na.rm),
mean = mean (xx[[col]], na.rm=na.rm),
sd = sd (xx[[col]], na.rm=na.rm)
)
},
measurevar
)
# Rename the "mean" column
datac <- rename(datac, c("mean" = measurevar))
datac$se <- datac$sd / sqrt(datac$N) # Calculate standard error of the mean
# Confidence interval multiplier for standard error
# Calculate t-statistic for confidence interval:
# e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1
ciMult <- qt(conf.interval/2 + .5, datac$N-1)
datac$ci <- datac$se * ciMult
return(datac)
}
This function can be found on the website “Cookbook for R”.
http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_(ggplot2)/#Helper functions
From the website:
It summarizes data, handling within-subjects variables by removing inter-subject variability. It will still work if there are no within-S variables. It gives count, un-normed mean, normed mean (with same between-group mean), standard deviation, standard error of the mean, and confidence intervals. If there are within-subject variables, calculate adjusted values using method from Morey (2008).
data: a data frame. measurevar: the name of a column that contains the variable to be summariezed betweenvars: a vector containing names of columns that are between-subjects variables withinvars: a vector containing names of columns that are within-subjects variables idvar: the name of a column that identifies each subject (or matched subjects) na.rm: a boolean that indicates whether to ignore NA’s conf.interval: the percent range of the confidence interval (default is 95%)
summarySEwithin <- function(data=NULL, measurevar, betweenvars=NULL, withinvars=NULL,
idvar=NULL, na.rm=FALSE, conf.interval=.95, .drop=TRUE) {
# Ensure that the betweenvars and withinvars are factors
factorvars <- vapply(data[, c(betweenvars, withinvars), drop=FALSE],
FUN=is.factor, FUN.VALUE=logical(1))
if (!all(factorvars)) {
nonfactorvars <- names(factorvars)[!factorvars]
message("Automatically converting the following non-factors to factors: ",
paste(nonfactorvars, collapse = ", "))
data[nonfactorvars] <- lapply(data[nonfactorvars], factor)
}
# Get the means from the un-normed data
datac <- summarySE(data, measurevar, groupvars=c(betweenvars, withinvars),
na.rm=na.rm, conf.interval=conf.interval, .drop=.drop)
# Drop all the unused columns (these will be calculated with normed data)
datac$sd <- NULL
datac$se <- NULL
datac$ci <- NULL
# Norm each subject's data
ndata <- normDataWithin(data, idvar, measurevar, betweenvars, na.rm, .drop=.drop)
# This is the name of the new column
measurevar_n <- paste(measurevar, "_norm", sep="")
# Collapse the normed data - now we can treat between and within vars the same
ndatac <- summarySE(ndata, measurevar_n, groupvars=c(betweenvars, withinvars),
na.rm=na.rm, conf.interval=conf.interval, .drop=.drop)
# Apply correction from Morey (2008) to the standard error and confidence interval
# Get the product of the number of conditions of within-S variables
nWithinGroups <- prod(vapply(ndatac[,withinvars, drop=FALSE], FUN=nlevels,
FUN.VALUE=numeric(1)))
correctionFactor <- sqrt( nWithinGroups / (nWithinGroups-1) )
# Apply the correction factor
ndatac$sd <- ndatac$sd * correctionFactor
ndatac$se <- ndatac$se * correctionFactor
ndatac$ci <- ndatac$ci * correctionFactor
# Combine the un-normed means with the normed results
merge(datac, ndatac)
}
This function is used by the SummarySEWithin function above. It can be found on the website “Cookbook for R”
http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_(ggplot2)/#Helper functions
From that website:
Norms the data within specified groups in a data frame; it normalizes each subject (identified by idvar) so that they have the same mean, within each group specified by betweenvars.
data: a data frame idvar: the name of a column that identifies each subject (or matched subjects) measurevar: the name of a column that contains the variable to be summarized betweenvars: a vector containing names of columns that are between-subjects variables na.rm: a boolean that indicates whether to ignore NA’s
normDataWithin <- function(data=NULL, idvar, measurevar, betweenvars=NULL,
na.rm=FALSE, .drop=TRUE) {
#library(plyr)
# Measure var on left, idvar + between vars on right of formula.
data.subjMean <- ddply(data, c(idvar, betweenvars), .drop=.drop,
.fun = function(xx, col, na.rm) {
c(subjMean = mean(xx[,col], na.rm=na.rm))
},
measurevar,
na.rm
)
# Put the subject means with original data
data <- merge(data, data.subjMean)
# Get the normalized data in a new column
measureNormedVar <- paste(measurevar, "_norm", sep="")
data[,measureNormedVar] <- data[,measurevar] - data[,"subjMean"] +
mean(data[,measurevar], na.rm=na.rm)
# Remove this subject mean column
data$subjMean <- NULL
return(data)
}
This function outputs the centered values of an variable, which can be a numeric variable, a factor, or a data frame. It was taken from Florian Jaegers blog https://hlplab.wordpress.com/2009/04/27/centering-several-variables/.
From his blog:
-If the input is a numeric variable, the output is the centered variable.
-If the input is a factor, the output is a numeric variable with centered factor level values. That is, the factor’s levels are converted into numerical values in their inherent order (if not specified otherwise, R defaults to alphanumerical order). More specifically, this centers any binary factor so that the value below 0 will be the 1st level of the original factor, and the value above 0 will be the 2nd level.
-If the input is a data frame or matrix, the output is a new matrix of the same dimension and with the centered values and column names that correspond to the colnames() of the input preceded by “c” (e.g. “Variable1” will be “cVariable1”).
myCenter= function(x) {
if (is.numeric(x)) { return(x - mean(x, na.rm=T)) }
if (is.factor(x)) {
x= as.numeric(x)
return(x - mean(x, na.rm=T))
}
if (is.data.frame(x) || is.matrix(x)) {
m= matrix(nrow=nrow(x), ncol=ncol(x))
colnames(m)= paste("c", colnames(x), sep="")
for (i in 1:ncol(x)) {
m[,i]= myCenter(x[,i])
}
return(as.data.frame(m))
}
}
This function provides a wrapper around myCenter allowing you to center a specific list of variables from a dataframe. The input is a dataframe (x) and a list of the names of the variables which you wish to center (listfname). The output is a copy of the dataframe with a column (numeric) added for each of the centered variables with each one labelled with it’s previous name with “.ct” appended. For example, if x is a dataframe with columns “a” and “b” lizCenter(x, list(“a”, “b”)) will return a dataframe with two additional columns, a.ct and b.ct, which are numeric, centered codings of the corresponding variables.
lizCenter= function(x, listfname)
{
for (i in 1:length(listfname))
{
fname = as.character(listfname[i])
x[paste(fname,".ct", sep="")] = myCenter(x[fname])
}
return(x)
}
This function is equivalent to the Dienes (2008) calculator which can be found here: http://www.lifesci.sussex.ac.uk/home/Zoltan_Dienes/inference/Bayes.htm.
The code was provided by Baguely and Kayne (2010) and can be found here: http://www.academia.edu/427288/Review_of_Understanding_psychology_as_a_science_An_introduction_to_scientific_and_statistical_inference
Bf<-function(sd, obtained, uniform, lower=0, upper=1, meanoftheory=0,sdtheory=1, tail=2){
area <- 0
if(identical(uniform, 1)){
theta <- lower
range <- upper - lower
incr <- range / 2000
for (A in -1000:1000){
theta <- theta + incr
dist_theta <- 1 / range
height <- dist_theta * dnorm(obtained, theta, sd)
area <- area + height * incr
}
}else
{theta <- meanoftheory - 5 * sdtheory
incr <- sdtheory / 200
for (A in -1000:1000){
theta <- theta + incr
dist_theta <- dnorm(theta, meanoftheory, sdtheory)
if(identical(tail, 1)){
if (theta <= 0){
dist_theta <- 0
} else {
dist_theta <- dist_theta * 2
}
}
height <- dist_theta * dnorm(obtained, theta, sd)
area <- area + height * incr
}
}
LikelihoodTheory <- area
Likelihoodnull <- dnorm(obtained, 0, sd)
BayesFactor <- LikelihoodTheory / Likelihoodnull
ret <- list("LikelihoodTheory" = LikelihoodTheory,"Likelihoodnull" = Likelihoodnull, "BayesFactor" = BayesFactor)
ret
}
This works with the Bf function above. It requires the same values as that function (i.e. the obtained mean and SE for the current sample, a value for the predicted mean, which is set to be sdtheory (with meanoftheory=0), and the current number of participants N). However, rather than return a BF for the current sample, it works out what the BF would be for a range of different subject numbers (assuming that the SE scales with sqrt(N)).
Bf_powercalc<-function(sd, obtained, uniform, lower=0, upper=1, meanoftheory=0, sdtheory=1, tail=2, N, min, max)
{
x = c(0)
y = c(0)
for(newN in min : max)
{
B = as.numeric(Bf(sd = sd*sqrt(N/newN), obtained, uniform, lower, upper, meanoftheory, sdtheory, tail)[3])
x= append(x,newN)
y= append(y,B)
output = cbind(x,y)
}
output = output[-1,]
return(output)
}
Given an lmer model (model) and one of the coefficients, this returns p value for that coefficient using model comparison (i.e. comparing identical models with and without those terms). It can be used to get p-values when using lmer rather than gmler and dealing with continuous (rather than binonimal) dependent variables.
lmedrop<-function(model, term) {
model.dropped<-update(model,eval(paste(".~.-",term)));
anova(model.dropped,model) }
The dataframes training.df, production.df, grammaticality.df, and baseline.df contain pilot data from 20 adults’ performance on the training, production, grammaticality judgment and baseline tasks, respectively.
trainingdata.df <- read.csv("training.csv", header=TRUE)
productiondata.df <- read.csv("production.csv", header=TRUE)
judgmentdata.df <- read.csv("grammaticality.csv", header=TRUE)
baseline.df <- read.csv("baseline.csv", header=TRUE)
As outlined above, we will exclude data from all participants who perform at chance on the baseline task. If applicable, we will also run analyses on and subsequently exclude production trials whereby participants produce something other than particle 1 or particle 2.
round(with(baseline.df, tapply(accuracy, list(pt_code), mean, na.rm=T)),2)
## a4 ad_pre_1 ad_pre_10 ad_pre_2 ad_pre_3 ad_pre_4 ad_pre_5
## 1.00 1.00 1.00 1.00 1.00 1.00 1.00
## ad_pre_6 ad_pre_7 ad_pre_8 ad_pre_9 ad1 ad10 ad2
## 1.00 1.00 1.00 1.00 1.00 1.00 1.00
## ad3 ad5 ad6 ad7 ad8 ad9
## 1.00 1.00 0.67 1.00 1.00 1.00
No participants are excluded from further analyses. Note that most adults perform at ceiling on the baseline task (accuracy = 100%), which is likely to be different with child data.
#create appropriate dataset for production in the entrenchment condition
productiondata_entrenchment.df = subset(productiondata.df, condition == "entrenchment")
#code excluded (det_other/none)
productiondata_entrenchment.df$det_excluded <- 0
productiondata_entrenchment.df$det_excluded[productiondata_entrenchment.df$det_lenient_adapted == "det other"] <- 1
#code det included
productiondata_entrenchment.df$det_included <- 0
productiondata_entrenchment.df$det_included[productiondata_entrenchment.df$det_excluded==0] <- 1
#turn long format
productiondata_entrenchment.long.df <- melt(productiondata_entrenchment.df, id.vars=c("pt_code", "trial_code", "verb_type_training", "verb_type_test", "old_new"),
measure.vars=c("det_excluded", "det_included"), variable.name="det_produced", value.name="measurement"
)
round(with(productiondata_entrenchment.long.df, tapply(measurement, list(det_produced), mean, na.rm=T)),2)
## det_excluded det_included
## 0 1
There is no responses where participants produce something other than particle 1 or particle 2. Note that, once again, this is likely to be different with child data.
To provide a valid comparison between the preemption and entrenchment conditions, we need to ensure that participants trained with entrenching forms have picked up on the difference in meaning between the two argument-structure constructions. To ensure this baseline has been met, we will (a) examine the semantic appropriateness of children’s particle productions in the production test and (b) look for differences in their acceptability ratings for semantically appropriate and inappropriate particles in the forced choice task. Previous work shows that participants’ productions for verbs that are restricted in the training input are sometimes conservative: that is, they may show a preference for the attested particles, even if they have learnt that particle-usage depends on the underlying semantics. For this reason, we will carry out all baseline analyses on the two unrestricted verbs, i.e., the novel and alternating verbs.
The dependent variable in all subsequent models on production performance is the semantic appropriateness of the particles produced: that is, the proportion of times participants produced causative particles in response to causative scenes at test (and vice versa). There is one predictor, Scene at Test [coded as verb_type_test.ct]. Including this in the model explores whether participants are more correct with one type of scenes (e.g., causatives), which is possible, though not clearly predicted.
A significant intercept would suggest that participants produce the semantically appropriate particles with better than chance accuracy
#Select production data for the alternating verb:
productiondata_entrenchment_alternating.df = subset(productiondata_entrenchment.df, verb_type_training == "alternating")
#Center variables of interest using the lizCenter function:
d_prod_alt = lizCenter(productiondata_entrenchment_alternating.df, list("verb_type_test"))
#Calculate average percentage of semantically correct performance for the alternating verb:
d_prod_alt_aggregated = summarySEwithin(d_prod_alt, measurevar="semantically_correct", withinvars= "verb_type_test", idvar="pt_code", na.rm=FALSE, conf.interval=.95)
round(mean(d_prod_alt_aggregated$semantically_correct),2)
## [1] 0.93
# Calculate average percentage of semantically correct performance for the alternating verb separately for causative and noncausative scenes:
round(tapply(d_prod_alt_aggregated$semantically_correct, d_prod_alt_aggregated$verb_type_test, mean),2)
## intransitive transitive
## 0.97 0.88
#Run the lme:
a = glmer(semantically_correct ~ verb_type_test.ct + (verb_type_test.ct|pt_code), family =binomial, control=glmerControl(optimizer = "bobyqa"), data = d_prod_alt)
round(summary(a)$coefficients,3)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 10.309 5.997 1.719 0.086
## verb_type_test.ct 13.531 11.813 1.145 0.252
There is no evidence that adults perform differently with verbs that appear in the causative vs. noncausative scenes. The intercept (93%) correct is not significantly different from chance, however, given that participantsa are almost at ceiling, this seems to be due to lack of power and should be resolved in the main child study where our starting sample will be double (20 participants/condition). (Note that removing verb_type_test.ct from the model yields a significant intercept (b = 9.98, SE = 4.49, z = 2.22 p = .026) in the simplified model).
#Select production data for the novel verb:
productiondata_entrenchment_novel.df = subset(productiondata_entrenchment.df, verb_type_training == "new1")
#Center variables of interest using the lizCenter function:
d_prod_novel = lizCenter(productiondata_entrenchment_novel.df, list("verb_type_test"))
#Calculate average percentage of semantically correct performance for the novel verb:
d_prod_novel_aggregated = summarySEwithin(d_prod_novel, measurevar="semantically_correct", withinvars= "verb_type_test", idvar="pt_code", na.rm=FALSE, conf.interval=.95)
round(mean(d_prod_novel_aggregated$semantically_correct),2)
## [1] 0.9
#Calculate average percentage of semantically correct performance for the novel verb separately for causative and noncausative scenes:
round(tapply(d_prod_novel_aggregated$semantically_correct, d_prod_novel_aggregated$verb_type_test, mean),2)
## intransitive transitive
## 0.89 0.91
#Run the lme:
#Note that that random slopes for the within-subject factor "scene at test" needs to be dropped to achieve model convergeance.
a = glmer(semantically_correct ~ verb_type_test.ct + (1|pt_code), family =binomial, control=glmerControl(optimizer = "bobyqa"), data = d_prod_novel)
round(summary(a)$coefficients,3)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 8.394 3.473 2.417 0.016
## verb_type_test.ct 0.923 1.407 0.656 0.512
As in the analyses for the alternating verb above, there is no evidence that adults perform differently with verbs that appear in the causative vs. noncausative scenes. The intercept (90%) correct is significantly above from chance.
The dependent variable in these analyses is the mean rating participants give to the novel and alternating verbs. [min = 1, max = 5]. There are two predictors in the models: Semantic Appropriateness of the particle used with a verb at test (i.e., whether a verb that is causative at test is paired with a causative particle, and vice versa); and Scene at Test (causative, noncausative). As in the previous analyses, including Scene at Test in the model explores whether participants have a bias to generalize the causative (or the noncausative). A significant main effect of Semantic Appropriateness would suggest that participants have picked up on the underlying semantics.
judgmentdata_entrenchment.df = subset(judgmentdata.df, condition == "entrenchment")
#turn semantically_correct into a factor
judgmentdata_entrenchment.df$semantically_correct = factor(judgmentdata_entrenchment.df$semantically_correct)
#recode data to give sensible variable names
judgmentdata_entrenchment.df$semantically_correct <- revalue(x = judgmentdata_entrenchment.df$semantically_correct, c("1" = "semantically correct", "0" = "semantically incorrect"))
#Select judgment data for the alternating verb:
judgmentdata_entrenchment_alternating.df = subset(judgmentdata_entrenchment.df, verb_type_training == "alternating")
#Center variables of interest using the lizCenter function:
d_alternating_gr = lizCenter(judgmentdata_entrenchment_alternating.df, list("verb_type_test","semantically_correct"))
#Calculate average ratings for semantically correct and incorrect trials featuring the alternatig verb:
d_alternating_gr_aggregated = summarySEwithin(d_alternating_gr, measurevar="rating_original", withinvars= c("verb_type_test", "semantically_correct"), idvar="pt_code", na.rm=FALSE, conf.interval=.95)
round(tapply(d_alternating_gr_aggregated$rating_original,d_alternating_gr_aggregated$semantically_correct, mean),2)
## semantically incorrect semantically correct
## 2.03 4.71
#Calculate average ratings for semantically correct and incorrect trials featuring the alternatig verb separately for causative and noncausative scenes:
round(tapply(d_alternating_gr_aggregated$rating_original, list(d_alternating_gr_aggregated$semantically_correct,d_alternating_gr_aggregated$verb_type_test), mean),2)
## intransitive transitive
## semantically incorrect 2.05 2.00
## semantically correct 4.63 4.79
#Run the lme:
a = lmer(rating_original ~ semantically_correct.ct * verb_type_test.ct + (verb_type_test.ct|pt_code), data = d_alternating_gr)
round(summary(a)$coefficients,3)
## Estimate Std. Error t value
## (Intercept) 3.383 0.137 24.665
## semantically_correct.ct 2.684 0.168 15.955
## verb_type_test.ct 0.055 0.169 0.323
## semantically_correct.ct:verb_type_test.ct 0.211 0.336 0.626
lmedrop(a,"semantically_correct.ct")
## Data: d_alternating_gr
## Models:
## model.dropped: rating_original ~ verb_type_test.ct + (verb_type_test.ct | pt_code) +
## model.dropped: semantically_correct.ct:verb_type_test.ct
## model: rating_original ~ semantically_correct.ct * verb_type_test.ct +
## model: (verb_type_test.ct | pt_code)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## model.dropped 7 296.64 312.95 -141.319 282.64
## model 8 190.07 208.72 -87.037 174.07 108.57 1 < 2.2e-16
##
## model.dropped
## model ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Select judgment data for the alternating verb:
judgmentdata_entrenchment_novel.df = subset(judgmentdata_entrenchment.df, verb_type_training == "new2")
#Center variables of interest using the lizCenter function:
d_novel_gr = lizCenter(judgmentdata_entrenchment_novel.df, list("verb_type_test","semantically_correct"))
#Calculate average ratings for semantically correct and incorrect trials featuring the alternatig verb:
d_novel_gr_aggregated = summarySEwithin(d_novel_gr, measurevar="rating_original", withinvars= c("verb_type_test", "semantically_correct"), idvar="pt_code", na.rm=FALSE, conf.interval=.95)
round(tapply(d_novel_gr_aggregated$rating_original, d_novel_gr_aggregated$semantically_correct, mean),2)
## semantically incorrect semantically correct
## 1.82 4.11
#Calculate average ratings for semantically correct and incorrect trials featuring the alternatig verb separately for causative and noncausative scenes:
round(tapply(d_novel_gr_aggregated$rating_original, list(d_novel_gr_aggregated$semantically_correct,d_novel_gr_aggregated$verb_type_test), mean),2)
## intransitive transitive
## semantically incorrect 1.84 1.79
## semantically correct 4.16 4.05
#Run the lme:
a = lmer(rating_original ~ semantically_correct.ct * verb_type_test.ct + (verb_type_test.ct|pt_code), data = d_novel_gr)
round(summary(a)$coefficients,3)
## Estimate Std. Error t value
## (Intercept) 2.992 0.226 13.261
## semantically_correct.ct 2.289 0.234 9.803
## verb_type_test.ct -0.080 0.234 -0.343
## semantically_correct.ct:verb_type_test.ct -0.053 0.467 -0.113
lmedrop(a,"semantically_correct.ct")
## Data: d_novel_gr
## Models:
## model.dropped: rating_original ~ verb_type_test.ct + (verb_type_test.ct | pt_code) +
## model.dropped: semantically_correct.ct:verb_type_test.ct
## model: rating_original ~ semantically_correct.ct * verb_type_test.ct +
## model: (verb_type_test.ct | pt_code)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## model.dropped 7 302.34 318.66 -144.17 288.34
## model 8 243.28 261.93 -113.64 227.28 61.055 1 5.55e-15
##
## model.dropped
## model ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In sum, the analyses on both the alternating and novel verb suggest that adults have a preference for scenes that are semantically correct over scenes that are semantically incorrect, i.e., have learnt the difference in meaning between particle 1 and 2.
Having established that participants pick up on the difference in meaning between the two argument-structure constructions, we will ask: do child participants prefer attested over unattested sentences for verbs that were restricted to one construction in the training input more strongly in the preemption condition relative to the entrenchment condition.
The dependent variable in these analyses is the mean rating participants give to restricted verbs: these are the two verbs, that during training, only occured with one of the two particles (referred to as construction-1 only and construction-2 only verbs). There are two predictors: condition (i.e., entrenchment versus preemption); and a factor reflecting whether a sentence has been attested with that particle during training (attested) or not (unattested): a) Attested trials: construction1-only verbs appearing with construction1 particles; construction2 only verbs appearing with construction2 particles b) Unattested trials: construction1-only verbs appearing with construction2 particles; construction2 only verbs appearing with construction1 particles
Note that there was no difference in performance for construction-1 only and construction-2 only verbs in the pilot data reported below, thus, this factor has been dropped from the analyses. Similarly, we will test for differences between these verb types in the main child analyses, and drop the factor if no significant effect is found.
d_preemption_vs_entrenchment = subset(judgmentdata.df, restricted_verbs == "yes")
#Center variables of interest using the lizCenter function:
d_preemption_vs_entrenchment = lizCenter(d_preemption_vs_entrenchment, list("attested_unattested","condition"))
#Calculate average ratings for attested and unattested sentences in each condition:
d_preemption_vs_entrenchment_aggregated = summarySEwithin(d_preemption_vs_entrenchment, measurevar="rating_original", withinvars= c("attested_unattested", "condition"), idvar="pt_code", na.rm=FALSE, conf.interval=.95)
round(tapply(d_preemption_vs_entrenchment_aggregated$rating_original, list(d_preemption_vs_entrenchment_aggregated$attested_unattested,d_preemption_vs_entrenchment_aggregated$condition), mean),2)
## entrenchment preemption
## attested 3.49 5.00
## unattested 3.26 1.85
#Run the lme:
a = lmer(rating_original ~ condition.ct * attested_unattested.ct + (attested_unattested.ct|pt_code), data = d_preemption_vs_entrenchment)
summary(a)
## Linear mixed model fit by REML ['lmerMod']
## Formula:
## rating_original ~ condition.ct * attested_unattested.ct + (attested_unattested.ct |
## pt_code)
## Data: d_preemption_vs_entrenchment
##
## REML criterion at convergence: 994.9
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.19966 -0.32615 0.01509 0.46572 2.24315
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## pt_code (Intercept) 0.1787 0.4228
## attested_unattested.ct 0.5979 0.7732 1.00
## Residual 1.2736 1.1286
## Number of obs: 312, groups: pt_code, 20
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 3.39865 0.11427 29.742
## condition.ct 0.05409 0.22853 0.237
## attested_unattested.ct -1.72800 0.21530 -8.026
## condition.ct:attested_unattested.ct -2.91884 0.43059 -6.779
##
## Correlation of Fixed Effects:
## (Intr) cndtn. atts_.
## conditin.ct 0.015
## attstd_ntt. 0.667 0.014
## cndtn.ct:_. 0.014 0.667 0.014
lmedrop(a,"condition.ct*attested_unattested.ct")
## Data: d_preemption_vs_entrenchment
## Models:
## model.dropped: rating_original ~ (attested_unattested.ct | pt_code)
## model: rating_original ~ condition.ct * attested_unattested.ct + (attested_unattested.ct |
## model: pt_code)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## model.dropped 5 1054.5 1073.2 -522.27 1044.54
## model 8 1004.9 1034.9 -494.47 988.93 55.605 3 5.099e-12
##
## model.dropped
## model ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The key prediction that there would be a greater difference between attested and unattested sentences in the preemption relative to the entrenchment condition (which suggests that preempting forms are more influential than entrenching forms in restricting overgeneralization argument-structure errors) has been confirmed. This value will be used in the main child study as the maximum predicted difference between entrenchment and preemption performance (condition.ct * attested_unattested.ct).
We also want to assess the effect that each condition has on the difference in ratings between attested and unattested sentences:
a = lmer(rating_original ~ 1 + condition: attested_unattested.ct + condition.ct + (attested_unattested.ct|pt_code), data = d_preemption_vs_entrenchment)
summary(a)
## Linear mixed model fit by REML ['lmerMod']
## Formula:
## rating_original ~ 1 + condition:attested_unattested.ct + condition.ct +
## (attested_unattested.ct | pt_code)
## Data: d_preemption_vs_entrenchment
##
## REML criterion at convergence: 994.9
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.19966 -0.32615 0.01509 0.46572 2.24315
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## pt_code (Intercept) 0.1787 0.4228
## attested_unattested.ct 0.5979 0.7732 1.00
## Residual 1.2736 1.1286
## Number of obs: 312, groups: pt_code, 20
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 3.39865 0.11427 29.742
## condition.ct 0.05409 0.22853 0.237
## conditionentrenchment:attested_unattested.ct -0.23116 0.30623 -0.755
## conditionpreemption:attested_unattested.ct -3.15000 0.30271 -10.406
##
## Correlation of Fixed Effects:
## (Intr) cndtn. cndtnn:_.
## conditin.ct 0.015
## cndtnntr:_. 0.459 -0.471
## cndtnprm:_. 0.485 0.473 0.000
BF analyses using an estimate from Ambridge et al.’s work with adults that the a roughly expected mean difference between attested and unattested sentences would be about 1 are used to assess whether we have substantial evidence for H1 (higher ratings for attested over unattested sentences) over the null in each condition
#entrenchment: the mean difference between attested and unattested ratigs is 0.23116
Bf(0.30623, 0.23116, uniform = 0, meanoftheory = 0, sdtheory = 1, tail = 1)
## $LikelihoodTheory
## [1] 0.5674259
##
## $Likelihoodnull
## [1] 0.9797826
##
## $BayesFactor
## [1] 0.5791345
The BF is < 3 and > 1/3, which suggests that the data are insensitive.
Bf_powercalc(sd = 0.30623, obtained = 0.23116, uniform = 0, meanoftheory=0, sdtheory=1, tail=1, N = 10, min = 60, max = 100)
## x y
## [1,] 60 1.288931
## [2,] 61 1.317004
## [3,] 62 1.345805
## [4,] 63 1.375352
## [5,] 64 1.405662
## [6,] 65 1.436755
## [7,] 66 1.468651
## [8,] 67 1.501368
## [9,] 68 1.534929
## [10,] 69 1.569353
## [11,] 70 1.604662
## [12,] 71 1.640879
## [13,] 72 1.678027
## [14,] 73 1.716128
## [15,] 74 1.755209
## [16,] 75 1.795292
## [17,] 76 1.836404
## [18,] 77 1.878572
## [19,] 78 1.921821
## [20,] 79 1.966180
## [21,] 80 2.011677
## [22,] 81 2.058342
## [23,] 82 2.106204
## [24,] 83 2.155295
## [25,] 84 2.205646
## [26,] 85 2.257289
## [27,] 86 2.310258
## [28,] 87 2.364588
## [29,] 88 2.420314
## [30,] 89 2.477471
## [31,] 90 2.536098
## [32,] 91 2.596231
## [33,] 92 2.657912
## [34,] 93 2.721179
## [35,] 94 2.786074
## [36,] 95 2.852640
## [37,] 96 2.920920
## [38,] 97 2.990960
## [39,] 98 3.062804
## [40,] 99 3.136501
## [41,] 100 3.212099
If SE varies as square root, we need approximately 100 participants to have conclusive evidence that attested > unattested in this condition
#preemption: the mean difference between attested and unattested ratigs is 3.15000
Bf(0.30271, 3.15000, uniform = 0, meanoftheory = 0, sdtheory = 1, tail = 1)
## $LikelihoodTheory
## [1] 0.008111892
##
## $Likelihoodnull
## [1] 4.037698e-24
##
## $BayesFactor
## [1] 2.009039e+21
We have conclusive evidence that attested > unattested in the preemption condition.
We will use lme models identical to those used on the pilot data above to (i) compare the effect of preemption vs. entrenchment on children’s preference for attested over unattested verb argument constructions and (ii) assess to the individual effects that each of these conditions have on children’s ratings.
These are exploratory data analyses with no are no clear predictions.
We will analyse children’s production accuracy during each of the training-with-recast blocks, for each of the three types of training verbs, in each of the two conditions.
#Select data for the recast training blocks:
trainingdata.df = subset(trainingdata.df, procedure == "training_feedback")
#Center variables of interest using the lizCenter function:
d_training = lizCenter(trainingdata.df, list("block","verb_type_training", "condition"))
#Run the lme:
a = glmer(training_accuracy ~ condition.ct * verb_type_training.ct * block.ct + (verb_type_training.ct*block.ct|pt_code), family =binomial, control=glmerControl(optimizer = "bobyqa"), data = d_training)
round(summary(a)$coefficients,3)
## Estimate Std. Error z value
## (Intercept) 3.987 0.470 8.487
## condition.ct -1.430 0.762 -1.877
## verb_type_training.ct 2.219 0.449 4.943
## block.ct -0.037 0.071 -0.527
## condition.ct:verb_type_training.ct 0.797 0.688 1.159
## condition.ct:block.ct -0.065 0.085 -0.761
## verb_type_training.ct:block.ct 0.021 0.075 0.276
## condition.ct:verb_type_training.ct:block.ct -0.038 0.093 -0.408
## Pr(>|z|)
## (Intercept) 0.000
## condition.ct 0.061
## verb_type_training.ct 0.000
## block.ct 0.599
## condition.ct:verb_type_training.ct 0.247
## condition.ct:block.ct 0.446
## verb_type_training.ct:block.ct 0.783
## condition.ct:verb_type_training.ct:block.ct 0.684
As above
No planned BF analyses
In the analyses of restricted verbs outlined above, we do not differentiate between causative-only and noncausatively-only verbs in the entrenchment condition (note that this distinction is not relevant in the pre-emption condition). However, we plan to carry out analyses (for both production and grammaticality judgment) to see if they are more/less accurate with these verb types. These analyses will be akin to those production/grammaticality judgment performance analyses described above for the novel and alternating verb in the entrenchment condition. Note that unlike the analyses for the novel and alternating verb, these may be noninformative regarding children’s knowledge of the underlying semantics. However, it would be interesting to explore whether there is a different pattern of performance in children’s productions and judgments for the two restricted verbs relative to their productions for judgments for the two nonrestricted verbs. If there is a large difference between these verb types (which we do not predict), it may be informative to repeat the comparison between the entrenchment and pre-emption condition condition separately for noncausative-only and causative-only verbs (that is, repeat the analyses with (i) causative-only verbs removed, and (ii) with noncausative-only removed).