This document is an update on the planned analyses preregistered at http://rpubs.com/AnnaSamara/333562
We had originally planned to carry out an artificial language learning study with Year 1 children comparing the effect of entrenching versus preempting forms on children’s judgments of argument-structure overgeneralizations. Pilot data from adults were summarized at http://rpubs.com/AnnaSamara/333562
As detailed at http://rpubs.com/AnnaSamara/429816, the child study was aborted following substantial evidence that children had not learnt these semantics in the entrenchment condition: this was a prerequisite to provide a valid comparison between the preemption and entrenchment conditions. We are now collecting child data using a modified version of the experiment that features onomatopoeic nouns.
Here, we preregister a study with adults on the original experiment. The data summarized at http://rpubs.com/AnnaSamara/333562 will be treated as pilot.
All aspects of the design and procedure will be identical to the original preregistered experiment, except that:
• Data will be collected online, using the online platform Gorilla (https://gorilla.sc/)
• The experiment will take place in a single (instead of 4) sessions (nb: this is consistent with what we did in the adult pilot study)
• (Where appropriate) participants will receive automated auditory feedback from the computer instead of the experimenter
• As it is currently not possible to audio record voice in Gorilla, participants will be asked to type their training and production task responses.
• During “copy-only” training blocks of trials (blocks 1,3,5,7,9,11,13,15), the written forms of Freddie’s sentences (e.g., “chila kem”, “chila gos”) will appear on the screen alongside the spoken audios. This is to ensure that participants have a reference for producing ‘accurate’ spellings of the made-up words.
• During “training-with-recast” blocks of trials (blocks 2,4,6,8,10,12,14,16), the written form of Freddie’s partial spoken audios (e.g., “chila…”) will also appear on the screen alongside the spoken audios. After participants ‘have a go at finishing Freddie’s sentence’, they will receive both visual and auditory implicit feedback on “how Freddie would have said it”.
• In the adult pilot of the preemption condition, the 4 alternating-verb trials per block were paired with animations of all possible 4 agents, such that, in each training block, agents 1 and 2 produced particle 1 whereas agents 3 and 4 produce particle 2. This means that pilot participants were (potentially) learning that the alternating verb occured with particle 1 for some agents, and particle 2 occured with some other agents in a given copying block, and could (potentially) use this information to guess “what Freddie would have said” in each subsequent training-with-recast block. In the study preregistered here, we have reordered the alternating-verb trials so that there is no association between agents and particles neither overall and nor at the block level: this has been achieved by using animations for only 2 (rather than 4 agents) in each block, such that, the 4 alternating-verb trials feature agents 1 and 2 producing either particle 1 or particle 2.
We will adopt a Bayesian approach to statistical analyses using the R package brms (Bürkner, 2017) to avoid model non-convergence (a common problem with conventional frequentist mixed effects models fitted using lme4). Another advantage of adopting a Bayesian approach, is that Bayesian models yield ‘p’ values ( pMCMC values) and credible intervals (cf. frequentist confidence intervals) that, unlike their frequentist counterparts, can be interpreted intuitively: The pMCMC value represents the probability that the true size of the effect is (for positive effects) zero or lower (for negative effects, zero or higher). The 95% credible interval represents an interval which contains, with 95% probability, the true value of the effect in question. Logistic mixed effect models will be used when we have a binary dependent variable (e.g., semantically correct/incorrect response); and linear mixed effect models will be used when we have a continuous or ordinal dependent variable.
All predictors will be scaled into standard deviation units (z-scores) allowing us to use the same relatively uninformative prior (M = 0, SD = 1) across all predictors. This prior is chosen simply on the basis that, in a normal distribution centered around zero, the majority of observations (roughly 68%) fall with one standard deviation of the mean (95% within two). We will report simultaneous models, which demonstrate the effect of each main effect (or interaction) above and beyond all the other predictors included in the model (e.g., Wurm & Fisicaro, 2014).
We will follow-up our primary (key) frequentist analyses using BF analyses. Following Dienes (personal communication), we will also compute ranges of values over which substantial Bayes Factors hold.
We will model H1 by using either:
OR
As detailed at http://rpubs.com/AnnaSamara/333562
With regards to criterion 1b, we will exclude adults whose test-phase production performance of semantically appropriate determiners for the alternating verb is less than 60%. We will also assess the semantic appropriateness of their test-phase productions for the novel verb as well as their judgment score differences in semantically appropriate vs. inappropriate trials for the alternating and novel verb; however, we will not exclude further participants on the basis of these results.
As detailed at http://rpubs.com/AnnaSamara/333562
Our policy will be as follows: We will collect data from 20 participants in the entrenchment condition and will run analyses to assess whether they have picked up on the difference in meaning between the two argument-structure constructions. This is required to provide a valid comparison between the preemption and entrenchment conditions. If the BF analyses suggest that our results are inconclusive regarding this ability (i.e., 1/3 < BF < 3), we will run power analyses to estimate what size sample might be expected to give BF > 3 or BF < 1/3, given that SE varies as square root (N); We will abort if N > 40, otherwise, we will continue to test more participants (collecting 10 at each step before inspecting the data again) until N = 40.
If we obtain substantial evidence that adults are unable to learn the semantics described above (BF < 1/3), we will abort the experiment.
If we obtain substantial evidence that they have learned the semantics described above, we will begin data collection on the preemption condition (beginning with n = 20) and will run the analyses outlined below to compare performance in the entrenchment and preemption condition. If the BF analyses suggest that our results are inconclusive regarding the comparison between entrenchment and preemption, we will follow the procedure outlined above: i) run power analyses to estimate what size sample might be expected to give BF > 3 or BF < 1/3, and ii) continue to test more participants (collecting 10 per condition at each step before inspecting the data again) until we obtain substantial evidence for/again the predicted advantage of preemption over entrenchment.
As detailed at http://rpubs.com/AnnaSamara/333562 (Note that the pilot data have been analyzed using conventional frequentist mixed effects models fitted using lme4, whereas our main data will be analysed using brms; see ##Note on data analyses: frequentist statistics)
To ensure this baseline has been met, we will examine the semantic appropriateness of participants’ particles during test-phase productions in response to scenes featuring the alternating verb. Participants who perform with less than 60% accuracy will be excluded.
Production performance for the alternating and novel verb (performance with the alternating verb is our baseline for participant exclusion): We will use Bayesian (logistic) mixed effect models. The dependent variable will be the semantic appropriateness of the particles produced: that is, the proportion of times participants produced causative particles in response to causative scenes at test (and vice versa). There is one predictor, Scene at Test [coded as verb_type_test.ct]. Including this in the model explores whether participants are more correct with one type of scenes (e.g., causatives), which is possible, though not clearly predicted.
Grammaticality judgment performance for the altenating and novel verb (no participant exclusion on the basis of these results): We will use Bayesian (linear) mixed effect models. The dependent variable will be the mean rating participants give to the novel and alternating verbs [min = 1, max = 5]. There are two predictors in the models: Semantic Appropriateness of the particle used with a verb (i.e., whether a verb that is causative at test is paired with a causative particle, and vice versa); and Scene at Test (causative, noncausative). As in the previous analyses, including Scene at Test in the model explores whether participants have a bias to generalize the causative (or the noncausative). A significant main effect of Semantic Appropriateness would suggest that participants have picked up on the underlying semantics.
• Production performance for the alternating verb: Are participants producing the semantically appropriate particles in test scenes featuring the alternating verb with better than chance (50%) accuracy?
Summary of data: mean and SE for intercept from bayesian lmes
Value to inform H1: mean of theory = 0; roughly predicted effect size from pilot adult data: 93% correct
• Production performance for the novel verb (no participant exclusion on the basis of these results): Are participants producing the semantically appropriate particles in test scenes featuring the novel verb with better than chance (50%) accuracy?
Summary of data: mean and SE for intercept from bayesian lmes
Value to inform H1: mean of theory = 0; roughly predicted effect size from pilot adult data: 90% correct
• Grammaticality judgment performance for the altenating and novel verbs: Are participants judging semantically appropriate trials that feature the alternating/novel verb higher than semantically inappropriate trials?
Summary of data for each type of verb: mean and SE for main effect of Semantic Appropriateness from bayesian lmes
Value to inform H1 for each type of verb: mean of theory = 0; roughly predicted difference between ratings for semantically appropriate vs. inappropriate trials from pilot adult data: 2.68 and 2.29 for alternating and novel verb, respectively.
We will use Bayesian (linear) mixed effect models. The dependent variable in these analyses is the mean rating participants give to restricted verbs. There are two predictors: a factor reflecting whether a trial has been witnessed (attested) with that particle during training (attested) or not (unattested); and the control factor verb type (construction-1 only vs. construction-2 only verb).
Do participants prefer attested over unattested sentences (for verbs that were restricted to one particle during training) in the preemption condition?
To assess the individual effect of preemption on participants’ ratings for attested over unattested sentences we need:
Summary of data for each condition: mean and SE for main effect of the “attested_unattested.ct” variable (capturing if a sentence has been attested during training) from bayesian lmes in this condition.
Value to inform H1 for each condition: mean of theory = 0; roughly expected rating difference between attested and unattested sentences from the adult pilot: 3.15
We will use Bayesian (linear) mixed effect models. The dependent variable in these analyses is the mean rating participants give to restricted verbs. There are two predictors: a factor reflecting whether a trial has been witnessed (attested) with that particle during training (attested) or not (unattested); and the control factor verb type (construction-1 only vs construction-2 only verb).
Do participants prefer attested over unattested sentences (for verbs that were restricted to one particle during training) in the entrenchment condition?
To assess the individual effect of this condition on participants’ ratings for attested over unattested sentences we need:
Summary of data for the entrenchment condition: main effect of the “attested_unattested.ct” variable (capturing if a sentence has been attested during training) from bayesian lmes in this condition.
Value to inform H1 for each condition: mean of theory = 0; roughly expected maximum rating difference between attested and unattested sentences: 1.00 [ This maximum reflects previous work suggesting that, when adults are rating novel verbs, the biggest difference you get between “grammatical” and “ungrammatical” forms is about 1 point on the five point scale]. As outlined in “Note on data analyses”, the SD will be set to half of these max value, i.e., SD = 0.5
The question addressed here is: Are participants producing witnessed over unwitnessed sentences more frequently than expected by chance (50%) in the preemption condition? These analyses will be treated as exploratory analyses, secondary to the judgment analyses (given that, rating the unattested form as significantly worse than the attested form -even though it conveys the right meaning- is a more stringent test of the effect of interest in comparison to producing the attested over the unattested).
We will use Bayesian (logistic) mixed effect models. The dependent variable in these analyses is the proportion of time adults produce witnessed (i.e., attested) over unwitnessed (i.e., unattested) sentences in response to the two restricted verbs. There is one predictor: verb type (construction-1 only vs construction-2 only verb)
No planned BF analyses
The question addressed here is: Are participants producing witnessed over unwitnessed sentences more frequently than expected by chance (50%) in the entrenchment condition? These analyses will be treated as secondary (exploratory) analyses.
We will use Bayesian (logistic) mixed effect models. The dependent variable in these analyses is the proportion of time participants produce attested over unattested sentences in response to the two restricted verbs. There is one predictor: verb type (construction-1 only vs construction-2 only verb).
No planned BF analyses
Key analyses: Judgment ratings for for restricted verbs: We will use Bayesian (logistic) mixed effect models. The dependent variable in these analyses is the mean rating participants give to restricted verbs. There are three predictors: condition (entrenchment versus preemption); a factor reflecting whether a trials has been attested with that particle during training (attested) or not (unattested); and the control factor verb type training (construction1-only vs. construction2-only)
Secondary analyses: Production performance for restricted verbs [exploratory analyses]: We will use Bayesian (linear) mixed effect models. The dependent variable in these analyses is the proportion of time participants produce attested over unattested sentences in response to the two restricted verbs. There are two predictors: condition (i.e., entrenchment versus preemption); and the control factor verb type training (construction1-only vs. construction2-only)
Judgment ratings: Do participants prefer attested over unattested sentences for the restricted verbs more strongly in the preemption condition relative to the entrenchment condition?
To assess the effect of preemption vs. entrenchment on ratings for attested vs. unattested verbs, we need:
Summary of data: mean and SE for the interaction between condition and the variable capturing if a sentence has been attested during training (attested_unattested.ct) from bayesian lmes.
Value to inform H1: mean of theory = 0; roughly predicted effect size from adult pilot study: 2.91
Production data: No planned BF analyses