This document was created from an R markdown file. The repository for the project can be found here. The data reported in the paper can be explored interactively at the Metalab website.

1 Details of calculating effect size

To standardize the effect size calculation, we converted some reported raw results to the proportion of correct responses. For looking time studies, when the paper only reported the raw looking time in seconds, we calculated the proportion of correct response by dividing the mean looking time toward the matching scene by the sum of looking time toward the matching scenes and non-matching scenes (i.e., excluding the look away time from the denominator). The raw standard deviations were also converted to the corresponding values by being divided by the sum.

Below is a step-by-step example calculation using data in Yuan & Fisher (2009) Experiment. The table presents raw data from Yuan & Fisher (2009, pg 622) Table 1. The values are Mean looking time in seconds, and in parentheses are SE.

Dialogue Type Sample Size Two-participant Event One-participant Event
Transitive 8 4.82 (0.43) 2.87 (0.51)
Intransitive 8 3.33 (0.24) 4.12 (0.40)

When the paper only provides raw looking time data, we converted the data into proportion of correct looking time and the variances following the formulae below. For children hearing transitive sentences, the correct scene was the Two-participant Event. For children hearing intransitive sentences, the correct scene was the One-participant Event. Standard Deviations were calculated by scaling the raw SE first, and then multiplied by the square roots of the number of participants."

\[\begin{aligned} Mean_{Proportion} &= \frac{Time_{correct}}{Time_{correct} + Time_{incorrect}} \\ SD_{Proportion} &= \frac{SE_{Raw}}{Time_{correct} + Time_{incorrect}} * \sqrt[2]{N}\\ \\ \\ \end{aligned}\] \[\begin{aligned} Mean_{transitive} &= \frac{Time_{correct}}{Time_{correct} + Time_{incorrect}} \\ &= \frac{4.82}{4.82 + 2.87} \\ &= 0.627 \\ SD_{transitive} &= \frac{SE_{Raw}}{Time_{correct} + Time_{incorrect}} * \sqrt[2]{N} \\ &= \frac{0.43}{4.82 + 2.87} * \sqrt[2]{8} \\ &= 0.158 \\ \\ \\ \end{aligned}\] \[\begin{aligned} Mean_{intransitive} &= \frac{Time_{correct}}{Time_{correct} + Time_{incorrect}} \\ &= \frac{4.12}{3.33 + 4.12} \\ &= 0.553 \\ SD_{intransitive} &= \frac{SE_{Raw}}{Time_{correct} + Time_{incorrect}} * \sqrt[2]{N} \\ &= \frac{0.4}{3.33 + 4.12} * \sqrt[2]{8} \\ &= 0.152 \\ \\ \\ \end{aligned}\]

Using the standardized data as presented in the table below, we calculate Cohen’s d and the variances as follows (the implemetation of the script can be found at XXX)

Dialogue Type Sample Size Mean Proportion Standard Deviation
Transitive 8 0.627 0.158
Intransitive 8 0.553 0.152
\[\begin{aligned} d_{transitive} &= \frac{M_1 - M_2}{\sigma_{pooled}} \\ &= \frac{M_{correct} - M_{chance}}{\sigma_{correct}} \\ &= \frac{0.627 - 0.5}{0.158} \\ &\approx 0.79 \\ \\ \\ d_{intransitive} &= \frac{M_1 - M_2}{\sigma_{pooled}} \\ &= \frac{M_{correct} - M_{chance}}{\sigma_{correct}} \\ &= \frac{0.553 - 0.5}{0.152} \\ &\approx 0.35 \end{aligned}\] \[\begin{aligned} var(d_{transitive}) &= \frac{1}{N} + \frac{d^2}{2 * N} \\ &= \frac{1}{8} + \frac{0.79^2}{2 * 8} \\ &\approx 0.16 \\ var(d_{intransitive}) &= \frac{1}{N} + \frac{d^2}{2 * N} \\ &= \frac{1}{8} + \frac{0.35^2}{2 * 8} \\ &\approx 0.13 \\ \end{aligned}\]

2 Sensitivity analysis

The plot below shows a modified funnel plot, or “significance funnel” where significant studies are shown in orange and non-significant studies are shown in grey. The x-axis shows effect size estimates, and the y-axis shows estimated standard error for each estimate. Studies lying on the grey line have a p-value of .05. The black diamond shows the meta-analytic effect size estimate for all studies; the grey diamond shows the meta-analytic effet size estimate for significant studies only (the “worst-case” publication scenario). Note that the worst case scenario appreciable attenuates the effect size estimate, but does not attentuate the point estimate to 0 (worst case estimate: 0.08 [-0.1, 0.26]).

3 Main model results

The tables below show the estimates for the single-moderator models reported in the main text. Across all the single-predictor model, the predicate type is significant, such that hearing transitive sentences have a positive effect on the effect size. We also found that median vocabulary size is a marginally significant moderator.

In the tables throughout the supplemental information, we reported the point estimates for the parameters and their 95% confidence intervals in square brackets (i.e., [lower bound, upper bound].) For estimates that reaches a p-value of 0.05, we put an asterisk (*) near the number. For categorical variables, the level reported in the table is the first level appeared in the parentheses.

3.1 Mean age

Parameter Estimate z value p value
Intercept 0.62 [0.1, 1.13] 2.34 0.02*
Mean Age (months) -0.02 [-0.03, <.001] -1.64 0.1

3.2 Median productive vocabulary size

Parameter Estimate z value p value
Intercept 0.59 [0, 1.17] 1.97 0.05*
Median productive vocabulary size -0.01 [-0.02, <.001] -1.93 0.05

3.3 Predicate Type

Parameter Estimate z value p value
Intercept 0.08 [-0.17, 0.33] 0.65 0.52
Predicate type (Transitive / Intransitive) 0.24 [0.02, 0.46] 2.13 0.03*

3.4 Noun phrase type

Parameter Estimate z value p value
Intercept 0.17 [-0.09, 0.43] 1.25 0.21
Noun phrase type (Pronoun / Nount) 0.14 [-0.26, 0.53] 0.69 0.49

3.5 Character identification phase

Parameter Estimate z value p value
Intercept 0.16 [-0.09, 0.42] 1.25 0.21
Character identification phase (Yes / No) 0.2 [-0.27, 0.67] 0.84 0.4

3.6 Practice phase

Parameter Estimate z value p value
Intercept 0.35 [0.08, 0.62] 2.55 0.01*
Practice phase (Yes / No) -0.23 [-0.53, 0.06] -1.54 0.12

3.7 Synchronicity

Parameter Estimate z value p value
Intercept 0.17 [-0.08, 0.42] 1.31 0.19
Synchronicity (Simultaneous / Asynchronous) 0.13 [-0.18, 0.43] 0.83 0.41

3.8 Testing structure

Parameter Estimate z value p value
Intercept 0.13 [-0.1, 0.36] 1.1 0.27
Testing Procedure Structure (Mass / Distributed) 0.38 [-0.09, 0.85] 1.6 0.11

3.9 Number of sentence repetitions

Parameter Estimate z value p value
Intercept 0.17 [-0.14, 0.47] 1.08 0.28
Number of sentence repetitions 0.01 [-0.02, 0.03] 0.51 0.61

4 Main models results using dataset without imputed values

As mentioned in the method sections, for studies missing relevant statistics, we imputed values from studies with similar design (e.g. Hirsh-Pasek, Golinkoff,& Naigles, 1996). The tables below report the model results from fitting the exact same models on the dataset excluding the imputed study. There was no signifcant difference between the outcomes from the two datasets.

4.1 Mean age

Parameter Estimate z value p value
Intercept 0.61 [0.09, 1.14] 2.3 0.02*
Mean Age (months) -0.02 [-0.03, <.001] -1.64 0.1

4.2 Median productive vocabulary size

Parameter Estimate z value p value
Intercept 0.59 [0, 1.17] 1.97 0.05*
Median productive vocabulary size -0.01 [-0.02, <.001] -1.93 0.05

4.3 Predicate Type

Parameter Estimate z value p value
Intercept 0.08 [-0.18, 0.34] 0.61 0.54
Predicate type (Transitive / Intransitive) 0.24 [0.02, 0.46] 2.1 0.04*

4.4 Noun phrase type

Parameter Estimate z value p value
Intercept 0.15 [-0.12, 0.43] 1.08 0.28
Noun phrase type (Pronoun / Nount) 0.15 [-0.25, 0.56] 0.73 0.46

4.5 Character identification phase

Parameter Estimate z value p value
Intercept 0.15 [-0.12, 0.42] 1.08 0.28
Character identification phase (Yes / No) 0.22 [-0.27, 0.7] 0.88 0.38

4.6 Practice phase

Parameter Estimate z value p value
Intercept 0.35 [0.06, 0.63] 2.4 0.02*
Practice phase (Yes / No) -0.23 [-0.54, 0.07] -1.51 0.13

4.7 Synchronicity

Parameter Estimate z value p value
Intercept 0.16 [-0.09, 0.42] 1.26 0.21
Synchronicity (Simultaneous / Asynchronous) 0.12 [-0.19, 0.44] 0.77 0.44

4.8 Testing structure

Parameter Estimate z value p value
Intercept 0.11 [-0.13, 0.35] 0.93 0.35
Testing Procedure Structure (Mass / Distributed) 0.4 [-0.08, 0.87] 1.63 0.1

4.9 Number of sentence repetitions

Parameter Estimate z value p value
Intercept 0.15 [-0.16, 0.47] 0.95 0.34
Number of sentence repetitions 0.01 [-0.02, 0.03] 0.54 0.59

5 Models with Methodological Moderators and Theoretical Moderators

Syntactic Bootstrapping studies differ in their implementational details. Here we examine to what extent the influences of the theoretical moderators can be accounted for by the methodological factors. The tables below present the results of models that include all the key methodological moderators and one of the theoretical moderators. The patterns were consistent with the single-predictor theoretical models: predicate type is still a significant predictor of the effect.

In the tables below we highlight the rows representing theoretical moderators in yellow.

5.1 With age

Parameter Estimates z value p value
Intercept -0.02 [-0.93, 0.89] -0.04 0.97
Character identification phase (Yes / No) 0.24 [-0.31, 0.78] 0.85 0.4
Practice phase (Yes / No) -0.11 [-0.47, 0.24] -0.63 0.53
Stimuli synchronicity (Simultaneous / Asynchronous) 0.29 [-0.22, 0.81] 1.11 0.27
Testing structure (Mass / Distributed) 0.46 [-0.04, 0.97] 1.82 0.07
Number of sentence repetitions 0.02 [-0.02, 0.06] 0.94 0.35
Mean age (months) -0.01 [-0.03, 0.02] -0.55 0.58

5.2 With productive vocabulary size

Parameter Estimates z value p value
Intercept -0.24 [-2.19, 1.72] -0.24 0.81
Character identification phase (Yes / No) 0.33 [-0.76, 1.43] 0.60 0.55
Practice phase (Yes / No) 0.01 [-1.05, 1.07] 0.02 0.98
Stimuli synchronicity (Simultaneous / Asynchronous) 0.17 [-1.14, 1.48] 0.25 0.8
Testing structure (Mass / Distributed) 0.95 [0.23, 1.67] 2.59 0.01*
Number of sentence repetitions 0.01 [-0.1, 0.12] 0.14 0.89
Median productive vocabulary size -0.01 [-0.03, 0.01] -0.74 0.46

5.3 With predicate type

Parameter Estimates z value p value
Intercept -0.4 [-1.02, 0.21] -1.30 0.19
Character identification phase (Yes / No) 0.26 [-0.27, 0.78] 0.96 0.34
Practice phase (Yes / No) -0.09 [-0.39, 0.21] -0.58 0.56
Stimuli synchronicity (Simultaneous / Asynchronous) 0.32 [-0.16, 0.79] 1.30 0.19
Testing structure (Mass / Distributed) 0.53 [0.03, 1.03] 2.06 0.04*
Number of sentence repetitions 0.02 [-0.01, 0.06] 1.27 0.2
Predicate type (Transitive / Intransitive) 0.24 [>-.001, 0.47] 1.99 0.05*

5.4 With Noun phrase type

Parameter Estimates z value p value
Intercept -0.56 [-1.23, 0.11] -1.63 0.1
Character identification phase (Yes / No) 0.27 [-0.22, 0.75] 1.09 0.28
Practice phase (Yes / No) -0.2 [-0.49, 0.1] -1.31 0.19
Stimuli synchronicity (Simultaneous / Asynchronous) 0.57 [0.02, 1.13] 2.01 0.04*
Testing structure (Mass / Distributed) 0.12 [-0.47, 0.72] 0.41 0.69
Number of sentence repetitions 0.04 [>-.001, 0.07] 1.94 0.05*
Noun phrase type (Pronoun / Nouns) 0.49 [-0.1, 1.08] 1.62 0.11

6 Additional Moderators

6.1 Relationship between moderators

We coded additional moderators, including the modality of, the actors in and the event types in the visual stimuli. Stimuli modality has two levels: videos and animations. We coded this moderator following the details provided in the method sections of the papers. Stimuli actors have two levels, human actors and non-human actors. Studies using visual stimuli with human actors wearing animal suits were coded as using non-human actors. To capture the event types of the visual stimuli, we coded the transitive action stimuli and the intransitive action stimuli separately. The transitive event has two levels: direct caused action and indirect caused action. The event was coded as using direct caused action if the agent in the action directly acted upon the patient. It was coded as using indirect caused action if the agent caused the patient to move via another medium. For example, the agent may pull a band on the patient’s waist causing her to move. Likewise, the intransitive event also has two levels: one action versus parallel actions. Here we coded the levels by number of participants presented on the screen. An intransitive event was coded as “one action” if and only if there was only one agent presented on the screen. If an event involves more than one actor in the intransitive event (e.g. two actors doing parallel actions or one actor with one stander-by), then the event was coded as parallel-actions.

These additional moderators were not included in the main analyses because of their close relationships between each other and with the main moderators. The heatmaps below showed the overlappings between moderators. Each cell corresponds to the co-occurrence between two moderator levels. Brighter colors indicate a higher frequency of co-occurrence, and darker colors indicate lower frequency. You can hover your mouse on the heatmap to see the corresponding value and combination of each cell.

6.1.1 Ordered by Row Average

6.1.2 Ordered by groups

6.2 Model results

The tables here present some exploratory moderators. The level represented in the table is the first one in the parenthesis.

6.2.1 Patient argument type for transitive sentence

In the main analysis, we presented the results of the model for the relationship between effect size and the agent argument type. We found that having nouns or pronouns int he agent argument does not significantly predict the effect size. Here, we presented a similar analysis of the influence of the patient argument type. Because by definition English intransitive sentences do not have patient argument, we focus on the subset of studies that used the transitive sentences (\(N\) = 30)

Parameter Estimate z value p value
Intercept 0.28 [0.01, 0.54] 2.06 0.04*
Patient Argument Type (Pronoun / Noun) -0.05 [-0.48, 0.38] -0.22 0.83

6.2.2 Stimuli Modality

We found that the presentation modality of the stimuli was not a significant predictor of the effect size. In other words, studies that presented young children with animation clips had similar effect sizes as studies using video clips. The model statistics are shown below. Note that the stimuli modality and the stimuli actor levels had a lot of overlapping studies, so researchers should interpret this result with caution.

Parameter Estimate z value p value
Intercept 0.58 [0.11, 1.05] 2.4 0.02*
Stimuli Modality (Video / Animation) -0.38 [-0.83, 0.07] -1.65 0.1

6.2.3 Stimuli actors

There is a marginal effect of stimuli actor. Studies with human actors as protagonists in the events had relatively smaller effect sizes as studies using puppets, human actors in animal suits, or using animated geometrical shapes. This might due to the relatively higher visual complexity associated with stimuli using real human actors.

Parameter Estimate z value p value
Intercept 0.42 [0.12, 0.71] 2.79 0.01*
Stimuli Actor (Person / Non-person) -0.31 [-0.64, 0.02] -1.85 0.06

6.2.4 Type of event

Studies differed in the type of transitive events and intransitive events they presented. Previous studies have shown that young children’s looking behaviors in Inter-modal Preferential Looking Paradigm were very sensitive to the subtle perceptual differences in the visual stimuli (Delle Luche, Durrant, Poltrock, & Floccia, 2015; Fernald, Zangl, Portillo, & Marchman, 2008). Therefore, we coded the types of events presented in the visual stimuli. There were two types of transitive events: direct causal action and indirect causal action. The former involved the agent directly acting on the patient and causing the patient to move. The latter involved a mean-end sequence leading to the caused action of the patient. For example, the agent may pull a band on the patient’s waist and caused it to move. There were also two types of intransitive events used in the literature. One involved a single actor acting, such as jumping up and down. The other involved two actors presented without any causal action.

Our model suggested that neither of the variables was predictive of the effect sizes.

6.2.4.1 Transitive Event type

Parameter Estimate z value p value
Intercept 0.22 [-0.01, 0.44] 1.89 0.06
Transitive Event Type (Indirect caused action / Direct caused action) 0.03 [-0.34, 0.4] 0.16 0.87

6.2.4.2 Intransitive event type

Parameter Estimate z value p value
Intercept 0.25 [-0.11, 0.62] 1.36 0.17
Intransitive Event Type (Parallel actions / One action) -0.04 [-0.38, 0.31] -0.2 0.84

7 Variability in visual stimuli as a function of age

There was some evidence for researchers adapting the level of visual complexity in the visual stimuli according to children’s age. We collected the available visual stimuli from the papers and the supporting materials. Schematic illustrations of the visual stimuli were used when the actual screenshots were not provided. Screenshots of the text descriptions of the events were used when the visual stimuli were unavailable. Note that because some papers’ publishers converted to the visual stimuli to black-and-white, we decided to grayscale all visual stimuli for easier visual comparison.

It is easy to see in the plot that studies for particularly young children used significantly simpler visual stimuli. This adaptation might be partly responsible for the lack of age effect observed in our samples.

References

Delle Luche, C., Durrant, S., Poltrock, S., & Floccia, C. (2015). A methodological investigation of the Intermodal Preferential Looking paradigm: Methods of analyses, picture selection and data rejection criteria. Infant Behavior and Development, 40, 151-172

Fernald, A., Zangl, R., Portillo, A. L., & Marchman, V. A. (2008). Looking while listening: Using eye movements to monitor spoken language. Developmental psycholinguistics: On-line methods in children’s language processing, 44, 97.