ME_DATA_PATH <- here("data/0_metaanalysis_data.csv")

ma_raw <- read_csv(ME_DATA_PATH) %>%
  select(1:4, 6:18,19,21, 27:29, 31:34, 48:50, 59) %>%
  mutate_if(is.character, as.factor)

AVG_MONTH <- 30.43688
ma_c <- ma_raw %>%
  filter(!is.na(d_calc)) %>%
  mutate(mean_age = mean_age_1/AVG_MONTH,
         year = as.numeric(str_sub(short_cite, -5, -2)),
         condition_type = as.factor(ifelse(infant_type == "typical" & ME_trial_type == "FN", "TFN", 
                                 ifelse(infant_type == "typical" & ME_trial_type == "NN", "TNN", 
                                        as.character(infant_type))))) %>%
  filter(mean_age < 144) 

Bedford et al. (2013)

“Bedford et al. (2013) - I think the SEs reported in the paper, rather than SDs, were used to calculate the d’s in metalab. So instead of d = 4 and 3, it should be d = .7 and .64, respectively”

Thank you, we agree on this. There were a few instances where errors had been corrected in the paper data, but not updated in Metalab, and this was one of them. We apologize for this confusion.

ma_c %>%
  filter(study_ID == "bedford2013") %>%
  select(study_ID, short_cite, x_1, x_2, SD_1, t, d_calc) %>%
  kable()
study_ID short_cite x_1 x_2 SD_1 t d_calc
bedford2013 Beford et al (2013) NA NA NA 3.88 0.6968686
bedford2013 Beford et al (2013) NA NA NA 3.41 0.6124541

Beverly and Estis (2003)

“Beverly & Estis (2003) – The Ms and SDs reported in the paper itself do not jive with those that can be computed from the plots of the actual data, which are also in the paper. Instead of d = 2.58, 4.5, and 4.88, I got d = 1.34, 4.94, and 2.25, respectively.”

Wer’re not sure of the source of the discrepancy here. The effect sizes are calculated from the ata in Table 2 (reproduced below), and appear to be correct.

ma_c %>%
  filter(study_ID == "beverly2003") %>%
  select(study_ID, short_cite, infant_type, x_1, x_2, SD_1, t, d_calc) %>%
  kable()
study_ID short_cite infant_type x_1 x_2 SD_1 t d_calc
beverly2003 Beverly & Estis (2003) NT 0.65 0.5 0.058 NA 2.586207
beverly2003 Beverly & Estis (2003) typical 0.95 0.5 0.100 NA 4.500000
beverly2003 Beverly & Estis (2003) typical 0.90 0.5 0.082 NA 4.878049

Table from Beverly & Estis (2003):

include_graphics('paper_pics/beverly2003.png', dpi = NA)

Davidson et al. (1997)

“The effect sizes for the tests of the bilingual children were included in the meta-data set, but those for the tests of monolingual children were excluded. (Davidson et al. did not report SDs for any of the groups in their paper. I assume they supplied the SDs for the bilingual children upon request.)”

The effect sizes for the bilingual group were calculated by estimates the SD based on the t-test reported in the paper for the 3-4 yo Greek-English bilingual group. The same estimate of SD was used for all the bilingual groups. We did not include the monolinguals here previously because we did not have an estimate of the SD for any of the monolingual groups. We have now imputed SDs for all conditions where an estimate was available for another condition in the paper, and so the monolinguals are now included in the dataset.

ma_c %>%
  filter(study_ID == "davidson1997") %>%
  select(study_ID, short_cite,infant_type, n_1, x_1, x_2, SD_1, t, d_calc) %>%
  kable()
study_ID short_cite infant_type n_1 x_1 x_2 SD_1 t d_calc
davidson1997 Davidson et al (1997) multilingual 16 0.60 0.5 NA 1.46 0.3650
davidson1997 Davidson et al (1997) multilingual 16 0.65 0.5 0.2739726 NA 0.5475
davidson1997 Davidson et al (1997) multilingual 16 0.71 0.5 0.2739726 NA 0.7665
davidson1997 Davidson et al (1997) multilingual 16 0.67 0.5 0.2739726 NA 0.6205
davidson1997 Davidson et al (1997) typical 16 0.69 0.5 0.2739726 NA 0.6935
davidson1997 Davidson et al (1997) typical 16 0.92 0.5 0.2739726 NA 1.5330

Gollek & Doherty (2016)

“Six conditions from this paper are listed in the meta-data set. The first two Ms and the last two Ms were below chance, not above it. So the corresponding d’s should be negative, not positive. However, as I already argued, these effect sizes should probably be excluded because these were the conditions that presented incongruent cues.”

Thank you, we agree that the “pragmatic” conditions are incongruent and have now excluded them from our primary analyses.

ma_c %>%
  filter(study_ID == "gollek2016") %>%
  select(study_ID, short_cite, expt_num, infant_type,group_name_1, mean_age_1, n_1, x_1, x_2, SD_1, t, d_calc) %>%
  kable()
study_ID short_cite expt_num infant_type group_name_1 mean_age_1 n_1 x_1 x_2 SD_1 t d_calc
gollek2016 Gollek & Doherty (2016) 2a typical disambiguation - younger 1126.000 21 NA NA NA 16.41 3.580956
gollek2016 Gollek & Doherty (2016) 2a typical disambiguation - older 1461.000 22 NA NA NA 26.14 5.573067
gollek2016 Gollek & Doherty (2016) 3 typical disambiguation - younger 1278.349 15 0.986 0.5 0.052 NA 9.346154
gollek2016 Gollek & Doherty (2016) 3 typical disambiguation - older 1674.028 13 0.876 0.5 0.252 NA 1.492063

Graham et al. (2009)

“Three effect sizes are listed in the meta-data, but there were only two tests of the disambiguation effect in the paper. These were conducted in Exps. 1A and 1B. It looks like one of the effect sizes in meta-analysis is for the condition in Exp. 2 in which three novel objects were presented and the experimenter gazed at one of them while asking for the referent of a novel label. This is not a test of the disambiguation effect.”"

We included two conditions from Experiment 1A (“no cues/gaze” and “consistent”). Experiment 2 data was not included.

ma_c %>%
  filter(study_ID == "graham2010") %>%
  select(study_ID, short_cite, expt_num, infant_type,group_name_1, mean_age_1, n_1, x_1, x_2, SD_1, t, d_calc, data_source) %>%
  kable()
study_ID short_cite expt_num infant_type group_name_1 mean_age_1 n_1 x_1 x_2 SD_1 t d_calc data_source
graham2010 Graham et al. (2009) 1 typical experimental - B (no gaze) 776.7492 21 0.54 0.33 0.22 NA 0.9545455 paper
graham2010 Graham et al. (2009) 1 typical experimental - A (no gaze) 712.8300 22 0.61 0.33 0.25 NA 1.1200000 figure
graham2010 Graham et al. (2009) 1 typical experimental - A (consistent) 736.2681 22 0.89 0.33 0.18 NA 3.1111111 figure

Grassmann et al. (2015)

“The effect sizes used from this study were based on the Ms and SDs when data from the High Familiarity condition were averaged with those from the Low Familiarity condition. However, many of the trials in the Low Familiarity condition involved familiar objects that the children were unable to produce a label for. Perhaps it would be better to just use the Ms and SDs from the High Familiarity cond. For 2s: M= .791 (.393) [d= .74]; 3s: M= .840 (.572) [d=.594]; 4s: M = .98 (.01) [d = astronomical]. (4s’ effect size here has the ceiling effect problem I described earlier.]”

Thank you, this is a reasonable suggestion. However, as elsewhere, we’ve opted to report the data in a way that reflects the analytical decisions of the original researchers in the interest of not introducing selection bias into the analysis.

ma_c %>%
  filter(study_ID == "grassmann2015") %>%
  select(study_ID, short_cite, expt_num, infant_type,group_name_1, mean_age_1, n_1, x_1, x_2, SD_1, t, d_calc, data_source) %>%
  kable()
study_ID short_cite expt_num infant_type group_name_1 mean_age_1 n_1 x_1 x_2 SD_1 t d_calc data_source
grassmann2015 Grassmann et al. (2015) 1 typical experimental 1161 22 0.714 0.5 0.235 4.267 0.9106383 paper
grassmann2015 Grassmann et al. (2015) 1 typical experimental 1644 21 0.759 0.5 0.272 4.367 0.9522059 paper
grassmann2015 Grassmann et al. (2015) 1 typical experimental 790 23 0.682 0.5 0.188 4.653 0.9680851 paper

Horst & Samuelson (2008)

“An effect size could not be calculated for two conditions in which the test of the disambiguation effect had only one trial. Would it be kosher to consider the proportion who chose correctly the M correct and then impute an SD from other studies that obtained a similar M correct in the same paradigm? For example, in one condition, .69 answered correctly in a three-choice procedure. There are a good number of studies in the meta-data that used a three-choice procedure, got a mean close to ..69, and report an SD. Use the average SD in these studies? If so, other one-trial tests of the disambiguation effect could be added to the meta-analysis (e.g., Au & Glusman, 1990).”

Thank you for these conditions - We have added these two conditions. We did not include Au and Glusman (1990) because the procedure was more complicated than the basic ME procedure (i.e. multiple experimenters, lag between N1 and N2).

ma_c %>%
  filter(study_ID == "horst2008") %>%
  select(study_ID, short_cite, expt_num, infant_type,group_name_1, num_trials, mean_age_1, n_1, x_1, x_2, SD_1, d, t, d_calc, data_source) %>%
  kable()
study_ID short_cite expt_num infant_type group_name_1 num_trials mean_age_1 n_1 x_1 x_2 SD_1 d t d_calc data_source
horst2008 Horst & Samuelson (2008) 1 typical experimental - 1A 8 752.5600 16 NA NA NA 1.860000 NA 1.8600000 paper
horst2008 Horst & Samuelson (2008) 2 typical experimental - 2, followin 8 744.5600 16 0.7000 0.33 0.20 1.850000 NA 1.8500000 figure
horst2008 Horst & Samuelson (2008) 2 typical experimental - 2, ostensive 8 744.5600 16 0.6400 0.33 0.15 2.066667 NA 2.0666667 figure
horst2008 Horst & Samuelson (2008) 1 typical experimental - 1B 1 746.4851 32 0.6875 0.33 0.47 NA NA 0.7606383 paper
horst2008 Horst & Samuelson (2008) 1 typical experimental - 1C 1 748.4851 20 0.6000 0.33 0.50 NA NA 0.5400000 paper

Lederberg et al. (2000)

“There is a second paper by Lederberg on the disambiguation effect (Lederberg & Spencer, 2008 J Deaf Studies & Deaf Educ) that should be included in the meta-analysis. Like the 2000 study it examined the disambiguation effect in deaf/hard of hearing children using the Novel-Novel paradigm. It involved a much larger and more diverse sample than the 2000 paper.”

Thank you for pointing us to this paper. We were able to email the author to get the raw data for the Lederberg and Spencer (2008) study.

Markman, Wasow, & Hansen (2003)

“These seem like incongruent cue conditions. In Study 1, the familiar object was in front of the child, but the novel object was in an opaque bucket. (The children knew that the opaque bucket could contain either a familiar or a novel object.) In Study 2, the familiar object was in the child’s hands and the novel object was in a bucket that was out of reach. In the meta-data, one way in which the effect size was calculated was as the proportion of novel label trials in which the child searched for another object rather than immediately selecting the familiar object. By this metric, the youngest children’s effect sizes were .06 (NS) and -.78 (p < .05) in Studies 1 and 2, respectively. Based on this, one would conclude that this age group does not show the disambiguation effect (and shows a reverse disambiguation effect when the bucket is out of reach). However, Markman et al. calculated the effect differently. They compared the proportion of searches on novel label trials to the proportion of searches on no label trials (when simply asked to “find one.’) By this metric, the effect size was .70 (p < .05) and -.26 (NS) in Studies 1 and 2, respectively. From this, one would conclude that this age group does show a disambiguation effect when the bucket is within reach. I would recommend either dropping these conditions because they involve incongruent cues or using Markman et al.’s measure of effect size. (In the meta-data, another way in which the effect size was calculated was as 1-minus-the proportion of novel label trials in which children selected the familiar object. [Sometimes a child searched a bit on a novel label trial, but ultimately selected the familiar object.] Markman et al. also examined the difference between novel label and no label trials on this dependent variable. The latter again yields the higher estimates of effect size. Regardless of whether Markman et al.’s way of calculating effect size is adopted, I would not make two ways of calculating effect size separate entries in the meta-data. The two are not independent and are likely highly correlated.)”

Thank you. Upon reconsideration, we agree - the additional cues present in these two studies are incongruent. We have decided to exclued these studies.

Mather & Plunkett (2009)

“This study used proportion of time looking at the novel object in a two-choice procedure as the measure of disambiguation. In the paper, this proportion was analyzed separately for the first time the novel labels were tested and the second time they were tested (repeat trials). The disambiguation effect was stronger on the repeat trials. The meta-data has separate entries for the two d’s. (So these were not independent conditions.) The SDs I derived from the paper are about one-tenth the size of those listed in the meta-data, and so the d’s I calculated are about ten times larger. I believe there is a decimal error in the meta-data. In the entire meta-data set, no other SDs come close to the ones listed for this report. Calculating d for this study is a bit complicated because not only are initial trials separated from repeat trials in the report, but proportion of looking at the novel object is presented separately for the 1.5 sec that immediately followed presentation and processing of the label and for the 1.5 sec after that. I recommend just including an estimation of the disambiguation effect for the first 1.5 sec because it was stronger in this period (and on trials in which familiar labels were tested, proportion of looking to the familiar object was also stronger during this period). I believe that this is what the authors of the meta-analysis did. Based on the figures presented in the paper, I got: for the 19-month-olds, M = .49, SD =.226, d = - .044 for initial trials and M = .51, SD =.226, d = .044 for repeat trials; for the 22-month-olds, M = .53, SD =.198, d = .152 for initial trials and M = .58, SD =.177, d = .452 for repeat trials.”

Thank you for catching this. There was indeed an error here; the SDs have now been corrected. The multi-level modeling approach we have now adopted addresses the issue of non-independence between conditions.

ma_c %>%
  filter(study_ID == "mather2009") %>%
  select(study_ID, short_cite, expt_num, infant_type, same_infant, group_name_1, mean_age_1, n_1, x_1, x_2, SD_1, t, d_calc, data_source) %>%
  kable()
study_ID short_cite expt_num infant_type same_infant group_name_1 mean_age_1 n_1 x_1 x_2 SD_1 t d_calc data_source
mather2009 Mather & Plunkett (2009) 1 typical mather2009_young experimental 608.40 35 0.50 0.5 0.184 NA 0.0000000 figure
mather2009 Mather & Plunkett (2009) 1 typical mather2009_young experimental - young, repeats 608.40 35 0.53 0.5 0.208 NA 0.1442308 figure
mather2009 Mather & Plunkett (2009) 1 typical mather2009_old experimental - old, new 699.66 32 0.51 0.5 0.199 NA 0.0502513 figure
mather2009 Mather & Plunkett (2009) 1 typical mather2009_old experimental - old, repeats 699.66 32 0.58 0.5 0.176 NA 0.4545455 figure

Merriman et al. (1989)

“At some point in the citation life of my monograph with Bowman, people started attaching MacWhinney’s name as the third author. Brian was not an author. He did write a brief commentary at the end of the monograph, which is a common practice for SRCD monographs. Google Scholar now lists it as Merriman, Bowman, & MacWhinney (1989). Arghh! Please list the correct citation in the manuscript as well as in metalab.”

Apologies, this has now been corrected.

Merriman, Marazita, & Jarvis (1993)

“When the first author contacted me for information about SDs for my studies, this was the only one that I could not locate in my files. I have since found it. If one averages trials in which token novelty opposed the disambiguation effect with those in which it did not, M = .85, SD = .18, and so d = 1.94. If one just uses the trials in which no cue opposed the disambiguation effect, M = .87, SD = .16, and so d = 2.31.”

Thank you for these data. We have now included this condition in the database.

ma_c %>%
  filter(study_ID == "merriman1993") %>%
  select(study_ID, short_cite, expt_num, infant_type,group_name_1, mean_age_1, n_1, x_1, x_2, SD_1, t, data_source, d_calc) %>%
  kable()
study_ID short_cite expt_num infant_type group_name_1 mean_age_1 n_1 x_1 x_2 SD_1 t data_source d_calc
merriman1993 Merriman, Marazita & Jarvis (1993) 2 typical experimental 1491.407 32 0.87 0.5 0.16 NA paper/author 2.3125

Mervis & Bertrand (1995)

“There are two other papers by Mervis that should be added to the meta-analysis. Mervis & Bertrand (1994 CD Study 1) used a 5-choice version of the Novel-Novel paradigm with 15- to 20-month-olds. They did not report overall Ms and SDs because groups were bimodal. Sixteen children got .90 correct (SD = .13) and 16 got .11 correct (SD = .13). So the overall M was .555, with chance = .20. If you can’t get the SD for the overall data from Mervis, you could probably get a reasonable estimate of it. Create two normally distributed sets of 16 random cases, where one set has M = .90 and SD = .13 and the other M = .11 and SD = .13, then calculate the overall SD of the combined sets. The other paper is Mervis & Bertrand (1995 JCL Vol 22 461-468). It reports a longitudinal study of 3 late talkers that includes assessment of the disambiguation effect with the Novel-Novel paradigm.”

Thank you for this suggestion. We have simulated the distribution for the Mervis and Bertrand, Study 1 paper and included this condition in the database (estimate SD = .42). Unfortunately, Mervis & Bertrand (1995) does not include descriptive statistics for the three children’s performance on the disambiguation task.

ma_c %>%
  filter(study_ID == "mervis1994b") %>%
  select(study_ID, short_cite, expt_num, infant_type,group_name_1, mean_age_1, n_1, x_1, x_2, SD_1, t, data_source, d_calc) %>%
  kable()
study_ID short_cite expt_num infant_type group_name_1 mean_age_1 n_1 x_1 x_2 SD_1 t data_source d_calc
mervis1994b Mervis, C. B., & Bertrand, J. (1994) 1 typical experimental 524.1231 32 0.55 0.2 0.42 NA paper - some calculations 0.8333333

Momen & Merriman (2002)

“The meta-analysis included the results of the disambiguation test reported in Study 2, but not the one from Study 3. For the latter, M age = 26 months, n = 20, M correct = .78, SD = .29, d = .966. Both studies used the 2-choice Familiar-Novel paradigm.”

Thank you, we have now included the Study 3 data in the database.

ma_c %>%
  filter(study_ID == "momen2002") %>%
  select(study_ID, short_cite, expt_num, infant_type, group_name_1,
         mean_age_1, n_1, x_1, x_2, SD_1, t, data_source, d_calc) %>%
  kable()
study_ID short_cite expt_num infant_type group_name_1 mean_age_1 n_1 x_1 x_2 SD_1 t data_source d_calc
momen2002 Momen & Merriman (2002) 2 typical experimental 760.9219 16 0.8333333 0.5 0.2966667 NA paper 1.1235953
momen2002 Momen & Merriman (2002) 3 typical experimental 791.3589 20 0.7800000 0.5 0.2900000 NA paper/author 0.9655172

Preissler & Carey (2005)

“The meta-analysis included the effect size for the autistic children, but not the effect size for the typically developing children. I assume that Preissler and/or Carey supplied the SD for the autistic children. If they cannot supply it for the typically developing, could it be imputed by the method discussed earlier?”

The SD for the group with autism was estimated based on the raw counts reported in the text. No such raw counts were reported for the typically developing group, but we have now imputed a value, as elsewhere, and this group is included.

Scofield & Williams (2009)

“The paper is listed at metalab, but no effect size is listed. Perhaps the authors did not supply the SD? Could you impute? The effect can’t be big because M correct (.56) was very close to chance (.50).”

This effect size was missing because we did not have a SD value. We were able to get this information from the authors via email.

Spiegel & Halberda (2011)

“This paper is listed at metalab, but is not listed in the References section of the manuscript. Was it included in the meta-analysis?”

This paper was included in the meta-analysis, and we have added these paper to the citation list.

Sugimura & Sato (1996)

“No d is listed for this study in the meta-data because there was no variance. Apparently, all 48 children chose the novel object on every trial in a four-choice paradigm. This result is a case of the ceiling problem in extremis. It does not seem like a good idea to exclude the study, however. The mean age was among the highest in the meta-data and so dropping it may reduce the model’s estimate of the the effect size at that age. Perhaps it would be kosher to assign an SD of .10 to every study that has an SD of .10 or less. Or replace every effect size that is more than 3 SDs greater than the mean of the effect sizes in the meta-data to a value that equals 3 SDs greater than this mean. Perhaps some of the ideas in the chapter in Tabachnick and Fidell’s multivariate statistics text on handling outliers in ordinary data would be applicable here.”

We have made the analytical decision to only input SDs from data available within a paper.

Vincent-Smith et al. (1974)

“The d’s for the two age groups in this report were not included in the meta-data. I assume that it was not possible to get the SDs from the authors of this old report. Perhaps these could be imputed using the method I described earlier.”

As above, we have made the analytical decision to only input SDs from data available within a paper. The SDs for this paper would be particularly difficult to estimate since there are drastically more trials per participant (50) than in other conditions in our dataset.

Williams (2009)

“The citation for this dissertation was omitted from the References section of the manuscript. One weird thing about this investigation is that the two groups of typically- developing children overlapped. The group that matched the ASD (autism spec disord) group on age had 16 children and the group that matched the ASD group on language ability had 16 children. However, 6 children belonged to both groups. There were only 28 typically-developing children in all. Not sure how to handle this problem.”

We have added this missing citation. The overlapp in children across groups is now dealt with by the participant group random effect structure in the multi-level model.

Kalashnikova et al. (2016 First Lang)

“Metalab lists the M in this study as .04 with SD = .054. It computes d as .04/.054 = .738. This is unusual because in the other two-choice eyetracking studies in metalab, M is the proportion of time looking at the novel object (which is usually greater than .50), chance is .50, and d is calculated as the difference between these two proportions/SD. If this method were adopted here, M = .58, SD = .27, and so d = .08/.27 = .296. (The .04 that metalab lists as the M might represent the change in looking at the novel object from baseline to test, but no SD is reported in the article for this difference score.)”

There are actually 10 2-AFC eyetracking studies in the database. As noted, there are two ways one could compute an effect size from eye-tracking data: (1) proportion looks to target (analgous to pointing task), or (2) difference in looking relative to baseline. We indicated which of these two approaches we took in coding the effect size in the “dependent measure” column (“target_selection” vs. “looking_time_change”). Seven were coded using the second approach; Four, including the Kalashnikova (2016b) paper, were coded using the first approach.

We agree that the first approach is more similiar to the measure used in the pointing task. On the other hand, there are demands particular to the eyetracking task that make it important to take into account baseline differences in saliency between the items. However, regardless of these motivations, in most cases information was only reported for one type of measure. There were a few cases where the data were available for either measure to be used. In these cases, we used the measure originally reported by the authors.

Kalashnikova et al. (2016 Lang Learn Dev)

“Metalab lists this as an eyetracking study, but it is a pointing study. Metalab combines the results for two experiments. The rule for combining experiments is not clear. The mean ages of the groups in each experiment are very close, and the methods differed in only one respect – whether the videos just presented objects and an audio, or whether they also included the face of a speaker (synch’d with the audio). Results were nearly identical in the two experiments. There are three other papers by Kalashnikova and colleagues that should be added to the meta-analysis: [1] Kalashnikova, M., Mattock, K., & Monaghan, P. (2014). Disambiguation of novel labels and referential facts: A developmental perspective. First Language, 34(2), 125-135. [2] Kalashnikova, M., Mattock, K., & Monaghan, P. (2015). The effects of linguistic experience on the flexible use of mutual exclusivity in word learning. Bilingualism: Language and Cognition, 18(4), 626-638. [3] Kalashnikova, M., Escudero, P., & Kidd, E. (2018). The development of fast‐mapping and novel word retention strategies in monolingual and bilingual infants. Developmental science, 21(6), e12674.”

Thank you - we have corrected the response_model coding for this study. Our general rule was to keep conditions and studies separate where possible. However, in this case, descriptive statistics (mean and standard deviations) were only reported for the combined data from Experiment 1 and 2, and so we calculated a single effect size from these data.

We have added Kalashnikova et al. (2014). We have not included Kalashnikova et al. (2018) since we have restricted our study to include only papers up to November 2017 (in addition to this paper there are a handful of other studies that have been publsihed since we finished our meta-analysis). We also did not include Kalashnikova et al. (2015) because it was not the standard paradigm (FFNN and multiple selections per trial).

Wall, Merriman, & Scofield (2015)

“The first two lines in the list for this article in metalab represent the results for the two age groups in Exp. 1. I already explained why I think this experiment involved incongruent cue conditions (where the cue was “discovery status”). The next lines represent the results for two age groups in Exp. 2. Actually, each age group in that experiment was in one of two conditions, but the results are pooled in metalab. Again, it is not clear what the rule for combining conditions is. Finally, metalab omitted the results of Exp. 3 from the article. The stats for Exp. 3 were M age = 54 mos., n = 24, M correct = .915, SD = .24, d = 1.73. If Exp. 1 is not considered an incongruent cue condition, then the meta-analysis ought to also include the results of the experiments reported in Scofield, J., Hernandez-Reif, M., & Keith, A. B. (2009). Preschool children’s multimodal word learning. Journal of Cognition and Development, 10(3), 06–333. Wall et al. (2015) was a follow-up to this article. Recently, Scofield, Wall, and I published experiments analogous to those in Wall et al. (2015), but involving the opposite direction of cross-modal transfer. We trained a label for an object the child could touch, but not see, then asked the child to pick the referent of a novel label from a pair of objects they could see, but not touch. One choice was the cross-modal match and the other was a novel object. The citation is: Scofield, J., Merriman, W. E., & Wall, J. L. (2018). The effect of a tactile-to-visual shift on young children’s tendency to map novel labels onto novel objects. Journal of Experimental Child Psychology, 172, 1-12."

We have now omitted this study because it was an atypical learning task (i.e. children did not have visual access to the objects).

Law & Edwards (2014 Lang Learn & Dev)

“This study is not listed in the meta-data. Although its primary focus was children’s referent selections for mispronounced familiar labels, it did include novel word conditions. Their analyses are quite unique – fitting to growth curves and examining the Akaike Information Criterion and the Bayesian Information Criterion. However, they do report a t test of the slope of the decline in proportion of looking at the familiar object (rather than the novel object) after onset and processing of the novel word. Could an estimate of d be derived from this t value? Alternatively, a figure in the paper shows that proportion of looking at the novel object reached an asymptote of .25 or so. Could an SD for this value be imputed from the other eyetracking studies?”

Thank you for pointing us to this paper. Unfortunately, it is not clear how to extract an effect size from these data in a way that would be meaningfully comprable to the AFC measures used in all other conditions in the meta-analysis. And, as elsewhere, we have not included papers that have no reported estimate of SD.