1 The Replication Debate and Developmental Science
2 The Manybabies project.
- 2.1 What is it for? Burning developmental psychology to the ground?
- 2.2 Improving standards through collaboration on measurement and analysis.
3 Making our science more cumulative through meta-analysis
- 3.1 A case study of infant rule learning
- 3.2 Can attending to cumulativity help?

1 The Replication Debate and Developmental Science

The core Manybabies team

The modal study in infant cognition research:

A looking time study, comparing gaze behavior at different ages
16 participants per age group
A maximum of 4 trials per condition
An emphasis on between-subjects conditions, particularly for comparing across age groups.
A high rate of data exclusion, particularly subjective data exclusion – this baby was fussy, that baby was sleepy.
Unstandardized testing setups:
- Basic differences between labs.
- Large differences in subject populations.
- Differences in experimenter “quality”.
- Huge differences in how stimuli are presented.
- “Subjective” stimulus presentation.
- Measures that are unstandardized, and whose validity and reliability is very hard to assess.

The setup for Onishi & Baillargeon, 2005 The setup for the Baillargeon lab (mid-90s)

Xu & Spelke (2000) method

Xu & Spelke (2000) discrimination (8 * 16)

Xu & Spelke (2000) no discrimination (8 * 12)

2 The Manybabies project.

2.1 What is it for? Burning developmental psychology to the ground?

No.
We can’t replicate the Reproducibility project in infant research. And in fact, there’s reason to think that doing so would not be a productive use of time and money.

A p curve of infant cognition findings, by Christina Bergmann and MetaLab

It is better to try and understand variability – in our measurement tools, in our labs, and between our participants.
To do that, the only plausible route is to focus on a deep investigation of a single topic (with further investigations in the future).

2.2 Improving standards through collaboration on measurement and analysis.

Increasing participant pool size and diversity
Understanding the relationship between different infant paradigms
Ensuring common standards and best practices across labs
How should we exclude participants?
How should we pre-process data?
How should we vary paradigms for infants of different ages?

IDS meta analysis

3 Making our science more cumulative through meta-analysis

3.1 A case study of infant rule learning

## 
## Multivariate Meta-Analysis Model (k = 56; method: REML)
## 
## Variance Components: 
## 
##             estim    sqrt  nlvls  fixed     factor
## sigma^2.1  0.0000  0.0000     10     no        lab
## sigma^2.2  0.1137  0.3372     17     no  lab/study
## 
## Test for Heterogeneity: 
## Q(df = 55) = 239.3490, p-val < .0001
## 
## Model Results:
## 
## estimate       se     zval     pval    ci.lb    ci.ub          
##   0.3164   0.0884   3.5785   0.0003   0.1431   0.4897      *** 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.1.1 Basic random effects regression

We plot forest and funnel plots for this first random effects regression.

3.1.2 Moderated random effects regression

## 
## Multivariate Meta-Analysis Model (k = 56; method: ML)
## 
## Variance Components: 
## 
##             estim    sqrt  nlvls  fixed     factor
## sigma^2.1  0.0022  0.0472     10     no        lab
## sigma^2.2  0.1125  0.3354     17     no  lab/study
## 
## Test for Residual Heterogeneity: 
## QE(df = 53) = 210.8762, p-val < .0001
## 
## Test of Moderators (coefficient(s) 2,3): 
## QM(df = 2) = 24.9332, p-val < .0001
## 
## Model Results:
## 
##                 estimate      se     zval    pval    ci.lb    ci.ub     
## intrcpt           0.0696  0.1032   0.6752  0.4996  -0.1325   0.2718     
## scale(age)       -0.1536  0.0565  -2.7216  0.0065  -0.2643  -0.0430   **
## modalityspeech    0.4938  0.1071   4.6092  <.0001   0.2838   0.7037  ***
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Dogs stimuli

## 
## Multivariate Meta-Analysis Model (k = 56; method: ML)
## 
## Variance Components: 
## 
##             estim    sqrt  nlvls  fixed     factor
## sigma^2.1  0.0000  0.0000     10     no        lab
## sigma^2.2  0.1148  0.3388     17     no  lab/study
## 
## Test for Residual Heterogeneity: 
## QE(df = 52) = 202.9058, p-val < .0001
## 
## Test of Moderators (coefficient(s) 2,3,4): 
## QM(df = 3) = 30.1833, p-val < .0001
## 
## Model Results:
## 
##                       estimate      se     zval    pval    ci.lb    ci.ub
## intrcpt                 0.2885  0.1404   2.0546  0.0399   0.0133   0.5637
## scale(age)             -0.1366  0.0570  -2.3960  0.0166  -0.2483  -0.0249
## modalityspeech          0.2357  0.1550   1.5202  0.1285  -0.0682   0.5395
## semanticsmeaningless   -0.3595  0.1563  -2.2997  0.0215  -0.6658  -0.0531
##                        
## intrcpt               *
## scale(age)            *
## modalityspeech         
## semanticsmeaningless  *
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Finally, we test if the model that includes semantics provides a better fit – it does.

##         df      AIC      BIC     AICc   logLik    LRT   pval       QE
## Full     6 116.9516 129.1037 118.6659 -52.4758               202.9058
## Reduced  5 120.2364 130.3632 121.4364 -55.1182 5.2848 0.0215 210.8762

3.1.3 p curve analysis

We can also fit a p curve. Do a density plot of all p values less than 0.05, and then run the Fisher-style test that is suggested in Simmonsen et al 2004.

## Warning: Removed 21 rows containing non-finite values (stat_bin).

## Warning: Removed 2 rows containing missing values (geom_path).

## Warning: Removed 21 rows containing non-finite values (stat_bin).

## Warning: Removed 4 rows containing missing values (geom_path).

## Warning: Removed 21 rows containing non-finite values (stat_bin).

## Warning: Removed 4 rows containing missing values (geom_path).

3.2 Can attending to cumulativity help?

Rules results

Rules Edinburgh Princeton Comparison

Replication in Developmental Science

Hugh Rabagliati

24/03/2017