This document shows an analysis of MB1 pilot data for purposes of procedural and analytic decision-making. It includes analysis of demographics and looking times, as well as condition differences.

Please forgive errors, omissions, lack of clearly labeled axes, etc. This document was produced quickly to provide an initial guide for decision-making.

Data formatting: Data upload script plus templates are here. They are designed to make merging demographic data easy. Please take a look.

Exclusion criteria: Are listed in the manuscript:

Participants must be: - monolingual (>90%) - full term (37+ weeks) - looking on at least one pair of test trials (a matched IDS/ADS pair)

Trials must be: - longer than 2s (to allow viewing of the stimulus)

The goal of this document is to provide a fully implemented draft of the analysis, conforming to the details of the specification in the manuscript. We will pre-register this document along with the manuscript, but the manuscript is definitive. For example, some aspects of the analysis were not possible using the pilot data and have not been implemented here. We expect there to be some minor but significant changes to the analysis once we have data from the full study (for example, inclusion of item effects).

1 Analytic Preliminaries

options(dplyr.width = Inf)
knitr::opts_chunk$set(message = FALSE, warning = FALSE, cache = FALSE)
library(tidyverse)

## Warning: package 'ggplot2' was built under R version 3.3.2

library(knitr)
library(langcog)
library(metafor)
library(lme4)

theme_set(theme_bw())

Loading data that were initially parsed in munge_pilot_data.Rmd. Add language predictor by hand (this will of course be added programmatically in the final dataset, from the labs sheet).

d <- suppressWarnings(read_csv("processed_data/pilot_data.csv", col_types = cols()))

# add NAE by hand for now
d$language <- "NAE"
d$language[d$lab %in% c("brookes","plymouth","lpp-paris")] <- "non-NAE"
d$language <- factor(d$language, levels = c("NAE","non-NAE"))

How many participants in each group?

d %>%
  group_by(lab, age_days, subid) %>%
  distinct %>%
  group_by(lab) %>%
  summarise(n = n(), 
            age_months = mean(age_days, na.rm=TRUE)/30.3) %>%
  kable(digits = 1)

lab	n	age_months
brookes	5	12.8
lpp-paris	3	8.5
plymouth	9	10.0
stanford	14	13.9
ubc	34	5.2

1.1 Exclusions

No demographic exclusions are implemented in this section.

We exlude trials with looking time below 2s.

We also (later) will exclude children who do not have one pair of included trials (one matched set of IDS/ADS)

lt_totals <- d %>%
  group_by(lab, subid) %>%
  summarise(total_looking_time = sum(looking_time, na.rm=TRUE))

d <- d %>%
  left_join(lt_totals) %>%
  filter(total_looking_time != 0, 
         !is.na(total_looking_time), 
         trial_type %in% c("IDS","ADS")) %>%
  mutate(looking_time = ifelse(looking_time < 2, NA, looking_time))
  
total_trials <- d %>%
  group_by(lab, subid) %>%
  summarise(n_trials = sum(!is.na(looking_time))) 

qplot(n_trials, fill = lab, data= total_trials)

1.2 Demographics

What’s our participant age distribution?

subs <- d %>%
  group_by(lab, subid, age_days) %>%
  distinct

qplot(age_days/30.3, binwidth = 1, fill = lab, data=subs) + 
  xlab("Age (months)")

1.3 Looking time dynamics

First, the overall distribution of looking times. Note, Brookes data goes to 30s due to an error in coding the experiment.

qplot(looking_time, fill = lab, facets = ~ lab, binwidth = 1, data = d) + 
  xlim(0,30) + 
  xlab("Looking time (s)")

Next, age analysis.

qplot(age_days, looking_time, col = lab, data = d) + 
  geom_smooth(aes(group=1), method = "lm")

1.4 Child outcomes

Next, are children making it through the experiment? Once exclusions are computed, we see that many kids are habituating and are not making it throughout the study.

d %>%
  group_by(lab, subid) %>%
  summarize(max_trial = max(trial_num[!is.na(looking_time)])) %>%
  summarise(prop_finishing = mean(max_trial == 8)) %>%
  kable(digits = 2)

lab	prop_finishing
brookes	0.60
lpp-paris	1.00
plymouth	0.67
stanford	0.45
ubc	0.76

Now, histogram of looking time by trial number. Looks like looking times are declining across trials, but not tremendously.

ms <- d %>%
  group_by(trial_num) %>%
  summarise(looking_time = mean(looking_time, na.rm=TRUE))

ggplot(d, aes(x = looking_time)) + 
  geom_histogram(binwidth = 1, aes(fill = lab)) + 
  geom_vline(data = ms, aes(xintercept = looking_time), lty = 2) + 
  facet_wrap(~trial_num) + 
  xlim(c(0,30))

Plot means. Note that this graph has survivorship bias – namely, those observations later in the graph represent kids that had more trials.

ms <- d %>%
  group_by(lab, trial_num) %>%
  multi_boot_standard(col = "looking_time", na.rm=TRUE)

ggplot(ms, aes(x = trial_num, y = mean, col = lab)) + 
  geom_line() + 
  geom_linerange(aes(ymin = ci_lower, ymax = ci_upper), 
                 position = position_dodge(width = .1))

2 IDS-ADS condition differences

We will be pursuing a within-subjects analysis approach, so almost all of our analysis will happen over pairs of trials. The downside of this approach is that if you are missing data from one of the trials in a pair, you are missing the difference score for that pair.

diffs <-  d %>%
  select(-total_looking_time) %>%
  spread(trial_type, looking_time) %>%
  mutate(diff = IDS - ADS)

3 Distribution

What’s the distributional form of these difference score data?

qplot(diff, binwidth = 1, 
      data = filter(diffs, !is.na(diff))) + 
  geom_vline(xintercept = mean(diffs$diff), col = "red", lty = 2)

Interestingly, it’s not skewed, but it does have very overdispersed shape with a big strong peak in the middle and then long tails).

Note spike near 0 is not due to low-looking time Stanford kids because LTs < 2s have been removed. This is legitimate data.

But: 0.36 of LTs have missing data. That’s problematic.

diffs %>%
  group_by(lab) %>%
  summarise(missing = mean(is.na(diff))) %>%
              kable(digits = 2)

lab	missing
brookes	0.60
lpp-paris	0.69
plymouth	0.28
stanford	0.89
ubc	0.15

Stanford data are almost all missing! In hindsight, kids were old, the eye-tracker didn’t pick them up, looks were very short, etc. LPP-Paris’s HPP procedure also resulted in significant missing data but this is only a very small number of kids.

4 IDS-ADS difference patterns

How does the IDS-ADS difference change with trials?

ms_diff <- diffs %>%
  group_by(lab, language, trial_num) %>%
  multi_boot_standard(col = "diff", na.rm=TRUE)

ggplot(ms_diff, aes(x = trial_num, y = mean)) +
         geom_smooth(se = FALSE, span = 2) + 
  facet_wrap(~lab) + 
  geom_pointrange(aes(ymin = ci_lower, ymax = ci_upper), 
                 position = position_dodge(width= .1)) +
  ylab("IDS preference (s)") + 
  geom_hline(yintercept = 0, lty = 2)

Brookes and Stanford both have a number of trials where there is essentially no data to estimate. In contrast, if anything, UBC shows hints of a preference by the end of the study, with longer looks all throughout (and much younger babies).

How does difference change with age? (by subject)

mss_diffs <- diffs %>%
  group_by(lab, language, subid) %>%
  summarise(age_days = mean(age_days), 
            diff = mean(diff, na.rm=TRUE))

qplot(age_days, diff, col = lab, group = 1, data = mss_diffs) + 
  geom_smooth(method = "lm") + 
  geom_hline(yintercept = 0, lty = 2) + 
  ylab("IDS preference (s)")

5 Meta-analytic approach

Following suggestions by Alex Cristia, who argued that this is a more straightforward approach and also has been followed in ManyLabs and the RRRs previously. In addition, it doesn’t require knowing the full form of the required mixed-effects model (e.g., trial order effects, age x trial order interactions, quadratic habituation, etc.).

Compute effect size for each lab. This analysis follows the recommendation in Jake Westfall’s blogpost, which says that “classic” Cohen’s \(d\) is the difference of the means divided by the pooled standard deviation across conditions. We also compute \(d_z\), the more standard within-subjects analysis, which we expect is more like what we recover from the previous meta-analytic work.

We will be using \(d_z\) in the final meta-analysis.

source("ma_helper.R")

ages <- d %>%
  group_by(lab, method, subid) %>%
  summarise(age_days = mean(age_days)) %>%
  summarise(age_days = mean(age_days))
  
ds_classic <- diffs %>%
  group_by(lab) %>%
  summarise(d_classic = mean(IDS - ADS, na.rm=TRUE) / 
              sqrt(mean(diag(var(cbind(IDS, ADS), na.rm=TRUE)))), 
            n = length(unique(subid)), 
            d_classic_var = d_var_calc(n, d_classic)) %>%
  left_join(ages)

ds <- diffs %>%
  group_by(lab, method, subid) %>%
  summarise(d = mean(diff,na.rm = TRUE))%>%
  group_by(lab) %>%
  summarise(d_z = mean(d, na.rm=TRUE) / sd(d, na.rm = TRUE), 
            n = length(unique(subid)), 
            d_z_var = d_var_calc(n, d_z)) %>%
  left_join(ages)

Compare the two effect size measures.

ds_comp <- left_join(ds_classic, ds)

ggplot(ds_comp, aes(x = d_classic, y = d_z)) + 
  geom_point(aes(size = n)) + 
  geom_linerange(aes(ymin = d_z - d_z_var, ymax = d_z + d_z_var)) + 
  geom_errorbarh(aes(xmin = d_classic - d_classic_var, xmax = d_classic + d_classic_var), height = 0) + 
  geom_smooth(method = "lm", se=FALSE) +
  geom_abline(lty = 2, slope = 1, intercept = 0) + 
  ylim(0,2) + 
  xlim(-.2,2)

Adopting \(d_z\), plot by age and method.

ggplot(ds, aes(x = age_days, y = d_z)) + 
  geom_point(aes(size = n, col = method)) + 
  geom_linerange(aes(ymin = d_z - 1.96 * sqrt(d_z_var), 
                     ymax = d_z + 1.96 * sqrt(d_z_var), col = method)) + 
  geom_hline(yintercept = 0, linetype = "dashed", color = "grey") +
  geom_smooth(method = "lm") + 
  scale_colour_solarized() +
  scale_size_continuous(guide = FALSE) +
  xlab("Mean Subject Age (Days)") +
  ylab("Effect Size") + 
  theme(legend.position= "bottom")

Model with no age moderation.

mod <- metafor::rma(d_z ~ 1, vi = d_z_var, slab = lab, data = ds, method = "REML") 
summary(mod)

## 
## Random-Effects Model (k = 5; tau^2 estimator: REML)
## 
##   logLik  deviance       AIC       BIC      AICc  
##  -4.6080    9.2161   13.2161   11.9887   25.2161  
## 
## tau^2 (estimated amount of total heterogeneity): 0.1913 (SE = 0.3151)
## tau (square root of estimated tau^2 value):      0.4374
## I^2 (total heterogeneity / total variability):   44.57%
## H^2 (total variability / sampling variability):  1.80
## 
## Test for Heterogeneity: 
## Q(df = 4) = 7.4554, p-val = 0.1137
## 
## Model Results:
## 
## estimate       se     zval     pval    ci.lb    ci.ub          
##   0.7851   0.3041   2.5818   0.0098   0.1891   1.3812       ** 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

f <- fitted(mod)
p <- predict(mod)

alpha <- .05

forest_data <- data.frame(effects = as.numeric(mod$yi.f),
                          variances = mod$vi.f) %>%
  mutate(effects.cil = effects -
           qnorm(alpha / 2, lower.tail = FALSE) * sqrt(variances),
         effects.cih = effects +
           qnorm(alpha / 2, lower.tail = FALSE) * sqrt(variances),
         estimate = as.numeric(f),
         lab = names(f),
         estimate.cil = p$ci.lb,
         estimate.cih = p$ci.ub,
         inverse_vars = 1/variances,
         identity = 1) %>%
  left_join(ds) 

qplot(lab, effects, ymin = effects.cil, ymax = effects.cih,
        geom = "linerange",
        data = forest_data) +
  geom_point(aes(y = effects, size = inverse_vars, col = method)) +
  geom_linerange(aes(ymin = effects - sqrt(variances)*1.96, 
                     ymax = effects + sqrt(variances)*1.96)) +
  # geom_pointrange(aes(x = lab, y = estimate,
  #                            ymin = estimate.cil, ymax = estimate.cih),
  #                 pch = 17) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "grey") +
  coord_flip() +
  scale_size_continuous(guide = FALSE) +
  scale_colour_solarized() +
    xlab("Lab") +
    ylab("Effect Size") + 
  theme(legend.position= "bottom")

Model with age moderation.

mod <- metafor::rma(d_z ~ scale(age_days, scale=FALSE), 
                    vi = d_z_var, slab = lab, data = ds, method = "REML") 
summary(mod)

## 
## Mixed-Effects Model (k = 5; tau^2 estimator: REML)
## 
##   logLik  deviance       AIC       BIC      AICc  
##  -3.0411    6.0821   12.0821    9.3780   36.0821  
## 
## tau^2 (estimated amount of residual heterogeneity):     0.0000 (SE = 0.2518)
## tau (square root of estimated tau^2 value):             0.0020
## I^2 (residual heterogeneity / unaccounted variability): 0.00%
## H^2 (unaccounted variability / sampling variability):   1.00
## R^2 (amount of heterogeneity accounted for):            100.00%
## 
## Test for Residual Heterogeneity: 
## QE(df = 3) = 2.8237, p-val = 0.4196
## 
## Test of Moderators (coefficient(s) 2): 
## QM(df = 1) = 4.6315, p-val = 0.0314
## 
## Model Results:
## 
##                                 estimate      se    zval    pval   ci.lb
## intrcpt                           0.8344  0.2196  3.7996  0.0001  0.4040
## scale(age_days, scale = FALSE)    0.0037  0.0017  2.1521  0.0314  0.0003
##                                  ci.ub     
## intrcpt                         1.2648  ***
## scale(age_days, scale = FALSE)  0.0071    *
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model with language effect.

mod <- metafor::rma(d_z ~ language, vi = d_z_var, 
                    slab = lab, data = ds, method = "REML") 
summary(mod)

Model with method moderation.

ds$method <- factor(ds$method, levels = c("single-screen","HPP","eye-tracking"))
mod <- metafor::rma(d_z ~ method, vi = d_z_var, slab = lab, data = ds, method = "REML") 
summary(mod)

## 
## Mixed-Effects Model (k = 5; tau^2 estimator: REML)
## 
##   logLik  deviance       AIC       BIC      AICc  
##  -2.5301    5.0602   13.0602    7.8328   53.0602  
## 
## tau^2 (estimated amount of residual heterogeneity):     0.0000 (SE = 0.3412)
## tau (square root of estimated tau^2 value):             0.0014
## I^2 (residual heterogeneity / unaccounted variability): 0.00%
## H^2 (unaccounted variability / sampling variability):   1.00
## R^2 (amount of heterogeneity accounted for):            100.00%
## 
## Test for Residual Heterogeneity: 
## QE(df = 2) = 2.7474, p-val = 0.2532
## 
## Test of Moderators (coefficient(s) 2,3): 
## QM(df = 2) = 4.7079, p-val = 0.0950
## 
## Model Results:
## 
##                     estimate      se    zval    pval    ci.lb   ci.ub   
## intrcpt               0.3474  0.2287  1.5193  0.1287  -0.1008  0.7956   
## methodHPP             0.3992  0.5063  0.7884  0.4305  -0.5932  1.3916   
## methodeye-tracking    1.1389  0.5332  2.1359  0.0327   0.0938  2.1839  *
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

6 Mixed effects approach

This approach is based on modeling all LTs.

6.1 Selected model

d_lmer <- d %>%
  mutate(log_lt = log(looking_time), 
         age_days = scale(age_days, scale=FALSE), 
         trial_num = scale(trial_num, scale=FALSE), 
         language = ifelse(lab %in% c("brookes","plymouth"), "non-NAE", "NAE")) %>%
  filter(!is.na(log_lt), !is.infinite(log_lt))

mod <- lmer(log_lt ~ trial_type * method + age_days * trial_num + trial_type * age_days * language + 
               + (trial_num | subid) 
             + (age_days | lab), data = d_lmer)

kable(summary(mod)$coefficients, digits = 2)

	Estimate	Std. Error	t value
(Intercept)	1.92	0.48	3.98
trial_typeIDS	0.02	0.38	0.04
methodHPP	-0.01	0.59	-0.02
methodsingle-screen	-0.14	0.66	-0.22
age_days	0.00	0.00	-1.02
trial_num	-0.04	0.02	-2.85
languagenon-NAE	-0.51	0.38	-1.35
trial_typeIDS:methodHPP	0.00	0.48	0.01
trial_typeIDS:methodsingle-screen	0.27	0.51	0.53
age_days:trial_num	0.00	0.00	-0.54
trial_typeIDS:age_days	0.00	0.00	1.18
trial_typeIDS:languagenon-NAE	0.43	0.30	1.45
age_days:languagenon-NAE	0.01	0.00	1.62
trial_typeIDS:age_days:languagenon-NAE	-0.01	0.00	-2.09

Corresponding scatter plot.

qplot(age_days, diff, col = language, group = language, data = mss_diffs) + 
  geom_smooth(method="lm") + 
  geom_hline(yintercept = 0, lty = 2) + 
  xlab("Age (days)") + 
  ylab("IDS preference (s)") + 
  scale_color_solarized() +
  theme(legend.position= "bottom")

6.2 Other (simple) models

This is the most basic model:

d_lmer <- diffs %>% 
  filter(!is.na(diff)) %>%
  mutate(age_centered = scale(age_days))

summary(lmer(diff ~ 1 + 
               (1 | lab) + 
               (1 | subid) , data = d_lmer))

## Linear mixed model fit by REML ['lmerMod']
## Formula: diff ~ 1 + (1 | lab) + (1 | subid)
##    Data: d_lmer
## 
## REML criterion at convergence: 2012.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.0138 -0.4532 -0.0727  0.5995  3.2664 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subid    (Intercept)  4.1890  2.0467  
##  lab      (Intercept)  0.4178  0.6464  
##  Residual             40.6505  6.3758  
## Number of obs: 304, groups:  subid, 55; lab, 5
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   1.7348     0.6465   2.683

and age effects.

summary(lmer(diff ~ age_centered + 
               (age_centered | lab) + 
               (1 | subid) , data = d_lmer))

## Linear mixed model fit by REML ['lmerMod']
## Formula: diff ~ age_centered + (age_centered | lab) + (1 | subid)
##    Data: d_lmer
## 
## REML criterion at convergence: 2010.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.0963 -0.4356 -0.0875  0.5371  3.1946 
## 
## Random effects:
##  Groups   Name         Variance  Std.Dev.  Corr 
##  subid    (Intercept)  4.428e+00 2.104e+00      
##  lab      (Intercept)  1.210e-12 1.100e-06      
##           age_centered 2.715e-13 5.211e-07 -1.00
##  Residual              4.051e+01 6.365e+00      
## Number of obs: 304, groups:  subid, 55; lab, 5
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept)    1.4043     0.4790   2.932
## age_centered   0.5963     0.4495   1.327
## 
## Correlation of Fixed Effects:
##             (Intr)
## age_centerd -0.059

Neither of these models take into account the shape of the habituation curve or how preference might manifest across it.

6.3 More complete model

Add trial number X age interaction.

summary(lmer(diff ~ age_centered * trial_num + 
               (age_centered * trial_num | lab) + 
               (trial_num | subid) , data = d_lmer))

## Linear mixed model fit by REML ['lmerMod']
## Formula: diff ~ age_centered * trial_num + (age_centered * trial_num |  
##     lab) + (trial_num | subid)
##    Data: d_lmer
## 
## REML criterion at convergence: 2002.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2432 -0.4533 -0.0175  0.5196  3.2363 
## 
## Random effects:
##  Groups   Name                   Variance Std.Dev. Corr             
##  subid    (Intercept)             0.0000  0.0000                    
##           trial_num               0.2536  0.5035    NaN             
##  lab      (Intercept)            97.2932  9.8637                    
##           age_centered           12.5953  3.5490   -0.89            
##           trial_num               5.1449  2.2682   -0.99  0.84      
##           age_centered:trial_num  0.6130  0.7829    0.97 -0.97 -0.94
##  Residual                        37.6597  6.1368                    
## Number of obs: 304, groups:  subid, 55; lab, 5
## 
## Fixed effects:
##                        Estimate Std. Error t value
## (Intercept)              7.7948     6.0875   1.280
## age_centered            -1.7869     2.6970  -0.663
## trial_num               -1.4043     1.3911  -1.009
## age_centered:trial_num   0.4519     0.6085   0.743
## 
## Correlation of Fixed Effects:
##             (Intr) ag_cnt trl_nm
## age_centerd -0.878              
## trial_num   -0.985  0.850       
## ag_cntrd:t_  0.882 -0.963 -0.883
## convergence code: 0
## unable to evaluate scaled gradient
## Model failed to converge: degenerate  Hessian with 1 negative eigenvalues

and the quadratic version of that analysis.

summary(lmer(diff ~ age_centered * trial_num + 
               (age_centered * trial_num | lab) + 
               (trial_num | subid) , data = d_lmer))

## Linear mixed model fit by REML ['lmerMod']
## Formula: diff ~ age_centered * trial_num + (age_centered * trial_num |  
##     lab) + (trial_num | subid)
##    Data: d_lmer
## 
## REML criterion at convergence: 2002.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2432 -0.4533 -0.0175  0.5196  3.2363 
## 
## Random effects:
##  Groups   Name                   Variance Std.Dev. Corr             
##  subid    (Intercept)             0.0000  0.0000                    
##           trial_num               0.2536  0.5035    NaN             
##  lab      (Intercept)            97.2932  9.8637                    
##           age_centered           12.5953  3.5490   -0.89            
##           trial_num               5.1449  2.2682   -0.99  0.84      
##           age_centered:trial_num  0.6130  0.7829    0.97 -0.97 -0.94
##  Residual                        37.6597  6.1368                    
## Number of obs: 304, groups:  subid, 55; lab, 5
## 
## Fixed effects:
##                        Estimate Std. Error t value
## (Intercept)              7.7948     6.0875   1.280
## age_centered            -1.7869     2.6970  -0.663
## trial_num               -1.4043     1.3911  -1.009
## age_centered:trial_num   0.4519     0.6085   0.743
## 
## Correlation of Fixed Effects:
##             (Intr) ag_cnt trl_nm
## age_centerd -0.878              
## trial_num   -0.985  0.850       
## ag_cntrd:t_  0.882 -0.963 -0.883
## convergence code: 0
## unable to evaluate scaled gradient
## Model failed to converge: degenerate  Hessian with 1 negative eigenvalues

ManyBabies 1: Pilot Data Analysis

ManyBabies Analysis Group (inc. Mike Frank, Hugh Rabagliati, Melissa Kline)