First, we read in the data and compute both the novelty preference (total looking time to the novel stimuli minus total looking time to familiar stimuli), as well as the novelty preference corrected for total looking time, i.e. the “novelty preference score” described in the summary from 10 July 2019.

#read in data
d <- read_csv("data/flip_updated.csv")

#novelty preference measures
d <- d %>%
  mutate(
    novel_pref=novel-familiar,
    novel_pref_corr=(novel-familiar)/(familiar+novel),
    avg_looking_time=(novel+familiar)/2)

Compare distributions of Syntax and Saffran & Wilson

The first thing I think that it makes sense to do is just to look at the distribution of (average) looking time for familiar and novel stimuli in the two studies. This seems like a good first step before combining the data from the two studies.

##convert data to long format
#also perform some centering transformations that will be useful in fitting models
d_long <- gather(d,"looking.type","looking.time",familiar,novel)  %>%
  mutate(
    looking.type.c=case_when(
      looking.type == "familiar" ~ -0.5,
      looking.type == "novel" ~ 0.5
      ),
    hpp.c=hpp-mean(hpp),
    total_studies.c=total_studies-mean(total_studies)
  )

ggplot(d_long,aes(looking.type,looking.time,color=looking.type))+
  geom_violin(draw_quantiles = c(0.25, 0.5, 0.75))+
  geom_jitter(width=0.01, height=0.01)+
  geom_line(aes(group=baby),color="black",alpha=0.2)+
  facet_wrap(~study+location)

Looking at the average looking times, it seems from visual inspection that the distributions look somewhat similar. Perhaps the looking times for Syntax in Wisconsin are somewhat shorter, mainly because the looking times to familiar items look substantially shorter in Wisconsin compared to Barcelona.

In general, there’s nothing that looks too dramatic here to me suggesting that one distribution looks radically different than the others (in shape or in the range of values).

Linear (Correlational) Analyses

As a first step, I’m just going to look at the linear relationship between visit number and headturn preference, not accounting for non-independence due to other factors (e.g., study). This helps for getting a sense of the simple effect and how robust it is to different sources of non-independence/ analytical decisions. I’ll also look at how much the effect relies on high leverage points (children with higher visit numbers) and visualize the effect for each study and location.

Visit number predicts magnitude of novelty/ familiarity preference

Uncorrected novelty preference

ggplot(d, aes(hpp, novel_pref,color=as.factor(hpp)))+
  geom_violin(aes(group=hpp),draw_quantiles = c(0.25, 0.5, 0.75))+
  geom_jitter(width=0.01, height=0.01)+
  geom_smooth(aes(color=NULL),method="lm")+
  geom_hline(yintercept=0)+
  scale_color_brewer(palette="Set1")+
  theme(legend.position="none")+
  xlab("Number of HPP visits")+
  ylab("Novelty Preference\n(not corrected for total looking time")

Fit a linear model predicting novelty preference from number of HPP visits (equivalent to a correlation).

#novelty preference
m <- lm(novel_pref~hpp,data=d)
#modelSummary(m)
m %>%
  summary() %>%
  tidy() %>%
  kable(digits=5)
term estimate std.error statistic p.value
(Intercept) -1494.2235 464.7362 -3.21521 0.00182
hpp 665.5578 233.3424 2.85228 0.00541

Corrected novelty preference

ggplot(d, aes(hpp, novel_pref_corr,color=as.factor(hpp)))+
  geom_violin(aes(group=hpp),draw_quantiles = c(0.25, 0.5, 0.75))+
  geom_jitter(width=0.01, height=0.01)+
  geom_smooth(aes(color=NULL),method="lm")+
  geom_hline(yintercept=0)+
  scale_color_brewer(palette="Set1")+
  theme(legend.position="none")+
  xlab("Number of HPP visits")+
  ylab("Novelty Preference\n(corrected for total looking time")

Fit a linear model predicting novelty preference from number of HPP visits (equivalent to a correlation).

#novelty preference
m <- lm(novel_pref_corr~hpp,data=d)
#modelSummary(m)
m %>%
  summary() %>%
  tidy() %>%
  kable(digits=5)
term estimate std.error statistic p.value
(Intercept) -0.10221 0.03162 -3.23188 0.00173
hpp 0.04714 0.01588 2.96857 0.00385

Split the prediction by study/location

Split by study (Santolin vs. Saffran & Wilson)

#split by study
ggplot(d, aes(hpp, novel_pref_corr,color=study))+
  geom_violin(aes(group=hpp),draw_quantiles = c(0.25, 0.5, 0.75))+
  geom_jitter(width=0.01, height=0.01)+
  geom_smooth(aes(color=NULL),method="lm")+
  geom_hline(yintercept=0)+
  scale_color_brewer(palette="Set1")+
  theme(legend.position="none")+
  xlab("Number of HPP visits")+
  ylab("Novelty Preference\n(corrected for total looking time")+
  facet_wrap(~study)

Split by study and location (Wisconsin vs. Barcelona)

ggplot(d, aes(hpp, novel_pref_corr,color=study))+
  geom_violin(aes(group=hpp),draw_quantiles = c(0.25, 0.5, 0.75))+
  geom_jitter(width=0.01, height=0.01)+
  geom_smooth(aes(color=NULL),method="lm")+
  geom_hline(yintercept=0)+
  scale_color_brewer(palette="Set1")+
  theme(legend.position="none")+
  xlab("Number of HPP visits")+
  ylab("Novelty Preference\n(corrected for total looking time")+
  facet_wrap(~study+location)

How robust is this analysis to reduced numbers of visits

1 - 5 visits

ggplot(d, aes(hpp, novel_pref_corr,color=study))+
  geom_violin(aes(group=hpp),draw_quantiles = c(0.25, 0.5, 0.75),color="black")+
  geom_jitter(width=0.05, height=0.01)+
  geom_smooth(aes(color=NULL),method="lm")+
  geom_hline(yintercept=0)+
  scale_color_brewer(palette="Set1")+
  theme(legend.position=c(0.65,0.3))+
  xlab("Number of HPP visits")+
  ylab("Novelty Preference\n(corrected for total looking time")

Fit a linear model predicting novelty preference from number of HPP visits (equivalent to a correlation).

#novelty preference
m <- lm(novel_pref_corr~hpp,data=d)
#modelSummary(m)
m %>%
  summary() %>%
  tidy() %>%
  kable(digits=5)
term estimate std.error statistic p.value
(Intercept) -0.10221 0.03162 -3.23188 0.00173
hpp 0.04714 0.01588 2.96857 0.00385

1 - 4 visits

ggplot(filter(d,hpp<5), aes(hpp, novel_pref_corr,color=study))+
  geom_violin(aes(group=hpp),draw_quantiles = c(0.25, 0.5, 0.75),color="black")+
  geom_jitter(width=0.05, height=0.01)+
  geom_smooth(aes(color=NULL),method="lm")+
  geom_hline(yintercept=0)+
  scale_color_brewer(palette="Set1")+
  theme(legend.position=c(0.7,0.3))+
  xlab("Number of HPP visits")+
  ylab("Novelty Preference\n(corrected for total looking time")

Fit a linear model predicting novelty preference from number of HPP visits (equivalent to a correlation).

#novelty preference
m <- lm(novel_pref_corr~hpp,data=filter(d,hpp<5))
#modelSummary(m)
m %>%
  summary() %>%
  tidy() %>%
  kable(digits=5)
term estimate std.error statistic p.value
(Intercept) -0.10706 0.03319 -3.22569 0.00177
hpp 0.05036 0.01718 2.93068 0.00432

1 - 3 visits

ggplot(filter(d,hpp<4), aes(hpp, novel_pref_corr,color=study))+
  geom_violin(aes(group=hpp),draw_quantiles = c(0.25, 0.5, 0.75),color="black")+
  geom_jitter(width=0.05, height=0.01)+
  geom_smooth(aes(color=NULL),method="lm")+
  geom_hline(yintercept=0)+
  scale_color_brewer(palette="Set1")+
  theme(legend.position=c(0.7,0.2))+
  xlab("Number of HPP visits")+
  ylab("Novelty Preference\n(corrected for total looking time")

Fit a linear model predicting novelty preference from number of HPP visits (equivalent to a correlation).

#novelty preference
m <- lm(novel_pref_corr~hpp,data=filter(d,hpp<4))
#modelSummary(m)
m %>%
  summary() %>%
  tidy() %>%
  kable(digits=5)
term estimate std.error statistic p.value
(Intercept) -0.09269 0.03763 -2.46339 0.01583
hpp 0.04014 0.02119 1.89424 0.06168

1 - 2 visits

ggplot(filter(d,hpp<3), aes(hpp, novel_pref_corr,color=study))+
  geom_violin(aes(group=hpp),draw_quantiles = c(0.25, 0.5, 0.75),color="black")+
  geom_jitter(width=0.05, height=0.01)+
  geom_smooth(aes(color=NULL),method="lm")+
  geom_hline(yintercept=0)+
  scale_color_brewer(palette="Set1")+
  theme(legend.position=c(0.35,0.8))+
  xlab("Number of HPP visits")+
  ylab("Novelty Preference\n(corrected for total looking time")+
  scale_x_continuous(breaks=c(1,2))

Fit a linear model predicting novelty preference from number of HPP visits (equivalent to a correlation).

#novelty preference
m <- lm(novel_pref_corr~hpp,data=filter(d,hpp<3))
#modelSummary(m)
m %>%
  summary() %>%
  tidy() %>%
  kable(digits=5)
term estimate std.error statistic p.value
(Intercept) -0.11678 0.05177 -2.25563 0.02718
hpp 0.05945 0.03497 1.69991 0.09352

Linear mixed-effects analyses

Reproducing analysis with summarized difference score

This essentially reproduces the model reported in the shared analysis script from 10 July 2019, with a few slight technical differences. For instance, I will simply fit the model with the maximal random effects structure here as the “default” and use Wald chi-squared tests to get p-values (not my preferred method, which is Kenward-Rogers approximantion, but this has some odd issues in this instance - see below).

#center hpp
d <- d%>%
  mutate(
    hpp.c=hpp - mean(hpp)
)
m <- lmer(novel_pref_corr~hpp.c+(1+hpp.c|study),data=d)
summary(m)
## Linear mixed model fit by REML ['lmerMod']
## Formula: novel_pref_corr ~ hpp.c + (1 + hpp.c | study)
##    Data: d
## 
## REML criterion at convergence: -94.3
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.11103 -0.64813 -0.00189  0.62100  2.72568 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev. Corr
##  study    (Intercept) 0.0032189 0.05673      
##           hpp.c       0.0003376 0.01837  1.00
##  Residual             0.0177138 0.13309      
## Number of obs: 90, groups:  study, 2
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) -0.01896    0.04250  -0.446
## hpp.c        0.06516    0.02112   3.085
## 
## Correlation of Fixed Effects:
##       (Intr)
## hpp.c 0.581 
## convergence code: 0
## boundary (singular) fit: see ?isSingular
Anova(m,type="III")
## Analysis of Deviance Table (Type III Wald chisquare tests)
## 
## Response: novel_pref_corr
##              Chisq Df Pr(>Chisq)   
## (Intercept) 0.1990  1   0.655505   
## hpp.c       9.5202  1   0.002032 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approach using long-format data & lme4

The best (most powerful) analysis would involve fitting linear mixed-effects models to the trial-by-trial looking data. This model would look something like this, fit on the trial-level looking data:

lmer(looking.time ~ looking.type x hpp + (1 + looking.type| baby) + (1 | item)+ (1+looking.typexhpp|study))

For now, let’s consider a mixed-effects model fit to the average looking time data.

I think the model that we want to test would involve predicting an interaction between item type (familiar vs. novel) and number of hpp visits on looking time - the effect of familiar vs. novel items on looking behavior depends on the number of headturn visits an infant has had.

lmer(looking.time ~ looking.type x hpp + (1 | baby) + (1+hpp|study))

We include a by-participant random intercept, since we have multiple observations per baby. Since item type has only two levels, we fit the model without a by-participant random slope (since this overdetermines the model; my understanding is that it’s OK to still force the model to accept the random slope, but it shouldn’t much matter for the estimates).

m <- lmer(looking.time~looking.type.c*hpp.c+(1|baby)+(1+hpp.c|study),data=d_long)
summary(m)
## Linear mixed model fit by REML ['lmerMod']
## Formula: 
## looking.time ~ looking.type.c * hpp.c + (1 | baby) + (1 + hpp.c |  
##     study)
##    Data: d_long
## 
## REML criterion at convergence: 3210.6
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.75477 -0.50941 -0.04304  0.32523  2.49321 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  baby     (Intercept) 3618131  1902.14       
##  study    (Intercept)  442540   665.24       
##           hpp.c          1307    36.15  -1.00
##  Residual             2071836  1439.39       
## Number of obs: 180, groups:  baby, 90; study, 2
## 
## Fixed effects:
##                      Estimate Std. Error t value
## (Intercept)            7071.9      523.2  13.516
## looking.type.c         -318.4      214.6  -1.484
## hpp.c                  -304.4      262.7  -1.159
## looking.type.c:hpp.c    665.6      233.3   2.852
## 
## Correlation of Fixed Effects:
##             (Intr) lkng.. hpp.c 
## lokng.typ.c  0.000              
## hpp.c       -0.105  0.000       
## lkng.typ.:.  0.000  0.000  0.000
## convergence code: 0
## boundary (singular) fit: see ?isSingular
Anova(m,type="III")
## Analysis of Deviance Table (Type III Wald chisquare tests)
## 
## Response: looking.time
##                         Chisq Df Pr(>Chisq)    
## (Intercept)          182.6721  1  < 2.2e-16 ***
## looking.type.c         2.2020  1   0.137832    
## hpp.c                  1.3422  1   0.246643    
## looking.type.c:hpp.c   8.1352  1   0.004341 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The significant interaction term suggests that the number of visits changes the relationship between item type and looking time, i.e. the nature of the familiarity/novelty preference.

The way to get appropriate p-values from lmer models is of course a matter of debate. Here, we used a Wald chi-squared test. Typically, the method I prefer is the Kenward-Rogers approximation which estimates the “correct” degrees of freedom. However, there’s something a bit odd going on when adding the hpp by-study random slope with this method, which is probably because study only has two levels. The chi-squared method should be fine, but this is also something we could do a deeper dive on down the road (it might also be just another argument against including the hpp by-study random slope).

Is the effect robust when we remove babies with high numbers of visits (4 or 5)?

Checking here that the effect still holds when we remove high leverage points - the handful of babies who had 4 or 5 visits.

m <- lmer(looking.time~looking.type.c*hpp.c+(1|baby)+(1+hpp.c|study),data=filter(d_long,hpp<4))
summary(m)
## Linear mixed model fit by REML ['lmerMod']
## Formula: 
## looking.time ~ looking.type.c * hpp.c + (1 | baby) + (1 + hpp.c |  
##     study)
##    Data: filter(d_long, hpp < 4)
## 
## REML criterion at convergence: 3035
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.7095 -0.5202 -0.0460  0.3702  2.4355 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  baby     (Intercept) 3745711  1935.4        
##  study    (Intercept)  338533   581.8        
##           hpp.c         16487   128.4   -1.00
##  Residual             2175697  1475.0        
## Number of obs: 170, groups:  baby, 85; study, 2
## 
## Fixed effects:
##                      Estimate Std. Error t value
## (Intercept)            7110.8      478.6  14.856
## looking.type.c         -326.8      230.7  -1.416
## hpp.c                  -178.2      362.2  -0.492
## looking.type.c:hpp.c    657.1      314.6   2.088
## 
## Correlation of Fixed Effects:
##             (Intr) lkng.. hpp.c 
## lokng.typ.c  0.000              
## hpp.c       -0.142  0.000       
## lkng.typ.:.  0.000  0.195  0.000
## convergence code: 0
## boundary (singular) fit: see ?isSingular
Anova(m,type="III")
## Analysis of Deviance Table (Type III Wald chisquare tests)
## 
## Response: looking.time
##                         Chisq Df Pr(>Chisq)    
## (Intercept)          220.7050  1    < 2e-16 ***
## looking.type.c         2.0062  1    0.15666    
## hpp.c                  0.2421  1    0.62273    
## looking.type.c:hpp.c   4.3612  1    0.03677 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total number of visits and hpp

What happens when we attempt to predict preferential looking from both total number of studies and hpp?

In principle, this works, but the issue here is that total number of studies and hpp studies is highly, highly correlated/ almost indistinguishable

ggplot(d,aes(hpp,total_studies))+
  geom_jitter(size=1.5, height=0.1,width=0.1)+
  #geom_smooth(method="lm")+
  stat_function(fun=function(x)x, geom="line", size=1.5,color="darkgreen")

Almost all participants have only completed HPP studies. Just on face value, it doesn’t make a lot of sense to me to fit a model when the two predictors would be virtually identical.

A likely consequence is that a model including both values will not be able to estimate the effect of both simultaneously/ will yield inaccurate estimates for these predictors. The standard way to check this is using the variance inflation factor (VIF). We will check the VIF of the main linear model and the main linear mixed-effects model below.

Linear model

m <- lm(novel_pref_corr~hpp+total_studies,data=d)
summary(m)
## 
## Call:
## lm(formula = novel_pref_corr ~ hpp + total_studies, data = d)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.29883 -0.08020 -0.00823  0.06970  0.37445 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   -0.09825    0.03373  -2.913  0.00455 **
## hpp            0.05774    0.03425   1.686  0.09538 . 
## total_studies -0.01180    0.03373  -0.350  0.72726   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1392 on 87 degrees of freedom
## Multiple R-squared:  0.0923, Adjusted R-squared:  0.07144 
## F-statistic: 4.423 on 2 and 87 DF,  p-value: 0.01481
vif(m)
##           hpp total_studies 
##      4.605184      4.605184

The VIF is quite high for the two predictors, ~4.6, which means that the SEs are increased by a factor of 2.15 (square root of the inflation factor). A typical cut-off is a factor of 5 or more, so we would be right on the border of that cutoff. Bottom line: I would not trust these values.

Linear mixed-effects model

Summarized difference score

#center hpp
d <- d%>%
  mutate(
    total_studies.c=total_studies - mean(total_studies)
)
#most complex model without convergence issues
m <- lmer(novel_pref_corr~hpp.c+total_studies.c+(1+total_studies.c|study),data=d)
summary(m)
## Linear mixed model fit by REML ['lmerMod']
## Formula: 
## novel_pref_corr ~ hpp.c + total_studies.c + (1 + total_studies.c |  
##     study)
##    Data: d
## 
## REML criterion at convergence: -90.2
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.19416 -0.64327 -0.02821  0.60975  2.64341 
## 
## Random effects:
##  Groups   Name            Variance  Std.Dev. Corr
##  study    (Intercept)     0.0042079 0.06487      
##           total_studies.c 0.0002331 0.01527  1.00
##  Residual                 0.0177075 0.13307      
## Number of obs: 90, groups:  study, 2
## 
## Fixed effects:
##                 Estimate Std. Error t value
## (Intercept)     -0.02168    0.04797  -0.452
## hpp.c            0.10511    0.03667   2.866
## total_studies.c -0.04199    0.03559  -1.180
## 
## Correlation of Fixed Effects:
##             (Intr) hpp.c 
## hpp.c       -0.009       
## totl_stds.c  0.296 -0.849
Anova(m,type="III")
## Analysis of Deviance Table (Type III Wald chisquare tests)
## 
## Response: novel_pref_corr
##                  Chisq Df Pr(>Chisq)   
## (Intercept)     0.2042  1   0.651321   
## hpp.c           8.2161  1   0.004152 **
## total_studies.c 1.3919  1   0.238078   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
vif.mer(m) #check for multicollinearity
##           hpp.c total_studies.c 
##        3.577092        3.577092

The news might be slightly better here. The VIF is still pretty high (~3.5), but perhaps acceptable? If we take these values as somewhat interpretable, then HPP predicts preferential looking, controlling for total number of studies. The one caveat here is that the model with the maximal random effects structure has some convergence issues (as well as having super high VIF), so this would be simplifying the model by removing the random slope for hpp number.

Long format

m <- lmer(looking.time~looking.type.c*hpp.c+looking.type.c*total_studies.c+(1|baby)+(1|study),data=d_long)
summary(m)
## Linear mixed model fit by REML ['lmerMod']
## Formula: 
## looking.time ~ looking.type.c * hpp.c + looking.type.c * total_studies.c +  
##     (1 | baby) + (1 | study)
##    Data: d_long
## 
## REML criterion at convergence: 3180.7
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.77110 -0.51821 -0.05031  0.34720  2.49047 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  baby     (Intercept) 3609800  1899.9  
##  study    (Intercept)  306719   553.8  
##  Residual             2094645  1447.3  
## Number of obs: 180, groups:  baby, 90; study, 2
## 
## Fixed effects:
##                                Estimate Std. Error t value
## (Intercept)                     7054.29     453.51  15.555
## looking.type.c                  -318.40     215.75  -1.476
## hpp.c                            270.40     564.33   0.479
## total_studies.c                 -605.85     535.80  -1.131
## looking.type.c:hpp.c             752.87     503.50   1.495
## looking.type.c:total_studies.c   -97.19     495.89  -0.196
## 
## Correlation of Fixed Effects:
##             (Intr) lkng.. hpp.c  ttl_s. lk..:.
## lokng.typ.c  0.000                            
## hpp.c       -0.018  0.000                     
## totl_stds.c  0.011  0.000 -0.886              
## lkng.typ.:.  0.000  0.000  0.000  0.000       
## lkng.ty.:_.  0.000  0.000  0.000  0.000 -0.885
vif.mer(m)
##                 looking.type.c                          hpp.c 
##                       1.000000                       4.664620 
##                total_studies.c           looking.type.c:hpp.c 
##                       4.664620                       4.605184 
## looking.type.c:total_studies.c 
##                       4.605184
Anova(m,type="III")
## Analysis of Deviance Table (Type III Wald chisquare tests)
## 
## Response: looking.time
##                                   Chisq Df Pr(>Chisq)    
## (Intercept)                    241.9559  1     <2e-16 ***
## looking.type.c                   2.1780  1     0.1400    
## hpp.c                            0.2296  1     0.6318    
## total_studies.c                  1.2786  1     0.2582    
## looking.type.c:hpp.c             2.2358  1     0.1348    
## looking.type.c:total_studies.c   0.0384  1     0.8446    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Similar situation to the linear model - VIF that is right on the threshold to being really too high to allow interpretation of the model (~4.6). We are simplifying the model again here to avoid convergence warnings.

The rough summary, to me, would be:

  1. Total number of studies and total HPP studies are virtually identical in our dataset. Basically, we don’t have the right dataset to disentangle these two factors.

  2. Models including both total study number and total hpp study number are right on the border to being uninterpretable, in principle, due to the fact that these predictors are virtually identical.

  3. If one really wants to look at these (probably uninterpretable) models, if anything, it looks like HPP is the better predictor. But I would take any conclusion along those lines with a MASSIVE grain of salt.