First, we read in the data and compute both the novelty preference (total looking time to the novel stimuli minus total looking time to familiar stimuli), as well as the novelty preference corrected for total looking time, i.e. the “novelty preference score” described in the summary from 10 July 2019.
#read in data
d <- read_csv("data/flip_updated.csv")
#novelty preference measures
d <- d %>%
mutate(
novel_pref=novel-familiar,
novel_pref_corr=(novel-familiar)/(familiar+novel),
avg_looking_time=(novel+familiar)/2)
The first thing I think that it makes sense to do is just to look at the distribution of (average) looking time for familiar and novel stimuli in the two studies. This seems like a good first step before combining the data from the two studies.
##convert data to long format
#also perform some centering transformations that will be useful in fitting models
d_long <- gather(d,"looking.type","looking.time",familiar,novel) %>%
mutate(
looking.type.c=case_when(
looking.type == "familiar" ~ -0.5,
looking.type == "novel" ~ 0.5
),
hpp.c=hpp-mean(hpp),
total_studies.c=total_studies-mean(total_studies)
)
ggplot(d_long,aes(looking.type,looking.time,color=looking.type))+
geom_violin(draw_quantiles = c(0.25, 0.5, 0.75))+
geom_jitter(width=0.01, height=0.01)+
geom_line(aes(group=baby),color="black",alpha=0.2)+
facet_wrap(~study+location)
Looking at the average looking times, it seems from visual inspection that the distributions look somewhat similar. Perhaps the looking times for Syntax in Wisconsin are somewhat shorter, mainly because the looking times to familiar items look substantially shorter in Wisconsin compared to Barcelona.
In general, there’s nothing that looks too dramatic here to me suggesting that one distribution looks radically different than the others (in shape or in the range of values).
As a first step, I’m just going to look at the linear relationship between visit number and headturn preference, not accounting for non-independence due to other factors (e.g., study). This helps for getting a sense of the simple effect and how robust it is to different sources of non-independence/ analytical decisions. I’ll also look at how much the effect relies on high leverage points (children with higher visit numbers) and visualize the effect for each study and location.
ggplot(d, aes(hpp, novel_pref,color=as.factor(hpp)))+
geom_violin(aes(group=hpp),draw_quantiles = c(0.25, 0.5, 0.75))+
geom_jitter(width=0.01, height=0.01)+
geom_smooth(aes(color=NULL),method="lm")+
geom_hline(yintercept=0)+
scale_color_brewer(palette="Set1")+
theme(legend.position="none")+
xlab("Number of HPP visits")+
ylab("Novelty Preference\n(not corrected for total looking time")
Fit a linear model predicting novelty preference from number of HPP visits (equivalent to a correlation).
#novelty preference
m <- lm(novel_pref~hpp,data=d)
#modelSummary(m)
m %>%
summary() %>%
tidy() %>%
kable(digits=5)
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -1494.2235 | 464.7362 | -3.21521 | 0.00182 |
| hpp | 665.5578 | 233.3424 | 2.85228 | 0.00541 |
ggplot(d, aes(hpp, novel_pref_corr,color=as.factor(hpp)))+
geom_violin(aes(group=hpp),draw_quantiles = c(0.25, 0.5, 0.75))+
geom_jitter(width=0.01, height=0.01)+
geom_smooth(aes(color=NULL),method="lm")+
geom_hline(yintercept=0)+
scale_color_brewer(palette="Set1")+
theme(legend.position="none")+
xlab("Number of HPP visits")+
ylab("Novelty Preference\n(corrected for total looking time")
Fit a linear model predicting novelty preference from number of HPP visits (equivalent to a correlation).
#novelty preference
m <- lm(novel_pref_corr~hpp,data=d)
#modelSummary(m)
m %>%
summary() %>%
tidy() %>%
kable(digits=5)
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -0.10221 | 0.03162 | -3.23188 | 0.00173 |
| hpp | 0.04714 | 0.01588 | 2.96857 | 0.00385 |
#split by study
ggplot(d, aes(hpp, novel_pref_corr,color=study))+
geom_violin(aes(group=hpp),draw_quantiles = c(0.25, 0.5, 0.75))+
geom_jitter(width=0.01, height=0.01)+
geom_smooth(aes(color=NULL),method="lm")+
geom_hline(yintercept=0)+
scale_color_brewer(palette="Set1")+
theme(legend.position="none")+
xlab("Number of HPP visits")+
ylab("Novelty Preference\n(corrected for total looking time")+
facet_wrap(~study)
ggplot(d, aes(hpp, novel_pref_corr,color=study))+
geom_violin(aes(group=hpp),draw_quantiles = c(0.25, 0.5, 0.75))+
geom_jitter(width=0.01, height=0.01)+
geom_smooth(aes(color=NULL),method="lm")+
geom_hline(yintercept=0)+
scale_color_brewer(palette="Set1")+
theme(legend.position="none")+
xlab("Number of HPP visits")+
ylab("Novelty Preference\n(corrected for total looking time")+
facet_wrap(~study+location)
ggplot(d, aes(hpp, novel_pref_corr,color=study))+
geom_violin(aes(group=hpp),draw_quantiles = c(0.25, 0.5, 0.75),color="black")+
geom_jitter(width=0.05, height=0.01)+
geom_smooth(aes(color=NULL),method="lm")+
geom_hline(yintercept=0)+
scale_color_brewer(palette="Set1")+
theme(legend.position=c(0.65,0.3))+
xlab("Number of HPP visits")+
ylab("Novelty Preference\n(corrected for total looking time")
Fit a linear model predicting novelty preference from number of HPP visits (equivalent to a correlation).
#novelty preference
m <- lm(novel_pref_corr~hpp,data=d)
#modelSummary(m)
m %>%
summary() %>%
tidy() %>%
kable(digits=5)
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -0.10221 | 0.03162 | -3.23188 | 0.00173 |
| hpp | 0.04714 | 0.01588 | 2.96857 | 0.00385 |
ggplot(filter(d,hpp<5), aes(hpp, novel_pref_corr,color=study))+
geom_violin(aes(group=hpp),draw_quantiles = c(0.25, 0.5, 0.75),color="black")+
geom_jitter(width=0.05, height=0.01)+
geom_smooth(aes(color=NULL),method="lm")+
geom_hline(yintercept=0)+
scale_color_brewer(palette="Set1")+
theme(legend.position=c(0.7,0.3))+
xlab("Number of HPP visits")+
ylab("Novelty Preference\n(corrected for total looking time")
Fit a linear model predicting novelty preference from number of HPP visits (equivalent to a correlation).
#novelty preference
m <- lm(novel_pref_corr~hpp,data=filter(d,hpp<5))
#modelSummary(m)
m %>%
summary() %>%
tidy() %>%
kable(digits=5)
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -0.10706 | 0.03319 | -3.22569 | 0.00177 |
| hpp | 0.05036 | 0.01718 | 2.93068 | 0.00432 |
ggplot(filter(d,hpp<4), aes(hpp, novel_pref_corr,color=study))+
geom_violin(aes(group=hpp),draw_quantiles = c(0.25, 0.5, 0.75),color="black")+
geom_jitter(width=0.05, height=0.01)+
geom_smooth(aes(color=NULL),method="lm")+
geom_hline(yintercept=0)+
scale_color_brewer(palette="Set1")+
theme(legend.position=c(0.7,0.2))+
xlab("Number of HPP visits")+
ylab("Novelty Preference\n(corrected for total looking time")
Fit a linear model predicting novelty preference from number of HPP visits (equivalent to a correlation).
#novelty preference
m <- lm(novel_pref_corr~hpp,data=filter(d,hpp<4))
#modelSummary(m)
m %>%
summary() %>%
tidy() %>%
kable(digits=5)
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -0.09269 | 0.03763 | -2.46339 | 0.01583 |
| hpp | 0.04014 | 0.02119 | 1.89424 | 0.06168 |
ggplot(filter(d,hpp<3), aes(hpp, novel_pref_corr,color=study))+
geom_violin(aes(group=hpp),draw_quantiles = c(0.25, 0.5, 0.75),color="black")+
geom_jitter(width=0.05, height=0.01)+
geom_smooth(aes(color=NULL),method="lm")+
geom_hline(yintercept=0)+
scale_color_brewer(palette="Set1")+
theme(legend.position=c(0.35,0.8))+
xlab("Number of HPP visits")+
ylab("Novelty Preference\n(corrected for total looking time")+
scale_x_continuous(breaks=c(1,2))
Fit a linear model predicting novelty preference from number of HPP visits (equivalent to a correlation).
#novelty preference
m <- lm(novel_pref_corr~hpp,data=filter(d,hpp<3))
#modelSummary(m)
m %>%
summary() %>%
tidy() %>%
kable(digits=5)
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -0.11678 | 0.05177 | -2.25563 | 0.02718 |
| hpp | 0.05945 | 0.03497 | 1.69991 | 0.09352 |
This essentially reproduces the model reported in the shared analysis script from 10 July 2019, with a few slight technical differences. For instance, I will simply fit the model with the maximal random effects structure here as the “default” and use Wald chi-squared tests to get p-values (not my preferred method, which is Kenward-Rogers approximantion, but this has some odd issues in this instance - see below).
#center hpp
d <- d%>%
mutate(
hpp.c=hpp - mean(hpp)
)
m <- lmer(novel_pref_corr~hpp.c+(1+hpp.c|study),data=d)
summary(m)
## Linear mixed model fit by REML ['lmerMod']
## Formula: novel_pref_corr ~ hpp.c + (1 + hpp.c | study)
## Data: d
##
## REML criterion at convergence: -94.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.11103 -0.64813 -0.00189 0.62100 2.72568
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## study (Intercept) 0.0032189 0.05673
## hpp.c 0.0003376 0.01837 1.00
## Residual 0.0177138 0.13309
## Number of obs: 90, groups: study, 2
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) -0.01896 0.04250 -0.446
## hpp.c 0.06516 0.02112 3.085
##
## Correlation of Fixed Effects:
## (Intr)
## hpp.c 0.581
## convergence code: 0
## boundary (singular) fit: see ?isSingular
Anova(m,type="III")
## Analysis of Deviance Table (Type III Wald chisquare tests)
##
## Response: novel_pref_corr
## Chisq Df Pr(>Chisq)
## (Intercept) 0.1990 1 0.655505
## hpp.c 9.5202 1 0.002032 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The best (most powerful) analysis would involve fitting linear mixed-effects models to the trial-by-trial looking data. This model would look something like this, fit on the trial-level looking data:
lmer(looking.time ~ looking.type x hpp + (1 + looking.type| baby) + (1 | item)+ (1+looking.typexhpp|study))
For now, let’s consider a mixed-effects model fit to the average looking time data.
I think the model that we want to test would involve predicting an interaction between item type (familiar vs. novel) and number of hpp visits on looking time - the effect of familiar vs. novel items on looking behavior depends on the number of headturn visits an infant has had.
lmer(looking.time ~ looking.type x hpp + (1 | baby) + (1+hpp|study))
We include a by-participant random intercept, since we have multiple observations per baby. Since item type has only two levels, we fit the model without a by-participant random slope (since this overdetermines the model; my understanding is that it’s OK to still force the model to accept the random slope, but it shouldn’t much matter for the estimates).
m <- lmer(looking.time~looking.type.c*hpp.c+(1|baby)+(1+hpp.c|study),data=d_long)
summary(m)
## Linear mixed model fit by REML ['lmerMod']
## Formula:
## looking.time ~ looking.type.c * hpp.c + (1 | baby) + (1 + hpp.c |
## study)
## Data: d_long
##
## REML criterion at convergence: 3210.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.75477 -0.50941 -0.04304 0.32523 2.49321
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## baby (Intercept) 3618131 1902.14
## study (Intercept) 442540 665.24
## hpp.c 1307 36.15 -1.00
## Residual 2071836 1439.39
## Number of obs: 180, groups: baby, 90; study, 2
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 7071.9 523.2 13.516
## looking.type.c -318.4 214.6 -1.484
## hpp.c -304.4 262.7 -1.159
## looking.type.c:hpp.c 665.6 233.3 2.852
##
## Correlation of Fixed Effects:
## (Intr) lkng.. hpp.c
## lokng.typ.c 0.000
## hpp.c -0.105 0.000
## lkng.typ.:. 0.000 0.000 0.000
## convergence code: 0
## boundary (singular) fit: see ?isSingular
Anova(m,type="III")
## Analysis of Deviance Table (Type III Wald chisquare tests)
##
## Response: looking.time
## Chisq Df Pr(>Chisq)
## (Intercept) 182.6721 1 < 2.2e-16 ***
## looking.type.c 2.2020 1 0.137832
## hpp.c 1.3422 1 0.246643
## looking.type.c:hpp.c 8.1352 1 0.004341 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The significant interaction term suggests that the number of visits changes the relationship between item type and looking time, i.e. the nature of the familiarity/novelty preference.
The way to get appropriate p-values from lmer models is of course a matter of debate. Here, we used a Wald chi-squared test. Typically, the method I prefer is the Kenward-Rogers approximation which estimates the “correct” degrees of freedom. However, there’s something a bit odd going on when adding the hpp by-study random slope with this method, which is probably because study only has two levels. The chi-squared method should be fine, but this is also something we could do a deeper dive on down the road (it might also be just another argument against including the hpp by-study random slope).
Checking here that the effect still holds when we remove high leverage points - the handful of babies who had 4 or 5 visits.
m <- lmer(looking.time~looking.type.c*hpp.c+(1|baby)+(1+hpp.c|study),data=filter(d_long,hpp<4))
summary(m)
## Linear mixed model fit by REML ['lmerMod']
## Formula:
## looking.time ~ looking.type.c * hpp.c + (1 | baby) + (1 + hpp.c |
## study)
## Data: filter(d_long, hpp < 4)
##
## REML criterion at convergence: 3035
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.7095 -0.5202 -0.0460 0.3702 2.4355
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## baby (Intercept) 3745711 1935.4
## study (Intercept) 338533 581.8
## hpp.c 16487 128.4 -1.00
## Residual 2175697 1475.0
## Number of obs: 170, groups: baby, 85; study, 2
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 7110.8 478.6 14.856
## looking.type.c -326.8 230.7 -1.416
## hpp.c -178.2 362.2 -0.492
## looking.type.c:hpp.c 657.1 314.6 2.088
##
## Correlation of Fixed Effects:
## (Intr) lkng.. hpp.c
## lokng.typ.c 0.000
## hpp.c -0.142 0.000
## lkng.typ.:. 0.000 0.195 0.000
## convergence code: 0
## boundary (singular) fit: see ?isSingular
Anova(m,type="III")
## Analysis of Deviance Table (Type III Wald chisquare tests)
##
## Response: looking.time
## Chisq Df Pr(>Chisq)
## (Intercept) 220.7050 1 < 2e-16 ***
## looking.type.c 2.0062 1 0.15666
## hpp.c 0.2421 1 0.62273
## looking.type.c:hpp.c 4.3612 1 0.03677 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
What happens when we attempt to predict preferential looking from both total number of studies and hpp?
In principle, this works, but the issue here is that total number of studies and hpp studies is highly, highly correlated/ almost indistinguishable
ggplot(d,aes(hpp,total_studies))+
geom_jitter(size=1.5, height=0.1,width=0.1)+
#geom_smooth(method="lm")+
stat_function(fun=function(x)x, geom="line", size=1.5,color="darkgreen")
Almost all participants have only completed HPP studies. Just on face value, it doesn’t make a lot of sense to me to fit a model when the two predictors would be virtually identical.
A likely consequence is that a model including both values will not be able to estimate the effect of both simultaneously/ will yield inaccurate estimates for these predictors. The standard way to check this is using the variance inflation factor (VIF). We will check the VIF of the main linear model and the main linear mixed-effects model below.
m <- lm(novel_pref_corr~hpp+total_studies,data=d)
summary(m)
##
## Call:
## lm(formula = novel_pref_corr ~ hpp + total_studies, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.29883 -0.08020 -0.00823 0.06970 0.37445
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.09825 0.03373 -2.913 0.00455 **
## hpp 0.05774 0.03425 1.686 0.09538 .
## total_studies -0.01180 0.03373 -0.350 0.72726
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1392 on 87 degrees of freedom
## Multiple R-squared: 0.0923, Adjusted R-squared: 0.07144
## F-statistic: 4.423 on 2 and 87 DF, p-value: 0.01481
vif(m)
## hpp total_studies
## 4.605184 4.605184
The VIF is quite high for the two predictors, ~4.6, which means that the SEs are increased by a factor of 2.15 (square root of the inflation factor). A typical cut-off is a factor of 5 or more, so we would be right on the border of that cutoff. Bottom line: I would not trust these values.
#center hpp
d <- d%>%
mutate(
total_studies.c=total_studies - mean(total_studies)
)
#most complex model without convergence issues
m <- lmer(novel_pref_corr~hpp.c+total_studies.c+(1+total_studies.c|study),data=d)
summary(m)
## Linear mixed model fit by REML ['lmerMod']
## Formula:
## novel_pref_corr ~ hpp.c + total_studies.c + (1 + total_studies.c |
## study)
## Data: d
##
## REML criterion at convergence: -90.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.19416 -0.64327 -0.02821 0.60975 2.64341
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## study (Intercept) 0.0042079 0.06487
## total_studies.c 0.0002331 0.01527 1.00
## Residual 0.0177075 0.13307
## Number of obs: 90, groups: study, 2
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) -0.02168 0.04797 -0.452
## hpp.c 0.10511 0.03667 2.866
## total_studies.c -0.04199 0.03559 -1.180
##
## Correlation of Fixed Effects:
## (Intr) hpp.c
## hpp.c -0.009
## totl_stds.c 0.296 -0.849
Anova(m,type="III")
## Analysis of Deviance Table (Type III Wald chisquare tests)
##
## Response: novel_pref_corr
## Chisq Df Pr(>Chisq)
## (Intercept) 0.2042 1 0.651321
## hpp.c 8.2161 1 0.004152 **
## total_studies.c 1.3919 1 0.238078
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
vif.mer(m) #check for multicollinearity
## hpp.c total_studies.c
## 3.577092 3.577092
The news might be slightly better here. The VIF is still pretty high (~3.5), but perhaps acceptable? If we take these values as somewhat interpretable, then HPP predicts preferential looking, controlling for total number of studies. The one caveat here is that the model with the maximal random effects structure has some convergence issues (as well as having super high VIF), so this would be simplifying the model by removing the random slope for hpp number.
m <- lmer(looking.time~looking.type.c*hpp.c+looking.type.c*total_studies.c+(1|baby)+(1|study),data=d_long)
summary(m)
## Linear mixed model fit by REML ['lmerMod']
## Formula:
## looking.time ~ looking.type.c * hpp.c + looking.type.c * total_studies.c +
## (1 | baby) + (1 | study)
## Data: d_long
##
## REML criterion at convergence: 3180.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.77110 -0.51821 -0.05031 0.34720 2.49047
##
## Random effects:
## Groups Name Variance Std.Dev.
## baby (Intercept) 3609800 1899.9
## study (Intercept) 306719 553.8
## Residual 2094645 1447.3
## Number of obs: 180, groups: baby, 90; study, 2
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 7054.29 453.51 15.555
## looking.type.c -318.40 215.75 -1.476
## hpp.c 270.40 564.33 0.479
## total_studies.c -605.85 535.80 -1.131
## looking.type.c:hpp.c 752.87 503.50 1.495
## looking.type.c:total_studies.c -97.19 495.89 -0.196
##
## Correlation of Fixed Effects:
## (Intr) lkng.. hpp.c ttl_s. lk..:.
## lokng.typ.c 0.000
## hpp.c -0.018 0.000
## totl_stds.c 0.011 0.000 -0.886
## lkng.typ.:. 0.000 0.000 0.000 0.000
## lkng.ty.:_. 0.000 0.000 0.000 0.000 -0.885
vif.mer(m)
## looking.type.c hpp.c
## 1.000000 4.664620
## total_studies.c looking.type.c:hpp.c
## 4.664620 4.605184
## looking.type.c:total_studies.c
## 4.605184
Anova(m,type="III")
## Analysis of Deviance Table (Type III Wald chisquare tests)
##
## Response: looking.time
## Chisq Df Pr(>Chisq)
## (Intercept) 241.9559 1 <2e-16 ***
## looking.type.c 2.1780 1 0.1400
## hpp.c 0.2296 1 0.6318
## total_studies.c 1.2786 1 0.2582
## looking.type.c:hpp.c 2.2358 1 0.1348
## looking.type.c:total_studies.c 0.0384 1 0.8446
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Similar situation to the linear model - VIF that is right on the threshold to being really too high to allow interpretation of the model (~4.6). We are simplifying the model again here to avoid convergence warnings.
The rough summary, to me, would be:
Total number of studies and total HPP studies are virtually identical in our dataset. Basically, we don’t have the right dataset to disentangle these two factors.
Models including both total study number and total hpp study number are right on the border to being uninterpretable, in principle, due to the fact that these predictors are virtually identical.
If one really wants to look at these (probably uninterpretable) models, if anything, it looks like HPP is the better predictor. But I would take any conclusion along those lines with a MASSIVE grain of salt.