1 Document Overview:

In this document, using infant-level data, we:

estimate the size of perceptual novelty (PN) and violation of expectation (VOE) to see how the size of these effects compare to each other in experimental conditions
find the moderators for PN and VOE (in experimental conditions) separately. This includes:

for significant moderators, estimating both effects within the same additive and interaction models
comparing additive and interaction models to see which better fits the data

find the effect of all moderators on both effects (in both experimental and control conditions) separately for each effect. This includes:

for exp_vs_control, estimating both effects within the same additive and interaction models
comparing additive and interaction models to see which better fits the data

determine what element of habituation makes infants have a larger PN effect. This includes:

seeing whether number of habituation trials predicts each effect
determining whether infants who meet habituation criteria in familiarization experiments have a larger PN effect than those who don’t

2 Imports

options(scipen = 1, digits = 3)
library(pacman)

pacman::p_load(plyr,
               tidyverse,
               metafor,
               lme4,
               lmerTest,
               cowplot,
               easystats,
               sjPlot,
               sjtable2df,
               kableExtra,
               easystats,
               MASS,
               performance,
               effects,
               lsmeans,
               multcomp,
               influence.ME,
               patchwork,
               simr, 
               pwr,
               datawizard)

sessionInfo()

## R version 4.3.2 (2023-10-31)
## Platform: x86_64-apple-darwin20 (64-bit)
## Running under: macOS Monterey 12.6.7
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] pwr_1.3-0           simr_1.0.7          patchwork_1.1.3    
##  [4] influence.ME_0.9-9  multcomp_1.4-25     TH.data_1.1-2      
##  [7] survival_3.5-7      mvtnorm_1.2-3       lsmeans_2.30-0     
## [10] emmeans_1.8.9       effects_4.2-2       carData_3.0-5      
## [13] MASS_7.3-60         kableExtra_1.3.4    sjtable2df_0.0.3   
## [16] sjPlot_2.8.15       see_0.8.0           report_0.5.7       
## [19] parameters_0.21.2   performance_0.10.8  modelbased_0.8.6   
## [22] insight_0.19.6      effectsize_0.8.6    datawizard_0.9.0   
## [25] correlation_0.8.4   bayestestR_0.13.1   easystats_0.6.0    
## [28] cowplot_1.1.1       lmerTest_3.1-3      lme4_1.1-34        
## [31] metafor_4.4-0       numDeriv_2016.8-1.1 metadat_1.2-0      
## [34] Matrix_1.6-1.1      lubridate_1.9.3     forcats_1.0.0      
## [37] stringr_1.5.0       dplyr_1.1.3         purrr_1.0.2        
## [40] readr_2.1.4         tidyr_1.3.0         tibble_3.2.1       
## [43] ggplot2_3.4.4       tidyverse_2.0.0     plyr_1.8.9         
## [46] pacman_0.5.1       
## 
## loaded via a namespace (and not attached):
##  [1] mathjaxr_1.6-0     rstudioapi_0.15.0  jsonlite_1.8.7     magrittr_2.0.3    
##  [5] estimability_1.4.1 nloptr_2.0.3       rmarkdown_2.25     vctrs_0.6.4       
##  [9] minqa_1.2.6        webshot_0.5.5      htmltools_0.5.6.1  binom_1.1-1.1     
## [13] plotrix_3.8-2      survey_4.2-1       broom_1.0.5        sjmisc_2.8.9      
## [17] sass_0.4.7         bslib_0.5.1        RLRsim_3.1-8       pbkrtest_0.5.2    
## [21] sandwich_3.0-2     zoo_1.8-12         cachem_1.0.8       lifecycle_1.0.3   
## [25] iterators_1.0.14   pkgconfig_2.0.3    sjlabelled_1.2.0   R6_2.5.1          
## [29] fastmap_1.1.1      digest_0.6.33      colorspace_2.1-0   fansi_1.0.5       
## [33] timechange_0.2.0   httr_1.4.7         abind_1.4-5        mgcv_1.9-0        
## [37] compiler_4.3.2     withr_2.5.2        backports_1.4.1    DBI_1.1.3         
## [41] sjstats_0.18.2     tools_4.3.2        nnet_7.3-19        glue_1.6.2        
## [45] nlme_3.1-163       grid_4.3.2         generics_0.1.3     gtable_0.3.4      
## [49] tzdb_0.4.0         data.table_1.14.8  hms_1.1.3          xml2_1.3.5        
## [53] car_3.1-2          utf8_1.2.4         pillar_1.9.0       mitools_2.4       
## [57] splines_4.3.2      lattice_0.21-9     tidyselect_1.2.0   knitr_1.45        
## [61] svglite_2.1.2      xfun_0.40          stringi_1.7.12     yaml_2.3.7        
## [65] boot_1.3-28.1      evaluate_0.22      codetools_0.2-19   cli_3.6.1         
## [69] systemfonts_1.0.5  munsell_0.5.0      jquerylib_0.1.4    modelr_0.1.11     
## [73] Rcpp_1.0.11        ggeffects_1.3.2    parallel_4.3.2     viridisLite_0.4.2 
## [77] scales_1.2.1       rlang_1.1.1        rvest_1.0.3

3 Read Mega-analysis data, and prepare data

mega_analysis_data <- read_csv('processed_data/mega-analysis_data.csv')

## Rows: 2078 Columns: 43
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (18): paper, long_cite, short_cite, doi, expt_cond, subj, specific_subje...
## dbl (25): expt_num, agedays, train1, train2, train3, train4, train5, train6,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

#Exclude infants without individual age or order of test trials
mega_analysis_age_order <- mega_analysis_data %>%
  filter(!is.na(agedays) & !is.na(order)) %>%
  mutate(cond_ID = paste(paper, expt_num, expt_cond, sep = ".")) %>%
  mutate(expt_ID = paste(paper, expt_num, sep = "."))

#Number of conditions (both experimental and control)
num_cond_w_age_order <- mega_analysis_age_order %>% group_by(cond_ID) %>% summarize(count = n())

num_cond_exp_total <- mega_analysis_data %>% filter(exp_or_control == "experimental")%>% mutate(cond_ID = paste(paper, expt_num, expt_cond, sep = ".")) %>% group_by(cond_ID) %>% summarize(count = n())

#Check whether data fits a normal or log distribution better
mega_analysis_long <- mega_analysis_age_order %>%
  pivot_longer(cols = c("last_train_looking", "expected1", "unexpected1"), names_to = "trial_type", values_to = "looking_time") %>%
  filter(!is.na(looking_time))

looknormal <- fitdistr(mega_analysis_long$looking_time, "normal")$loglik
looklognormal <- fitdistr(mega_analysis_long$looking_time, "log-normal")$loglik

First we ask whether the looking time data are more likely drawn from a normal or log-normal distribution. It turns out that the log-normal distribution provides a better fit (log-likelihood = -21562.277 vs -24369.897). Thus we proceed with the analysis using log-transformed looking times.

#Log-transform looking times
mega_analysis_log <- mega_analysis_data %>%
  filter(!is.na(agedays) & !is.na(order)) %>%
  mutate(last_train_looking = log(last_train_looking)) %>%
  mutate(expected1 = log(expected1)) %>%
  mutate(unexpected1 = log(unexpected1)) %>%
  mutate(PN = expected1 - last_train_looking) %>%
  mutate(VOE = unexpected1 - expected1) %>%
  mutate(cond_ID = paste(paper, expt_num, expt_cond, sep = ".")) %>%
  mutate(expt_ID = paste(paper, expt_num, sep = "."))

4 lmer object to dataframe

gen.m <- function(model) {
  df <- data.frame(coef(summary(model)))
  names(df) <- c("B", "SE", "df", "t", "p")
  return(df)
}

5 Estimating size of effects in experimental conditions

#Prepare data and contrast
mega_analysis_log_long <- mega_analysis_log %>%
  filter(exp_or_control == "experimental") %>%
  pivot_longer(cols = c("last_train_looking", "expected1", "unexpected1"), names_to = "trial_type", values_to = "looking_time") %>%
  filter(!is.na(looking_time)) %>%
  mutate(age_z = datawizard::standardize(agedays))

mega_analysis_log_long$trial_type <- as.factor(mega_analysis_log_long$trial_type)
contrasts(mega_analysis_log_long$trial_type) = contr.treatment(3)

#Model
estimate_effect_sizes <- lmer(formula = looking_time ~ trial_type + (1|specific_subject_id) + (1|cond_ID) + (1|expt_ID), data = mega_analysis_log_long)
summary(estimate_effect_sizes)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: looking_time ~ trial_type + (1 | specific_subject_id) + (1 |  
##     cond_ID) + (1 | expt_ID)
##    Data: mega_analysis_log_long
## 
## REML criterion at convergence: 9925
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -4.507 -0.621  0.005  0.633  3.152 
## 
## Random effects:
##  Groups              Name        Variance Std.Dev.
##  specific_subject_id (Intercept) 0.15327  0.3915  
##  cond_ID             (Intercept) 0.00267  0.0517  
##  expt_ID             (Intercept) 0.32153  0.5670  
##  Residual                        0.45154  0.6720  
## Number of obs: 4288, groups:  
## specific_subject_id, 1453; cond_ID, 59; expt_ID, 44
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)    2.4348     0.0889   45.3429   27.37  < 2e-16 ***
## trial_type2   -0.2178     0.0251 2870.7795   -8.67  < 2e-16 ***
## trial_type3    0.1775     0.0252 2868.1638    7.04  2.5e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trl_t2
## trial_type2 -0.142       
## trial_type3 -0.141  0.500

check_model(estimate_effect_sizes)

#Found outliers, remove and rerun model
outliers_list_effect_sizes <- check_outliers(estimate_effect_sizes)[1:4288]
outliers_effect_sizes <- insight::get_data(estimate_effect_sizes)[outliers_list_effect_sizes, ]
effect_size_data_no_outliers <- anti_join(mega_analysis_log_long, outliers_effect_sizes, by = c("specific_subject_id", "trial_type"))
estimate_effect_sizes_no_outliers <- lmer(formula = looking_time ~ trial_type + (1|specific_subject_id) + (1|cond_ID) + (1|expt_ID), data = effect_size_data_no_outliers)
summary(estimate_effect_sizes_no_outliers) #same results

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: looking_time ~ trial_type + (1 | specific_subject_id) + (1 |  
##     cond_ID) + (1 | expt_ID)
##    Data: effect_size_data_no_outliers
## 
## REML criterion at convergence: 9309
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.5791 -0.6450  0.0058  0.6459  2.6928 
## 
## Random effects:
##  Groups              Name        Variance Std.Dev.
##  specific_subject_id (Intercept) 0.16153  0.4019  
##  cond_ID             (Intercept) 0.00403  0.0635  
##  expt_ID             (Intercept) 0.31880  0.5646  
##  Residual                        0.38649  0.6217  
## Number of obs: 4235, groups:  
## specific_subject_id, 1452; cond_ID, 59; expt_ID, 44
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)    2.4366     0.0885   45.0360   27.53  < 2e-16 ***
## trial_type2   -0.1872     0.0234 2830.1263   -7.98  2.0e-15 ***
## trial_type3    0.1818     0.0235 2822.7615    7.74  1.4e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trl_t2
## trial_type2 -0.132       
## trial_type3 -0.132  0.498

#Table
estimate_effect_sizes_results <- mtab2df(tab_model(estimate_effect_sizes), 1, "data.frame")

#Beta value
effect_size_std_coef <- standardize_parameters(estimate_effect_sizes) %>%
    as.data.frame() %>%
    dplyr::select(Std_Coefficient)

effect_size_lmer_table <- cbind(gen.m(estimate_effect_sizes), effect_size_std_coef, estimate_effect_sizes_results$CI[1:3])[, c(1, 2, 7, 6, 3, 4, 5)] %>% mutate(p = ifelse(p<.001, "<.001", p))
colnames(effect_size_lmer_table) <- c("Estimate", "Standard Error", "CI", "Beta", "df", "t", "p")

effect_size_lmer_table %>%
  kable("html",align = 'clc',table.attr="id=effectsizestable")%>%
  kable_styling("striped")

	Estimate	Standard Error	CI	Beta	df	t	p
(Intercept)	2.435	0.089	2.26 – 2.61	0.161	45.3	27.37	<.001
trial_type2	-0.218	0.025	-0.27 – -0.17	-0.233	2870.8	-8.67	<.001
trial_type3	0.178	0.025	0.13 – 0.23	0.190	2868.2	7.04	<.001

#tests whether the two differences are significantly different from each other
K <- matrix(c(0, -1,-1), 1)
diff_pn_voe <- glht(estimate_effect_sizes, linfct = K)
summary(diff_pn_voe)

## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Fit: lmer(formula = looking_time ~ trial_type + (1 | specific_subject_id) + 
##     (1 | cond_ID) + (1 | expt_ID), data = mega_analysis_log_long)
## 
## Linear Hypotheses:
##        Estimate Std. Error z value Pr(>|z|)
## 1 == 0   0.0403     0.0436    0.92     0.36
## (Adjusted p values reported -- single-step method)

Parallel to the mega-analysis, we found that both the perceptual novelty and the violation of expectation effect were significantly greater than 0 (PN: Mean difference = 0.218 log seconds, [negated and flipped (-0.27 – -0.17)], beta = 0.233, t(df) = 8.668(2870.779), p <.001; VOE: 0.178 log seconds, [0.13 – 0.23], beta = 0.19, t(df) = 7.035(2868.164), p <.001). These two effect sizes were not significantly different from each other (p = 0.47, null hypothesis that PN - VOE = 0).

6 Moderators of PN and VOE for only experimental conditions

mega_analysis_experimental <- mega_analysis_log %>%
  filter(exp_or_control == "experimental") %>%
  mutate(age_z = datawizard::standardize(agedays))

#Number of conditions and papers: just experimental conditions
num_cond_exp <- mega_analysis_experimental %>% group_by(cond_ID) %>% summarize(count = n())
num_papers_exp <- mega_analysis_experimental %>% group_by(paper) %>% summarize(count = n())

#Prepare contrasts
mega_analysis_experimental$equal_per_nov <- as.factor(mega_analysis_experimental$equal_per_nov)
mega_analysis_experimental$exposure_phase <- as.factor(mega_analysis_experimental$exposure_phase)
mega_analysis_experimental$domain <- as.factor(mega_analysis_experimental$domain)
mega_analysis_experimental$stim_loop <- as.factor(mega_analysis_experimental$stim_loop)
mega_analysis_experimental$order <- as.factor(mega_analysis_experimental$order)

mega_analysis_experimental$equal_per_nov <- relevel(mega_analysis_experimental$equal_per_nov, "yes")
mega_analysis_experimental$exposure_phase <- relevel(mega_analysis_experimental$exposure_phase, "habituation")
mega_analysis_experimental$domain <- relevel(mega_analysis_experimental$domain, "psychology")
mega_analysis_experimental$stim_loop <- relevel(mega_analysis_experimental$stim_loop, "yes")
mega_analysis_experimental$order <- relevel(mega_analysis_experimental$order, "exp")

contrasts(mega_analysis_experimental$equal_per_nov) = contr.sum(2)
contrasts(mega_analysis_experimental$exposure_phase) = contr.sum(2)
contrasts(mega_analysis_experimental$domain) = contr.sum(2)
contrasts(mega_analysis_experimental$stim_loop) = contr.sum(2)
contrasts(mega_analysis_experimental$order) = contr.sum(2)

6.1 Modeling each effect separately

6.1.1 PN

#PN model
PN_moderators <- lmer(formula = PN ~ age_z + order + equal_per_nov + exposure_phase + domain + stim_loop + (1|cond_ID) + (1|expt_ID), data = mega_analysis_experimental)
summary(PN_moderators)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: PN ~ age_z + order + equal_per_nov + exposure_phase + domain +  
##     stim_loop + (1 | cond_ID) + (1 | expt_ID)
##    Data: mega_analysis_experimental
## 
## REML criterion at convergence: 3835
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -3.688 -0.625 -0.040  0.605  3.597 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  cond_ID  (Intercept) 0.0354   0.188   
##  expt_ID  (Intercept) 0.0913   0.302   
##  Residual             0.8168   0.904   
## Number of obs: 1417, groups:  cond_ID, 59; expt_ID, 44
## 
## Fixed effects:
##                  Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)        0.4749     0.0814   37.3760    5.83 0.000001 ***
## age_z              0.1529     0.0592   65.3048    2.58  0.01203 *  
## order1             0.0914     0.0240 1362.1301    3.80  0.00015 ***
## equal_per_nov1    -0.0378     0.0635   37.2545   -0.59  0.55559    
## exposure_phase1    0.2923     0.0666   41.8541    4.39 0.000076 ***
## domain1           -0.1840     0.0960   37.2878   -1.92  0.06304 .  
## stim_loop1         0.0463     0.0847   36.8927    0.55  0.58804    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) age_z  order1 eql__1 exps_1 doman1
## age_z        0.370                                   
## order1      -0.007 -0.019                            
## equl_pr_nv1 -0.098 -0.013 -0.008                     
## exposr_phs1  0.181  0.245 -0.011  0.107              
## domain1     -0.551 -0.180 -0.001  0.299  0.014       
## stim_loop1  -0.091 -0.192  0.002 -0.255  0.200  0.442

check_collinearity(PN_moderators)

check_model(PN_moderators)

#PN table
PN_results <- mtab2df(tab_model(PN_moderators), 1, "data.frame")

#Beta value
PN_mod_std_coef <- standardize_parameters(PN_moderators) %>%
    as.data.frame() %>%
    dplyr::select(Std_Coefficient)

PN_mod_lmer_table <- cbind(gen.m(PN_moderators), PN_mod_std_coef, PN_results$CI[1:7])[, c(1, 7, 6, 3, 4, 5)] %>% mutate(p = ifelse(p<.001, "<.001", p))
colnames(PN_mod_lmer_table) <- c("Estimate", "CI", "Beta", "df", "t", "p")

PN_mod_lmer_table %>%
  kable("html",align = 'clc',table.attr="id=PNresultstable")%>%
  kable_styling("striped")

	Estimate	CI	Beta	df	t	p
(Intercept)	0.475	0.32 – 0.63	0.261	37.4	5.835	<.001
age_z	0.153	0.04 – 0.27	0.155	65.3	2.583	0.0120344252530589
order1	0.091	0.04 – 0.14	0.092	1362.1	3.802	<.001
equal_per_nov1	-0.038	-0.16 – 0.09	-0.038	37.3	-0.595	0.555590552114518
exposure_phase1	0.292	0.16 – 0.42	0.295	41.9	4.387	<.001
domain1	-0.184	-0.37 – 0.00	-0.186	37.3	-1.916	0.0630354430272978
stim_loop1	0.046	-0.12 – 0.21	0.047	36.9	0.546	0.588039880687281

#PN visualize results
PN_plot_model <- plot_model(PN_moderators, ci.lvl = 0.95, show.values = TRUE, show.p = TRUE, colors = "bw")
PN_plot_model

plot(allEffects(PN_moderators))

6.1.2 VOE

#VOE model
VOE_moderators <- lmer(formula = VOE ~ age_z + order + equal_per_nov + exposure_phase + domain + stim_loop + (1|cond_ID) + (1|expt_ID) , data = mega_analysis_experimental)
summary(VOE_moderators)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: VOE ~ age_z + order + equal_per_nov + exposure_phase + domain +  
##     stim_loop + (1 | cond_ID) + (1 | expt_ID)
##    Data: mega_analysis_experimental
## 
## REML criterion at convergence: 3466
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -3.740 -0.580 -0.009  0.629  4.931 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  cond_ID  (Intercept) 0.022282 0.149   
##  expt_ID  (Intercept) 0.000625 0.025   
##  Residual             0.661128 0.813   
## Number of obs: 1404, groups:  cond_ID, 59; expt_ID, 44
## 
## Fixed effects:
##                   Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)        0.15066    0.04154   26.24560    3.63   0.0012 ** 
## age_z             -0.08956    0.03217   21.07184   -2.78   0.0111 *  
## order1            -0.17820    0.02173 1352.78255   -8.20  5.6e-16 ***
## equal_per_nov1    -0.00814    0.03989   41.84302   -0.20   0.8394    
## exposure_phase1   -0.03806    0.03559   24.29665   -1.07   0.2954    
## domain1            0.06185    0.04980   29.83750    1.24   0.2239    
## stim_loop1         0.00270    0.04246   18.13183    0.06   0.9499    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) age_z  order1 eql__1 exps_1 doman1
## age_z        0.246                                   
## order1       0.002 -0.012                            
## equl_pr_nv1 -0.131 -0.078 -0.013                     
## exposr_phs1  0.186  0.297 -0.021  0.245              
## domain1     -0.590 -0.226 -0.013  0.297  0.023       
## stim_loop1   0.017 -0.066 -0.005 -0.300  0.176  0.379

check_collinearity(VOE_moderators)

check_model(VOE_moderators)

#VOE table
VOE_results <- mtab2df(tab_model(VOE_moderators), 1, "data.frame")

#Beta value
VOE_mod_std_coef <- standardize_parameters(VOE_moderators) %>%
    as.data.frame() %>%
    dplyr::select(Std_Coefficient)

VOE_mod_lmer_table <- cbind(gen.m(VOE_moderators), VOE_mod_std_coef, VOE_results$CI[1:7])[, c(1, 7, 6, 3, 4, 5)] %>% mutate(p = ifelse(p<.001, "<.001", p))
colnames(VOE_mod_lmer_table) <- c("Estimate", "CI", "Beta", "df", "t", "p")

VOE_mod_lmer_table %>%
  kable("html",align = 'clc',table.attr="id=VOEresultstable")%>%
  kable_styling("striped")

	Estimate	CI	Beta	df	t	p
(Intercept)	0.151	0.07 – 0.23	-0.036	26.2	3.627	0.00121321061990907
age_z	-0.090	-0.15 – -0.03	-0.106	21.1	-2.784	0.0110940716856255
order1	-0.178	-0.22 – -0.14	-0.210	1352.8	-8.199	<.001
equal_per_nov1	-0.008	-0.09 – 0.07	-0.010	41.8	-0.204	0.839368459760285
exposure_phase1	-0.038	-0.11 – 0.03	-0.045	24.3	-1.069	0.295379701892301
domain1	0.062	-0.04 – 0.16	0.073	29.8	1.242	0.223903162155107
stim_loop1	0.003	-0.08 – 0.09	0.003	18.1	0.064	0.949909903771322

#VOE visualize results
VOE_plot_model <- plot_model(VOE_moderators, ci.lvl = 0.95, show.values = TRUE, show.p = TRUE, colors = "bw")
VOE_plot_model

plot(allEffects(VOE_moderators))

6.1.3 Both effects

#Both effects visualize results
PN_plot_model + VOE_plot_model

Next, we repeated the moderator analyses using data from individual infants, including individual infants’ ages and also the order of test events. The results from these participant-level data largely converged with the findings from the meta-analysis.

First, we found that the conditions using habituation reported greater perceptual novelty effects than conditions using familiarization (estimate = 0.292, [0.16 – 0.42], beta = 0.295, t(df) = 4.387(41.854), p <.001); habituation versus familiarization did not significantly predict the size of the violation of expectation effect (estimate = -0.038, [-0.11 – 0.03], beta = -0.045, t(df) = -1.069(24.297), p = 0.295379701892301).

Second, older infants showed a smaller VOE effect than younger infants (estimate = -0.09, [-0.15 – -0.03], beta = -0.106, t(df) = -2.784(21.072), p = 0.0110940716856255), and older infants showed a bigger PN effect than younger infants (estimate = 0.153, [0.04 – 0.27], beta = 0.155, t(df) = 2.583(65.305), p = 0.0120344252530589).

6.2 Estimating both effects within the same model to address statistical non-independence

6.2.1 Exposure Phase

#Prepare contrasts
mega_analysis_log_long$equal_per_nov <- as.factor(mega_analysis_log_long$equal_per_nov)
mega_analysis_log_long$exposure_phase <- as.factor(mega_analysis_log_long$exposure_phase)
mega_analysis_log_long$domain <- as.factor(mega_analysis_log_long$domain)
mega_analysis_log_long$stim_loop <- as.factor(mega_analysis_log_long$stim_loop)
mega_analysis_log_long$order <- as.factor(mega_analysis_log_long$order)

mega_analysis_log_long$equal_per_nov <- relevel(mega_analysis_log_long$equal_per_nov, "yes")
mega_analysis_log_long$exposure_phase <- relevel(mega_analysis_log_long$exposure_phase, "habituation")
mega_analysis_log_long$domain <- relevel(mega_analysis_log_long$domain, "psychology")
mega_analysis_log_long$stim_loop <- relevel(mega_analysis_log_long$stim_loop, "yes")
mega_analysis_log_long$order <- relevel(mega_analysis_log_long$order, "exp")

contrasts(mega_analysis_log_long$equal_per_nov) = contr.sum(2)
contrasts(mega_analysis_log_long$exposure_phase) = contr.sum(2)
contrasts(mega_analysis_log_long$domain) = contr.sum(2)
contrasts(mega_analysis_log_long$stim_loop) = contr.sum(2)
contrasts(mega_analysis_log_long$order) = contr.sum(2)

#Change exposure phase to treatment contrast
contrasts(mega_analysis_log_long$exposure_phase) = contr.treatment(2)

#Additive model
results_exposure_phase_add <- lmer(formula = looking_time ~ trial_type + exposure_phase + equal_per_nov + age_z + domain + stim_loop + order + (1|specific_subject_id) + (1|cond_ID) + (1|expt_ID), data = mega_analysis_log_long)
check_collinearity(results_exposure_phase_add)

check_model(results_exposure_phase_add)

#Interaction model
results_exposure_phase_inter <- lmer(formula = looking_time ~ trial_type * exposure_phase + equal_per_nov + age_z + domain + stim_loop + order + (1|specific_subject_id) + (1|cond_ID) + (1|expt_ID), data = mega_analysis_log_long)
summary(results_exposure_phase_inter)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: looking_time ~ trial_type * exposure_phase + equal_per_nov +  
##     age_z + domain + stim_loop + order + (1 | specific_subject_id) +  
##     (1 | cond_ID) + (1 | expt_ID)
##    Data: mega_analysis_log_long
## 
## REML criterion at convergence: 9834
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -4.584 -0.602 -0.007  0.648  3.413 
## 
## Random effects:
##  Groups              Name        Variance Std.Dev.
##  specific_subject_id (Intercept) 0.15871  0.3984  
##  cond_ID             (Intercept) 0.00433  0.0658  
##  expt_ID             (Intercept) 0.15927  0.3991  
##  Residual                        0.43705  0.6611  
## Number of obs: 4288, groups:  
## specific_subject_id, 1453; cond_ID, 59; expt_ID, 44
## 
## Fixed effects:
##                               Estimate Std. Error         df t value Pr(>|t|)
## (Intercept)                    2.32051    0.11771   43.89033   19.71  < 2e-16
## trial_type2                   -0.44855    0.03779 2856.21439  -11.87  < 2e-16
## trial_type3                    0.19990    0.03781 2852.79607    5.29  1.3e-07
## exposure_phase2                0.37659    0.13970   41.50051    2.70   0.0101
## equal_per_nov1                -0.00864    0.03746    9.57809   -0.23   0.8225
## age_z                         -0.04272    0.04746  150.75690   -0.90   0.3695
## domain1                       -0.03154    0.09898   37.03317   -0.32   0.7518
## stim_loop1                     0.26602    0.08766   37.14012    3.03   0.0044
## order1                         0.00632    0.01456 1404.97425    0.43   0.6644
## trial_type2:exposure_phase2    0.40273    0.04996 2864.26115    8.06  1.1e-15
## trial_type3:exposure_phase2   -0.04003    0.05013 2861.45964   -0.80   0.4247
##                                
## (Intercept)                 ***
## trial_type2                 ***
## trial_type3                 ***
## exposure_phase2             *  
## equal_per_nov1                 
## age_z                          
## domain1                        
## stim_loop1                  ** 
## order1                         
## trial_type2:exposure_phase2 ***
## trial_type3:exposure_phase2    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trl_t2 trl_t3 exps_2 eql__1 age_z  doman1 stm_l1 order1
## trial_type2 -0.161                                                        
## trial_type3 -0.161  0.500                                                 
## exposr_phs2 -0.700  0.136  0.135                                          
## equl_pr_nv1 -0.006  0.000  0.000 -0.053                                   
## age_z        0.327  0.001  0.000 -0.176  0.008                            
## domain1     -0.371  0.000  0.000 -0.019  0.179 -0.138                     
## stim_loop1   0.064 -0.001  0.000 -0.226 -0.147 -0.171  0.533              
## order1      -0.009  0.002  0.000  0.007 -0.012 -0.027  0.000  0.005       
## trl_typ2:_2  0.122 -0.756 -0.378 -0.180 -0.001  0.000  0.000  0.000 -0.001
## trl_typ3:_2  0.121 -0.377 -0.754 -0.179 -0.001 -0.001  0.000  0.001 -0.001
##             t_2:_2
## trial_type2       
## trial_type3       
## exposr_phs2       
## equl_pr_nv1       
## age_z             
## domain1           
## stim_loop1        
## order1            
## trl_typ2:_2       
## trl_typ3:_2  0.500

check_collinearity(results_exposure_phase_inter)

check_model(results_exposure_phase_inter)

plot(allEffects(results_exposure_phase_inter))

#Does the interaction better fit the data?
anova(results_exposure_phase_inter, results_exposure_phase_add, refit = TRUE)

## refitting model(s) with ML (instead of REML)

#Expected values of each of the effects facetted by exposure phase
exposure_phase_lsmeans <- lsmeans(results_exposure_phase_inter, pairwise ~ trial_type|exposure_phase)

## Note: D.f. calculations have been disabled because the number of observations exceeds 3000.
## To enable adjustments, add the argument 'pbkrtest.limit = 4288' (or larger)
## [or, globally, 'set emm_options(pbkrtest.limit = 4288)' or larger];
## but be warned that this may result in large computation time and memory use.

## Note: D.f. calculations have been disabled because the number of observations exceeds 3000.
## To enable adjustments, add the argument 'lmerTest.limit = 4288' (or larger)
## [or, globally, 'set emm_options(lmerTest.limit = 4288)' or larger];
## but be warned that this may result in large computation time and memory use.

exposure_phase_table <- as.data.frame(exposure_phase_lsmeans$contrasts) %>%
  filter(contrast == "expected1 - last_train_looking" | contrast == "expected1 - unexpected1") %>%
  subset(select = c("exposure_phase", "contrast", "estimate", "SE", "z.ratio", "p.value")) %>%
  mutate(p.value = ifelse(p.value <= .001, "<.001", round(p.value, 3))) %>%
  mutate(estimate = ifelse(contrast == "expected1 - unexpected1", -1* estimate, estimate)) %>%
  mutate(z.ratio = ifelse(contrast == "expected1 - unexpected1", -1* z.ratio, z.ratio)) %>% mutate(contrast = ifelse(contrast == "expected1 - unexpected1", "unexpected - expected", "expected - last_train"))

colnames(exposure_phase_table) <- c("Exposure Phase", "Contrast", "Estimate", "Standard Error", "z", "p")

exposure_phase_table %>%  
  kable("html",align = 'clc',table.attr="id=exposurephasetable")%>%
  kable_styling("striped", row_label_position = 'c')

Exposure Phase	Contrast	Estimate	Standard Error	z	p
habituation	expected - last_train	0.449	0.038	11.87	<.001
habituation	unexpected - expected	0.200	0.038	5.29	<.001
familiarization	expected - last_train	0.046	0.033	1.40	0.34
familiarization	unexpected - expected	0.160	0.033	4.86	<.001

#Beta value
exp_phase_std_coef <- standardize_parameters(results_exposure_phase_inter) %>%
    as.data.frame() %>%
    dplyr::select(Std_Coefficient)

exp_phase_lmer_table <- cbind(gen.m(results_exposure_phase_inter), exp_phase_std_coef, mtab2df(tab_model(results_exposure_phase_inter), 1, "data.frame")$CI[1:11])[, c(1, 2, 7, 6, 3, 4, 5)] %>% mutate(p = ifelse(p<.001, "<.001", p))
colnames(exp_phase_lmer_table) <- c("Estimate", "SE", "CI", "Beta", "df", "t", "p")

Exposure phase predicted the PN and VOE effects to different degrees (trial type by exposure phase interaction - effect of exposure phase for PN: estimate = -0.403, [swap and negate(0.30 – 0.50)], beta = -0.43, t(df) = -8.061(2864.261), p <.001; effect of exposure phase for VOE: -0.04, [-0.14 – 0.06], beta = -0.043, t(df) = -0.798(2861.46), p = 0.424679456172315)

6.2.2 Addressing statistical non-independence: Age

#Changing contrast for exposure phase to summed contrast
contrasts(mega_analysis_log_long$exposure_phase) = contr.sum(2)

#Additive model
results_age_add <- lmer(formula = looking_time ~ trial_type + age_z + exposure_phase + equal_per_nov + domain + stim_loop + order + (1|specific_subject_id) + (1|cond_ID) + (1|expt_ID), data = mega_analysis_log_long)
check_collinearity(results_age_add)

check_model(results_age_add)

#Interaction model
results_age_inter <- lmer(formula = looking_time ~ trial_type * age_z + exposure_phase + equal_per_nov + domain + stim_loop + order + (1|specific_subject_id) + (1|cond_ID) + (1|expt_ID), data = mega_analysis_log_long)
summary(results_age_inter)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: looking_time ~ trial_type * age_z + exposure_phase + equal_per_nov +  
##     domain + stim_loop + order + (1 | specific_subject_id) +  
##     (1 | cond_ID) + (1 | expt_ID)
##    Data: mega_analysis_log_long
## 
## REML criterion at convergence: 9912
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -4.556 -0.619  0.010  0.638  3.245 
## 
## Random effects:
##  Groups              Name        Variance Std.Dev.
##  specific_subject_id (Intercept) 0.1549   0.3936  
##  cond_ID             (Intercept) 0.0045   0.0671  
##  expt_ID             (Intercept) 0.1584   0.3980  
##  Residual                        0.4486   0.6698  
## Number of obs: 4288, groups:  
## specific_subject_id, 1453; cond_ID, 59; expt_ID, 44
## 
## Fixed effects:
##                     Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)          2.49953    0.08483   42.09715   29.46  < 2e-16 ***
## trial_type2         -0.21788    0.02504 2867.89594   -8.70  < 2e-16 ***
## trial_type3          0.17762    0.02515 2865.25065    7.06    2e-12 ***
## age_z               -0.03181    0.04969  179.67856   -0.64  0.52288    
## exposure_phase1     -0.24957    0.06821   38.04813   -3.66  0.00076 ***
## equal_per_nov1      -0.00789    0.03769    9.72543   -0.21  0.83848    
## domain1             -0.03059    0.09881   37.08078   -0.31  0.75861    
## stim_loop1           0.26666    0.08750   37.17990    3.05  0.00423 ** 
## order1               0.00647    0.01456 1405.23634    0.44  0.65694    
## trial_type2:age_z    0.03718    0.02506 2861.31946    1.48  0.13802    
## trial_type3:age_z   -0.07310    0.02511 2858.16937   -2.91  0.00363 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trl_t2 trl_t3 age_z  exps_1 eql__1 doman1 stm_l1 order1
## trial_type2 -0.149                                                        
## trial_type3 -0.148  0.500                                                 
## age_z        0.295  0.002  0.000                                          
## exposr_phs1  0.147  0.000  0.000  0.172                                   
## equl_pr_nv1 -0.053 -0.001 -0.001  0.008  0.055                            
## domain1     -0.530 -0.001  0.000 -0.132  0.020  0.181                     
## stim_loop1  -0.098 -0.001  0.001 -0.164  0.231 -0.148  0.532              
## order1      -0.007  0.001 -0.001 -0.026 -0.007 -0.012  0.000  0.005       
## trl_typ2:g_ -0.001  0.000 -0.001 -0.256  0.000 -0.001  0.000  0.000  0.002
## trl_typ3:g_  0.000 -0.001 -0.001 -0.254  0.000 -0.002  0.000  0.000  0.001
##             tr_2:_
## trial_type2       
## trial_type3       
## age_z             
## exposr_phs1       
## equl_pr_nv1       
## domain1           
## stim_loop1        
## order1            
## trl_typ2:g_       
## trl_typ3:g_  0.500

check_collinearity(results_age_inter)

check_model(results_age_inter)

plot(allEffects(results_age_inter))

#Age interaction model table
#Beta value
age_std_coef <- standardize_parameters(results_age_inter) %>%
    as.data.frame() %>%
    dplyr::select(Std_Coefficient)

age_lmer_table <- cbind(gen.m(results_age_inter), age_std_coef, mtab2df(tab_model(results_age_inter), 1, "data.frame")$CI[1:11])[, c(1, 7, 6, 3, 4, 5)] %>% mutate(p = ifelse(p<.001, "<.001", p))
colnames(age_lmer_table) <- c("Estimate", "CI", "Beta", "df", "t", "p")

age_lmer_table%>%
  kable("html",align = 'clc',table.attr="id=ageresultstable")%>%
  kable_styling("striped")

	Estimate	CI	Beta	df	t	p
(Intercept)	2.500	2.33 – 2.67	0.230	42.10	29.464	<.001
trial_type2	-0.218	-0.27 – -0.17	-0.233	2867.90	-8.700	<.001
trial_type3	0.178	0.13 – 0.23	0.190	2865.25	7.062	<.001
age_z	-0.032	-0.13 – 0.07	-0.034	179.68	-0.640	0.522884495529615
exposure_phase1	-0.250	-0.38 – -0.12	-0.267	38.05	-3.659	<.001
equal_per_nov1	-0.008	-0.08 – 0.07	-0.008	9.72	-0.209	0.838477615393106
domain1	-0.031	-0.22 – 0.16	-0.033	37.08	-0.310	0.758614053974961
stim_loop1	0.267	0.10 – 0.44	0.285	37.18	3.047	0.00423228295186812
order1	0.006	-0.02 – 0.04	0.007	1405.24	0.444	0.656938359693127
trial_type2:age_z	0.037	-0.01 – 0.09	0.040	2861.32	1.484	0.138016585854861
trial_type3:age_z	-0.073	-0.12 – -0.02	-0.078	2858.17	-2.911	0.00363465565256069

Age predicted the PN and VOE effects to different degrees (trial type by age interaction - effect of age for PN: estimate = -0.037, [swap and negate(-0.01 – 0.09)], beta = -0.04, t(df) = -1.484(2861.319), p = 0.138016585854861; effect of age for VOE: -0.073, [-0.12 – -0.02], beta = -0.078, t(df) = -2.911(2858.169), p 0.00363465565256069)

#Does the interaction model better explain the data?
anova(results_age_inter, results_age_add, refit = TRUE)

## refitting model(s) with ML (instead of REML)

6.2.3 Addressing statistical non-independence: Order

#Changing contrast for order to treatment contrast
contrasts(mega_analysis_log_long$order) = contr.treatment(2)

#Additive model
results_order_add <- lmer(formula = looking_time ~ trial_type + order + age_z + exposure_phase + equal_per_nov + domain + stim_loop + (1|specific_subject_id) + (1|cond_ID) + (1|expt_ID), data = mega_analysis_log_long)
check_collinearity(results_order_add)

check_model(results_order_add)

#Interaction model
results_order_inter <- lmer(formula = looking_time ~ trial_type * order + age_z + exposure_phase + equal_per_nov + domain + stim_loop + (1|specific_subject_id) + (1|cond_ID) + (1|expt_ID), data = mega_analysis_log_long)
summary(results_order_inter)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: looking_time ~ trial_type * order + age_z + exposure_phase +  
##     equal_per_nov + domain + stim_loop + (1 | specific_subject_id) +  
##     (1 | cond_ID) + (1 | expt_ID)
##    Data: mega_analysis_log_long
## 
## REML criterion at convergence: 9880
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -4.522 -0.611  0.013  0.631  3.283 
## 
## Random effects:
##  Groups              Name        Variance Std.Dev.
##  specific_subject_id (Intercept) 0.15628  0.3953  
##  cond_ID             (Intercept) 0.00459  0.0677  
##  expt_ID             (Intercept) 0.15819  0.3977  
##  Residual                        0.44419  0.6665  
## Number of obs: 4288, groups:  
## specific_subject_id, 1453; cond_ID, 59; expt_ID, 44
## 
## Fixed effects:
##                      Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)           2.59478    0.08713   47.05476   29.78  < 2e-16 ***
## trial_type2          -0.31062    0.03532 2864.62661   -8.79  < 2e-16 ***
## trial_type3           0.00224    0.03543 2863.62905    0.06  0.94962    
## order2               -0.19063    0.04102 3706.49730   -4.65  3.5e-06 ***
## age_z                -0.04401    0.04750  150.83120   -0.93  0.35556    
## exposure_phase1      -0.24982    0.06818   38.06604   -3.66  0.00075 ***
## equal_per_nov1       -0.00727    0.03781    9.81699   -0.19  0.85140    
## domain1              -0.02998    0.09878   37.10160   -0.30  0.76318    
## stim_loop1            0.26661    0.08747   37.19909    3.05  0.00423 ** 
## trial_type2:order2    0.18468    0.04984 2867.80287    3.71  0.00021 ***
## trial_type3:order2    0.34988    0.05005 2865.27194    6.99  3.4e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trl_t2 trl_t3 order2 age_z  exps_1 eql__1 doman1 stm_l1
## trial_type2 -0.203                                                        
## trial_type3 -0.203  0.500                                                 
## order2      -0.230  0.433  0.431                                          
## age_z        0.296  0.003  0.000  0.020                                   
## exposr_phs1  0.141  0.001  0.000  0.006  0.181                            
## equl_pr_nv1 -0.053 -0.002 -0.002  0.007  0.008  0.055                     
## domain1     -0.515 -0.001 -0.001 -0.001 -0.138  0.020  0.181              
## stim_loop1  -0.094 -0.001  0.000 -0.003 -0.171  0.231 -0.149  0.532       
## trl_typ2:r2  0.143 -0.709 -0.355 -0.612 -0.002 -0.001  0.002  0.001  0.000
## trl_typ3:r2  0.143 -0.354 -0.708 -0.608 -0.001  0.000  0.003  0.001  0.000
##             tr_2:2
## trial_type2       
## trial_type3       
## order2            
## age_z             
## exposr_phs1       
## equl_pr_nv1       
## domain1           
## stim_loop1        
## trl_typ2:r2       
## trl_typ3:r2  0.500

check_collinearity(results_order_inter)

check_model(results_order_inter)

plot(allEffects(results_order_inter))

#Does interaction model better explain the data?
results_order <- anova(results_order_inter, results_order_add, refit = TRUE)

## refitting model(s) with ML (instead of REML)

results_order

#Expected values of each of the effects facetted by order
order_lsmeans <- lsmeans(results_order_inter, pairwise ~ trial_type|order)

## Note: D.f. calculations have been disabled because the number of observations exceeds 3000.
## To enable adjustments, add the argument 'pbkrtest.limit = 4288' (or larger)
## [or, globally, 'set emm_options(pbkrtest.limit = 4288)' or larger];
## but be warned that this may result in large computation time and memory use.

## Note: D.f. calculations have been disabled because the number of observations exceeds 3000.
## To enable adjustments, add the argument 'lmerTest.limit = 4288' (or larger)
## [or, globally, 'set emm_options(lmerTest.limit = 4288)' or larger];
## but be warned that this may result in large computation time and memory use.

order_table <- as.data.frame(order_lsmeans$contrasts) %>%
  filter(contrast == "expected1 - last_train_looking" | contrast == "expected1 - unexpected1") %>%
  subset(select = c("order", "contrast", "estimate", "SE", "z.ratio", "p.value")) %>%
  mutate(p.value = ifelse(p.value <= .001, "<.001", round(p.value, 3))) %>%
  mutate(estimate = ifelse(contrast == "expected1 - unexpected1", -1* estimate, estimate)) %>%
  mutate(z.ratio = ifelse(contrast == "expected1 - unexpected1", -1* z.ratio, z.ratio)) %>% mutate(contrast = ifelse(contrast == "expected1 - unexpected1", "unexpected - expected", "expected - last_train"))

colnames(order_table) <- c("Order", "Contrast", "Estimate", "Standard Error", "z", "p")

order_table %>%  
  kable("html",align = 'clc',table.attr="id=ordertable")%>%
  kable_styling("striped", row_label_position = 'c')

Order	Contrast	Estimate	Standard Error	z	p
exp	expected - last_train	0.311	0.035	8.794	<.001
exp	unexpected - expected	0.002	0.035	0.063	0.998
unexp	expected - last_train	0.126	0.035	3.581	<.001
unexp	unexpected - expected	0.352	0.035	9.960	<.001

To ask whether the effect of order was different across the VOE and PN effects, we compared the fit of the model with this interaction to a simpler model including the same predictors as two main effects, using the anova() function. The model that contained an interaction between trial_type and order better explained the data (Likelihood Ratio test, p <.001, and also provided the best compromise between parsimony and fit (AIC = 9861.404 vs 9905.967, BIC = 9956.857 vs 9988.693).

7 Effect of all moderators on both effects in both experimental and control conditions

7.1 Separate models for PN and VOE

#Prepare data and contrasts
mega_analysis_exp_control <- mega_analysis_log %>%
  mutate(age_z = datawizard::standardize(agedays))

mega_analysis_exp_control$equal_per_nov <- as.factor(mega_analysis_exp_control$equal_per_nov)
mega_analysis_exp_control$exposure_phase <- as.factor(mega_analysis_exp_control$exposure_phase)
mega_analysis_exp_control$domain <- as.factor(mega_analysis_exp_control$domain)
mega_analysis_exp_control$stim_loop <- as.factor(mega_analysis_exp_control$stim_loop)
mega_analysis_exp_control$order <- as.factor(mega_analysis_exp_control$order)
mega_analysis_exp_control$exp_or_control <- as.factor(mega_analysis_exp_control$exp_or_control)

mega_analysis_exp_control$equal_per_nov <- relevel(mega_analysis_exp_control$equal_per_nov, "yes")
mega_analysis_exp_control$exposure_phase <- relevel(mega_analysis_exp_control$exposure_phase, "habituation")
mega_analysis_exp_control$domain <- relevel(mega_analysis_exp_control$domain, "psychology")
mega_analysis_exp_control$stim_loop <- relevel(mega_analysis_exp_control$stim_loop, "yes")
mega_analysis_exp_control$order <- relevel(mega_analysis_exp_control$order, "exp")
mega_analysis_exp_control$exp_or_control <- relevel(mega_analysis_exp_control$exp_or_control, "experimental")

contrasts(mega_analysis_exp_control$equal_per_nov) = contr.sum(2)
contrasts(mega_analysis_exp_control$exposure_phase) = contr.sum(2)
contrasts(mega_analysis_exp_control$domain) = contr.sum(2)
contrasts(mega_analysis_exp_control$stim_loop) = contr.sum(2)
contrasts(mega_analysis_exp_control$order) = contr.sum(2)
contrasts(mega_analysis_exp_control$exp_or_control) = contr.sum(2)

PN_exp_control <- lmer(formula = PN ~ age_z + exp_or_control + order + equal_per_nov + exposure_phase + domain + stim_loop  + (1|cond_ID), data = mega_analysis_exp_control)
summary(PN_exp_control)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: 
## PN ~ age_z + exp_or_control + order + equal_per_nov + exposure_phase +  
##     domain + stim_loop + (1 | cond_ID)
##    Data: mega_analysis_exp_control
## 
## REML criterion at convergence: 5367
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -3.627 -0.618 -0.042  0.577  4.222 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  cond_ID  (Intercept) 0.123    0.350   
##  Residual             0.849    0.921   
## Number of obs: 1952, groups:  cond_ID, 87
## 
## Fixed effects:
##                   Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)        0.37446    0.06968   79.42102    5.37  7.5e-07 ***
## age_z              0.00495    0.04619  101.54793    0.11  0.91487    
## exp_or_control1   -0.02422    0.04984   81.40994   -0.49  0.62838    
## order1             0.07389    0.02088 1865.41921    3.54  0.00041 ***
## equal_per_nov1     0.00751    0.06647   70.17186    0.11  0.91032    
## exposure_phase1    0.13330    0.04864   85.00876    2.74  0.00747 ** 
## domain1           -0.14180    0.08425   70.99749   -1.68  0.09673 .  
## stim_loop1         0.00829    0.06669   68.88572    0.12  0.90143    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) age_z  exp__1 order1 eql__1 exps_1 doman1
## age_z        0.279                                          
## exp_r_cntr1 -0.379 -0.173                                   
## order1      -0.003 -0.014  0.002                            
## equl_pr_nv1 -0.012 -0.130 -0.162 -0.003                     
## exposr_phs1  0.148  0.156 -0.014 -0.008  0.260              
## domain1     -0.569 -0.238  0.016  0.001  0.346  0.004       
## stim_loop1  -0.018 -0.075 -0.033  0.001 -0.286  0.086  0.414

check_collinearity(PN_exp_control)

check_model(PN_exp_control)

VOE_exp_control <- lmer(formula = VOE ~ age_z + exp_or_control + order + equal_per_nov + exposure_phase + domain + stim_loop + (1|cond_ID) , data = mega_analysis_exp_control)
summary(VOE_exp_control)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: 
## VOE ~ age_z + exp_or_control + order + equal_per_nov + exposure_phase +  
##     domain + stim_loop + (1 | cond_ID)
##    Data: mega_analysis_exp_control
## 
## REML criterion at convergence: 4884
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -6.058 -0.544  0.022  0.607  4.871 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  cond_ID  (Intercept) 0.0237   0.154   
##  Residual             0.6971   0.835   
## Number of obs: 1939, groups:  cond_ID, 87
## 
## Fixed effects:
##                   Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)        0.05038    0.04121   79.84730    1.22     0.23    
## age_z             -0.02128    0.02837   93.42339   -0.75     0.46    
## exp_or_control1    0.12677    0.02983   88.16666    4.25 0.000053 ***
## order1            -0.15778    0.01898 1864.29937   -8.31  < 2e-16 ***
## equal_per_nov1     0.00308    0.03809   65.08441    0.08     0.94    
## exposure_phase1   -0.00464    0.02957   98.39557   -0.16     0.88    
## domain1            0.04358    0.04819   63.44538    0.90     0.37    
## stim_loop1         0.00384    0.03793   61.79443    0.10     0.92    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) age_z  exp__1 order1 eql__1 exps_1 doman1
## age_z        0.257                                          
## exp_r_cntr1 -0.398 -0.194                                   
## order1       0.003 -0.008  0.000                            
## equl_pr_nv1 -0.034 -0.150 -0.142 -0.011                     
## exposr_phs1  0.142  0.212 -0.011 -0.015  0.288              
## domain1     -0.576 -0.238  0.032 -0.009  0.341 -0.001       
## stim_loop1   0.022 -0.050 -0.038 -0.001 -0.268  0.091  0.389

check_collinearity(VOE_exp_control)

check_model(VOE_exp_control)

#PN table
#Beta value
PN_exp_control_std_coef <- standardize_parameters(PN_exp_control) %>%
    as.data.frame() %>%
    dplyr::select(Std_Coefficient)

PN_exp_control_lmer_table <- cbind(gen.m(PN_exp_control), PN_exp_control_std_coef, mtab2df(tab_model(PN_exp_control), 1, "data.frame")$CI[1:8])[, c(1, 7, 6, 3, 4, 5)] %>% mutate(p = ifelse(p<.001, "<.001", p))
colnames(PN_exp_control_lmer_table) <- c("Estimate", "CI", "Beta", "df", "t", "p")

PN_exp_control_lmer_table %>%
  kable("html",align = 'clc',table.attr="id=PNexpcontroltable")%>%
  kable_styling("striped")

	Estimate	CI	Beta	df	t	p
(Intercept)	0.374	0.24 – 0.51	0.148	79.4	5.374	<.001
age_z	0.005	-0.09 – 0.10	0.005	101.5	0.107	0.914865380419804
exp_or_control1	-0.024	-0.12 – 0.07	-0.024	81.4	-0.486	0.628379678191041
order1	0.074	0.03 – 0.11	0.074	1865.4	3.540	<.001
equal_per_nov1	0.008	-0.12 – 0.14	0.008	70.2	0.113	0.91032046996414
exposure_phase1	0.133	0.04 – 0.23	0.134	85.0	2.741	0.00747447015792999
domain1	-0.142	-0.31 – 0.02	-0.143	71.0	-1.683	0.0967287357488424
stim_loop1	0.008	-0.12 – 0.14	0.008	68.9	0.124	0.901425516879573

#VOE table
#Beta value
VOE_exp_control_std_coef <- standardize_parameters(VOE_exp_control) %>%
    as.data.frame() %>%
    dplyr::select(Std_Coefficient)

VOE_exp_control_lmer_table <- cbind(gen.m(VOE_exp_control), VOE_exp_control_std_coef, mtab2df(tab_model(VOE_exp_control), 1, "data.frame")$CI[1:8])[, c(1, 7, 6, 3, 4, 5)] %>% mutate(p = ifelse(p<.001, "<.001", p))
colnames(VOE_exp_control_lmer_table) <- c("Estimate", "CI", "Beta", "df", "t", "p")

VOE_exp_control_lmer_table %>%
  kable("html",align = 'clc',table.attr="id=VOEexpcontroltable")%>%
  kable_styling("striped")

	Estimate	CI	Beta	df	t	p
(Intercept)	0.050	-0.03 – 0.13	-0.084	79.8	1.222	0.225119970046862
age_z	-0.021	-0.08 – 0.03	-0.025	93.4	-0.750	0.455028686582808
exp_or_control1	0.127	0.07 – 0.19	0.146	88.2	4.250	<.001
order1	-0.158	-0.20 – -0.12	-0.182	1864.3	-8.313	<.001
equal_per_nov1	0.003	-0.07 – 0.08	0.004	65.1	0.081	0.93584149727526
exposure_phase1	-0.005	-0.06 – 0.05	-0.005	98.4	-0.157	0.875703991654413
domain1	0.044	-0.05 – 0.14	0.050	63.4	0.904	0.369203467861906
stim_loop1	0.004	-0.07 – 0.08	0.004	61.8	0.101	0.919775309640435

7.2 Addressing statistical non-independence: Experimental or Control

#Prepare data and contrasts
mega_analysis_log_long_all <- mega_analysis_log %>%
  pivot_longer(cols = c("last_train_looking", "expected1", "unexpected1"), names_to = "trial_type", values_to = "looking_time") %>%
  filter(!is.na(looking_time)) %>%
  mutate(age_z = datawizard::standardize(agedays))

mega_analysis_log_long_all$equal_per_nov <- as.factor(mega_analysis_log_long_all$equal_per_nov)
mega_analysis_log_long_all$exposure_phase <- as.factor(mega_analysis_log_long_all$exposure_phase)
mega_analysis_log_long_all$domain <- as.factor(mega_analysis_log_long_all$domain)
mega_analysis_log_long_all$stim_loop <- as.factor(mega_analysis_log_long_all$stim_loop)
mega_analysis_log_long_all$order <- as.factor(mega_analysis_log_long_all$order)
mega_analysis_log_long_all$exp_or_control <- as.factor(mega_analysis_log_long_all$exp_or_control)

mega_analysis_log_long_all$equal_per_nov <- relevel(mega_analysis_log_long_all$equal_per_nov, "yes")
mega_analysis_log_long_all$exposure_phase <- relevel(mega_analysis_log_long_all$exposure_phase, "habituation")
mega_analysis_log_long_all$domain <- relevel(mega_analysis_log_long_all$domain, "psychology")
mega_analysis_log_long_all$stim_loop <- relevel(mega_analysis_log_long_all$stim_loop, "yes")
mega_analysis_log_long_all$order <- relevel(mega_analysis_log_long_all$order, "exp")
mega_analysis_log_long_all$exp_or_control <- relevel(mega_analysis_log_long_all$exp_or_control, "control")

contrasts(mega_analysis_log_long_all$equal_per_nov) = contr.sum(2)
contrasts(mega_analysis_log_long_all$exposure_phase) = contr.sum(2)
contrasts(mega_analysis_log_long_all$domain) = contr.sum(2)
contrasts(mega_analysis_log_long_all$stim_loop) = contr.sum(2)
contrasts(mega_analysis_log_long_all$order) = contr.sum(2)

mega_analysis_log_long_all$trial_type <- as.factor(mega_analysis_log_long_all$trial_type)
contrasts(mega_analysis_log_long_all$trial_type) = contr.treatment(3)
contrasts(mega_analysis_log_long_all$exp_or_control) = contr.treatment(2)

#Additive model
results_exp_or_control_add <- lmer(formula = looking_time ~ trial_type + exp_or_control + exposure_phase + equal_per_nov + age_z + domain + stim_loop + order + (1|specific_subject_id) + (1|cond_ID) + (1|expt_ID), data = mega_analysis_log_long_all)
summary(results_exp_or_control_add)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: looking_time ~ trial_type + exp_or_control + exposure_phase +  
##     equal_per_nov + age_z + domain + stim_loop + order + (1 |  
##     specific_subject_id) + (1 | cond_ID) + (1 | expt_ID)
##    Data: mega_analysis_log_long_all
## 
## REML criterion at convergence: 13757
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -5.376 -0.609  0.009  0.632  3.068 
## 
## Random effects:
##  Groups              Name        Variance Std.Dev.
##  specific_subject_id (Intercept) 0.16983  0.4121  
##  cond_ID             (Intercept) 0.00565  0.0752  
##  expt_ID             (Intercept) 0.13307  0.3648  
##  Residual                        0.45552  0.6749  
## Number of obs: 5895, groups:  
## specific_subject_id, 1990; cond_ID, 87; expt_ID, 53
## 
## Fixed effects:
##                   Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)        2.44430    0.08183   68.24197   29.87  < 2e-16 ***
## trial_type2       -0.22840    0.02153 3942.28740  -10.61  < 2e-16 ***
## trial_type3        0.12165    0.02160 3937.96455    5.63  1.9e-08 ***
## exp_or_control2    0.10487    0.04576   38.50644    2.29  0.02747 *  
## exposure_phase1   -0.22441    0.05771   47.81873   -3.89  0.00031 ***
## equal_per_nov1    -0.00391    0.03767   19.54282   -0.10  0.91842    
## age_z             -0.08656    0.04424  164.62894   -1.96  0.05208 .  
## domain1           -0.02632    0.08816   49.45132   -0.30  0.76654    
## stim_loop1         0.25192    0.07159   47.76721    3.52  0.00096 ***
## order1             0.01055    0.01277 1917.07893    0.83  0.40890    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trl_t2 trl_t3 exp__2 exps_1 eql__1 age_z  doman1 stm_l1
## trial_type2 -0.133                                                        
## trial_type3 -0.132  0.500                                                 
## exp_r_cntr2 -0.423  0.000  0.000                                          
## exposr_phs1  0.043  0.000  0.000  0.047                                   
## equl_pr_nv1 -0.061 -0.001  0.000  0.048  0.106                            
## age_z        0.196  0.001  0.000  0.014  0.181 -0.001                     
## domain1     -0.553 -0.001  0.000  0.045 -0.059  0.237 -0.126              
## stim_loop1  -0.070 -0.002  0.000 -0.044  0.169 -0.138 -0.152  0.474       
## order1      -0.003  0.000  0.000 -0.001 -0.007 -0.010 -0.023  0.000  0.004

check_collinearity(results_exp_or_control_add)

check_model(results_exp_or_control_add)

#Found outliers, remove and rerun model
outliers_list_exp_or_control_add <- check_outliers(results_exp_or_control_add)[1:5895]
outliers_exp_or_control_add <- insight::get_data(results_exp_or_control_add)[outliers_list_exp_or_control_add, ]
exp_or_control_add_data_no_outliers <- anti_join(mega_analysis_log_long_all, outliers_exp_or_control_add, by = c("specific_subject_id", "trial_type"))
exp_or_control_add_no_outliers <- lmer(formula = looking_time ~ trial_type + exp_or_control + exposure_phase + equal_per_nov + age_z + domain + stim_loop + order + (1|specific_subject_id) + (1|cond_ID) + (1|expt_ID), data = exp_or_control_add_data_no_outliers)
summary(exp_or_control_add_no_outliers) #same results

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: looking_time ~ trial_type + exp_or_control + exposure_phase +  
##     equal_per_nov + age_z + domain + stim_loop + order + (1 |  
##     specific_subject_id) + (1 | cond_ID) + (1 | expt_ID)
##    Data: exp_or_control_add_data_no_outliers
## 
## REML criterion at convergence: 13721
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -4.494 -0.610  0.006  0.630  3.072 
## 
## Random effects:
##  Groups              Name        Variance Std.Dev.
##  specific_subject_id (Intercept) 0.17041  0.4128  
##  cond_ID             (Intercept) 0.00567  0.0753  
##  expt_ID             (Intercept) 0.13221  0.3636  
##  Residual                        0.45198  0.6723  
## Number of obs: 5894, groups:  
## specific_subject_id, 1990; cond_ID, 87; expt_ID, 53
## 
## Fixed effects:
##                   Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)        2.44547    0.08163   68.25606   29.96  < 2e-16 ***
## trial_type2       -0.22839    0.02144 3941.29458  -10.65  < 2e-16 ***
## trial_type3        0.12392    0.02152 3937.41931    5.76  9.1e-09 ***
## exp_or_control2    0.10240    0.04574   38.40623    2.24  0.03104 *  
## exposure_phase1   -0.22435    0.05755   47.78913   -3.90  0.00030 ***
## equal_per_nov1    -0.00397    0.03766   19.53871   -0.11  0.91708    
## age_z             -0.08704    0.04416  164.07783   -1.97  0.05039 .  
## domain1           -0.02633    0.08792   49.43254   -0.30  0.76584    
## stim_loop1         0.25202    0.07139   47.74441    3.53  0.00093 ***
## order1             0.00981    0.01276 1917.08105    0.77  0.44211    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trl_t2 trl_t3 exp__2 exps_1 eql__1 age_z  doman1 stm_l1
## trial_type2 -0.133                                                        
## trial_type3 -0.132  0.500                                                 
## exp_r_cntr2 -0.424  0.000  0.000                                          
## exposr_phs1  0.043  0.000  0.000  0.047                                   
## equl_pr_nv1 -0.061 -0.001  0.000  0.048  0.106                            
## age_z        0.196  0.001  0.000  0.013  0.181 -0.001                     
## domain1     -0.553 -0.001  0.000  0.045 -0.059  0.237 -0.126              
## stim_loop1  -0.070 -0.002  0.000 -0.044  0.168 -0.139 -0.152  0.474       
## order1      -0.003  0.000 -0.001 -0.001 -0.007 -0.010 -0.023  0.000  0.004

#Interaction model
results_exp_or_control_inter <- lmer(formula = looking_time ~ trial_type * exp_or_control + exposure_phase + equal_per_nov + age_z + domain + stim_loop + order + (1|specific_subject_id) + (1|cond_ID) + (1|expt_ID), data = mega_analysis_log_long_all)
summary(results_exp_or_control_inter)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: looking_time ~ trial_type * exp_or_control + exposure_phase +  
##     equal_per_nov + age_z + domain + stim_loop + order + (1 |  
##     specific_subject_id) + (1 | cond_ID) + (1 | expt_ID)
##    Data: mega_analysis_log_long_all
## 
## REML criterion at convergence: 13746
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -5.252 -0.606  0.010  0.630  3.103 
## 
## Random effects:
##  Groups              Name        Variance Std.Dev.
##  specific_subject_id (Intercept) 0.17049  0.4129  
##  cond_ID             (Intercept) 0.00564  0.0751  
##  expt_ID             (Intercept) 0.13302  0.3647  
##  Residual                        0.45344  0.6734  
## Number of obs: 5895, groups:  
## specific_subject_id, 1990; cond_ID, 87; expt_ID, 53
## 
## Fixed effects:
##                              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)                    2.5030     0.0843   76.8344   29.70  < 2e-16 ***
## trial_type2                   -0.2561     0.0411 3922.9658   -6.22  5.3e-10 ***
## trial_type3                   -0.0265     0.0412 3917.5003   -0.64  0.51990    
## exp_or_control2                0.0241     0.0536   72.3015    0.45  0.65384    
## exposure_phase1               -0.2245     0.0577   47.8248   -3.89  0.00031 ***
## equal_per_nov1                -0.0039     0.0377   19.5262   -0.10  0.91851    
## age_z                         -0.0867     0.0442  164.6350   -1.96  0.05178 .  
## domain1                       -0.0263     0.0882   49.4542   -0.30  0.76668    
## stim_loop1                     0.2520     0.0716   47.7713    3.52  0.00096 ***
## order1                         0.0105     0.0128 1917.1730    0.82  0.41085    
## trial_type2:exp_or_control2    0.0382     0.0482 3929.4270    0.79  0.42839    
## trial_type3:exp_or_control2    0.2041     0.0483 3924.4350    4.22  2.5e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trl_t2 trl_t3 exp__2 exps_1 eql__1 age_z  doman1 stm_l1
## trial_type2 -0.244                                                        
## trial_type3 -0.244  0.500                                                 
## exp_r_cntr2 -0.476  0.384  0.384                                          
## exposr_phs1  0.042  0.000  0.000  0.040                                   
## equl_pr_nv1 -0.060  0.000  0.000  0.042  0.106                            
## age_z        0.191  0.000  0.000  0.012  0.181 -0.001                     
## domain1     -0.537  0.000  0.000  0.038 -0.059  0.237 -0.126              
## stim_loop1  -0.068 -0.001  0.000 -0.038  0.169 -0.138 -0.152  0.474       
## order1      -0.003 -0.001  0.000 -0.001 -0.007 -0.010 -0.023  0.000  0.004
## trl_ty2:__2  0.208 -0.853 -0.427 -0.451  0.000 -0.001  0.000  0.000  0.000
## trl_ty3:__2  0.208 -0.426 -0.852 -0.451  0.000  0.000  0.000  0.000  0.000
##             order1 t_2:__
## trial_type2              
## trial_type3              
## exp_r_cntr2              
## exposr_phs1              
## equl_pr_nv1              
## age_z                    
## domain1                  
## stim_loop1               
## order1                   
## trl_ty2:__2  0.001       
## trl_ty3:__2  0.000  0.500

check_collinearity(results_exp_or_control_inter)

## Model has interaction terms. VIFs might be inflated.
##   You may check multicollinearity among predictors of a model without
##   interaction terms.

check_model(results_exp_or_control_inter)

#Does the interaction model better explain the data
anova(results_exp_or_control_inter, results_exp_or_control_add, refit = TRUE)

## refitting model(s) with ML (instead of REML)

#Expected values of each of the effects for experimental and control conditions
exp_control_lsmeans <- lsmeans(results_exp_or_control_inter, pairwise ~ trial_type|exp_or_control)

## Note: D.f. calculations have been disabled because the number of observations exceeds 3000.
## To enable adjustments, add the argument 'pbkrtest.limit = 5895' (or larger)
## [or, globally, 'set emm_options(pbkrtest.limit = 5895)' or larger];
## but be warned that this may result in large computation time and memory use.

## Note: D.f. calculations have been disabled because the number of observations exceeds 3000.
## To enable adjustments, add the argument 'lmerTest.limit = 5895' (or larger)
## [or, globally, 'set emm_options(lmerTest.limit = 5895)' or larger];
## but be warned that this may result in large computation time and memory use.

exp_control_table <- as.data.frame(exp_control_lsmeans$contrasts) %>%
  filter(contrast == "expected1 - last_train_looking" | contrast == "expected1 - unexpected1") %>%
  subset(select = c("exp_or_control", "contrast", "estimate", "SE", "z.ratio", "p.value")) %>%
  mutate(p.value = ifelse(p.value <= .001, "<.001", round(p.value, 3))) %>%
  mutate(estimate = ifelse(contrast == "expected1 - unexpected1", -1* estimate, estimate)) %>%
  mutate(z.ratio = ifelse(contrast == "expected1 - unexpected1", -1* z.ratio, z.ratio)) %>% mutate(contrast = ifelse(contrast == "expected1 - unexpected1", "unexpected - expected", "expected - last_train"))

colnames(exp_control_table) <- c("Condition", "Contrast", "Estimate", "Standard Error", "z", "p")

exp_control_table %>%  
  kable("html",align = 'clc',table.attr="id=expcontroltable")%>%
  kable_styling("striped", row_label_position = 'c')

Condition	Contrast	Estimate	Standard Error	z	p
control	expected - last_train	0.256	0.041	6.225	<.001
control	unexpected - expected	-0.026	0.041	-0.644	0.796
experimental	expected - last_train	0.218	0.025	8.654	<.001
experimental	unexpected - expected	0.178	0.025	7.023	<.001

7.3 Visualization

#PN
PN_exp_control <- ggplot(data = mega_analysis_exp_control, aes(x = "PN", y = PN)) +
  geom_boxplot() +
  geom_point(aes(colour = exp_or_control, alpha = 0.001)) +
  ggtitle("PN") +
  ylab("Raw difference of Log Looking Times") +
  xlab("Condition") +
  ylim(c(-5,5))+
  theme(plot.title = element_text(hjust = 0.5)) +
  facet_wrap(~exp_or_control)

#VOE
VOE_exp_control <- ggplot(data = mega_analysis_exp_control, aes(x = "VOE", y = VOE)) +
  geom_boxplot() +
  geom_point(aes(colour = exp_or_control, alpha = 0.001)) + 
  ggtitle("VOE") +
  ylab("Raw difference of Log Looking Times") +
  xlab("Condition") +
  ylim(c(-5,5))+
  theme(plot.title = element_text(hjust = 0.5)) +
  facet_wrap(~exp_or_control)

#Both PN and VOE
PN_exp_control + VOE_exp_control

## Warning: Removed 43 rows containing non-finite values (`stat_boxplot()`).

## Warning: Removed 43 rows containing missing values (`geom_point()`).

## Warning: Removed 57 rows containing non-finite values (`stat_boxplot()`).

## Warning: Removed 57 rows containing missing values (`geom_point()`).

8 Investigating why PN is affected by exposure phase

8.1 Size of PN and VOE based on number of habituation trials

#experimental conditions
mega_analysis_log_hab_exp <- mega_analysis_log %>%
  filter(exp_or_control == "experimental", exposure_phase == "habituation", !is.na(last_train_trial))

#PN
pn_hab_trials <- ggplot(data = mega_analysis_log_hab_exp, aes(x = last_train_trial, y = PN)) +
  geom_point(alpha = 0.2) +
  geom_smooth(method = "lm") + 
  xlab("# habituation trials") +
  ylab("log seconds") +
  ylim(c(-2.5, 3)) + 
  ggtitle("PN") +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_x_continuous(breaks = c(5,6,7,8,9,10,11,12,13,14))

#VOE
voe_hab_trials <- ggplot(data = mega_analysis_log_hab_exp, aes(x = last_train_trial, y = VOE)) +
  geom_point(alpha = 0.2) +
  geom_smooth(method = "lm") + 
  xlab("# habituation trials") +
  ylab("log seconds") +
  ylim(c(-2.5, 3)) +
  ggtitle("VOE") +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_x_continuous(breaks = c(5,6,7,8,9,10,11,12,13,14))

#Both
pn_hab_trials + voe_hab_trials

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 10 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 10 rows containing missing values (`geom_point()`).

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 7 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 7 rows containing missing values (`geom_point()`).

mega_analysis_log_hab_exp_long <- mega_analysis_log_hab_exp %>%
  pivot_longer(cols = c("last_train_looking", "expected1", "unexpected1"), names_to = "trial_type", values_to = "looking_time")

hab_num_trials <- lmer(formula = looking_time ~ trial_type * last_train_trial + (1|specific_subject_id) + (1|cond_ID) + (1|expt_ID), data = mega_analysis_log_hab_exp_long)
summary(hab_num_trials)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: 
## looking_time ~ trial_type * last_train_trial + (1 | specific_subject_id) +  
##     (1 | cond_ID) + (1 | expt_ID)
##    Data: mega_analysis_log_hab_exp_long
## 
## REML criterion at convergence: 3314
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -4.447 -0.589 -0.003  0.613  3.028 
## 
## Random effects:
##  Groups              Name        Variance Std.Dev.
##  specific_subject_id (Intercept) 0.1707   0.413   
##  cond_ID             (Intercept) 0.0164   0.128   
##  expt_ID             (Intercept) 0.0784   0.280   
##  Residual                        0.3931   0.627   
## Number of obs: 1488, groups:  
## specific_subject_id, 498; cond_ID, 22; expt_ID, 16
## 
## Fixed effects:
##                                                 Estimate Std. Error         df
## (Intercept)                                      2.25137    0.15182  144.79427
## trial_typelast_train_looking                    -0.93874    0.14939  992.94919
## trial_typeunexpected1                            0.24794    0.14967  993.37165
## last_train_trial                                -0.00731    0.01465 1179.75918
## trial_typelast_train_looking:last_train_trial    0.06077    0.01659  993.36168
## trial_typeunexpected1:last_train_trial          -0.00540    0.01662  993.95779
##                                               t value Pr(>|t|)    
## (Intercept)                                     14.83  < 2e-16 ***
## trial_typelast_train_looking                    -6.28  4.9e-10 ***
## trial_typeunexpected1                            1.66  0.09792 .  
## last_train_trial                                -0.50  0.61811    
## trial_typelast_train_looking:last_train_trial    3.66  0.00026 ***
## trial_typeunexpected1:last_train_trial          -0.32  0.74546    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trl___ trl_t1 lst_t_ t___:_
## trl_typls__ -0.495                            
## trl_typnxp1 -0.494  0.502                     
## lst_trn_trl -0.830  0.550  0.549              
## trl_ty__:__  0.478 -0.964 -0.484 -0.571       
## trl_typ1:__  0.477 -0.484 -0.964 -0.570  0.503

check_model(hab_num_trials)

#Found outliers, remove and rerun model
outliers_list_hab_num_trials <- check_outliers(hab_num_trials)[1:1488]
outliers_hab_num_trials <- insight::get_data(hab_num_trials)[outliers_list_hab_num_trials, ]
hab_num_trials_data_no_outliers <- anti_join(mega_analysis_log_hab_exp_long, outliers_hab_num_trials, by = c("specific_subject_id", "trial_type"))
hab_num_trials_no_outliers <- lmer(formula = looking_time ~ trial_type * last_train_trial + (1|specific_subject_id) + (1|cond_ID) + (1|expt_ID), data = hab_num_trials_data_no_outliers)
summary(hab_num_trials_no_outliers) #same results

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: 
## looking_time ~ trial_type * last_train_trial + (1 | specific_subject_id) +  
##     (1 | cond_ID) + (1 | expt_ID)
##    Data: hab_num_trials_data_no_outliers
## 
## REML criterion at convergence: 3269
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -3.734 -0.599 -0.006  0.614  3.056 
## 
## Random effects:
##  Groups              Name        Variance Std.Dev.
##  specific_subject_id (Intercept) 0.1710   0.414   
##  cond_ID             (Intercept) 0.0167   0.129   
##  expt_ID             (Intercept) 0.0734   0.271   
##  Residual                        0.3799   0.616   
## Number of obs: 1486, groups:  
## specific_subject_id, 498; cond_ID, 22; expt_ID, 16
## 
## Fixed effects:
##                                                 Estimate Std. Error         df
## (Intercept)                                      2.24787    0.14951  148.49064
## trial_typelast_train_looking                    -0.91989    0.14694  991.14154
## trial_typeunexpected1                            0.24788    0.14713  991.14740
## last_train_trial                                -0.00663    0.01448 1169.13656
## trial_typelast_train_looking:last_train_trial    0.06009    0.01632  991.48386
## trial_typeunexpected1:last_train_trial          -0.00539    0.01634  991.72509
##                                               t value Pr(>|t|)    
## (Intercept)                                     15.03  < 2e-16 ***
## trial_typelast_train_looking                    -6.26  5.7e-10 ***
## trial_typeunexpected1                            1.68  0.09235 .  
## last_train_trial                                -0.46  0.64729    
## trial_typelast_train_looking:last_train_trial    3.68  0.00024 ***
## trial_typeunexpected1:last_train_trial          -0.33  0.74149    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trl___ trl_t1 lst_t_ t___:_
## trl_typls__ -0.494                            
## trl_typnxp1 -0.493  0.501                     
## lst_trn_trl -0.833  0.547  0.546              
## trl_ty__:__  0.477 -0.964 -0.484 -0.568       
## trl_typ1:__  0.476 -0.484 -0.964 -0.567  0.502

hab_num_trials_add <- lmer(formula = looking_time ~ trial_type + last_train_trial + (1|specific_subject_id) + (1|cond_ID) + (1|expt_ID), data = mega_analysis_log_hab_exp_long)
summary(hab_num_trials_add)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: 
## looking_time ~ trial_type + last_train_trial + (1 | specific_subject_id) +  
##     (1 | cond_ID) + (1 | expt_ID)
##    Data: mega_analysis_log_hab_exp_long
## 
## REML criterion at convergence: 3321
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -4.529 -0.570 -0.023  0.608  2.963 
## 
## Random effects:
##  Groups              Name        Variance Std.Dev.
##  specific_subject_id (Intercept) 0.1683   0.410   
##  cond_ID             (Intercept) 0.0164   0.128   
##  expt_ID             (Intercept) 0.0782   0.280   
##  Residual                        0.4002   0.633   
## Number of obs: 1488, groups:  
## specific_subject_id, 498; cond_ID, 22; expt_ID, 16
## 
## Fixed effects:
##                              Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)                    2.0895     0.1267  71.6179   16.49  < 2e-16 ***
## trial_typelast_train_looking  -0.4113     0.0402 995.3185  -10.24  < 2e-16 ***
## trial_typeunexpected1          0.2007     0.0403 995.1637    4.98  7.3e-07 ***
## last_train_trial               0.0114     0.0110 491.7469    1.03      0.3    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trl___ trl_t1
## trl_typls__ -0.158              
## trl_typnxp1 -0.157  0.501       
## lst_trn_trl -0.744 -0.002 -0.002

check_model(hab_num_trials_add)

#Found outliers, remove and rerun model
outliers_list_hab_num_trials_add <- check_outliers(hab_num_trials_add)[1:1488]
outliers_hab_num_trials_add <- insight::get_data(hab_num_trials_add)[outliers_list_hab_num_trials_add, ]
hab_num_trials_add_data_no_outliers <- anti_join(mega_analysis_log_hab_exp_long, outliers_hab_num_trials_add, by = c("specific_subject_id", "trial_type"))
hab_num_trials_add_no_outliers <- lmer(formula = looking_time ~ trial_type + last_train_trial + (1|specific_subject_id) + (1|cond_ID) + (1|expt_ID), data = hab_num_trials_add_data_no_outliers)
summary(hab_num_trials_add_no_outliers) #same results

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: 
## looking_time ~ trial_type + last_train_trial + (1 | specific_subject_id) +  
##     (1 | cond_ID) + (1 | expt_ID)
##    Data: hab_num_trials_add_data_no_outliers
## 
## REML criterion at convergence: 3190
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.1020 -0.6013 -0.0225  0.6163  3.0113 
## 
## Random effects:
##  Groups              Name        Variance Std.Dev.
##  specific_subject_id (Intercept) 0.1749   0.418   
##  cond_ID             (Intercept) 0.0163   0.128   
##  expt_ID             (Intercept) 0.0617   0.248   
##  Residual                        0.3605   0.600   
## Number of obs: 1481, groups:  
## specific_subject_id, 498; cond_ID, 22; expt_ID, 16
## 
## Fixed effects:
##                              Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)                    2.0981     0.1213  81.8673   17.29  < 2e-16 ***
## trial_typelast_train_looking  -0.3703     0.0383 990.8112   -9.67  < 2e-16 ***
## trial_typeunexpected1          0.2007     0.0382 987.7456    5.25  1.8e-07 ***
## last_train_trial               0.0110     0.0109 491.3394    1.01     0.31    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trl___ trl_t1
## trl_typls__ -0.155              
## trl_typnxp1 -0.155  0.498       
## lst_trn_trl -0.768 -0.003 -0.002

hab_anova_results <- anova(hab_num_trials, hab_num_trials_add, refit = TRUE)

## refitting model(s) with ML (instead of REML)

#Num hab trials interaction model table
#Beta value
hab_num_std_coef <- standardize_parameters(hab_num_trials) %>%
    as.data.frame() %>%
    dplyr::select(Std_Coefficient)

hab_num_lmer_table <- cbind(gen.m(hab_num_trials), hab_num_std_coef, mtab2df(tab_model(hab_num_trials), 1, "data.frame")$CI[1:6])[, c(1, 7, 6, 3, 4, 5)] %>% mutate(p = ifelse(p<.001, "<.001", round(p, 3)))
colnames(hab_num_lmer_table) <- c("Estimate", "CI", "Beta", "df", "t", "p")

hab_num_lmer_table%>%
  kable("html",align = 'clc',table.attr="id=habnumresultstable")%>%
  kable_styling("striped")

	Estimate	CI	Beta	df	t	p
(Intercept)	2.251	1.95 – 2.55	0.110	145	14.830	<.001
trial_typelast_train_looking	-0.939	-1.23 – -0.65	-0.492	993	-6.284	<.001
trial_typeunexpected1	0.248	-0.05 – 0.54	0.241	993	1.657	0.098
last_train_trial	-0.007	-0.04 – 0.02	-0.021	1180	-0.499	0.618
trial_typelast_train_looking:last_train_trial	0.061	0.03 – 0.09	0.175	993	3.662	<.001
trial_typeunexpected1:last_train_trial	-0.005	-0.04 – 0.03	-0.016	994	-0.325	0.745

Previously, we found that habituation predicts a larger PN effect than familiarization but no difference in the size of the VOE effect. What about habituation makes infants recover more attention to the first expected trial? One hypothesis is since habituated infants are on average seeing more training trials before test, the finding that habituated infants show a larger PN effect is due to seeing more training trials. However, within experimental conditions, when predicting the size of the PN and VOE effect with the number of training trials each infant saw, we found that the PN effect was larger for infants that saw less habituation trials, which means they habituated more quickly (estimate = -0.061), negate and flip: [0.03 – 0.09], beta = -0.175, t(df) = -3.662(993.362), p <.001), but no such effect for VOE (estimate = -0.005, [-0.04 – 0.03], beta = -0.016, t(df) = -0.325(993.958),p = 0.745). This provides evidence against the hypothesis that a larger PN effect for infants who are habituated is due to seeing more training trials before test.

Additionally, we found that the model that contained the interaction between trial type and number of habituation trials better explained the data (Likelihood Ratio Test, <.001), and also provided the best compromise between parsimony and fit (AIC = 3301.292 vs 3316.858, BIC = 3354.344 vs 3359.299).

8.2 Habituators vs Non-habituators in familiarization experiments

#For familiarized infants, habituators vs non-habituators, comparing size of effects

#Determining which infants follow habituation criteria
all_train_long <- mega_analysis_age_order %>%
  pivot_longer(cols = c("train1", "train2", "train3", "train4", "train5", "train6", "train7", "train8", "train9", "train10", "train11", "train12", "train13", "train14"), names_to = "training_trial", values_to = "looking_time") %>%
  filter(!is.na(looking_time))

last_three_train <- all_train_long %>%
  group_by(specific_subject_id) %>%
  slice(tail(row_number(), 3)) %>%
  mutate(last_three_mean = mean(looking_time)) %>%
  dplyr::select(c(specific_subject_id, last_three_mean)) %>%
  distinct() %>%
  ungroup()

first_three_train <- all_train_long %>%
  group_by(specific_subject_id) %>%
  slice(head(row_number(), 3)) %>%
  mutate(first_three_mean = mean(looking_time)) %>%
  dplyr::select(c(specific_subject_id, first_three_mean)) %>%
  distinct() %>%
  ungroup()

determining_hab_0 <- left_join(mega_analysis_age_order, first_three_train, by = c("specific_subject_id" = "specific_subject_id"))

determining_hab <- left_join(determining_hab_0, last_three_train, by = c("specific_subject_id" = "specific_subject_id")) %>%
  mutate(habituated = ifelse(first_three_mean >= 2 * last_three_mean, "yes", "no")) %>%
  filter(exposure_phase =="familiarization", last_train_trial >= 6) %>%
  mutate(last_train_looking = log(last_train_looking)) %>%
  mutate(expected1 = log(expected1)) %>%
  mutate(unexpected1 = log(unexpected1)) %>%
  mutate(PN = expected1 - last_train_looking) %>%
  mutate(VOE = unexpected1 - expected1) %>%
  filter(exp_or_control == "experimental")

#Visualizations
PN_hab_plot <- ggplot(determining_hab, aes(x = habituated, y = PN)) +
  geom_boxplot() +
  ylim(c(-3, 3)) +
  ylab("log seconds") +
  ggtitle("PN") +
  theme(plot.title = element_text(hjust = 0.5))

VOE_hab_plot <- ggplot(determining_hab, aes(x = habituated, y = VOE)) +
  geom_boxplot() +
  ylim(c(-3, 3)) +
  ylab("log seconds") + 
  ggtitle("VOE") +
  theme(plot.title = element_text(hjust = 0.5))

PN_hab_plot + VOE_hab_plot

## Warning: Removed 10 rows containing non-finite values (`stat_boxplot()`).

## Warning: Removed 19 rows containing non-finite values (`stat_boxplot()`).

#Analyses
determining_hab_long <- determining_hab %>%
  pivot_longer(cols = c("last_train_looking", "expected1", "unexpected1"), names_to = "trial_type", values_to = "looking_time") %>%
  mutate(habituated = as.factor(habituated))

habituator_effect <- lmer(formula = looking_time ~ trial_type * habituated + (1|specific_subject_id) + (1|cond_ID) + (1|expt_ID), data = determining_hab_long)

## boundary (singular) fit: see help('isSingular')

summary(habituator_effect)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: looking_time ~ trial_type * habituated + (1 | specific_subject_id) +  
##     (1 | cond_ID) + (1 | expt_ID)
##    Data: determining_hab_long
## 
## REML criterion at convergence: 3674
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.8098 -0.6638  0.0281  0.6653  2.5230 
## 
## Random effects:
##  Groups              Name        Variance Std.Dev. 
##  specific_subject_id (Intercept) 1.25e-01 0.3539948
##  cond_ID             (Intercept) 5.47e-11 0.0000074
##  expt_ID             (Intercept) 2.81e-01 0.5301364
##  Residual                        3.85e-01 0.6208621
## Number of obs: 1710, groups:  
## specific_subject_id, 577; cond_ID, 20; expt_ID, 13
## 
## Fixed effects:
##                                              Estimate Std. Error         df
## (Intercept)                                   2.92118    0.15326   13.18149
## trial_typelast_train_looking                  0.12185    0.04733 1133.84828
## trial_typeunexpected1                         0.10975    0.04755 1134.59002
## habituatedyes                                -0.21861    0.06185 1485.82914
## trial_typelast_train_looking:habituatedyes   -0.52973    0.07493 1139.28401
## trial_typeunexpected1:habituatedyes           0.00921    0.07558 1136.61862
##                                            t value Pr(>|t|)    
## (Intercept)                                  19.06  5.6e-11 ***
## trial_typelast_train_looking                  2.57  0.01017 *  
## trial_typeunexpected1                         2.31  0.02117 *  
## habituatedyes                                -3.53  0.00042 ***
## trial_typelast_train_looking:habituatedyes   -7.07  2.7e-12 ***
## trial_typeunexpected1:habituatedyes           0.12  0.90302    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trl___ trl_t1 hbttdy tr___:
## trl_typls__ -0.155                            
## trl_typnxp1 -0.154  0.500                     
## habituatdys -0.159  0.385  0.382              
## trl_typl__:  0.098 -0.632 -0.316 -0.613       
## trl_typnx1:  0.097 -0.314 -0.629 -0.606  0.500
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

habituator_effect_add <- lmer(formula = looking_time ~ trial_type + habituated + (1|specific_subject_id) + (1|cond_ID) + (1|expt_ID), data = determining_hab_long)

## boundary (singular) fit: see help('isSingular')

habituator_anova <- anova(habituator_effect, habituator_effect_add, refit = TRUE)

## refitting model(s) with ML (instead of REML)

#Habituator vs Non-habitutator interaction model table
#Beta value
habituated_std_coef <- standardize_parameters(habituator_effect) %>%
    as.data.frame() %>%
    dplyr::select(Std_Coefficient)

## boundary (singular) fit: see help('isSingular')

habituated_lmer_table <- cbind(gen.m(habituator_effect), habituated_std_coef, mtab2df(tab_model(habituator_effect), 1, "data.frame")$CI[1:6])[, c(1, 7, 6, 3, 4, 5)] %>% mutate(p = ifelse(p<.001, "<.001", round(p, 3)))
colnames(habituated_lmer_table) <- c("Estimate", "CI", "Beta", "df", "t", "p")

habituated_lmer_table%>%
  kable("html",align = 'clc',table.attr="id=habituatedresultstable")%>%
  kable_styling("striped")

	Estimate	CI	Beta	df	t	p
(Intercept)	2.921	2.62 – 3.22	0.467	13.2	19.061	<.001
trial_typelast_train_looking	0.122	0.03 – 0.21	0.133	1133.8	2.574	0.01
trial_typeunexpected1	0.110	0.02 – 0.20	0.120	1134.6	2.308	0.021
habituatedyes	-0.219	-0.34 – -0.10	-0.239	1485.8	-3.534	<.001
trial_typelast_train_looking:habituatedyes	-0.530	-0.68 – -0.38	-0.580	1139.3	-7.070	<.001
trial_typeunexpected1:habituatedyes	0.009	-0.14 – 0.16	0.010	1136.6	0.122	0.903

The finding that infants who habituate in less trials have a larger PN effect suggests that infants who habituate quickly are more sensitive to low level changes in the stimuli. If so, we expect that within infants who are familiarized, those who pass the habituation criteria at the set amount of trials (are habituators) show a larger PN effect than those who fail to pass the habituation criteria after the same amount of trials (non-habituators). We found that the PN effect was larger for habituators than for non-habituators (estimate = 0.53, negate and flip: [-0.68 – -0.38], beta = 0.58, t(df) = 7.07(1139.284), p <.001) but there was no such effect for VOE (estimate = 0.009, [-0.14 – 0.16], beta = 0.01, t(df) = 0.122(1136.619), p = 0.903). This suggests that infants who more steeply habituate recover more attention to perceptually novel stimuli.

Additionally, we found that the model that contained the interaction between trial type and whether infants habituated better explained the data (Likelihood Ratio Test, <.001), and also provided the best compromise between parsimony and fit (AIC = 3670.398 vs 3732.549, BIC = 3724.84 vs 3776.103)

Data from individual infants allowed us to further explore the differential effects of habituation on these two effects. Is the result that habituation predicts PN, but not VOE, effect genuinely driven by the process of habituation, or could it plausibly be attributed to systematic differences between habituation and familiarization studies? We found that in habituation studies, infants who habituated more steeply (i.e. underwent fewer habituation trials) showed a bigger PN effect (estimate = -0.061), negate and flip: [0.03 – 0.09], beta = -0.175, t(df) = -3.662(993.362), p <.001), but this factor did not affect the size of the VOE effect (estimate = -0.005, [-0.04 – 0.03], beta = -0.016, t(df) = -0.325(993.958),p = 0.745). We also found that in familiarization studies with 6 or more familiarization trials, infants who would have met a standard habituation criterion (looking for a summed duration on the last three trials that is 50% or less than the summed duration on the first three trials) showed a bigger PN effect than infants who did not (estimate = 0.53, negate and flip: [-0.68 – -0.38], beta = 0.58, t(df) = 7.07(1139.284), p <.001); this factor did not moderate the size of the VOE effect (estimate = 0.009, [-0.14 – 0.16], beta = 0.01, t(df) = 0.122(1136.619), p = 0.903). Thus, habituation genuinely moderated the two effects to different degrees, across all experiments, and this effect was driven by individual differences in habituation across infants: infants who habituated faster (during habituation studies), or more (during familiarization studies), recovered more attention to visually novel events, but not unexpected events.

9 Visualizing age z-scored within condition

z_score_age_w_cond <- mega_analysis_log %>%
  filter(exp_or_control == "experimental") %>%
  group_by(cond_ID) %>%
  mutate(age_z_w_cond = scale(agedays))

ggplot(z_score_age_w_cond, aes(x = age_z_w_cond, y = VOE)) +
  geom_point() +
  geom_smooth(method = "lm")

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 54 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 54 rows containing missing values (`geom_point()`).

ggplot(z_score_age_w_cond, aes(x = age_z_w_cond, y = PN)) +
  geom_point() +
  geom_smooth(method = "lm")

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 41 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 41 rows containing missing values (`geom_point()`).

10 Scatterplot VOE vs PN

ggplot(data = mega_analysis_log, aes(x = VOE, y = PN)) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = "lm") +
  xlab("VOE (log seconds)") +
  ylab("PN (log seconds)") +
  xlim(c(-5,5)) + 
  ylim(c(-5,5)) + 
  theme(axis.text.x = element_text(size = 12), axis.text.y = element_text(size = 12), axis.title.x = element_text(size = 13), axis.title.y = element_text(size = 13))

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 65 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 65 rows containing missing values (`geom_point()`).

3) Mega-analysis analogs of scripts 1. and 2.