ManyNumbers Small Sets - Pilot Data Analysis

Intro

This markdown is an exploration of the MN small sets project pilot data. The goal is to elucidate design and analysis choices prior to the submission of the registered report.

Preprocessing and loading

library(here)

here() starts at /Users/mcfrank/Projects/manynumbers-smallsets

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.1
✔ purrr     1.0.2

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(janitor)


Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test

library(lme4)

Loading required package: Matrix

Attaching package: 'Matrix'

The following objects are masked from 'package:tidyr':

    expand, pack, unpack

library(GLMMadaptive)


Attaching package: 'GLMMadaptive'

The following object is masked from 'package:lme4':

    negative.binomial

Loading

Loading the SS group pilot data.

d_raw <- read_csv(here("data", "pilot_data", "all_pilot_data.csv"))

Rows: 102 Columns: 89
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (17): LabSubID, Sex, LabRun, Notes, Version: PTX, Version: PTX Detailed,...
dbl (72): OverallSubID, Age, Age- Parent Report, HighestCount, HighestCount_...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Clean up names and remove null entries.

Process ages.

d <- d_raw |>
  janitor::clean_names() |>
  filter(!is.na(overall_sub_id), 
         !is.na(age)) |>
  mutate(age = ifelse(!is.na(age_parent_report), age_parent_report, age), 
         age = floor(age)) |>
  filter(age >= 14, age <= 40)

Look at the distribution of ages.

ggplot(d, aes(x = age, fill = lab_run)) + 
  geom_histogram(binwidth = 3)

Create age_group.

d$age_group <- cut(d$age, 
                   breaks = c(13, 24, 40), 
                   labels = c("14 - 24","24 - 40"), 
                   include_lowest = TRUE)

Preprocessing

Select variables we need.

We have a complex trial type variable that we need to parse. NOTE: we renamed this in the data to make our life easier.

We are going to make a separate data frame for the box task.

d_box <- d |>
  select(overall_sub_id, age, age_group, highest_count, knower_level, version_box,
       starts_with("familiarizarion"),
       starts_with("box")) |>
  select(-starts_with("box_first"), -contains("verbal")) |>
  pivot_longer(starts_with("box"), names_to = "trial_type", 
               values_to = "search_time") |>
  separate_wider_delim(cols = trial_type, 
                       names = c("box", "block", "out_of_in", "trial_num"), 
                       delim = "_") |>
  select(-box) |>
  separate_wider_delim(cols = out_of_in,
                       names = c("objects_out", "objects_in"), 
                       delim = "of", 
                       cols_remove = FALSE) |>
  filter(!is.na(search_time)) |>
  mutate(searched = search_time > 0, 
         mismatch = ifelse(out_of_in %in% 
                             c("1of1","2of2","3of3","4of4"), 
                           FALSE, TRUE))

Two labs did some trials wrong so we filter them out.

d_box <- filter(d_box, 
                 !(version_box == "3out" & block == "1v4"))

Difference score approach

In the literature, people do a difference score between search time in match and mismatch trials.

ms_box <- d_box |>
  group_by(age, age_group, overall_sub_id, trial_num, block) |>
  filter(any(mismatch), any(!mismatch)) |> # make sure I have both trial types for each block / trial_num combo
  summarise(n = n(),
            diff_score = search_time[mismatch] - 
              mean(search_time[!mismatch])) |>
  filter(!is.na(diff_score))

`summarise()` has grouped output by 'age', 'age_group', 'overall_sub_id',
'trial_num'. You can override using the `.groups` argument.

Now let’s look at the difference scores.

ggplot(ms_box, aes(x = diff_score)) +
  geom_histogram()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(ms_box, aes(x = block, y = diff_score)) + 
  geom_jitter(alpha = .3, width = .1, height = 0) + 
  stat_summary(fun.data = "mean_cl_boot", col = "red") + 
  facet_wrap(~block, scales = "free_x") + 
  ylab("Difference score (s)") + 
  xlab("Trial Type")

Note: one outlier trimmed by hand.

ggplot(ms_box, aes(x = block, y = diff_score)) + 
  geom_jitter(alpha = .3, width = .1, height = 0) + 
  stat_summary(fun.data = "mean_cl_boot", col = "red") + 
  facet_grid(age_group~block, scales = "free_x") +
  ylim(-10,20) + 
  ylab("Difference score (s)") + 
  xlab("Trial Type")

Warning: Removed 1 rows containing non-finite values (`stat_summary()`).

Warning: Removed 1 rows containing missing values (`geom_point()`).

Models

ms_box$block <- fct_relevel(ms_box$block, "2v3")

mod_diff <- lmer(diff_score ~ block + (1 | overall_sub_id), 
                 data = ms_box)

summary(mod_diff)

Linear mixed model fit by REML ['lmerMod']
Formula: diff_score ~ block + (1 | overall_sub_id)
   Data: ms_box

REML criterion at convergence: 954.2

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-6.8087 -0.3991 -0.1464  0.3173  3.2749 

Random effects:
 Groups         Name        Variance Std.Dev.
 overall_sub_id (Intercept)  2.027   1.424   
 Residual                   20.177   4.492   
Number of obs: 162, groups:  overall_sub_id, 68

Fixed effects:
            Estimate Std. Error t value
(Intercept)   1.5277     0.4860   3.143
block1v4      1.7932     1.1811   1.518
block2v4     -0.4753     0.8492  -0.560

Correlation of Fixed Effects:
         (Intr) blck14
block1v4 -0.348       
block2v4 -0.505  0.176

ms_box$age_centered <- ms_box$age - mean(d_box$age)

mod_diff_age <- lmer(diff_score ~ block * age_centered + 
                       (1 | overall_sub_id), 
                 data = ms_box)

summary(mod_diff_age)

Linear mixed model fit by REML ['lmerMod']
Formula: diff_score ~ block * age_centered + (1 | overall_sub_id)
   Data: ms_box

REML criterion at convergence: 952.3

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-6.6194 -0.4646 -0.0979  0.3565  3.2424 

Random effects:
 Groups         Name        Variance Std.Dev.
 overall_sub_id (Intercept)  1.514   1.230   
 Residual                   20.030   4.475   
Number of obs: 162, groups:  overall_sub_id, 68

Fixed effects:
                      Estimate Std. Error t value
(Intercept)            1.54800    0.47570   3.254
block1v4               2.04000    1.82540   1.118
block2v4              -1.23227    0.92887  -1.327
age_centered           0.18857    0.09931   1.899
block1v4:age_centered -0.11158    0.28133  -0.397
block2v4:age_centered  0.12497    0.17739   0.704

Correlation of Fixed Effects:
            (Intr) blck14 blck24 ag_cnt bl14:_
block1v4    -0.229                            
block2v4    -0.467  0.107                     
age_centerd  0.024 -0.005 -0.007              
blck1v4:g_c -0.007  0.734  0.002 -0.309       
blck2v4:g_c -0.009  0.002 -0.364 -0.512  0.158

Search choice

OK, next we explore a different analytic strategy based on modeling trial-to-trial performance independently.

Let’s start with looking at the choice of whether or not to search, since this is a simple binary DV.

Plots

ggplot(d_box, 
       aes(x = out_of_in, 
           y = as.numeric(searched))) + 
  geom_jitter(alpha = .3, width = .1, height = .05) + 
  stat_summary(fun.data = "mean_cl_boot", col = "red") + 
  facet_wrap(~block, scales = "free_x") + 
  ylab("Search Probability") + 
  xlab("Trial Type")

Coarse age bins.

ggplot(d_box, 
       aes(x = out_of_in, 
           y = as.numeric(searched))) + 
  geom_jitter(alpha = .3, width = .1, height = .05) + 
  stat_summary(fun.data = "mean_cl_boot", col = "red") + 
  facet_grid(age_group~block, scales = "free_x") + 
  ylab("Search Probability") + 
  xlab("Trial Type")

Models

Logistic mixed effects.

Question: should we distinguish 2 of 2 trials that are part of 2v3 vs. 2v4 blocks? Right now we don’t.

Search probability is higher in mismatch trials than baseline trials. Very clear.

mod <- glmer(searched ~ out_of_in + (1 | overall_sub_id), 
      family = "binomial", 
      data = d_box)

summary(mod)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: searched ~ out_of_in + (1 | overall_sub_id)
   Data: d_box

     AIC      BIC   logLik deviance df.resid 
   599.0    632.4   -291.5    583.0      471 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.8961 -0.6712  0.2548  0.6687  1.8584 

Random effects:
 Groups         Name        Variance Std.Dev.
 overall_sub_id (Intercept) 1.411    1.188   
Number of obs: 479, groups:  overall_sub_id, 68

Fixed effects:
              Estimate Std. Error z value Pr(>|z|)   
(Intercept)    -0.7598     0.5916  -1.284  0.19905   
out_of_in1of4   2.2944     0.8201   2.798  0.00515 **
out_of_in2of2   0.6685     0.6058   1.103  0.26981   
out_of_in2of3   1.9954     0.6282   3.176  0.00149 **
out_of_in2of4   2.0565     0.7046   2.919  0.00351 **
out_of_in3of3   0.2351     0.6192   0.380  0.70423   
out_of_in4of4   0.2958     0.6364   0.465  0.64206   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) ot__14 ot__22 ot__23 ot__24 ot__33
out_of_n1f4 -0.602                                   
out_of_n2f2 -0.918  0.590                            
out_of_n2f3 -0.883  0.578  0.869                     
out_of_n2f4 -0.806  0.517  0.798  0.772              
out_of_n3f3 -0.889  0.572  0.871  0.837  0.757       
out_of_n4f4 -0.848  0.559  0.835  0.800  0.745  0.802

Age interaction model.

d_box$age_centered <- d_box$age - mean(d_box$age)
mod_age <- glmer(searched ~ age_centered*out_of_in + (1 | overall_sub_id), 
      family = "binomial", 
      data = d_box)

Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.072486 (tol = 0.002, component 1)

summary(mod_age)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: searched ~ age_centered * out_of_in + (1 | overall_sub_id)
   Data: d_box

     AIC      BIC   logLik deviance df.resid 
   606.1    668.7   -288.1    576.1      464 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.3469 -0.6872  0.2416  0.6667  2.0359 

Random effects:
 Groups         Name        Variance Std.Dev.
 overall_sub_id (Intercept) 1.446    1.203   
Number of obs: 479, groups:  overall_sub_id, 68

Fixed effects:
                           Estimate Std. Error z value Pr(>|z|)  
(Intercept)                -1.59254    1.18290  -1.346   0.1782  
age_centered               -0.17303    0.17127  -1.010   0.3124  
out_of_in1of4               3.55793    1.51596   2.347   0.0189 *
out_of_in2of2               1.48937    1.18938   1.252   0.2105  
out_of_in2of3               2.84145    1.20091   2.366   0.0180 *
out_of_in2of4               2.63524    1.25404   2.101   0.0356 *
out_of_in3of3               1.04085    1.19698   0.870   0.3845  
out_of_in4of4               1.11693    1.20616   0.926   0.3544  
age_centered:out_of_in1of4  0.23202    0.21649   1.072   0.2838  
age_centered:out_of_in2of2  0.17793    0.17341   1.026   0.3049  
age_centered:out_of_in2of3  0.17146    0.17583   0.975   0.3295  
age_centered:out_of_in2of4  0.27417    0.19246   1.425   0.1543  
age_centered:out_of_in3of3  0.07612    0.17663   0.431   0.6665  
age_centered:out_of_in4of4  0.10548    0.17602   0.599   0.5490  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Correlation matrix not shown by default, as p = 14 > 12.
Use print(x, correlation=TRUE)  or
    vcov(x)        if you need it

optimizer (Nelder_Mead) convergence code: 0 (OK)
Model failed to converge with max|grad| = 0.072486 (tol = 0.002, component 1)

Exploring models with simpler parameterization

Let’s try doing match/mismatch.

d_box$block <- fct_relevel(d_box$block, "2v3")

mod_match <- glmer(searched ~ block * mismatch + (1 | overall_sub_id), 
      family = "binomial", 
      data = d_box)

summary(mod_match)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: searched ~ block * mismatch + (1 | overall_sub_id)
   Data: d_box

     AIC      BIC   logLik deviance df.resid 
   599.6    628.8   -292.8    585.6      472 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.9090 -0.6881  0.2573  0.6906  1.8997 

Random effects:
 Groups         Name        Variance Std.Dev.
 overall_sub_id (Intercept) 1.351    1.162   
Number of obs: 479, groups:  overall_sub_id, 68

Fixed effects:
                      Estimate Std. Error z value Pr(>|z|)    
(Intercept)            -0.2625     0.2210  -1.188    0.235    
block1v4               -0.1462     0.4537  -0.322    0.747    
block2v4               -0.1727     0.3319  -0.520    0.603    
mismatchTRUE            1.4939     0.3041   4.912 9.01e-07 ***
block1v4:mismatchTRUE   0.5052     0.7718   0.655    0.513    
block2v4:mismatchTRUE   0.1876     0.5309   0.353    0.724    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) blck14 blck24 msTRUE b14:TR
block1v4    -0.254                            
block2v4    -0.406  0.116                     
mismtchTRUE -0.405  0.192  0.290              
blck14:TRUE  0.155 -0.470 -0.105 -0.371       
blck24:TRUE  0.226 -0.107 -0.519 -0.538  0.208

Age interaction model.

mod_match_age <- glmer(searched ~ block * mismatch * age + (1 | overall_sub_id), 
      family = "binomial", 
      data = d_box)

Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.0677731 (tol = 0.002, component 1)

summary(mod_match_age)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: searched ~ block * mismatch * age + (1 | overall_sub_id)
   Data: d_box

     AIC      BIC   logLik deviance df.resid 
   605.8    660.1   -289.9    579.8      466 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.1606 -0.6733  0.2541  0.6973  1.9295 

Random effects:
 Groups         Name        Variance Std.Dev.
 overall_sub_id (Intercept) 1.315    1.147   
Number of obs: 479, groups:  overall_sub_id, 68

Fixed effects:
                          Estimate Std. Error z value Pr(>|z|)
(Intercept)                0.89651    1.16801   0.768    0.443
block1v4                   2.55440    2.48712   1.027    0.304
block2v4                  -1.87808    1.94446  -0.966    0.334
mismatchTRUE               0.48922    1.57345   0.311    0.756
age                       -0.04698    0.04649  -1.011    0.312
block1v4:mismatchTRUE     -3.06415    3.86693  -0.792    0.428
block2v4:mismatchTRUE     -1.30294    3.13241  -0.416    0.677
block1v4:age              -0.14735    0.12617  -1.168    0.243
block2v4:age               0.06500    0.07143   0.910    0.363
mismatchTRUE:age           0.04080    0.06266   0.651    0.515
block1v4:mismatchTRUE:age  0.19467    0.19149   1.017    0.309
block2v4:mismatchTRUE:age  0.05167    0.11779   0.439    0.661

Correlation of Fixed Effects:
            (Intr) blck14 blck24 msTRUE age    bl14:TRUE bl24:TRUE blc14:
block1v4    -0.253                                                       
block2v4    -0.405  0.116                                                
mismtchTRUE -0.435  0.207  0.271                                         
age         -0.982  0.248  0.401  0.431                                  
blck14:TRUE  0.176 -0.545 -0.110 -0.405 -0.174                           
blck24:TRUE  0.217 -0.104 -0.484 -0.490 -0.215  0.198                    
block1v4:ag  0.196 -0.981 -0.091 -0.163 -0.199  0.531     0.082          
block2v4:ag  0.431 -0.122 -0.985 -0.291 -0.440  0.118     0.484     0.099
msmtchTRUE:  0.431 -0.201 -0.268 -0.981 -0.443  0.396     0.477     0.163
blc14:TRUE: -0.141  0.545  0.088  0.320  0.144 -0.976    -0.157    -0.553
blc24:TRUE: -0.227  0.108  0.474  0.508  0.233 -0.205    -0.985    -0.088
            blc24: mTRUE: b14:TRUE:
block1v4                           
block2v4                           
mismtchTRUE                        
age                                
blck14:TRUE                        
blck24:TRUE                        
block1v4:ag                        
block2v4:ag                        
msmtchTRUE:  0.297                 
blc14:TRUE: -0.097 -0.325          
blc24:TRUE: -0.491 -0.514  0.168   
optimizer (Nelder_Mead) convergence code: 0 (OK)
Model failed to converge with max|grad| = 0.0677731 (tol = 0.002, component 1)

We could treat 1 of 1, 2 of 2, 3 of 3, and 4 of 4 as the same search probability. This is an empirical question, but it’s supported by the model above. Doesn’t look like there’s much difference.

If we make this assumption, we have fewer parameters. Let’s try it.

d_box$trial_type <- case_when(d_box$out_of_in %in% 
c("1of1","2of2","3of3","4of4") ~ "match", 
  TRUE ~ d_box$out_of_in)

d_box$trial_type <- fct_relevel(d_box$trial_type, "match")

mod_tt <- glmer(searched ~ trial_type + (1 | overall_sub_id), 
      family = "binomial", 
      data = d_box)

summary(mod_tt)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: searched ~ trial_type + (1 | overall_sub_id)
   Data: d_box

     AIC      BIC   logLik deviance df.resid 
   595.9    616.8   -293.0    585.9      474 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.8984 -0.6819  0.2571  0.6749  1.9486 

Random effects:
 Groups         Name        Variance Std.Dev.
 overall_sub_id (Intercept) 1.378    1.174   
Number of obs: 479, groups:  overall_sub_id, 68

Fixed effects:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)     -0.3216     0.1978  -1.626  0.10398    
trial_type1of4   1.9231     0.6485   2.966  0.00302 ** 
trial_type2of3   1.5540     0.2872   5.411 6.26e-08 ***
trial_type2of4   1.5920     0.4066   3.915 9.04e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) trl_14 trl_23
tril_typ1f4 -0.110              
tril_typ2f3 -0.300  0.106       
tril_typ2f4 -0.203  0.047  0.181

mod_tt_age <- glmer(searched ~ trial_type * age + (1 | overall_sub_id), 
      family = "binomial", 
      data = d_box)

summary(mod_tt_age)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: searched ~ trial_type * age + (1 | overall_sub_id)
   Data: d_box

     AIC      BIC   logLik deviance df.resid 
   600.8    638.3   -291.4    582.8      470 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.2332 -0.6917  0.2477  0.6832  1.9879 

Random effects:
 Groups         Name        Variance Std.Dev.
 overall_sub_id (Intercept) 1.397    1.182   
Number of obs: 479, groups:  overall_sub_id, 68

Fixed effects:
                   Estimate Std. Error z value Pr(>|z|)
(Intercept)         0.57329    1.00346   0.571    0.568
trial_type1of4     -0.46179    3.05308  -0.151    0.880
trial_type2of3      0.72843    1.47073   0.495    0.620
trial_type2of4     -1.98535    2.50528  -0.792    0.428
age                -0.03616    0.03960  -0.913    0.361
trial_type1of4:age  0.11722    0.15386   0.762    0.446
trial_type2of3:age  0.03376    0.05848   0.577    0.564
trial_type2of4:age  0.13430    0.09328   1.440    0.150

Correlation of Fixed Effects:
            (Intr) trl_14 trl_23 trl_24 age    tr_14: tr_23:
tril_typ1f4 -0.117                                          
tril_typ2f3 -0.287  0.096                                   
tril_typ2f4 -0.180  0.026  0.126                            
age         -0.980  0.115  0.283  0.180                     
trl_typ1f4:  0.090 -0.977 -0.075 -0.021 -0.093              
trl_typ2f3:  0.282 -0.095 -0.981 -0.129 -0.290  0.079       
trl_typ2f4:  0.190 -0.028 -0.136 -0.986 -0.197  0.024  0.145

Search time

Search time feels like it should be a more sensitive measure, the trouble is that it is a composite measure that includes a bunch of data that are missing not at random (because the kids that DON’T search are doing so systematically across conditions). So if we look at just non-zero search times, we get a problematic measure; the alternative is to average in a bunch of zeros. This averaging is not great because 1) arguably they represent a different process altogether and 2) having a lot of zeros keeps us from using log search time (which might be more appropriate since the times are clearly log-normally distributed). That makes these times a somewhat tricky measure and hence leads me to favor the search probabilities.

Plots

First look at the distribution of search times.

ggplot(d_box, aes(x = search_time, fill = search_time == 0)) + 
  geom_histogram()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

No logs, average all trials.

NB: all of these error bars are somewhat wrong because of non-independence of the two repeated trials.

ggplot(d_box, 
       aes(x = out_of_in, 
           y = search_time)) + 
  geom_jitter(alpha = .3, width = .1, height = 0) + 
  stat_summary(fun.data = "mean_cl_boot", col = "red") + 
  facet_wrap(~block, scales = "free_x") + 
  ylim(0,25) + 
  ylab("Search Time (s)") + 
  xlab("Trial Type")

Warning: Removed 1 rows containing non-finite values (`stat_summary()`).

Warning: Removed 1 rows containing missing values (`geom_point()`).

Search time seemed heavy-tailed so it’s a good thing to take logs. BUT - if you take logs, you can’t have 0 trials!

ggplot(filter(d_box, 
              search_time > 0), 
       aes(x = out_of_in, 
           y = log(search_time))) + 
  geom_jitter(alpha = .3, width = .1, height = 0) + 
  stat_summary(fun.data = "mean_cl_boot", col = "red") + 
  facet_wrap(~block, scales = "free_x") + 
  # scale_y_log10() + 
  # ylim(0,25) + 
  ylab("Search Time (log s)") + 
  xlab("Trial Type")

Here the result is much less clear. We see something in 1v4 but not in the other trial types.

Let’s just confirm that it’s not the logs messing things up.

ggplot(filter(d_box, 
              search_time > 0), 
       aes(x = out_of_in, 
           y = search_time)) + 
  geom_jitter(alpha = .3, width = .1, height = 0) + 
  stat_summary(fun.data = "mean_cl_boot", col = "red") + 
  facet_wrap(~block, scales = "free_x") + 
  # scale_y_log10() + 
  ylim(0,25) + 
  ylab("Search Time (s)") + 
  xlab("Trial Type")

Warning: Removed 1 rows containing non-finite values (`stat_summary()`).

Warning: Removed 1 rows containing missing values (`geom_point()`).

Models

It might be nice to see whether we detect any information about condition differences in the magnitude of search times themselves, when removing the search decisions (i.e. filtering the zeros).

We’ll use the simplest (fewest parameters) model.

mod_search <- lmer(search_time ~ trial_type + (1 | overall_sub_id), 
      data = filter(d_box, search_time > 0))

summary(mod_search)

Linear mixed model fit by REML ['lmerMod']
Formula: search_time ~ trial_type + (1 | overall_sub_id)
   Data: filter(d_box, search_time > 0)

REML criterion at convergence: 1427.1

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.0154 -0.4558 -0.1874  0.2196  8.4246 

Random effects:
 Groups         Name        Variance Std.Dev.
 overall_sub_id (Intercept)  4.424   2.103   
 Residual                   15.580   3.947   
Number of obs: 249, groups:  overall_sub_id, 66

Fixed effects:
               Estimate Std. Error t value
(Intercept)      4.0820     0.4593   8.888
trial_type1of4   1.5262     1.1607   1.315
trial_type2of3   0.2301     0.6052   0.380
trial_type2of4   1.2503     0.8663   1.443

Correlation of Fixed Effects:
            (Intr) trl_14 trl_23
tril_typ1f4 -0.231              
tril_typ2f3 -0.500  0.168       
tril_typ2f4 -0.370  0.092  0.273

Logs make a big difference.

mod_log_search <- lmer(log(search_time) ~ trial_type + (1 | overall_sub_id), 
      data = filter(d_box, search_time > 0))

summary(mod_log_search)

Linear mixed model fit by REML ['lmerMod']
Formula: log(search_time) ~ trial_type + (1 | overall_sub_id)
   Data: filter(d_box, search_time > 0)

REML criterion at convergence: 567.1

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.68420 -0.62802  0.01096  0.63494  3.11406 

Random effects:
 Groups         Name        Variance Std.Dev.
 overall_sub_id (Intercept) 0.2324   0.4821  
 Residual                   0.4236   0.6508  
Number of obs: 249, groups:  overall_sub_id, 66

Fixed effects:
               Estimate Std. Error t value
(Intercept)     1.04480    0.08774  11.909
trial_type1of4  0.38783    0.19642   1.974
trial_type2of3  0.12872    0.10145   1.269
trial_type2of4  0.29818    0.14711   2.027

Correlation of Fixed Effects:
            (Intr) trl_14 trl_23
tril_typ1f4 -0.199              
tril_typ2f3 -0.448  0.163       
tril_typ2f4 -0.334  0.078  0.277

We conclude that there probably is some information about condition in the search times. From these data, it looks less clear than from the search choice but still informative. Further, it looks in these data like the log normal is a better choice. SO - maybe we need a hurdle-type model that fits both the choice and the search time.

Hurdle models

We follow the GLMMadaptive vignette, here.

The nice thing about these models is that they model both processes. We should probably learn more about them and confirm that the covariance structures are reasonable etc.

mod_hurdle <- mixed_model(search_time ~ trial_type, 
                   random = ~ 1 | overall_sub_id, 
                   data = d_box, 
                  family = hurdle.lognormal(), 
                  n_phis = 1,
                  zi_fixed = ~ trial_type, 
                  zi_random = ~ 1 | overall_sub_id)

summary(mod_hurdle)


Call:
mixed_model(fixed = search_time ~ trial_type, random = ~1 | overall_sub_id, 
    data = d_box, family = hurdle.lognormal(), zi_fixed = ~trial_type, 
    zi_random = ~1 | overall_sub_id, n_phis = 1)

Data Descriptives:
Number of Observations: 479
Number of Groups: 68 

Model:
 family: hurdle log-normal
 link: identity 

Fit statistics:
  log.Lik      AIC      BIC
 -570.756 1165.512 1192.146

Random effects covariance matrix:
                StdDev    Corr
(Intercept)     0.4760        
zi_(Intercept)  1.2196 -0.1205

Fixed effects:
               Estimate Std.Err z-value  p-value
(Intercept)      1.0319  0.0900 11.4681  < 1e-04
trial_type1of4   0.3990  0.1959  2.0363 0.041718
trial_type2of3   0.1349  0.1014  1.3302 0.183461
trial_type2of4   0.3100  0.1477  2.0993 0.035792

Zero-part coefficients:
               Estimate Std.Err z-value    p-value
(Intercept)      0.3169  0.2030  1.5612 0.11847656
trial_type1of4  -1.9282  0.6502 -2.9656 0.00302141
trial_type2of3  -1.5550  0.2892 -5.3765    < 1e-04
trial_type2of4  -1.5878  0.4095 -3.8772 0.00010566

log(residual std. dev.):
  Estimate Std.Err
   -0.4366  0.0516

Integration:
method: adaptive Gauss-Hermite quadrature rule
quadrature points: 11

Optimization:
method: hybrid EM and quasi-Newton
converged: TRUE

Age hurdle model.

mod_hurdle_age <- mixed_model(search_time ~ trial_type * age, 
                   random = ~ 1 | overall_sub_id, 
                   data = d_box, 
                  family = hurdle.lognormal(), 
                  n_phis = 1,
                  zi_fixed = ~ trial_type * age, 
                  zi_random = ~ 1 | overall_sub_id)

summary(mod_hurdle_age)


Call:
mixed_model(fixed = search_time ~ trial_type * age, random = ~1 | 
    overall_sub_id, data = d_box, family = hurdle.lognormal(), 
    zi_fixed = ~trial_type * age, zi_random = ~1 | overall_sub_id, 
    n_phis = 1)

Data Descriptives:
Number of Observations: 479
Number of Groups: 68 

Model:
 family: hurdle log-normal
 link: identity 

Fit statistics:
   log.Lik      AIC      BIC
 -565.3987 1170.797 1215.188

Random effects covariance matrix:
                StdDev    Corr
(Intercept)     0.4538        
zi_(Intercept)  1.2275 -0.1671

Fixed effects:
                   Estimate Std.Err z-value p-value
(Intercept)          0.4901  0.4264  1.1494 0.25041
trial_type1of4       0.8329  0.9785  0.8511 0.39470
trial_type2of3      -0.6342  0.5167 -1.2273 0.21971
trial_type2of4       0.0672  0.8515  0.0790 0.93706
age                  0.0214  0.0170  1.2550 0.20949
trial_type1of4:age  -0.0211  0.0488 -0.4317 0.66596
trial_type2of3:age   0.0317  0.0210  1.5089 0.13133
trial_type2of4:age   0.0087  0.0302  0.2882 0.77320

Zero-part coefficients:
                   Estimate Std.Err z-value p-value
(Intercept)         -0.5646  1.0305 -0.5479 0.58374
trial_type1of4       0.4467  3.0596  0.1460 0.88391
trial_type2of3      -0.7278  1.4787 -0.4922 0.62257
trial_type2of4       1.9959  2.5108  0.7949 0.42666
age                  0.0356  0.0406  0.8768 0.38062
trial_type1of4:age  -0.1163  0.1540 -0.7550 0.45025
trial_type2of3:age  -0.0338  0.0588 -0.5757 0.56485
trial_type2of4:age  -0.1346  0.0936 -1.4386 0.15028

log(residual std. dev.):
  Estimate Std.Err
   -0.4447  0.0514

Integration:
method: adaptive Gauss-Hermite quadrature rule
quadrature points: 11

Optimization:
method: hybrid EM and quasi-Newton
converged: TRUE

These models require a bit more study but seem like a promising alternative.

Conclusions

Analysis comments

We have two different approaches: 1. Difference scores - calculate a difference score for each block and then model these across children. 2. Trial-level modeling - model probability of search - and perhaps search times as well - directly across all trials rather than putting these two different pieces of the measure together.

The literature in the past has favored approach #1, but there are several reasons from the statistical literature to worry about this approach.

As argued above, although there may be condition-related signal in both the search probabilities and the search times, averaging them may conflate these in sub-optimal ways. For example, we saw above that log search times showed more condition signal than raw search times, but we can’t log this measure if it includes zeros (non-search trials).
Difference scores limit the ability of models to remove covariates from the raw scores. For example, if search times vary systematically by age as a main effect, it might be more helpful to remove these via fitting this trend at a trial level vs. trying to remove it from each trial via the difference score.
More generally, modeling trial-by-trial differences makes maximal use of the mixed effects framework, which can be helpful in A) dealing with asymmetric amounts of data for each child, and B) removing child-to-child variability through child-level random effects. This second might be especially useful in modeling search times where we expect there could be systematic between-child differences.

A second choice point is whether to include all of the measures into a single statistical model (e.g. a mixed effects model) vs. whether to conduct separate tests of individual contrasts. Here the literature is very clear: a model allows for variation due to other aspects of the data to be better estimated and removed from individual condition estimates. Performing, for example, individual age-group t-tests vs. chance is likely to give at best noisy estimates, with the classic flaw that \(p>.05\) does not allow acceptance of the null hypothesis of no difference.

Design comments

One main question of interest for the study is whether 2v4 develops later than 1v4. But right now, the design doesn’t show 2v4 to the youngest children and doesn’t show 1v4 to the older children. Instead, the goal is to find the appropriate age range for each to find emergence. This strategy has a couple of issues. First, there is some risk that the age ranges chosen will be incorrect, e.g. all younger children will not succeed on 1v4 and the point of emergence will not be found. Second, few or no children will do both trials, making within-child comparisons of the two contrasts (which are actually the thing you want to study) impossible.

An alternative design would show all trial types to all children. This straightforward design would allow estimation of the developmental trajectory for each contrast. If there is a concern about task length, one possibility would be to do a between-subjects manipulation for the younger children (1v4 for some and 2v4 for others). Or another (probably better) possibility would be to do only a single trial of each block type for younger children. Again, the more you can estimate the difference between trial types within child, the stronger the claim will be that these trial types develop on a different timeline.

The tradeoff here is between having each child have A) two trials for a subset of trial types vs. B) one trial for all trial types. Here the question is which of these we think contributes more variance - between child variability vs. between trial variability. For infants, these are approximately equal (e.g., deBolt et al.), but for older children and adults typically between-subject variance is much higher and so the recommendation is to use within-subject designs (see Experimentology Chapter 9; also this blogpost by Lakens).