Our goal here is to model trial-level variation in accuracy for familiar word recognition across all the experiments in Peekbank.

Let’s move forward with ALL THE DATA (tm). Window will be something like 500 – 4000 based on the above analysis (which is incomplete, but which suggested that using all the data was better than not).

df <- d_trial |>
    filter(t_norm > 500, t_norm < 4000) |>
    group_by(dataset_name, dataset_id, administration_id, 
             age, stimulus_id, target_label, distractor_label) |>
    summarise(accuracy = mean(correct[t_norm > 0], na.rm=TRUE),
              prop_data = mean(!is.na(correct[t_norm > 0])), 
              target = sum(correct[t_norm > 0], na.rm=TRUE),
              distractor = sum(!correct[t_norm > 0], na.rm=TRUE), 
              elogit = log( (target + .5) / (distractor + .5) )) 
df <- df[complete.cases(df),]

Descriptives

How much data is there?

df %>%
  group_by(dataset_name) %>%
  count() %>%
  arrange(desc(n)) %>%
  knitr::kable()
dataset_name n
adams_marchman_2018 5973
fmw_2013 1790
reflook_socword 1505
attword_processed 1059
xsectional_2007 1007
newman_genderdistractor 711
mahr_coartic 621
potter_canine 515
perry_cowpig 504
pomper_saffran_2016 471
frank_tablet_2016 461
swingley_aslin_2002 420
garrison_bergelson_2020 395
bacon_gendercues 380
ronfard_2021 296
reflook_v4 277
pomper_yumme 255
pomper_salientme 248
newman_sinewave_2015 150
potter_remix 106
df %>%
  group_by(target_label) %>%
  count() %>%
  arrange(desc(n)) %>%
  DT::datatable()

The dependent variable

How are we going to model accuracies? The trouble is that we have a very odd dependent variable.

ggplot(df, aes(x = accuracy)) + 
  geom_histogram()

ggplot(df, aes(x = log(accuracy))) + 
  geom_histogram()

What about going back to the raw proportions?

ggplot(df, aes(x = elogit)) + 
  geom_histogram()

And consider filtering 0/1 observations just to see if we can get a decent distribution.

ggplot(filter(df, accuracy > 0, accuracy < 1), 
       aes(x = elogit)) + 
  geom_histogram()

That looks lovely.

df_clean <- filter(df, accuracy > 0, accuracy < 1)

Target for modeling

Plot all trials by all participants.

ggplot(df_clean, aes(x = age/12, y = elogit)) + 
  geom_point(alpha = .01) + 
  geom_smooth() + 
  geom_hline(lty = 2, yintercept = 0) + 
  ggthemes::theme_few()

Try breaking this down by word.

words <- df_clean |>
  group_by(target_label) |> 
  count() 
  
hf_words = words |>
  filter(n > 200)

comparison_plot <- ggplot(filter(df_clean, target_label %in% hf_words$target_label),
       aes(x = age/12, y = elogit, col = target_label)) + 
  geom_jitter(alpha = .05, width = .02, height = 0) + 
  geom_smooth(se=FALSE) + 
  geom_hline(lty = 2, yintercept = 0)+ 
  ggthemes::theme_few()
comparison_plot

Model comparison

What follows is an iterative model building/comparison exercise. The goal is really to build up something that is interpretable as a model of variation in the LWL task. You might think this could be done programmatically as opposed to with cut-and-paste as I did it here. In my defense, I spent a lot of time with tidymodels and workflows and found that there were some bugs in the predict workflow that were insurmountable for lme4. In particular, we couldn’t predict with different random effect structures, meaning that one critical function that we were trying to fulfill here was not doable under tidymodels. Fail. Hence the copy/paste.

lmm_fit <- lmer(elogit ~ age + 
                  (1 | administration_id), 
                data = df_clean)

preds <- expand_grid(
  age = seq(min(df_clean$age),max(df_clean$age),1),
  target_label = hf_words$target_label)

preds <- preds |>
  mutate(.pred = predict(lmm_fit, 
                         type = "response",
                         re.form = ~  0, 
                         newdata = preds))

ggplot(df_clean,
       aes(x = age, y = elogit)) + 
  geom_point(alpha = .1) + 
  geom_smooth() + 
  geom_line(data = preds, 
            aes(x = age, y = .pred, col = target_label)) + 
  geom_hline(lty = 2, yintercept = 0) + 
  xlab("Age (months)") + 
  ggthemes::theme_few() 

Now let’s put in some random effects.

lmm_fit <- lmer(elogit ~ age + 
                  (1 | administration_id) + 
                  (1 | dataset_name) + 
                  (1 | target_label), 
                data = df_clean)

preds <- expand_grid(
  age = seq(min(df_clean$age),max(df_clean$age),1),
  target_label = hf_words$target_label)

preds <- preds |>
  mutate(.pred = predict(lmm_fit, 
                         type = "response",
                         re.form = ~  (1|target_label), 
                         newdata = preds))

ggplot(df_clean,
       aes(x = age, y = elogit)) + 
  geom_point(alpha = .1) + 
  geom_line(data = preds, 
            aes(x = age, y = .pred, col = target_label)) + 
  geom_hline(lty = 2, yintercept = 0) + 
  xlab("Age (months)") + 
  ggthemes::theme_few() +
  geom_smooth()

Random slopes by age.

lmm_fit <- lmer(elogit ~ age + 
                  (1 | administration_id) + 
                  (1 | dataset_name) + 
                  (age | target_label), 
                data = df_clean)

preds <- expand_grid(
  age = seq(min(df_clean$age),max(df_clean$age),1),
  target_label = hf_words$target_label)

preds <- preds |>
  mutate(.pred = predict(lmm_fit, 
                         type = "response",
                         re.form = ~  (age|target_label), 
                         newdata = preds))

main_plot <- ggplot(df_clean,
       aes(x = age, y = elogit)) + 
  geom_point(alpha = .1) + 
  geom_line(data = preds, 
            aes(x = age, y = .pred, col = target_label)) + 
  geom_hline(lty = 2, yintercept = 0) + 
  xlab("Age (months)") + 
  ggthemes::theme_few() 

plot_grid(main_plot + scale_color_discrete(guide = "none"), 
          comparison_plot + scale_color_discrete(guide = "none"))

Seems like we should try to accommodate the curvature in the age slopes. One possibility is to consider log(age) as a predictor. We will center age to increase convergence and swap to bobyqa optimizer.

df_clean$log_age <- log(df_clean$age)
df_clean$log_age_centered <- df_clean$log_age - mean(df_clean$log_age)

lmm_fit <- lmer(elogit ~ log_age_centered + 
                  (1 | administration_id) + 
                  (1 | dataset_name) + 
                  (log_age_centered | target_label), 
                data = df_clean, 
                control = lmerControl(optimizer="bobyqa"))

preds <- expand_grid(
  log_age_centered = seq(min(df_clean$log_age_centered),
                max(df_clean$log_age_centered), .1),
  target_label = hf_words$target_label)

preds <- preds |>
  mutate(.pred = predict(lmm_fit, 
                         type = "response",
                         re.form = ~  (log_age_centered | target_label), 
                         newdata = preds), 
         age = exp(log_age_centered + mean(df_clean$log_age)))

main_plot <- ggplot(df_clean,
       aes(x = age, y = elogit)) + 
  geom_point(alpha = .1) + 
  geom_line(data = preds, 
            aes(x = age, y = .pred, col = target_label)) + 
  geom_hline(lty = 2, yintercept = 0) + 
  xlab("Age (months)") + 
  ggthemes::theme_few()

plot_grid(main_plot + scale_color_discrete(guide = "none"), 
          comparison_plot + scale_color_discrete(guide = "none"))

This ends up looking quite good in terms of the shape of the curves and their distribution. Let’s look at the random effects.

tibble(ranef(lmm_fit)$target_label) |>
  mutate(target_label = rownames(ranef(lmm_fit)$target_label)) |>
  left_join(words) |>
  rename(intercept = `(Intercept)`) |>
  select(target_label, intercept, log_age_centered, n) |>
  arrange(desc(n)) |>
  filter(n>100) |>
  knitr::kable(digits = 2)
target_label intercept log_age_centered n
dog -0.03 -0.04 1798
book 0.20 0.11 1470
ball -0.09 -0.46 1245
car 0.23 0.36 1129
cat -0.05 -0.11 1075
shoe -0.03 0.07 1063
bird -0.15 0.16 990
baby 0.25 0.12 921
banana -0.16 -0.06 400
apple -0.23 0.16 384
frog 0.34 -0.06 369
cookie 0.28 0.05 230
sock -0.25 -0.22 177
block 0.18 -0.06 149
cup -0.13 -0.17 149
horse 0.06 -0.12 135
duck 0.11 -0.03 132
keys 0.13 0.19 113

I’m worried because these don’t look at all like they reflect acquisition ordering. For example, ball has a very low slope and intercept. Let’s take a look at the top ten of these in the empirical data.

vhf_words = words |>
  filter(n > 800)

ggplot(filter(df_clean, target_label %in% vhf_words$target_label),
       aes(x = age, y = elogit)) + 
  geom_jitter(width = 1, height = 0, alpha = .01) + 
  facet_wrap(~target_label) + 
  geom_smooth(se=TRUE) + 
  scale_color_solarized() +
  geom_hline(lty = 2, yintercept = 0)+ 
  ggthemes::theme_few()

Let’s zoom in on ball.

df_clean |>
  ungroup() |>
  filter(target_label == "ball") |>
  group_by(target_label, distractor_label, dataset_name) |>
  summarise(age = mean(age)/12) |>
  arrange(age) |>
  knitr::kable(digits = 1)
target_label distractor_label dataset_name age
ball spoon garrison_bergelson_2020 1.1
ball block garrison_bergelson_2020 1.2
ball flower garrison_bergelson_2020 1.2
ball vacuum garrison_bergelson_2020 1.2
ball car garrison_bergelson_2020 1.2
ball phone garrison_bergelson_2020 1.2
ball sock garrison_bergelson_2020 1.2
ball apple swingley_aslin_2002 1.3
ball hat garrison_bergelson_2020 1.3
ball highchair garrison_bergelson_2020 1.3
ball shoe garrison_bergelson_2020 1.4
ball cup garrison_bergelson_2020 1.4
ball shoe adams_marchman_2018 1.4
ball book ronfard_2021 1.7
ball shoe fmw_2013 1.7
ball duck mahr_coartic 1.7
ball truck newman_genderdistractor 1.9
ball shoe xsectional_2007 2.0
ball cookie xsectional_2007 2.0
ball cookie potter_canine 2.0
ball car newman_sinewave_2015 2.2

We see garrison_bergelson_2020 has lots and lots of different targets. Across datasets, trends definitely go up, but there is confounding across datasets.

ggplot(filter(df_clean, target_label == "ball"),
       aes(x = age, y = elogit, col = dataset_name)) + 
  geom_jitter(width = 1, height = 0, alpha = .1) + 
  geom_smooth(method = "lm", se=FALSE) + 
  geom_hline(lty = 2, yintercept = 0)+ 
  ggthemes::theme_few()

First, we should probably model within-dataset age effects. Second, we should probably model the distractors.

Our model of distractors right now is purely linear. Some distractors are harder – such that if having car as the distractor makes you less likely to look at the target, then it’s going to have a negative coefficient. Similarly, maybe ball is a boring distractor and so you’re more likely to look. We can’t fit in age-slopes though.

lmm_fit <- lmer(elogit ~ log_age_centered + 
                  (1 | administration_id) + 
                  (log_age_centered | dataset_name) + 
                  (log_age_centered | target_label) + 
                  (1 | distractor_label), 
                data = df_clean, 
                control = lmerControl(optimizer="bobyqa"))

preds <- expand_grid(
  log_age_centered = seq(min(df_clean$log_age_centered),
                max(df_clean$log_age_centered), .1),
  target_label = hf_words$target_label)

preds <- preds |>
  mutate(.pred = predict(lmm_fit, 
                         type = "response",
                         re.form = ~  (log_age_centered | target_label), 
                         newdata = preds), 
         age = exp(log_age_centered + mean(df_clean$log_age)))

main_plot <- ggplot(df_clean,
       aes(x = age, y = elogit)) + 
  geom_point(alpha = .1) + 
  geom_line(data = preds, 
            aes(x = age, y = .pred, col = target_label)) + 
  geom_hline(lty = 2, yintercept = 0) + 
  xlab("Age (months)") + 
  ggthemes::theme_few()

plot_grid(main_plot + scale_color_discrete(guide = "none"), 
          comparison_plot + scale_color_discrete(guide = "none"))

Let’s look at the random effects one more time and see if this helps.

target_ranef <-  tibble(ranef(lmm_fit)$target_label) |>
  mutate(target_label = rownames(ranef(lmm_fit)$target_label)) |>
  left_join(words) |>
  rename(intercept = `(Intercept)`) |>
  select(target_label, intercept, log_age_centered, n) 

target_ranef |>
  arrange(desc(n)) |>
  filter(n>100) |>
  knitr::kable(digits = 2)
target_label intercept log_age_centered n
dog -0.05 -0.28 1798
book 0.28 0.12 1470
ball -0.19 -0.35 1245
car 0.19 0.39 1129
cat 0.03 -0.08 1075
shoe -0.03 0.02 1063
bird -0.06 0.12 990
baby 0.48 0.08 921
banana -0.21 -0.07 400
apple -0.21 0.01 384
frog 0.29 -0.21 369
cookie 0.22 0.14 230
sock -0.26 -0.18 177
block 0.16 -0.03 149
cup -0.13 -0.15 149
horse 0.06 -0.03 135
duck 0.07 -0.03 132
keys 0.13 0.21 113

It looks like the coefficients for ball are regularized more toward zero though probably not completely so. That makes me think that we are probably indexing a bunch of salience effects as well as true word recognition.

Just for kicks, let’s look at the distractors as well.

distractors <- df_clean |>
  group_by(distractor_label) |>
  count() 

distractor_ranef <- tibble(ranef(lmm_fit)$distractor_label) |>
  mutate(distractor_label = rownames(ranef(lmm_fit)$distractor_label)) |>
  left_join(distractors) |>
  rename(intercept = `(Intercept)`) |>
  select(distractor_label, intercept,  n) 

distractor_ranef |>
  arrange(desc(n)) |>
  filter(n>100) |>
  knitr::kable(digits = 2)
distractor_label intercept n
dog -0.21 1714
book 0.12 1495
ball 0.04 1275
car 0.03 1225
cat -0.05 1079
shoe 0.22 1046
baby -0.03 1001
bird -0.02 995
apple 0.08 390
frog -0.04 386
banana 0.04 348
cookie -0.13 215
sock -0.03 152
block -0.05 151
duck -0.02 144
cup 0.02 142
keys -0.01 112

This looks like animates are negative (harder) and inanimates are positive (easier) for the most part.

Let’s add a fixed effect for animacy of target and distractor to try and soak up these preference effects systematically.

sort(unique(c(unique(df_clean$target_label),
              unique(df_clean$distractor_label))))
##   [1] "animal"      "apple"       "baby"        "ball"        "balloon"    
##   [6] "banana"      "bathtub"     "bear"        "bed"         "belt"       
##  [11] "bib"         "bike"        "bird"        "blanket"     "block"      
##  [16] "blueberries" "boat"        "book"        "boot"        "bottle"     
##  [21] "bowl"        "box"         "boy"         "broom"       "brush"      
##  [26] "bucket"      "bunny"       "bus"         "cake"        "can"        
##  [31] "car"         "carrot"      "cat"         "chair"       "chalk"      
##  [36] "chicken"     "clock"       "coat"        "comb"        "cookie"     
##  [41] "cow"         "crib"        "cup"         "diaper"      "dog"        
##  [46] "donut"       "door"        "dress"       "drink"       "drum"       
##  [51] "duck"        "elephant"    "fan"         "fish"        "flag"       
##  [56] "flower"      "food"        "foot"        "frog"        "giraffe"    
##  [61] "glasses"     "glove"       "grapes"      "guitar"      "hammer"     
##  [66] "hat"         "highchair"   "horse"       "hose"        "juice"      
##  [71] "kangaroo"    "keys"        "knife"       "ladder"      "lamp"       
##  [76] "lion"        "milk"        "monkey"      "mouth"       "muffin"     
##  [81] "necklace"    "opal"        "owl"         "pacifier"    "pajamas"    
##  [86] "peas"        "phone"       "piano"       "pig"         "plate"      
##  [91] "pudding"     "puppy"       "purse"       "puzzle"      "remote"     
##  [96] "sandbox"     "scissors"    "shawl"       "sheep"       "shirt"      
## [101] "shoe"        "shorts"      "shovel"      "sippy"       "slide"      
## [106] "slipper"     "sock"        "spoon"       "strawberry"  "stroller"   
## [111] "sunglasses"  "table"       "tablet"      "tape"        "teddy"      
## [116] "tiger"       "toothbrush"  "toy"         "train"       "truck"      
## [121] "vacuum"      "wagon"       "water"       "waterbottle" "whale"      
## [126] "zebra"       "zipper"
animates <- c("anima","baby","dog","bear","bird","boy", 
              "bunny","cat","chicken","cow","dog","duck","elephant",
              "fish","frog","giraffe","horse","kangaroo","lion","monkey",
              "owl","pig","puppy","sheep","teddy","tiger","whale","zebra")

df_clean$animate_target <- df_clean$target_label %in% animates
df_clean$animate_distractor <- df_clean$distractor_label %in% animates

Now let’s add animacy of targets and distractor to the model as well as interactions with age.

lmm_fit <- lmer(elogit ~ log_age_centered * animate_target + 
                  log_age_centered * animate_distractor +
                  (1 | administration_id) + 
                  (log_age_centered | dataset_name) + 
                  (log_age_centered | target_label) + 
                  (1 | distractor_label), 
                data = df_clean, 
                control = lmerControl(optimizer="bobyqa"))

preds <- expand_grid(
  animate_target = c(TRUE, FALSE), 
  animate_distractor = c(TRUE, FALSE), 
  log_age_centered = seq(min(df_clean$log_age_centered),
                max(df_clean$log_age_centered), .1),
  target_label = hf_words$target_label)

preds <- preds |>
  mutate(.pred = predict(lmm_fit, 
                         type = "response",
                         re.form = ~  (log_age_centered | target_label), 
                         newdata = preds), 
         age = exp(log_age_centered + mean(df_clean$log_age)))

main_plot <- ggplot(df_clean,
       aes(x = age, y = elogit)) + 
  geom_point(alpha = .1) + 
  geom_line(data = filter(preds, !animate_target, !animate_distractor), 
            aes(x = age, y = .pred, col = target_label)) + 
  geom_hline(lty = 2, yintercept = 0) + 
  xlab("Age (months)") + 
  ggthemes::theme_few()

plot_grid(main_plot + scale_color_discrete(guide = "none"), 
          comparison_plot + scale_color_discrete(guide = "none"))

Now when we make predictions, we need to make the predictions with or without these animacy effects. It’s kind of funny to look at a “dog” prediction with animate_target=FALSE but that’s really like saying “what’s the predicted curve for dog knowledge independent of perceptual biases.” Let’s look at the model fits first, and then the random effects.

knitr::kable(summary(lmm_fit)[[10]], digits = 2)
Estimate Std. Error t value
(Intercept) 0.66 0.07 9.65
log_age_centered 1.05 0.16 6.73
animate_targetTRUE 0.20 0.09 2.27
animate_distractorTRUE -0.21 0.06 -3.58
log_age_centered:animate_targetTRUE -0.25 0.12 -2.06
log_age_centered:animate_distractorTRUE 0.24 0.10 2.48

We can see that there is a pretty sizable effect for each, such that:

  1. with an animate target, there is a positive effect on accuracy (animate bias)
  2. with an animate distractor, there is a negative effect on accuracy (animate bias again)
  3. the animate target bias decreases with log age, so it’s bigger for younger kids
  4. the animate distractor bias increases (becomes less negative) with log age, so it’s also bigger for younger kids

So in sum, there are big animate biases, especially for younger kids. That seems really important.

Tried fitting an interaction (animate target X animate distractor) to see if they canceled out but the effect was negligible.

Let’s see how the word-level random effects look now. Really they are quite different!

target_ranef <-  tibble(ranef(lmm_fit)$target_label) |>
  mutate(target_label = rownames(ranef(lmm_fit)$target_label)) |>
  left_join(words) |>
  rename(intercept = `(Intercept)`) |>
  select(target_label, intercept, log_age_centered, n) 

target_ranef |>
  arrange(desc(intercept)) |>
  filter(n>100) |>
  knitr::kable(digits = 2)
target_label intercept log_age_centered n
baby 0.40 0.16 921
book 0.33 0.05 1470
car 0.26 0.28 1129
cookie 0.25 0.12 230
frog 0.17 -0.01 369
block 0.15 0.03 149
keys 0.13 0.15 113
shoe 0.00 0.03 1063
horse -0.01 0.00 135
duck -0.02 -0.03 132
cat -0.04 -0.03 1075
dog -0.10 -0.10 1798
bird -0.11 0.07 990
cup -0.14 -0.12 149
apple -0.14 -0.07 384
ball -0.14 -0.25 1245
banana -0.16 -0.07 400
sock -0.26 -0.17 177

Let’s save the resulting model and carry it forward.

save(file = here("data","df_clean.Rds"), df_clean)
save(file = here("data","lmm_fit.Rds"), lmm_fit)