Goal 2: Compare to Wordbank AoAs

Load AoAs - this is from aoa_prediction.R, which I assume I wrote at some point but can’t remember.

items <- wordbankr::get_item_data(language = "English (American)") 

ws_data <- wordbankr::get_instrument_data(language = "English (American)", 
                                          form = "WS", 
                                          administrations = TRUE, 
                                          items = items$item_id[items$form == "WS"]) %>%
  right_join(items)
wg_data <- wordbankr::get_instrument_data(language = "English (American)", 
                                          form = "WG", 
                                          administrations = TRUE, 
                                          items = items$item_id[items$form == "WG"]) %>%
  right_join(items)

wordbank_data <- bind_rows(ws_data, wg_data) %>%
  mutate(produces = value == "produces", 
         form = "both", 
         num_item_id = definition, # stupid stuff to make fit_aoa work on joint data
         item_id = definition) %>%
  filter(type == "word")

aoas <- wordbankr::fit_aoa(wordbank_data, 
                           measure = "produces", 
                           method = "glmrob",
                           age_min = 8, 
                           age_max = 36)
saveRDS(aoas, here("explorations","data","aoas.rds"))

Load these from cache since it’s time-consuming.

aoas <- readRDS(here("data","aoas.rds"))

Now let’s look at the relationship between AoAs and accuracies.

Naive AoA-accuracy

What’s the right metric to compare? Here are some ideas:

  1. average accuracy (confounded with age and study)
  2. accuracy random intercepts
  3. predicted point at which accuracy is greater than XYZ%

Let’s try each. First let’s do average accuracy.

mdf <- df_clean |>
  group_by(target_label) |>
  summarise(avg_accuracy = mean(accuracy),
            n_trials = n()) |>
  inner_join(aoas |>
              ungroup() |>
              select(definition, aoa) |>
              rename(target_label = definition))

ggplot(mdf, aes(x = aoa, y = avg_accuracy)) + 
  geom_point(alpha = .5, aes(size = n_trials)) + 
  geom_smooth(method = "lm") + 
  ggpmisc::stat_correlation() + 
  ggrepel::geom_label_repel(aes(label = target_label))

There’s a correlation, but it goes in the WRONG direction!

ggplot(filter(mdf, n_trials > 200), 
       aes(x = aoa, y = avg_accuracy)) + 
  geom_point(alpha = .5, aes(size = n_trials)) + 
  geom_smooth(method = "lm") + 
  ggpmisc::stat_correlation() + 
  ggrepel::geom_label_repel(aes(label = target_label))

The correlation only gets stronger when we subset to items about which we have a lot of data. What it looks like is that we have some serious confounding such that the lower-accuracy items are easier items used in studies with younger kids.

Let’s try methods 2 and 3.

ranefs <- tibble(ranef(lmm_fit)$target_label) |>
  rename(intercept = `(Intercept)`,
         slope = `log_age_centered`)
ranefs$target_label <- rownames(ranef(lmm_fit)$target_label) 

mdf <- left_join(mdf, ranefs)

Extract “AoAs” from predicted curves. We’re going to get the point at which a word crosses the elogit = 1 line as “the age of acquisition” - approx the point when the proportion accuracy hits 75%. We’ll also look at the elogit value at age 36 months as different proxy. (These points are marked by the two red lines below).

words <- df_clean |>
  group_by(target_label) |> 
  count() 
  
hf_words = words |>
  filter(n > 200)

preds <- expand_grid(
  animate_target = FALSE, 
  animate_distractor = FALSE, 
  log_age_centered = seq(min(df_clean$log_age_centered),
                max(df_clean$log_age_centered), .1),
  target_label = unique(df_clean$target_label))

preds <- preds |>
  mutate(.pred = predict(lmm_fit, 
                         type = "response",
                         re.form = ~  (log_age_centered | target_label), 
                         newdata = preds), 
         age = exp(log_age_centered + mean(df_clean$log_age)))

ggplot(filter(preds, target_label %in% hf_words$target_label),
       aes(x = age, y = .pred, col = target_label)) + 
  geom_line() + 
  geom_hline(lty = 2, yintercept = 0) + 
  geom_hline(lty = 3, yintercept = 1, col = "red") + 
  geom_vline(lty = 3, xintercept = 36, col = "red") + 
  xlab("Age (months)") + 
  ggthemes::theme_few()

Now join in these curves and plot.

mdf <- left_join(mdf, 
                 preds |>
                   group_by(target_label) |>
                   summarise(mod_aoa = age[.pred > 1][1],
                             mod_elogit_36_mo = .pred[age > 36][1]))

mdf_long <- mdf |> 
  pivot_longer(cols = -c("target_label", "aoa", "n_trials"), 
               names_to = "predictor", 
               values_to = "value")

ggplot(mdf_long, 
       aes(x = aoa, y = value)) + 
  geom_point(alpha = .5, aes(size = n_trials)) + 
  geom_smooth(method = "lm") + 
  ggpmisc::stat_correlation() + 
  # ggrepel::geom_label_repel(aes(label = target_label)) + 
  facet_wrap(~predictor, scales="free_y")

Ugh. Surprising that these are not more related. Let’s filter down to the words about which we have any data.

ggplot(filter(mdf_long, n_trials > 200),
       aes(x = aoa, y = value)) + 
  geom_point(alpha = .5, aes(size = n_trials)) + 
  geom_smooth(method = "lm") + 
  facet_wrap(~predictor, scales="free_y") + 
  ggpmisc::stat_correlation() + 
  # ggrepel::geom_label_repel(aes(label = target_label)) + 
  xlab("Wordbank AoA") + 
  ylab("Peekbank model measure")

So going through these:

  • avg_accuracy goes the opposite direction of what we wanted – accuracy higher for later aquired words. probably due to confounding as discussed above.
  • intercept and slope - these are the target_word random effects from the model. they should be higher for earlier acquired words. these also appear to go in the opposite direction from what we’d predict.
  • mod_aoa - this should be the same as wordbank AoA - in other words, a positive relation. Again, appears negative.
  • mod_elogit_36_mo this should be another measure of accuracy, so higher for low AoA words. again, goes in the wrong direction.

In sum, ALL of these go in the wrong direction. What gives? Let’s zoom in on model AoA.

ggplot(filter(mdf_long, predictor == "mod_aoa", n_trials > 200),
       aes(x = aoa, y = value)) + 
  geom_point(alpha = .5, aes(size = n_trials)) + 
  geom_smooth(method = "lm") + 
  ggpmisc::stat_correlation() + 
  ggrepel::geom_label_repel(aes(label = target_label)) +
  xlab("Wordbank AoA") + 
  ylab("Peekbank model measure")

What we see is that ball is consistently bad, as we observed throughout. dog also comes out surprisingly bad! We do see that book, baby, car are more easily recognized however.

In sum, we’re not seeing ANY successes in linking peekbank to wordbank.