library(knitr)

opts_chunk$set(echo = T, message = F, warning = F, 
               error = F, cache = F, tidy = F)

library(tidyverse)
library(langcog)
library(feather)

theme_set(theme_classic(base_size = 20))

The goal of this analysis is to test the hypothesis that children learn words that are subordinates to known words at the previous timepoint. To do this, I analyzed the vocabulary for each kid at each timepoint using word embeddings trained on English Wikipedia.

Here the baseline is sample of all words that kid could have known at current hypernym level (previously: sample of all other words that kids actually knows).

“Inclusive” means the x-axis refers to words the hypernym level of the anchor words known at the previous timepoint and all smaller hypernym values.

“Exclusive” means means the x-axis refers to the exact hypernym level of the anchor words at the previous timestamp.

ITEM_KEY <- "data/item_key.csv"
item_key <- read_csv(ITEM_KEY)
HYP <- "data/wordbank_hypernyms.csv"
hypernyms <- read_csv(HYP)  %>% # foot
  filter(uni_lemma != "feet") %>%
  left_join(item_key %>% select(uni_lemma, num_item_id),
            by = "uni_lemma") 

hyp_counts <- count(hypernyms, hypernyms)

Closesness of learned words to hypernym at previous timepoint

These plots show Actual - Random as a function of hypernym level at previous timepoint.

N words known close to hypernym at previous timepoint

This analysis asks how many words a kid learned that are close to the hypernyms at the previous timepoint. These plots actual - random, averaged across all the anchor hypernyms.