Parents Change the Way they Use Words as Children Develop

introduction

We study change in the way parents use words when they talk to children over development. Understanding this change is interesting because:

-It can reflect developmental change: Children develop both mentally (social-cognitive abilities) and physically (motor skills). This development can change the way parents interact verbally with their children. For example, it becomes possible for parents to engage in more complex discussions and to talk about new aspects of the children’s growing experience.

-It can drive developmental change: It is possible that parents adapt their speech to the cognitive abilities of their children as they develop. By talking about new concepts and meanings in a simplified way, they can make learning easier (e.g., Elman XX).

Here we focus on broad properties of change in word usage. For example: do words that belong to some syntactic/semantic categories change more than others? What are the factors that predict change? Finally, we ask if the way parents use words can drive word meaning change in children’s own representation?

Methods

This is a super-brief summary (Hang, can you add more details later?):

We used a corpus made of all English-language transcripts from CHILDES. We split this corpus by “epochs” of 2 million tokens. The “epochs” do not necessarily vary linearly with time. For example, the first epoch may span the first two years, whereas the last epoch may span only one year or less. We did this to control for (irrelevant) variations that might arise from differences in the training size.

We trained Word2vec on each epoch and aligned the resulting spaces using orthogonal Procrustes (see the method outlined in Hamilton et al., 2016).

We use a measure of change that quantifies the extent to which words have been displaced between the first and last epochs.

IMPORTANT: We add a control corpus, made of exactly the same utterances as the “real” corpus, but these utterances were shuffled across time. Thus, words in the new corpus are NOT supposed to change over time. This “shuffled” corpus allows us to control for spurious change. For example, previous work has shown that less frequent words may change more because they have noisier representations (Dubossarsky et al., 2017)

We use the syntactic and semantic categories from CDI.

Some words change more than others

Change by syntactic category

Figure below shows change broken down by syntactic category. Function words are the words that change the most. Nouns are the words that change the least.

There is more change than in the control. The residual can be attributed to genuine change in the patterns of word usage.

Further, we found an interaction between syntactic_class (function_words vs. nouns) and condition (real vs. shuffled) in a linear model, showing that at least some of the disparity between word class is due to genuine change. In fact, function words express grammatical relations; they are not dependent on a specific semantic context. Thus, their meaning is not expected to change as much as content words.

Change by semantic category

Figure below shows change broken down by semantic category.

We observe within-syntactic-category variation which depends on finer-grained (semantic) classifications.

For example, within the function words, the category of quantifiers undergoes more change than the category of connecting words. Within nouns, the categories of clothes and food change less than the category of animals and people do.

Simialr to above, here the difference between semantic categories cannot be entirely captured by the control; as least part of this variability is due to genuine change. We found an interaction between “semantic_category” and “condition” using many pairs of semantic categories (e.g., animal vs. number).

Predictors of change

Frequency

Figure shows how change is predicted by frequency.

There is an effect of frequency (interaction between “freq” and “condition”)

More frequent words have more stable meanings over development.

Polysemy

Polysemous words are a possible locus of meaning change. Nevertheless, we did not find an effect of polysemy (Figure below).

Investigation of a few cases suggests that parents use a diversity of strategies when introducing these words, only one of them (the first) involves change:

Use only one sense early, and add more senses later. e.g., “fish” and “chicken” have the animal sense early and acquire the double sense of animal-food later. “bat” is first related to “ball” and “kick” and later acquire the additional animal-related sense.
Use both senses from the start. e.g., “orange”
Use only one sense e.g., “can”

Semantic density

So far, we explored how change can be predicted by the properties of individual words.

Is there also an effect of the way words are organized into semantic categories?

For example, categories that undergo most change maybe be made of words that are more loosely tied together, i.e., have lower semantic density.

Figure below shows how the density of a semantic category (characterized as the average of pairwise similarity between its member words) predicts change at the word level.

We found an effect of density (interaction between “density” and “condition”).

For illustration, I show how the density of a category predicts its change (average change over words belonging to the category) at the category level. For example, we see that the categories of clothes and food are denser and change less than the categories of animals and people.

Comparing predictors

When we put all the predictors in the same mixed-effects model, we found an effect of both “frequency” and “density”, as well as a three-way interaction between “frequency”, “density” and “condition”.

Note: when using “freq” as a slope in the random factor, the model tends to overfit (see warning at the end “Singular fit”). To be sure, I will run a bayesian regression later.

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: value ~ freq * density * corpus + (freq | lexical_class)
##    Data: data_for_model
## 
## REML criterion at convergence: 2368
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.6615 -0.5472 -0.0991  0.3375  5.2251 
## 
## Random effects:
##  Groups        Name        Variance Std.Dev. Corr 
##  lexical_class (Intercept) 0.041348 0.20334       
##                freq        0.008637 0.09294  -1.00
##  Residual                  0.445047 0.66712       
## Number of obs: 1154, groups:  lexical_class, 5
## 
## Fixed effects:
##                              Estimate Std. Error        df t value
## (Intercept)                    1.9968     0.1396   17.0061  14.307
## freq                          -1.3776     0.1131   89.6349 -12.179
## density                       -4.1876     0.3006 1041.6517 -13.930
## corpusshuffled                -1.6884     0.1332 1141.8325 -12.672
## freq:density                   2.3023     0.2862  949.6254   8.045
## freq:corpusshuffled            0.7576     0.1415 1141.8782   5.354
## density:corpusshuffled         1.8765     0.3715 1141.8520   5.051
## freq:density:corpusshuffled   -1.2732     0.3780 1141.9048  -3.368
##                             Pr(>|t|)    
## (Intercept)                 6.51e-11 ***
## freq                         < 2e-16 ***
## density                      < 2e-16 ***
## corpusshuffled               < 2e-16 ***
## freq:density                2.56e-15 ***
## freq:corpusshuffled         1.04e-07 ***
## density:corpusshuffled      5.10e-07 ***
## freq:density:corpusshuffled 0.000781 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) freq   densty crpssh frq:dn frq:cr dnsty:
## freq        -0.328                                          
## density     -0.722  0.103                                   
## corpsshffld -0.489  0.029  0.613                            
## freq:densty  0.086 -0.889 -0.119 -0.027                     
## frq:crpsshf  0.022 -0.657 -0.027 -0.022  0.682              
## dnsty:crpss  0.473 -0.024 -0.649 -0.955  0.023  0.011       
## frq:dnsty:c -0.019  0.643  0.023  0.012 -0.721 -0.960 -0.001
## convergence code: 0
## boundary (singular) fit: see ?isSingular

From the input to the children’s representations

This study only deals with change in the caregivers’ input. But we can we examine if this input may scaffold meaning change in the children’s own word representation.

Developmental researchers have found systematic changes in the children’s words meanings in at least three domains: number, color, and time (for a review, see Wagner, Tillman, & Barner, 2016).

Acquiring meaning in these domains is understood to follow, roughly speaking, two steps (see Carey, 2009 for more details):

Children acquire a placeholder structure made of a class of lexical alternatives. For example, in the number domain, children first acquire the sequence: (“one”, “two”, “three”,“four”, …., “ten”) but only as a pattern, that is, they learn that “eight” precedes “nine” in the sequence, but not necessarily that nine = eight + 1. They similarly acquire a class of lexical alternatives related to time (e.g., “second”, “minute”, “hour”,…) and colors (“red”, “blue”, “yellow”,..) before they understand what these terms exactly mean.
Chidlren “discover” the meaning of a subset of the placeholder system, e.g. that “two” = “one” + 1 and “three” = “two” + 1, and generalize to the rest of the system e.g., the fact that “eight” preceeds “nine” in the sequence actually means that “nine” = “eghit” + 1.

So, in order to arrive at an adult-like meaning in each of these domains, children have to first learn a class of lexical alternatives. Do parents talk in a way that simplifies the learning of such a class?

We test this hypothesis as follows: for each word in these three domains, we list the nearest neighbors and we count how many of these neighbors are themselves members of the same domain.

For example, if the category of “number” has 20 words in total, we take, say, the word “two” and we list its 20 nearest neighbors. Then we determine the percentage of other number terms in this list. We repeat the same procedure for the rest of the words in the number category.

For each category, we derive a measure of “purity” as the average of this percentage over words in this category.

If purity is high, it shows that, in principle, it is possible to learn a system of lexical alternatives from the input.

The Figure below shows the purity score in the first and last epochs of parental input.

Finding 1: While the average purity is around 0.3, it is around 0.75 in “number” and “color”, showing that parents use terms from each of these domains in highly predictive contexts, thus simplifying their acquisition as a class of contrastive alternatives.

The score for “time” was not as high as the other domains, but this is probably due to the fact that, unlike “number” and “color”, the time words (as classified in CDI) do not form a unique system of contrasting alternatives. Instead, we find several sub-systems such as (“yesterday”, “today,”tonight“,”tomorrow“), (”before“,”now“,”after“,”later“), (”day“,”morning“,”night"). Using one of these sub-systems would be too noisy because of their small size, but I should show some examples …

Findsing 2: There is almost no change in the purity between the first and last epoch, offering a rather stable learning cue.

Limitations and Future work

This work follows the method used in Hamilton et al (2016) as well as others (Kulkarni et al., 2015; Kim et al., 2014).

This method is still widely used to study meaning change. It consists in grouping the data into time bins and training the embeddings separately on these bins.

However, this method does not allow us to study fine-grained temporal change: as the bins become smaller, the models become noisier.

In the future, we may want to use some sort of dynamic work embedding (e.g., Rudolph and Blei, 2018)