Explorations in bilingual language exposure

Author

Alvin W.M. Tan

Published

February 6, 2024

Here we explore the relationship between language exposure in bilinguals and vocabulary size. There are two specific questions to ask:

  1. Is the effect of language exposure non-linear?
  2. Are there well defined cut-off points (e.g., 20%), as has often been claimed for someone to be considered bilingual?

We pull data from bilingual datasets in Wordbank, including longitudinal data. We control for age by transforming expressive vocabulary sizes into percentiles based on GAMLSS fits of monolingual data (this is not ideal, but eliminates potential variance that may arise from non-uniform or non-representative sampling across language exposures). We plot LOESS fits on the data as an exploratory visualisation (as we had no a priori prediction about the shape of the data). Broadly, the effect of language exposure appears to be mostly linear with no particular change in shape close to the extremums, providing evidence against both questions posed above.

English–French data

English plot

French plot

English–French percentile scatterplot

English–Spanish data

English plot

Spanish plot

English–Spanish percentile scatterplot

English–Hebrew data

Very small amount of data (\(N=40\)) with very little spread over exposure values; feel free to ignore.

English plot

Hebrew plot