Explorations in bilingual language exposure
Here we explore the relationship between language exposure in bilinguals and vocabulary size. There are two specific questions to ask:
- Is the effect of language exposure non-linear?
- Are there well defined cut-off points (e.g., 20%), as has often been claimed for someone to be considered bilingual?
We pull data from bilingual datasets in Wordbank, including longitudinal data. We control for age by transforming expressive vocabulary sizes into percentiles based on GAMLSS fits of monolingual data (this is not ideal, but eliminates potential variance that may arise from non-uniform or non-representative sampling across language exposures). We plot LOESS fits on the data as an exploratory visualisation (as we had no a priori prediction about the shape of the data). Broadly, the effect of language exposure appears to be mostly linear with no particular change in shape close to the extremums, providing evidence against both questions posed above.
English–French data
English plot
French plot
English–French percentile scatterplot
English–Spanish data
English plot
Spanish plot
English–Spanish percentile scatterplot
English–Hebrew data
Very small amount of data (\(N=40\)) with very little spread over exposure values; feel free to ignore.