knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE)
options(dplyr.summarise.inform = FALSE)
# Results
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.6 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.4 ✓ stringr 1.4.0
## ✓ readr 2.1.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(widyr)
library(ggthemes)
library(devtools)
## Loading required package: usethis
#install_github("langcog/langcog")
library(ggstance)
##
## Attaching package: 'ggstance'
## The following objects are masked from 'package:ggplot2':
##
## geom_errorbarh, GeomErrorbarh
library(langcog)
##
## Attaching package: 'langcog'
## The following objects are masked from 'package:ggthemes':
##
## scale_color_solarized, scale_colour_solarized, scale_fill_solarized
## The following object is masked from 'package:base':
##
## scale
library(ggdendro)
library(ggplot2)
library(glue)
walk(list.files("scripts", pattern = "*.R$", full.names = TRUE), source)
## New names:
## * form -> form...5
## * `unilemma coverage (proportion of tokens that are mapped to a unilemma)` -> `unilemma coverage (proportion of tokens that are mapped to a unilemma)...6`
## * form -> form...7
## * `unilemma coverage (proportion of tokens that are mapped to a unilemma)` -> `unilemma coverage (proportion of tokens that are mapped to a unilemma)...8`
## * form -> form...9
## * ...
## Rows: 80 Columns: 28
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (14): iso_name, wordbank, form...5, form...7, form...9, form...11, child...
## dbl (10): wordbank kids, wordbank kids / 1000, unilemma coverage (proportion...
## lgl (4): wordbank and childes, in aoa-pred, current targets, new
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Plotting each predictor with AOA for all languages, within each lexical category. N_types is unexpected, maybe because it relies on matching of definitions to CHILDES words which is not precise enough.
Plotting each predictor with AOA for all languages, within each lexical category. N_types is unexpected, maybe because it relies on matching of definitions to CHILDES words which is not precise enough.
Plotting each predictor with AOA for all languages, within each lexical category. N_types is unexpected, maybe because it relies on matching of definitions to CHILDES words which is not precise enough.
Plotting each predictor with AOA for all languages, within each lexical category. N_types is unexpected, maybe because it relies on matching of definitions to CHILDES words which is not precise enough.
Plotting each predictor with AOA for all languages, within each lexical category. N_types is unexpected, maybe because it relies on matching of definitions to CHILDES words which is not precise enough.
Plotting each predictor with AOA for all languages, within each lexical category. N_types is unexpected, maybe because it relies on matching of definitions to CHILDES words which is not precise enough.
Plotting each predictor with AOA for all languages, within each lexical category. N_types is unexpected, maybe because it relies on matching of definitions to CHILDES words which is not precise enough.
Plotting each predictor with AOA for all languages, within each lexical category. N_types is unexpected, maybe because it relies on matching of definitions to CHILDES words which is not precise enough.
Plotting each predictor with AOA for all languages, within each lexical category. N_types is unexpected, maybe because it relies on matching of definitions to CHILDES words which is not precise enough.
## # A tibble: 50 × 7
## language uni_lemma tokens n_types freq_last freq_first freq_solo
## <chr> <chr> <list> <int> <dbl> <dbl> <dbl>
## 1 English (American) a <chr [5]> 5 3.20 0.792 2.91
## 2 English (American) a <chr [5]> 5 3.20 0.792 2.91
## 3 English (American) a lot <chr [8]> 8 0.563 0.0908 1.09
## 4 English (American) about <chr [2]> 2 1.28 1.50 3.23
## 5 English (American) above <chr [1]> 1 1.25 -0.895 0.148
## 6 English (American) after <chr [2]> 2 1.52 -0.535 1.35
## 7 English (American) airplane <chr [13… 13 2.77 0.793 2.61
## 8 English (American) all <chr [7]> 7 1.52 0.111 3.00
## 9 English (American) all gone <chr [2]> 2 -0.257 -0.358 -2.13
## 10 English (American) alligator <chr [5]> 5 -0.340 -0.166 -1.31
## # … with 40 more rows
To illustrate the structure of our analysis, we first describe the results for English data, as the main effect coefficients in predicting words’ developmental trajectories for English comprehension and production data. Larger coefficient values indicate a greater effect of the predictor on acquisition: positive main effects indicate that words with higher values of the predictor tend to be understood/produced by more children, while negative main effects indicate that words with lower values of the predictor tend to be understood/produced by more children. Line ranges indicates 95% confidence intervals; filled in points indicate coefficients for which \(p < 0.05\).
Estimates of coefficients in predicting words’ developmental trajectories for all languages and measures. Each point represents a predictor’s coefficient in one language, with the bar showing the mean across languages. Filled in points indicate coefficients for which \(p < 0.05\).
Correlations of coefficient estimates between languages. Each point represents the mean of one language’s coefficients’ correlation with each other language’s coefficients, with the vertical line indicating the overall mean across languages. The shaded region and line show a bootstrapped 95% confidence interval for a randomized baseline where predictor coefficients are shuffled within language.
Dendrograms of the similarity structure among languages’ coefficients.
LEXICAL CATEGORY : Estimates of effects in predicting words’ developmental trajectories for each language, measure, and lexical category (main effect of predictor + main effect of lexical category + interaction between predictor and lexical category). Each point represents a predictor’s effect in one language, with the bar showing the mean across languages.
Cross validation for American English
Cross validation for American English
Cross validation for American English
Cross validation for American English
Cross validation for American English and Mandarin Taiwanese
Cross validation for American English and Mandarin Taiwanese