knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE)
options(dplyr.summarise.inform = FALSE)

# Results

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.1.0     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(widyr)
library(ggthemes)
library(devtools)

## Loading required package: usethis

#install_github("langcog/langcog")
library(ggstance)

## 
## Attaching package: 'ggstance'

## The following objects are masked from 'package:ggplot2':
## 
##     geom_errorbarh, GeomErrorbarh

library(langcog)

## 
## Attaching package: 'langcog'

## The following objects are masked from 'package:ggthemes':
## 
##     scale_color_solarized, scale_colour_solarized, scale_fill_solarized

## The following object is masked from 'package:base':
## 
##     scale

library(ggdendro)
library(ggplot2)
library(glue)

walk(list.files("scripts", pattern = "*.R$", full.names = TRUE), source)

## New names:
## * form -> form...5
## * `unilemma coverage (proportion of tokens that are mapped to a unilemma)` -> `unilemma coverage (proportion of tokens that are mapped to a unilemma)...6`
## * form -> form...7
## * `unilemma coverage (proportion of tokens that are mapped to a unilemma)` -> `unilemma coverage (proportion of tokens that are mapped to a unilemma)...8`
## * form -> form...9
## * ...

## Rows: 80 Columns: 28

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (14): iso_name, wordbank, form...5, form...7, form...9, form...11, child...
## dbl (10): wordbank kids, wordbank kids / 1000, unilemma coverage (proportion...
## lgl  (4): wordbank and childes, in aoa-pred, current targets, new

## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

$Plotting each predictor with AOA for all languages, within each lexical category. N_types is unexpected, maybe because it relies on matching of definitions to CHILDES words which is not precise enough.$

Plotting each predictor with AOA for all languages, within each lexical category. N_types is unexpected, maybe because it relies on matching of definitions to CHILDES words which is not precise enough.

## # A tibble: 50 × 7
##    language           uni_lemma tokens    n_types freq_last freq_first freq_solo
##    <chr>              <chr>     <list>      <int>     <dbl>      <dbl>     <dbl>
##  1 English (American) a         <chr [5]>       5     3.20      0.792      2.91 
##  2 English (American) a         <chr [5]>       5     3.20      0.792      2.91 
##  3 English (American) a lot     <chr [8]>       8     0.563     0.0908     1.09 
##  4 English (American) about     <chr [2]>       2     1.28      1.50       3.23 
##  5 English (American) above     <chr [1]>       1     1.25     -0.895      0.148
##  6 English (American) after     <chr [2]>       2     1.52     -0.535      1.35 
##  7 English (American) airplane  <chr [13…      13     2.77      0.793      2.61 
##  8 English (American) all       <chr [7]>       7     1.52      0.111      3.00 
##  9 English (American) all gone  <chr [2]>       2    -0.257    -0.358     -2.13 
## 10 English (American) alligator <chr [5]>       5    -0.340    -0.166     -1.31 
## # … with 40 more rows

$To illustrate the structure of our analysis, we first describe the results for English data, as the main effect coefficients in predicting words' developmental trajectories for English comprehension and production data. Larger coefficient values indicate a greater effect of the predictor on acquisition: positive main effects indicate that words with higher values of the predictor tend to be understood/produced by more children, while negative main effects indicate that words with lower values of the predictor tend to be understood/produced by more children. Line ranges indicates 95\% confidence intervals; filled in points indicate coefficients for which $p < 0.05$.$

To illustrate the structure of our analysis, we first describe the results for English data, as the main effect coefficients in predicting words’ developmental trajectories for English comprehension and production data. Larger coefficient values indicate a greater effect of the predictor on acquisition: positive main effects indicate that words with higher values of the predictor tend to be understood/produced by more children, while negative main effects indicate that words with lower values of the predictor tend to be understood/produced by more children. Line ranges indicates 95% confidence intervals; filled in points indicate coefficients for which $p < 0.05$.

Estimates of coefficients in predicting words’ developmental trajectories for all languages and measures. Each point represents a predictor’s coefficient in one language, with the bar showing the mean across languages. Filled in points indicate coefficients for which $p < 0.05$.

English predictor effects

Cross-linguistic predictor effects

$Correlations of coefficient estimates between languages. Each point represents the mean of one language's coefficients' correlation with each other language's coefficients, with the vertical line indicating the overall mean across languages. The shaded region and line show a bootstrapped 95\% confidence interval for a randomized baseline where predictor coefficients are shuffled within language.$

Correlations of coefficient estimates between languages. Each point represents the mean of one language’s coefficients’ correlation with each other language’s coefficients, with the vertical line indicating the overall mean across languages. The shaded region and line show a bootstrapped 95% confidence interval for a randomized baseline where predictor coefficients are shuffled within language.

Dendrograms of the similarity structure among languages’ coefficients.

LEXICAL CATEGORY : Estimates of effects in predicting words’ developmental trajectories for each language, measure, and lexical category (main effect of predictor + main effect of lexical category + interaction between predictor and lexical category). Each point represents a predictor’s effect in one language, with the bar showing the mean across languages.

Cross validation for American English

Cross validation across languages: English and Mandarin

Cross validation for American English and Mandarin Taiwanese

Cross-validation across all languages

plots

English predictor effects

Cross-linguistic predictor effects

Cross validation across languages: English and Mandarin

Cross-validation across all languages