The “mod_2pl” files (for Spanish/English, production/comprehension) each contain a coefs_2pl dataframe of the item parameters (in mirt’s slope-intercept form), as well as a mod_2pl mirt model object, and fscores_2pl (the estimated ability parameters from Wordbank participants).
it = list() # item parameters
ab = list() # child ability
load("production/eng_ws_wg_mod_2pl_nobad.Rds")
it$en_prod <- coefs_2pl # 680 WS items
ab$en_prod <- fscores_2pl
load("production/preferredCAT_eng.Rds")
Productive CDI vocabulary of 7633 English-speaking children ages 12-36 months used to fit a 2PL model with 679 items (after pruning).
load("production/sp_ws_wg_mod_2pl_nobad.Rds")
it$sp_prod <- coefs_2pl
ab$sp_prod <- fscores_2pl
load("production/preferredCAT_sp.Rds")
Productive CDI vocabulary of 1610 Spanish-speaking children ages 12-30 months used to fit a 2PL model with 679 items (after pruning).
load("comprehension/eng_wg_mod_2pl.Rds")
it$en_comp <- coefs_2pl
ab$en_comp <- fscores_2pl
Receptive CDI:WG vocabulary of 2394 English-speaking children used to fit a 2PL model with 396 items (after pruning).
load("comprehension/sp_wg_mod_2pl.Rds")
it$sp_comp <- coefs_2pl
ab$sp_comp <- fscores_2pl
Receptive CDI:WG vocabulary of 759 Spanish-speaking children used to fit a 2PL model with 428 items (after pruning).
How correlated are parameters for comprehension and production?
en_comp_prod_match <- intersect(it$en_prod$definition, it$en_comp$definition) # 394
setdiff(it$en_comp$definition, it$en_prod$definition) # in and inside
## [1] "daddy*" "in" "inside"
sp_comp_prod_match <- intersect(it$sp_comp$definition, it$sp_prod$definition) # 389
setdiff(it$sp_comp$definition, it$sp_prod$definition)
## [1] "¡salud!" "acabar(se)" "alla/allí"
## [4] "bolsa" "brazos" "buenas día"
## [7] "cabra" "calcetines" "camión de bomberos"
## [10] "contento" "cuidado" "dedos"
## [13] "dónde está" "el (articles)" "escalera"
## [16] "hacer la meme" "llaves" "mamá/mami"
## [19] "manos (body_parts)" "manos (games_routines)" "miedo"
## [22] "mojar(se)" "no" "orejas"
## [25] "papá/papi" "pipí (coche)" "poco"
## [28] "qué" "secar(se)" "sed"
## [31] "shh" "subir" "sueño"
## [34] "también" "te" "temprano"
## [37] "trabajar" "uno dos tres" "vaso"
## [40] "ver"
# 39 items - can match some of these (e.g., escalera/s, brazo/s, oreja/s...)
all <- it$en_prod %>%
mutate(Language = "EN", Task = "Production") %>%
bind_rows(it$en_comp %>% mutate(Language = "EN", Task = "Comprehension")) %>%
bind_rows(it$sp_comp %>% mutate(Language = "SP", Task = "Comprehension")) %>%
bind_rows(it$sp_prod %>% mutate(Language = "SP", Task = "Production"))
all_wide <- all %>% group_by(Language) %>% select(-g, -u) %>%
pivot_wider(names_from = Task, values_from=c(a1, d))
m_d <- lm(d_Comprehension ~ d_Production*Language, data = all_wide)
m_a1 <- lm(a1_Comprehension ~ a1_Production*Language, data = all_wide)
all_wide <- all_wide %>% add_residuals(m_d) %>%
rename(d_resid = resid) %>%
add_residuals(m_a1) %>%
rename(a1_resid = resid)
#corrr::correlate(all_wide %>% select(d_Comprehension, d_Production))
## `geom_smooth()` using formula 'y ~ x'
Strong correlation between comprehension and production item difficulties. Outliers with easier comprehension than production seem to be mostly verbs.
## `geom_smooth()` using formula 'y ~ x'
## Joining, by = c("data_id", "Language")
## Joining, by = c("data_id", "Language")
## `geom_smooth()` using formula 'y ~ x'
Strong correlations between children’s estimated comprehension and production abilities (same or stronger correlation as comprehension vs. production sumscores?).
Fit GLM to predict vocabulary size from IRT ability (and age and sex?).
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## data_id = col_double()
## )
## Joining, by = "data_id"
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## data_id = col_double()
## )
## Joining, by = "data_id"
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## data_id = col_double()
## )
## Joining, by = "data_id"
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## data_id = col_double()
## )
## Joining, by = "data_id"
en_comp <- lm(comprehension ~ age + sex + ability,
data=demo$en_comp)
summary(en_comp)
##
## Call:
## lm(formula = comprehension ~ age + sex + ability, data = demo$en_comp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -89.948 -20.459 -7.788 17.232 157.691
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 127.7731 4.1488 30.798 <2e-16 ***
## age 0.3017 0.3008 1.003 0.316
## sexMale -0.8908 1.0995 -0.810 0.418
## ability 85.6924 0.6674 128.401 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 26.72 on 2388 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.9204, Adjusted R-squared: 0.9203
## F-statistic: 9202 on 3 and 2388 DF, p-value: < 2.2e-16
en_prod <- lm(production ~ age + sex + ability,
data=demo$en_prod)
summary(en_prod)
##
## Call:
## lm(formula = production ~ age + sex + ability, data = demo$en_prod)
##
## Residuals:
## Min 1Q Median 3Q Max
## -180.759 -47.870 -1.077 45.629 297.779
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 109.1249 6.2784 17.381 <2e-16 ***
## age 4.9944 0.2908 17.176 <2e-16 ***
## sexMale -4.7793 1.9039 -2.510 0.0121 *
## sexOther 5.3828 43.0742 0.125 0.9006
## ability 171.3591 1.3063 131.177 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 60.79 on 4157 degrees of freedom
## (3835 observations deleted due to missingness)
## Multiple R-squared: 0.9244, Adjusted R-squared: 0.9243
## F-statistic: 1.271e+04 on 4 and 4157 DF, p-value: < 2.2e-16
# cor(demo$en_comp$comprehension, demo$en_comp$ability) # .96
# cor(demo$en_comp$age, demo$en_comp$ability) # .62
The above item parameters can be used to run a CAT, for example with the mirtCAT package. Based on real data simulations, we recommend a minimum of 25 items, a maximum of 50, with termination at SE = .15, and ML scoring. In the below call, the maximally-informative (MI) start item is chosen, but we can instead supply an age-based starting item (based on mean theta for each age) if you can use that.
mirtCAT(mo = mod_2pl, criteria = 'MI', start_item = 'MI',
method = 'ML', cl = cl, #local_pattern = dat,
design = list(min_items = 25,
max_items = 50,
min_SEM = 0.15))