The goals are 1) to create a word list that is informative about both English and Spanish vocabulary size and 2) to ensure that there are sufficient doublets to estimate lexical overlap. On an IRT view, we can’t perfectly assess 2 (at least not without better bilingual CDI data), but we can assess criterion 1 - that is, we can look at whether the reduced word list is a good sub-test for the full CDI in each language. Our original concern was that the current DLL-ES test might not perform well for older or high ability kids due to the lack of abstract words, and we can test this formally.
The DLL lists are meant to be used together with the original English and Spanish MCDI short forms.
Now we need to match the DLL words to the wordbank items for which we have IRT parameters.
# use with EN WG short form (12-18 mos)
#dll1ENshort <- read_csv(here("DLL/DLL-ES1-short-English.csv")) # 79 items (81 after splitting defs)
dll1short <- dll1short_raw %>% # this is now 168 items...it has the WG short form included
mutate(english = tolower(english)) %>%
mutate(english = case_when(english == 'i' ~ 'I',
english == 'tv (television)' ~ 'TV',
english == 'water' ~ 'water (beverage)',
english == 'grandma (or word used in your family)' ~ 'grandma*',
english == 'mommy (or word used in your family)' ~ 'mommy*',
english == 'choo choo (train sound)' ~ 'choo choo',
english == 'patty cake' ~ 'pattycake',
english == 'bye or bye bye' ~ 'bye',
english == 'teddy bear' ~ 'teddybear',
english == 'his/her' ~ 'his', # hers ? (and should DLL say 'his/hers' ?)
english == 'to have' ~ 'have',
english == 'shh' ~ 'shh/shush/hush',
english == 'to sit' ~ 'sit',
english == 'to be' ~ 'be',
english == 'put, put on' ~ 'put',
english == 'to write' ~ 'write',
english == 'arms' ~ 'arm',
english == 'church (or word used in your family)' ~ 'church*',
english == 'want' ~ 'wanna/want to',
english == 'in' ~ 'inside/in', # into ?
TRUE ~ english))
dll1ENshort_num_matching = length(intersect(dll1short$english, coefs$en$definition))
# 75/81 (GK version)
# 160/168 (Sandy)
#setdiff(dll1short$english, coefs$en$definition)
# are these items also on the short form CDIs?
#length(intersect(dll1short$english, wg_short_en$word)) # all of the WG
#length(intersect(dll1short$english, ws_short_enA$word)) # 36 of the WS
# use with SP WG short form (12-18 mos)
#dll1SPshort <- read_csv(here("DLL/DLL-ES1-short-Spanish.csv")) # 67 items
dll1short <- dll1short %>% # this is now 168 items...it has the WG short form included
mutate(spanish = tolower(spanish)) %>%
separate(col = spanish, into = c("spanish", NA), sep=" \\(") %>%
mutate(spanish = case_when(spanish == 'mamá/mami' ~ 'mamá',
spanish == 'calcetines' ~ 'calcetín',
spanish == 'tomar baño / bañarse' ~ 'baño', # verb -> noun.. tomar(se) ?
spanish == 'espera' ~ 'esperar(se)', # close enough ?
spanish == 'acabar(se)' ~ 'acabar',
spanish == 'no hay más' ~ 'no hay', # or "más" ?
spanish == 'rapido' ~ 'rápido (descriptive)', # or rápido (quantifiers)
spanish == 'lastimado' ~ 'lastimar(se)', # close enough ?
spanish == 'otro' ~ 'otro/otra vez', # close enough ?
spanish == 'quiquiriqui' ~ 'quiquiriquí', # DLL missing acent
spanish == 'brazos' ~ 'brazo',
spanish == 'manos' ~ 'mano',
spanish == 'vaso' ~ 'vasos',
spanish == 'llaves' ~ 'llave',
spanish == 'adiós/byebye' ~ 'adíos/byebye', # wordbank accent is incorrect
spanish == 'uno, dos, tres' ~ 'uno dos tres...',
spanish == 'shh' ~ 'shhh',
spanish == 'ver' ~ 'ver(se)',
spanish == '¿dónde está?' ~ 'dónde', # close enough ?
TRUE ~ spanish))
dll1SPshort_num_matching = length(intersect(dll1short$spanish, coefs$sp$definition)) # was 59 on GK version
# 158 / 168
#setdiff(dll1short$spanish, coefs$sp$definition)
# no match: tostada, alimentar, sonreír, algunos, también
# are these items also on the short form CDIs?
#length(intersect(dll1short$spanish, wg_short_sp$word)) # all of the WG
#length(intersect(dll1short$spanish, ws_short_sp$word)) # 63 of the WS
# DLL short Level 2 English / Spanish
dll2short <- dll2short_raw %>% mutate(english = tolower(english)) %>% filter(!is.na(english)) %>%
mutate(english = case_when(english == 'church (or word used in your family)' ~ 'church*',
english == 'tv' ~ 'TV',
english == 'to be' ~ 'be',
english == 'to be, there is' ~ 'there', # DLL typo?
english == 'mommy' ~ 'mommy*',
english == 'in the morning' ~ 'morning',
english == 'swing' ~ 'swing (object)', # or swing (action) ?
english == 'dress' ~ 'dress (object)',
TRUE ~ english))
# not on CDI: snake, drum, rice, skirt, mustache, pot, matches, newspaper, bell, godmother,
# let's go (but "go"), know, turn on (but "on"), win
#length(intersect(dll2short$english, coefs$en$definition)) # 162 / 177
#setdiff(dll2short$english, coefs$en$definition)
dll2short <- dll2short %>%
mutate(spanish = tolower(spanish)) %>%
separate(col = spanish, into = c("spanish", NA), sep=" \\(") %>%
mutate(spanish = case_when(spanish == 'zapatos' ~ 'zapato',
spanish == 'adiós/bye bye' ~ 'adíos/byebye', # wordbank typo
spanish == 'banco' ~ 'banco (outside)', # or banco (places)
spanish == 'coca' ~ 'soda/refresco', # close enough?
spanish == 'acabar/terminar' ~ 'acabar', # or terminar
spanish == 'quiquiriqui' ~ 'quiquiriquí', # DLL missing accent
#spanish == 'debajo' ~ 'abajo', # down vs. below...close enough?
spanish == 'en la noche/esta noche' ~ 'en la noche',
spanish == 'haber' ~ 'haber (hay)',
spanish == 'ir de compras' ~ 'comprar', # close enough? or 'ir(se)'
spanish == 'llevar' ~ 'llevar(se)',
#spanish == 'no más' ~ 'más', # close enough? or 'no'..or 'no hay'
spanish == 'puede' ~ 'poder', # infinitive--close enough?
spanish == 'que' ~ 'que (connection word)',
spanish == 'rapido' ~ 'rápido (descriptive)', # or rápido (quantifiers)
TRUE ~ spanish))
length(intersect(dll2short$spanish, coefs$sp$definition)) # 153 / 177
## [1] 153
sort(setdiff(dll2short$spanish, coefs$sp$definition) )
## [1] "abrazar" "agitar" "al lado de"
## [4] "bandeja" "debajo" "desear"
## [7] "el puré de manzana" "fingir" "juegos"
## [10] "la galleta salada" "mentón" "minúsculo"
## [13] "necesitar" "no más" "perseguir"
## [16] "probar" "rápido" "rasgar"
## [19] "tierno" "trapeador" "x"
#length(intersect(ws_short_enA$word, dll2short$english)) # 100
#length(intersect(ws_short_enB$word, dll2short$english)) # ..
dll1long <- dll1long_raw %>% mutate(english = tolower(english)) %>%
mutate(english = case_when(english == 'daddy (or word used in your family)' ~ 'daddy*',
english == 'toy' ~ 'toy (object)',
english == 'swing' ~ 'swing (object)', # or swing (action) ?
english == 'dress' ~ 'dress (object)',
TRUE ~ english))
dll1long_EN_num_matching = length(intersect(dll1long$english, coefs$en$definition)) # 74 / 74 match
dll1long <- dll1long %>% mutate(spanish = tolower(spanish)) %>%
separate(col = spanish, into = c("spanish", NA), sep=" \\(") %>%
mutate(spanish = case_when(spanish == 'pipi' ~ 'pipí', # is this a match?
spanish == 'orejas' ~ 'oreja',
spanish == 'dedos' ~ 'dedo',
spanish == 'escalera' ~ 'escaleras',
spanish == 'bolsa' ~ 'bolsa (clothing)', # or bolsa (household) ?
spanish == 'papá/papi' ~ 'papá',
spanish == 'cosquillita' ~ 'cosquillitas',
spanish == 'hacer la meme' ~ 'siesta', # close enough ? or hacer?
TRUE ~ spanish))
## Warning: Expected 2 pieces. Additional pieces discarded in 2 rows [3, 4].
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [14].
dll1long_SP_num_matching = length(intersect(dll1long$spanish, coefs$sp$definition)) # 74 / 74 match
dll2long <- dll2long_raw %>% mutate(english = tolower(english)) %>%
mutate(english = case_when(english == 'daddy (or name/word used in your family)' ~ 'daddy*',
english == 'teddy bear' ~ 'teddybear',
english == 'patty cake' ~ 'pattycake', # or swing (action) ?
english == 'dress' ~ 'dress (object)',
english == 'i' ~ 'I',
english == 'penis (or word used in your family)' ~ 'penis*',
english == 'water' ~ 'water (beverage)',
english == 'orange' ~ 'orange (food)',
english == 'clock/watch' ~ 'clock', # or watch (object)
english == 'drink' ~ 'drink (action)',
english == 'feet' ~ 'foot',
english == 'picture (\"or photo\")' ~ 'picture',
english == 'buttocks/bottom (or word used in your family)' ~ 'buttocks/bottom*',
TRUE ~ english))
dll2long <- dll2long %>% mutate(spanish = tolower(spanish)) %>%
separate(col = spanish, into = c("spanish", NA), sep=" \\(") %>%
mutate(spanish = case_when(spanish == 'pipi' ~ 'pipí',
spanish == 'shh' ~ 'shhh',
spanish == 'bolsa' ~ 'bolsa (clothing)', # or bolsa (household) ?
spanish == 'qué' ~ 'qué (question_words)',
TRUE ~ spanish))
## Warning: Expected 2 pieces. Additional pieces discarded in 5 rows [2, 20, 25,
## 50, 61].
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 3 rows [80, 87,
## 121].
#length(intersect(dll2long$spanish, coefs$sp$definition))
#setdiff(dll2long$spanish, coefs$sp$definition)
English DLL items not in our wordbank IRT model: one, two, three, family, drum, good morning, also, and many. Spanish DLL items not in our wordbank IRT model: tostada, algunos, alimentar, sonreír, no hay más (although we have no and no hay, as well as más), and lastimado (but we have lastimar(se)).
DLL2-Spanish has “esta noche” translated as “at night”, but should be “tonight”. DLL also has “puede”, a conjugation of infinitive verb poder (which is in Wordbank). Wordbank and DLL also often disagree about plural, e.g. escalera, and pierna on the DLL correspond to escaleras and piernas in Wordbank. (See also brazos, manos, vasos, and llaves.)
Using data from 3717 English-speaking children 12-18 month of age from Wordbank, we test how well sumscores from the DLL-ES1 short English form + CDI:WG short form predict children’s production scores from the full CDI (WG/WS). The left panel shows full CDI scores vs. the DLL-ES1 short + CDI:WG short score, and the right panel shows the full CDI scores vs. just the CDI:WG short form score.
Overall, the correlation of children’s CDI:WG short + DLL scores and their full CDI production scores is quite high (\(r=0.98\)), but as shown above, for small vocabulary sizes the DLL score overestimates full CDI:WG production scores, while for higher full CDI:WG scores the DLL underestimates vocab size (dotted line has slope \(=160 / 395\)). However, the CDI:WG short form alone (right panel) shows a similar (and more extreme) overestimation for small vocabulary sizes.
Now we do the same for comprehension (receptive vocabulary) using Wordbank’s CDI:WG English data.
The correlation of children’s CDI:WG short + DLL scores and their full CDI comprehension scores is quite high (\(r=0.98\)), and extrapolating from DLL to full CDI:WG scores shows very little overestimation (less than when using the CDI:WG short form alone).
Using data from 6411 English-speaking children 16-30 month of age from Wordbank, we test how well sumscores from the DLL-ES1 short English form + CDI:WG short form predict children’s full production scores from the CDI:WS. The left panel shows full CDI scores vs. the DLL-ES2 short + CDI:WS short form (A) score, and the right panel shows the full CDI scores vs. just the CDI:WG short form (A) score.
Overall, the correlation of children’s CDI:WS short + DLL2 scores and their full CDI production scores is quite high (\(r=0.98\)), but as shown above, the DLL2 again mostly overestimates production scores on the full CDI (dotted line has slope \(=162 / 680\)). In comparison, the CDI:WS short form (A) score only overestimates full CDI scores for smaller vocabulary sizes (<400).
Now we look at overestimation for Spanish DLLs + CDI short forms.
Using Wordbank data from 731 Spanish-speaking children aged 12-18 months, we test how well sumscores from the DLL-ES1 short Spanish form correlate with children’s full CDI:WG production scores.
As for English, the correlation of Spanish-speaking children’s DLL scores and their full CDI:WG production scores is quite high (\(r=0.99\)), but as shown above, their DLL score overestimates the production score on the full CDI at smaller vocabulary sizes (dotted line has slope \(=161 / 428\)). Do note that few children in this dataset have large productive vocabularies.
Now we do the same for comprehension (receptive vocabulary) using Wordbank’s CDI:WG Spanish data.
The correlation of children’s CDI:WG short + DLL scores and their full CDI comprehension scores is quite high (\(r=0.98\)), and extrapolating from DLL to full CDI:WG scores shows very little overestimation–similar to the level shown by using the CDI:WG alone.
Overall, it seems that many of the items on the DLL are somewhat easier than average, and thus these forms tend to overestimate children’s full CDI scores (indeed, for items on the DLL1 English short form, the average easiness is -0.95, while the mean easiness of items not on the DLL is -2.57). This is also true of the CDI:WG short English form: the average easiness is -0.64 and the average ease of items not on the WG short form is -2.42. The CDI:WS short English form (A) is less biased towards easy items: average easiness is -1.73 vs. -2.27 for items not on the short WS. The histograms below show the distribution of easiness parameters for English (left) and Spanish (right) CDI words. Solid lines show the average ease of DLL items (DLL 1 = red, DLL 2 = orange), and dashed lines show the average of non-DLL items.
Spanish DLL1 items have an average ease of -0.81, while other items on the full CDI have a mean ease of -2.27.
Spanish DLL2 shows the least bias: items on it have an average ease of -1.29, while other CDI items have a mean of -2.35. English DLL2 items have an average ease of -1.67, while other items on the full CDI have a mean ease of -2.35.
We recommend bringing the overall mean estimated IRT difficulty of the words selected for the DLLs closer to the mean difficulty of the words on the rest of the CDI.
To start, we examine IRT easiness parameters for the doublets on the existing DLL lists, looking for items with large mismatch between their English and Spanish ease.
We want to whether assess doublet items have similar difficulty (operationalized by their IRT parameters) in English and in Spanish. For example, consider if “perro” was for some reason much more difficult than “dog”, then you wouldn’t want to include it because it wouldn’t be a good item for estimating vocabulary overlap!
Below are shown the parameters for items from the DLL Level 1 short form: en_d = English easiness, sp_d = Spanish easiness), ordered by the most to least discrepant (difficulty difference squared); (_a1 columns show item discriminations (slopes), and sp_en_d_diff simply shows difference in Spanish and English easiness parameter).
For the short form, we merely want to identify items that have very different difficulty in English and Spanish, and recommend that researchers interested in estimating conceptual overlap in bilinguals not include these items in their calculations due to the bias. Below is shown the distribution of the squared difference in difficulty for doublets from the DLL Level 1 short form.
The mean squared difference in difficulty of doublets on the DLL Level 1 short form is 2.16 (SD=3.33), and as shown above this distribution is highly skewed: most doublets are fairly well-matched (median d_diff_sq=0.99), but a few items are extremely mismatched. We recommend that the 13 items (shown below) with a squared difficulty difference of 7.17 (mean + 1.5 * SD) or more be excluded from calculations of conceptual overlap.
english | spanish | sp_a1 | sp_d | en_a1 | en_d | sp_en_d_diff | d_diff_sq |
---|---|---|---|---|---|---|---|
street | calle | 3.24 | 0.78 | 4.39 | -3.38 | 4.16 | 17.35 |
book | libro | 3.80 | -1.55 | 2.88 | 2.55 | -4.10 | 16.83 |
on | encima | 2.96 | -4.38 | 2.93 | -0.45 | -3.93 | 15.46 |
none | no hay | 2.71 | -0.29 | 2.37 | -3.85 | 3.55 | 12.63 |
hat | sombrero | 3.28 | -2.14 | 3.15 | 1.37 | -3.51 | 12.33 |
don’t | no | 1.50 | 1.81 | 1.95 | -1.61 | 3.42 | 11.70 |
water (beverage) | agua | 2.58 | 4.38 | 2.80 | 0.99 | 3.38 | 11.46 |
other | otro/otra vez | 2.34 | -0.84 | 3.09 | -4.05 | 3.22 | 10.34 |
dish | plato | 4.57 | 0.17 | 2.99 | -2.90 | 3.06 | 9.39 |
babysitter | nana | 0.97 | -1.36 | 2.41 | -4.31 | 2.94 | 8.67 |
bread | pan | 2.59 | 2.50 | 3.33 | -0.35 | 2.85 | 8.10 |
finish | acabar | 2.51 | -0.93 | 3.13 | -3.68 | 2.75 | 7.55 |
today | hoy | 2.04 | -2.29 | 3.79 | -5.01 | 2.72 | 7.40 |
The mean squared difference in difficulty of doublets on the DLL Level 2 short form is 2.63 (SD=4.19), and once again while most doublets are fairly well-matched, several items are extremely mismatched. We recommend that the 10 items (shown below) with a squared difficulty difference of 8.22 (mean + 1.5 * SD) or more be excluded from calculations of conceptual overlap.
english | spanish | sp_a1 | sp_d | en_a1 | en_d | sp_en_d_diff | d_diff_sq |
---|---|---|---|---|---|---|---|
if | que (connection word) | 1.50 | -0.82 | 3.12 | -6.53 | 5.71 | 32.63 |
street | calle | 3.24 | 0.78 | 4.39 | -3.38 | 4.16 | 17.35 |
book | libro | 3.80 | -1.55 | 2.88 | 2.55 | -4.10 | 16.83 |
to | a | 1.00 | -0.15 | 3.43 | -3.80 | 3.65 | 13.32 |
none | no hay | 2.71 | -0.29 | 2.37 | -3.85 | 3.55 | 12.63 |
hat | sombrero | 3.28 | -2.14 | 3.15 | 1.37 | -3.51 | 12.33 |
hear | oír | 1.83 | -0.94 | 4.49 | -4.35 | 3.41 | 11.61 |
bench | banco (outside) | 2.94 | -2.18 | 3.59 | -5.43 | 3.25 | 10.57 |
much | mucho | 3.33 | -1.80 | 3.10 | -5.04 | 3.24 | 10.48 |
potato | papas | 2.54 | 1.11 | 2.90 | -2.07 | 3.18 | 10.12 |
Now we examine the doublets on the long supplemental DLL forms, and make recommendations of mismatched items to swap out.
Below are shown the parameters for items from the DLL Level 1 long supplement. Clearly some of these items have quite different difficulty levels in English and Spanish, and we should try to find items that are more equivalent and swap them onto the long supplement.
english | spanish | sp_a1 | sp_d | en_a1 | en_d | sp_en_d_diff | d_diff_sq |
---|---|---|---|---|---|---|---|
give | dar | 2.25 | -0.59 | 3.88 | -3.13 | 2.54 | 6.45 |
ear | oreja | 4.68 | -0.46 | 3.68 | 1.69 | -2.16 | 4.65 |
cheese | queso | 3.93 | -0.63 | 2.89 | 1.45 | -2.08 | 4.33 |
crib | cuna | 2.92 | -0.51 | 3.09 | -2.42 | 1.90 | 3.62 |
banana | plátano/banana | 2.95 | 0.01 | 2.26 | 1.89 | -1.88 | 3.54 |
shirt | playera | 3.47 | -2.44 | 4.14 | -0.58 | -1.85 | 3.43 |
chicken (food) | pollo | 3.94 | 0.72 | 3.46 | -1.01 | 1.73 | 2.99 |
moo | muu | 1.22 | 0.01 | 1.87 | 1.70 | -1.69 | 2.85 |
nap | siesta | 1.98 | -2.71 | 3.79 | -1.05 | -1.67 | 2.78 |
apple | manzana | 3.76 | -0.07 | 3.22 | 1.55 | -1.62 | 2.62 |
stop | parar(se) | 3.76 | -2.43 | 2.80 | -0.81 | -1.62 | 2.62 |
drawer | cajón | 4.17 | -2.32 | 3.67 | -3.85 | 1.53 | 2.35 |
face | cara | 4.25 | -0.36 | 3.92 | -1.83 | 1.48 | 2.18 |
swing (object) | columpio | 3.80 | -1.96 | 3.21 | -0.49 | -1.47 | 2.16 |
belly button | ombligo | 3.33 | -1.23 | 2.68 | 0.22 | -1.45 | 2.11 |
quack quack | cuacuá | 1.26 | -0.28 | 1.80 | 1.10 | -1.38 | 1.89 |
picture | fotos | 4.46 | -0.97 | 4.50 | -2.34 | 1.37 | 1.87 |
go | ir(se) | 2.36 | -0.71 | 2.43 | 0.60 | -1.32 | 1.73 |
bunny | conejo | 3.41 | -1.06 | 2.75 | 0.23 | -1.29 | 1.67 |
trash | basura | 4.54 | -0.03 | 2.20 | -1.28 | 1.25 | 1.56 |
pen | lápiz | 3.71 | -0.10 | 2.79 | -1.34 | 1.24 | 1.54 |
bite | morder | 3.72 | -1.77 | 2.82 | -0.54 | -1.23 | 1.51 |
comb | peine | 3.94 | -0.68 | 2.92 | -1.90 | 1.23 | 1.50 |
baa baa | bee/mee | 0.83 | -0.30 | 1.17 | 0.90 | -1.20 | 1.45 |
paper | papel | 4.18 | -0.02 | 4.19 | -1.22 | 1.19 | 1.43 |
window | ventana | 4.28 | -1.11 | 3.45 | -2.29 | 1.18 | 1.39 |
store | tienda/mercado | 3.66 | -0.95 | 4.33 | -2.06 | 1.10 | 1.22 |
dance | bailar | 3.18 | -0.40 | 3.38 | -1.39 | 0.99 | 0.98 |
tree | árbol | 3.78 | -0.53 | 3.18 | 0.45 | -0.98 | 0.96 |
hit | pegar(se) | 3.48 | -1.39 | 3.89 | -2.35 | 0.97 | 0.93 |
finger | dedo | 4.43 | 0.22 | 4.32 | -0.73 | 0.96 | 0.91 |
purse | bolsa (clothing) | 3.93 | -0.83 | 2.72 | -1.75 | 0.93 | 0.86 |
daddy* | papá | 1.48 | 2.96 | 1.70 | 3.87 | -0.91 | 0.84 |
bear | oso | 2.96 | -0.09 | 2.48 | 0.81 | -0.91 | 0.83 |
touch | tocar | 3.18 | -2.17 | 3.75 | -3.03 | 0.86 | 0.73 |
door | puerta | 4.68 | -0.28 | 3.37 | 0.52 | -0.80 | 0.64 |
clock | reloj | 3.80 | -1.51 | 2.40 | -0.73 | -0.78 | 0.61 |
tongue | lengua | 3.87 | -0.50 | 3.16 | -1.27 | 0.78 | 0.60 |
open | abrir | 3.41 | 0.22 | 3.57 | -0.55 | 0.78 | 0.60 |
balloon | globo/bomba | 2.35 | 1.12 | 2.56 | 1.78 | -0.65 | 0.43 |
medicine | medicina | 4.31 | -1.38 | 3.55 | -1.98 | 0.60 | 0.35 |
cry | llorar | 3.81 | -1.08 | 4.08 | -1.67 | 0.59 | 0.35 |
bathtub | tina | 3.07 | -1.03 | 2.79 | -0.45 | -0.58 | 0.33 |
vroom | pipí | 1.76 | 1.19 | 0.93 | 0.61 | 0.58 | 0.33 |
box | caja | 4.25 | -1.28 | 3.30 | -0.71 | -0.57 | 0.33 |
pajamas | pijama | 3.51 | -2.06 | 3.82 | -1.50 | -0.57 | 0.32 |
foot | pies | 3.78 | 0.65 | 3.86 | 0.10 | 0.55 | 0.30 |
bicycle | bicicleta | 2.93 | -0.90 | 3.02 | -0.36 | -0.54 | 0.29 |
towel | toalla | 4.89 | -2.02 | 4.12 | -1.51 | -0.52 | 0.27 |
refrigerator | refrigerador | 3.96 | -2.52 | 3.75 | -3.02 | 0.50 | 0.25 |
dress (object) | vestido | 3.29 | -1.69 | 2.96 | -2.17 | 0.48 | 0.23 |
button | botón | 2.97 | -0.96 | 2.69 | -0.49 | -0.47 | 0.22 |
cake | pastel | 3.64 | -0.35 | 3.53 | -0.72 | 0.37 | 0.13 |
throw | tirar | 3.82 | -2.11 | 4.16 | -2.46 | 0.34 | 0.12 |
horse | caballo | 2.99 | 0.65 | 3.34 | 0.32 | 0.34 | 0.11 |
stairs | escaleras | 4.20 | -1.84 | 3.28 | -2.12 | 0.28 | 0.08 |
wash | lavar(se) | 4.22 | -1.68 | 4.15 | -1.96 | 0.28 | 0.08 |
man | señor | 3.96 | -1.76 | 2.79 | -2.04 | 0.27 | 0.07 |
glasses | lentes | 3.93 | -1.45 | 3.20 | -1.20 | -0.25 | 0.06 |
bib | babero | 3.01 | -1.16 | 2.19 | -0.92 | -0.24 | 0.06 |
close | cerrar | 4.06 | -1.68 | 3.38 | -1.92 | 0.24 | 0.06 |
airplane | avión | 2.81 | 0.24 | 2.97 | 0.48 | -0.24 | 0.06 |
say | decir | 3.84 | -3.28 | 3.46 | -3.50 | 0.22 | 0.05 |
telephone | teléfono | 3.18 | -0.25 | 3.47 | -0.04 | -0.21 | 0.04 |
brush | cepillo | 4.80 | -0.24 | 3.13 | -0.44 | 0.20 | 0.04 |
thank you | gracias | 2.27 | 1.35 | 2.04 | 1.17 | 0.18 | 0.03 |
pillow | almohada | 3.89 | -1.14 | 4.09 | -0.96 | -0.18 | 0.03 |
light | luz | 3.38 | 0.50 | 2.72 | 0.38 | 0.12 | 0.01 |
diaper | pañal | 3.79 | 1.00 | 3.00 | 0.89 | 0.10 | 0.01 |
hair | pelo | 5.15 | 0.85 | 4.09 | 0.95 | -0.10 | 0.01 |
toy (object) | juguete | 3.58 | -0.39 | 3.57 | -0.33 | -0.07 | 0.00 |
tickle | cosquillitas | 2.06 | -0.92 | 2.47 | -0.87 | -0.05 | 0.00 |
walk | caminar | 3.26 | -1.10 | 3.59 | -1.07 | -0.03 | 0.00 |
blow | soplar | 2.67 | -1.58 | 3.18 | -1.59 | 0.01 | 0.00 |
english | spanish | sp_a1 | sp_d | en_a1 | en_d | sp_en_d_diff | d_diff_sq |
---|---|---|---|---|---|---|---|
bubbles | burbujas | 3.02 | -2.63 | 2.61 | 1.30 | -3.94 | 15.50 |
water (beverage) | agua | 2.58 | 4.38 | 2.80 | 0.99 | 3.38 | 11.46 |
soup | sopa | 2.41 | 1.36 | 2.98 | -1.98 | 3.34 | 11.15 |
bread | pan | 2.59 | 2.50 | 3.33 | -0.35 | 2.85 | 8.10 |
table | mesa | 4.61 | 0.34 | 5.01 | -2.22 | 2.56 | 6.55 |
coat | abrigo | 2.63 | -3.09 | 2.70 | -0.80 | -2.29 | 5.22 |
pencil | lápiz | 3.71 | -0.10 | 3.23 | -2.31 | 2.21 | 4.90 |
bee | abeja | 2.62 | -2.19 | 2.53 | -0.08 | -2.10 | 4.43 |
chocolate | chocolate | 3.35 | -0.04 | 3.11 | -2.12 | 2.09 | 4.35 |
cheese | queso | 3.93 | -0.63 | 2.89 | 1.45 | -2.08 | 4.33 |
crib | cuna | 2.92 | -0.51 | 3.09 | -2.42 | 1.90 | 3.62 |
motorcycle | moto | 2.75 | -0.24 | 2.85 | -2.12 | 1.89 | 3.56 |
girl | niña | 2.35 | 0.45 | 2.79 | -1.40 | 1.85 | 3.42 |
soda/pop | soda/refresco | 2.13 | 0.31 | 1.59 | -1.51 | 1.82 | 3.31 |
chicken (food) | pollo | 3.94 | 0.72 | 3.46 | -1.01 | 1.73 | 2.99 |
nose | nariz | 4.20 | 0.45 | 3.42 | 2.18 | -1.73 | 2.99 |
drink (action) | tomar(se) | 2.87 | -1.83 | 3.05 | -0.15 | -1.68 | 2.82 |
cup | taza | 3.53 | -0.84 | 3.39 | 0.83 | -1.67 | 2.79 |
knee | rodilla | 4.52 | -2.51 | 3.06 | -0.85 | -1.65 | 2.74 |
apple | manzana | 3.76 | -0.07 | 3.22 | 1.55 | -1.62 | 2.62 |
sing | cantar | 3.07 | -0.79 | 4.05 | -2.37 | 1.58 | 2.51 |
bug | bicho | 1.58 | -1.80 | 2.74 | -0.22 | -1.58 | 2.50 |
house | casa | 4.22 | 0.55 | 3.31 | -1.03 | 1.58 | 2.49 |
blanket | cobija | 4.49 | -1.29 | 3.26 | 0.29 | -1.57 | 2.48 |
face | cara | 4.25 | -0.36 | 3.92 | -1.83 | 1.48 | 2.18 |
belly button | ombligo | 3.33 | -1.23 | 2.68 | 0.22 | -1.45 | 2.11 |
stick | palo | 3.25 | -0.50 | 3.30 | -1.90 | 1.40 | 1.96 |
quack quack | cuacuá | 1.26 | -0.28 | 1.80 | 1.10 | -1.38 | 1.89 |
picture | fotos | 4.46 | -0.97 | 4.50 | -2.34 | 1.37 | 1.87 |
I | yo | 2.25 | 0.23 | 2.11 | -1.11 | 1.34 | 1.79 |
ice | hielo | 3.19 | -2.07 | 2.48 | -0.73 | -1.33 | 1.77 |
go | ir(se) | 2.36 | -0.71 | 2.43 | 0.60 | -1.32 | 1.73 |
park | parque | 3.57 | -2.44 | 2.98 | -1.14 | -1.30 | 1.70 |
bunny | conejo | 3.41 | -1.06 | 2.75 | 0.23 | -1.29 | 1.67 |
potty | bacinica | 2.12 | -1.43 | 3.20 | -0.18 | -1.25 | 1.57 |
help | ayudar | 4.23 | -2.32 | 3.36 | -1.07 | -1.25 | 1.55 |
eye | ojos | 4.42 | 1.03 | 3.31 | 2.26 | -1.23 | 1.52 |
head | cabeza | 4.55 | 0.77 | 4.14 | -0.43 | 1.19 | 1.43 |
paper | papel | 4.18 | -0.02 | 4.19 | -1.22 | 1.19 | 1.43 |
you | tú | 2.00 | -0.22 | 2.65 | -1.41 | 1.19 | 1.41 |
window | ventana | 4.28 | -1.11 | 3.45 | -2.29 | 1.18 | 1.39 |
room | cuarto | 4.28 | -1.48 | 4.19 | -2.64 | 1.17 | 1.36 |
money | dinero | 3.67 | -0.12 | 3.04 | -1.27 | 1.14 | 1.31 |
store | tienda/mercado | 3.66 | -0.95 | 4.33 | -2.06 | 1.10 | 1.22 |
cereal | cereal | 2.37 | -1.67 | 3.13 | -0.57 | -1.10 | 1.21 |
run | correr | 3.97 | -1.25 | 4.78 | -2.26 | 1.01 | 1.02 |
mouth | boca | 4.58 | 1.45 | 3.63 | 0.45 | 0.99 | 0.99 |
dance | bailar | 3.18 | -0.40 | 3.38 | -1.39 | 0.99 | 0.98 |
tree | árbol | 3.78 | -0.53 | 3.18 | 0.45 | -0.98 | 0.96 |
hit | pegar(se) | 3.48 | -1.39 | 3.89 | -2.35 | 0.97 | 0.93 |
finger | dedo | 4.43 | 0.22 | 4.32 | -0.73 | 0.96 | 0.91 |
that | eso | 2.35 | -1.52 | 1.48 | -0.57 | -0.95 | 0.90 |
rock | piedra | 4.04 | -1.49 | 3.16 | -0.55 | -0.94 | 0.89 |
purse | bolsa (clothing) | 3.93 | -0.83 | 2.72 | -1.75 | 0.93 | 0.86 |
butterfly | mariposa | 3.67 | -2.08 | 3.21 | -1.15 | -0.92 | 0.85 |
daddy* | papá | 1.48 | 2.96 | 1.70 | 3.87 | -0.91 | 0.84 |
touch | tocar | 3.18 | -2.17 | 3.75 | -3.03 | 0.86 | 0.73 |
ice cream | helado/nieve | 3.16 | -1.45 | 3.64 | -0.61 | -0.84 | 0.71 |
monkey | mono | 2.28 | -1.16 | 3.59 | -0.33 | -0.83 | 0.69 |
chair | silla | 4.92 | 0.91 | 4.39 | 0.11 | 0.81 | 0.65 |
clock | reloj | 3.80 | -1.51 | 2.40 | -0.73 | -0.78 | 0.61 |
tongue | lengua | 3.87 | -0.50 | 3.16 | -1.27 | 0.78 | 0.60 |
open | abrir | 3.41 | 0.22 | 3.57 | -0.55 | 0.78 | 0.60 |
raisin | pasas | 2.79 | -2.29 | 2.86 | -1.53 | -0.76 | 0.58 |
kick | patear | 3.52 | -2.69 | 3.35 | -1.93 | -0.76 | 0.57 |
tummy | panza | 4.03 | 0.40 | 3.65 | -0.34 | 0.74 | 0.55 |
beans | frijoles | 3.19 | -0.75 | 2.46 | -1.47 | 0.72 | 0.52 |
kitchen | cocina | 5.10 | -1.85 | 4.93 | -2.54 | 0.70 | 0.48 |
buttocks/bottom* | nalgas | 3.15 | -1.33 | 3.07 | -0.63 | -0.70 | 0.48 |
lady | señora | 4.01 | -2.39 | 2.73 | -3.08 | 0.69 | 0.47 |
orange (food) | naranja | 3.76 | -0.43 | 3.57 | -1.10 | 0.67 | 0.45 |
balloon | globo/bomba | 2.35 | 1.12 | 2.56 | 1.78 | -0.65 | 0.43 |
coffee | café | 3.18 | -1.17 | 2.72 | -1.82 | 0.65 | 0.42 |
spaghetti | espagueti | 2.30 | -2.46 | 3.21 | -1.84 | -0.62 | 0.38 |
pattycake | tortillitas | 1.81 | -0.78 | 1.94 | -1.40 | 0.62 | 0.38 |
medicine | medicina | 4.31 | -1.38 | 3.55 | -1.98 | 0.60 | 0.35 |
cry | llorar | 3.81 | -1.08 | 4.08 | -1.67 | 0.59 | 0.35 |
plant | planta | 3.89 | -2.22 | 3.12 | -2.80 | 0.58 | 0.34 |
bathtub | tina | 3.07 | -1.03 | 2.79 | -0.45 | -0.58 | 0.33 |
vroom | pipí | 1.76 | 1.19 | 0.93 | 0.61 | 0.58 | 0.33 |
box | caja | 4.25 | -1.28 | 3.30 | -0.71 | -0.57 | 0.33 |
foot | pies | 3.78 | 0.65 | 3.86 | 0.10 | 0.55 | 0.30 |
bicycle | bicicleta | 2.93 | -0.90 | 3.02 | -0.36 | -0.54 | 0.29 |
popcorn | palomitas | 3.05 | -1.80 | 2.52 | -1.28 | -0.52 | 0.27 |
see | ver(se) | 2.58 | -1.30 | 2.55 | -0.79 | -0.51 | 0.26 |
spoon | cuchara | 4.17 | 0.04 | 3.42 | 0.54 | -0.50 | 0.25 |
teddybear | osito | 3.02 | -0.22 | 2.22 | -0.70 | 0.48 | 0.23 |
dress (object) | vestido | 3.29 | -1.69 | 2.96 | -2.17 | 0.48 | 0.23 |
button | botón | 2.97 | -0.96 | 2.69 | -0.49 | -0.47 | 0.22 |
grass | pasto | 4.08 | -2.13 | 3.95 | -1.66 | -0.46 | 0.22 |
hungry | hambre | 3.16 | -1.50 | 3.50 | -1.96 | 0.46 | 0.21 |
yucky | fuchi | 1.66 | 0.16 | 1.87 | -0.30 | 0.46 | 0.21 |
ant | hormiga | 3.26 | -1.18 | 2.54 | -1.63 | 0.45 | 0.21 |
what | qué (question_words) | 1.90 | -0.41 | 1.74 | -0.82 | 0.41 | 0.17 |
good | bueno | 2.48 | -1.64 | 2.63 | -1.25 | -0.39 | 0.15 |
cake | pastel | 3.64 | -0.35 | 3.53 | -0.72 | 0.37 | 0.13 |
cookie | galleta | 2.95 | 1.21 | 2.92 | 1.56 | -0.36 | 0.13 |
keys | llave | 4.14 | -0.26 | 1.73 | 0.08 | -0.34 | 0.11 |
cloud | nube | 3.67 | -2.06 | 3.23 | -2.39 | 0.33 | 0.11 |
cheek | cachete | 3.71 | -1.64 | 3.18 | -1.33 | -0.32 | 0.10 |
french fries | papitas | 2.61 | -1.21 | 2.75 | -0.90 | -0.31 | 0.10 |
tooth | dientes | 4.67 | 0.01 | 2.82 | -0.29 | 0.31 | 0.09 |
wash | lavar(se) | 4.22 | -1.68 | 4.15 | -1.96 | 0.28 | 0.08 |
strawberry | fresa | 3.12 | -1.93 | 3.15 | -1.66 | -0.28 | 0.08 |
moon | luna | 3.02 | -0.19 | 2.43 | 0.09 | -0.27 | 0.08 |
man | señor | 3.96 | -1.76 | 2.79 | -2.04 | 0.27 | 0.07 |
glasses | lentes | 3.93 | -1.45 | 3.20 | -1.20 | -0.25 | 0.06 |
bib | babero | 3.01 | -1.16 | 2.19 | -0.92 | -0.24 | 0.06 |
close | cerrar | 4.06 | -1.68 | 3.38 | -1.92 | 0.24 | 0.06 |
eat | comer(se) | 3.10 | 0.21 | 3.47 | 0.44 | -0.22 | 0.05 |
telephone | teléfono | 3.18 | -0.25 | 3.47 | -0.04 | -0.21 | 0.04 |
brush | cepillo | 4.80 | -0.24 | 3.13 | -0.44 | 0.20 | 0.04 |
baby | bebé | 2.25 | 2.15 | 2.44 | 2.34 | -0.19 | 0.04 |
pillow | almohada | 3.89 | -1.14 | 4.09 | -0.96 | -0.18 | 0.03 |
shh/shush/hush | shhh | 1.18 | 0.44 | 1.73 | 0.61 | -0.16 | 0.03 |
lion | león | 3.09 | -1.28 | 3.32 | -1.12 | -0.16 | 0.03 |
penis* | pene | 1.75 | -1.51 | 1.30 | -1.35 | -0.15 | 0.02 |
couch | sillón | 3.76 | -2.07 | 3.78 | -2.22 | 0.15 | 0.02 |
yum yum | ¡am! | 0.45 | 0.69 | 1.00 | 0.81 | -0.12 | 0.02 |
grapes | uvas | 2.49 | -0.74 | 3.36 | -0.62 | -0.11 | 0.01 |
diaper | pañal | 3.79 | 1.00 | 3.00 | 0.89 | 0.10 | 0.01 |
hair | pelo | 5.15 | 0.85 | 4.09 | 0.95 | -0.10 | 0.01 |
sweater | suéter | 3.86 | -1.20 | 2.14 | -1.29 | 0.08 | 0.01 |
corn | elote | 3.69 | -1.72 | 3.49 | -1.78 | 0.06 | 0.00 |
stove | estufa | 4.61 | -2.63 | 2.90 | -2.57 | -0.06 | 0.00 |
train | tren | 2.62 | -0.16 | 3.51 | -0.10 | -0.06 | 0.00 |
tickle | cosquillitas | 2.06 | -0.92 | 2.47 | -0.87 | -0.05 | 0.00 |
yogurt | yoghurt | 2.59 | -1.03 | 2.29 | -1.00 | -0.03 | 0.00 |
walk | caminar | 3.26 | -1.10 | 3.59 | -1.07 | -0.03 | 0.00 |
blow | soplar | 2.67 | -1.58 | 3.18 | -1.59 | 0.01 | 0.00 |
doll | muñeca | 2.59 | -0.30 | 2.11 | -0.30 | 0.00 | 0.00 |
We will use Wordbank’s unilemmas to find translation-equivalent pairs that have smaller d_diff_sq values than current DLL extended items. We first get the English / Spanish unilemmas from wordbank (both WS and WG), and below simply show the Spanish vs. English easiness parameters.
Working with 183 unilemmas that match both our English and Spanish IRT parameters and that are not already on the DLL short lists, for each DLL list we will simply swap the N=25 items with the largest easiness difference for items with minimal easiness difference. We also attempt to find replacement items of the same lexical class. We report the original DLL list’s easiness SSE, as well as the improvement in (easiness SSE) after each swap is made.
new_dll1long <- improve_DLL_list(dll1long, dict, Nswaps=Nswaps)$new_list
## [1] "Original list item easiness SSE: 78.12"
## [1] "Mean Spanish item easiness: -0.8"
## [1] "Mean English item easiness: -0.71"
## [1] "Selecting from 128 words on both Eng/Sp CDIs that are not on the DLL."
## [1] "Replacing 'shirt' with 'sweater' (SSE improvement = 3.42)"
## [1] "Replacing 'banana' with 'stove' (SSE improvement = 3.54)"
## [1] "Replacing 'crib' with 'grapes' (SSE improvement = 3.61)"
## [1] "Replacing 'cheese' with 'watch (object)' (SSE improvement = 4.32)"
## [1] "Replacing 'picture' with 'penis' (SSE improvement = 3.28)"
## [1] "Replacing 'ear' with 'animal' (SSE improvement = 4.52)"
## [1] "Replacing 'give' with 'draw' (SSE improvement = 6.45)"
## [1] "Replacing 'quack quack' with 'grandpa' (SSE improvement = 1.7)"
## [1] "Replacing 'belly button' with 'scissors' (SSE improvement = 1.95)"
## [1] "Replacing 'swing (object)' with 'elephant' (SSE improvement = 2.03)"
## [1] "Replacing 'go' with 'hurry' (SSE improvement = 1.56)"
## [1] "Replacing 'face' with 'vagina' (SSE improvement = 2.07)"
## [1] "Replacing 'drawer' with 'cloud' (SSE improvement = 2.24)"
## [1] "Replacing 'stop' with 'run' (SSE improvement = 1.6)"
## [1] "Replacing 'bunny' with 'helicopter' (SSE improvement = 1.49)"
## [1] "Replacing 'apple' with 'cheek' (SSE improvement = 2.52)"
## [1] "Replacing 'bite' with 'paint (action)' (SSE improvement = 0.75)"
## [1] "Replacing 'trash' with 'lamb' (SSE improvement = 1.12)"
## [1] "Replacing 'moo' with 'brother' (SSE improvement = 2.57)"
## [1] "Replacing 'baa baa' with 'child's own name' (SSE improvement = 1.27)"
## [1] "Replacing 'comb (object)' with 'coffee' (SSE improvement = 1.08)"
## [1] "Replacing 'chicken (food)' with 'tights' (SSE improvement = 2.67)"
## [1] "Replacing 'purse' with 'shorts' (SSE improvement = 0.56)"
## [1] "Replacing 'finger' with 'orange (food)' (SSE improvement = 0.47)"
## [1] "Replacing 'daddy' with 'sister' (SSE improvement = 0.39)"
## [1] "New list item easiness SSE: 20.96"
## [1] "Mean Spanish item easiness: -1.13"
## [1] "Mean English item easiness: -1.26"
write.csv(new_dll1long, file="DLL/new_DLL-ES1-long.csv")
Even swapping only 25 items reduced the total SSE by more than 50%. Although it was not optimized for, the average ease of the items also decreased for both languages, coming closer to the mean. We now do the same for the other DLL forms before determining whether the DLL overestimation has decreased.
new_dll2long <- improve_DLL_list(dll2long, dict, Nswaps=Nswaps)$new_list
## [1] "Original list item easiness SSE: 137.64"
## [1] "Mean Spanish item easiness: -0.58"
## [1] "Mean English item easiness: -0.73"
## [1] "Selecting from 107 words on both Eng/Sp CDIs that are not on the DLL."
## [1] "Replacing 'bread' with 'toy (object)' (SSE improvement = 8.1)"
## [1] "Replacing 'soup' with 'watch (object)' (SSE improvement = 11.15)"
## [1] "Replacing 'table' with 'scissors' (SSE improvement = 6.39)"
## [1] "Replacing 'water (beverage)' with 'animal' (SSE improvement = 11.33)"
## [1] "Replacing 'apple' with 'helicopter' (SSE improvement = 2.44)"
## [1] "Replacing 'knee' with 'elephant' (SSE improvement = 2.62)"
## [1] "Replacing 'cup' with 'vagina' (SSE improvement = 2.67)"
## [1] "Replacing 'sing' with 'draw' (SSE improvement = 2.51)"
## [1] "Replacing 'nose' with 'lamb' (SSE improvement = 2.55)"
## [1] "Replacing 'chicken (food)' with 'tights' (SSE improvement = 2.67)"
## [1] "Replacing 'picture' with 'pajamas' (SSE improvement = 2.98)"
## [1] "Replacing 'soda' with 'shorts' (SSE improvement = 3.01)"
## [1] "Replacing 'house' with 'toothbrush' (SSE improvement = 2.04)"
## [1] "Replacing 'girl' with 'grandpa' (SSE improvement = 3.22)"
## [1] "Replacing 'motorcycle' with 'refrigerator' (SSE improvement = 3.31)"
## [1] "Replacing 'blanket' with 'firetruck' (SSE improvement = 1.42)"
## [1] "Replacing 'crib' with 'squirrel' (SSE improvement = 2.71)"
## [1] "Replacing 'cheese' with 'boots' (SSE improvement = 3.24)"
## [1] "Replacing 'belly button' with 'knife' (SSE improvement = 1.22)"
## [1] "Replacing 'couch' with 'shower' (SSE improvement = 1.28)"
## [1] "Replacing 'face' with 'closet' (SSE improvement = 1.4)"
## [1] "Replacing 'quack quack' with 'brother' (SSE improvement = 1.61)"
## [1] "Replacing 'bee' with 'snow' (SSE improvement = 3.72)"
## [1] "No better word found!"
## [1] "No better word found!"
## [1] "New list item easiness SSE: 54.06"
## [1] "Mean Spanish item easiness: -0.94"
## [1] "Mean English item easiness: -0.99"
write.csv(new_dll2long, file="DLL/new_DLL-ES2-long.csv")
Once again, the total SSE decreased significantly, and the average ease of the items in both languages went down, and got closer to each other.
For each of the DLL lists, swapping the 25 items with the largest discrepancy between English and Spanish easiness for items of minimal discrepancy within the same lexical class resulted in substantially reducing the total easiness SSE, and also resulted in mean item easiness (in both languages) that are more equal and closer to the means of each language, and thus generally better for extrapolating to children’s full CDI score. Recognizing that these swaps are chosen algorithmically, we recommend the DLL team to consider each of the above swaps and determine whether any key words have been removed, or whether any undesirable words have been added.