Multi-dimensional IRT Models

Here we attempt to determine whether the latent space of the data is multidimensional via exploratory multi-factor models. We fit exploratory 2- through 8-factor 2PL models, and then compare them.

## `summarise()` regrouping output by 'definition', 'lexical_class' (override with `.groups` argument)

Below is a table showing sequential model comparisons from the ordinary 2PL through the 6-factor exploratory model. Comparing the first two rows, we can see that the exploratory 2-factor 2PL model has better AIC and BIC than the ordinary 2PL model, suggesting that the items load on multiple latent dimensions. Comparing subsequent rows (e.g., 2-factor vs. 3-factor) shows that higher-dimensional models always provide a better fit, and that the additional parameters are justified by both AIC and BIC. Before we run models with even more factors, we attempt to understand even the 2-factor model.

Comparisons of 2PL 1- through 8-factor exploratory models.
Model	AIC	BIC	logLik	df
2PL	2475742	2485181	-1236511	NaN
2-factor	2416366	2430517	-1206144	679
3-factor	2386205	2405062	-1190385	678
4-factor	2369061	2392616	-1181136	677
5-factor	2354910	2383157	-1173385	676
6-factor	2346837	2379768	-1168673	675
7-factor	2339597	2377206	-1164379	674
8-factor	2337965	2380245	-1162891	673

First we look at the structure of the loadings: how much variance accounted for on each dimension?

Proportion of variance per factor in exploratory multidimensional models.
Model	F1	F2	F3	F4	F5	F6	F7	F8	Total
1-d									0.00
2-d	0.83	0.02							0.84
3-d	0.80	0.03	0.01						0.84
4-d	0.64	0.09	0.04	0.03					0.81
5-d	0.66	0.07	0.05	0.01	0.01				0.81
6-d	0.57	0.14	0.04	0.02	0.02	0.01			0.79
7-d	0.75	0.02	0.01	0.01	0.01	0.01	0.01		0.81
8-d	0.64	0.04	0.03	0.02	0.02	0.02	0.01	0.01	0.79

Despite the fact that AIC and BIC prefer the 7-dimensional multifactor model, this model may not be worth the added complexity of attempting to explain the additional factors, especially given that the proportion of variance explained by factors beyond the 2-factor model is never more than .09. Now we turn to the 2- and 3-factor models to try to understand the structure of these factors in terms of the CDI categories.

What CDI categories do the factors load on? We inspect the average factor loading for each category for the 2-dimensional and 3-dimensional models. In the 2-dimensional model, factor 1 loads on more complex grammatical items (connecting words, helping verbs, pronouns, quantifiers, question words, time words, locations, action words), while factor 2 loads mostly on nouns (vehicles, animals, outside, toys, bodyparts, household, etc.).

2-factor Model

Mean loadings on CDI category in 2-factor model
category	F1	F2
vehicles	-0.76	-0.51
animals	-0.75	-0.51
outside	-0.73	-0.57
toys	-0.73	-0.56
clothing	-0.72	-0.56
household	-0.72	-0.58
food_drink	-0.72	-0.55
body_parts	-0.71	-0.59
furniture_rooms	-0.71	-0.61
places	-0.67	-0.63
descriptive_words	-0.62	-0.69
action_words	-0.61	-0.72
sounds	-0.61	-0.42
people	-0.61	-0.62
games_routines	-0.59	-0.65
time_words	-0.55	-0.75
locations	-0.54	-0.75
quantifiers	-0.49	-0.77
pronouns	-0.48	-0.77
helping_verbs	-0.47	-0.80
question_words	-0.46	-0.77
connecting_words	-0.43	-0.82

Let’s plot F1 vs. F2 for the 2-factor model and label the extremes.

3-factor Model

In the 3-factor model, F2 loads strongly on nouns (household, body parts, food/drink, toys, and clothing) while F3 loads more on question words, connecting words, action words, descriptive words, time words, pronouns, quantifiers, and location. F1 picks up mostly on sounds, animals, and vehicles.

Mean loadings on CDI category in 3-factor model
category	F1	F2	F3
connecting_words	-0.81	0.39	-0.15
helping_verbs	-0.80	0.44	-0.16
question_words	-0.77	0.43	-0.15
pronouns	-0.77	0.45	-0.19
quantifiers	-0.76	0.45	-0.22
locations	-0.74	0.50	-0.22
time_words	-0.73	0.54	-0.18
action_words	-0.70	0.61	-0.17
descriptive_words	-0.67	0.59	-0.23
games_routines	-0.62	0.59	-0.14
places	-0.59	0.66	-0.20
people	-0.59	0.61	-0.16
furniture_rooms	-0.56	0.73	-0.15
outside	-0.54	0.71	-0.24
body_parts	-0.53	0.72	-0.18
household	-0.52	0.75	-0.15
toys	-0.51	0.72	-0.22
clothing	-0.50	0.72	-0.20
food_drink	-0.50	0.72	-0.18
animals	-0.47	0.69	-0.34
vehicles	-0.47	0.72	-0.29
sounds	-0.44	0.49	-0.37

Let’s plot F2 vs. F3 for the 3-factor model and label the extremes.

Next we will attempt to understand the 7-factor exploratory model (the 8-factor model did not have superior BIC) through clustering analyses.

Clustering the Items

We attempt to understand the factors by clustering items’ factor loadings, and then look at acquisition curves (or item difficulties?) for each cluster. We’ll first use mclust’s Gaussian finite mixture model and t-SNE to plot the solution, and then move on to k-means and hierarchical clustering.

cluster	a1	a2	a3	a4	a5	a6	a7	d	N
2	-3.82	0.46	-0.45	-0.07	-0.17	-0.50	0.44	-1.37	27
4	-3.75	0.20	-0.08	-0.31	0.30	-0.41	0.00	-2.57	228
1	-3.53	-0.28	-0.08	-0.06	0.65	0.37	0.09	-4.44	110
3	-3.46	0.48	-0.35	0.16	-0.24	-0.39	-0.13	-2.00	315

Mclust finds four clusters. Below are the number of words of each CDI category per cluster (1-4).

category	1	2	3	4
descriptive_words	1	8	.	54
connecting_words	6	.	.	.
question_words	7	.	.	.
time_words	9	.	.	3
quantifiers	17	.	.	.
helping_verbs	21	.	.	.
locations	24	.	.	2
pronouns	25	.	.	.
action_words	.	1	1	101
body_parts	.	8	17	2
clothing	.	2	25	1
furniture_rooms	.	2	29	2
household	.	2	46	2
people	.	2	8	19
places	.	1	11	10
vehicles	.	1	13	.
animals	.	.	43	.
food_drink	.	.	66	2
outside	.	.	29	2
sounds	.	.	9	3
toys	.	.	18	.
games_routines	.	.	.	25

Cluster 1 has grammatically complex items (pronouns, quantifiers, helping verbs), cluster 2 has a smattering of nouns (early-learned?). Cluster 3 has the bulk of the nouns, and cluster 4 has most of the verbs and adjectives. Do the words in these clusters vary systematically in their difficulty? Shown below, the items in cluster 1 are much more difficult, followed by cluster 4 and 3. Cluster 1 words tend to be the easiest, but the bootstrapped CIs are large.

Clustered Items Compared to CDI Categories

We take the 7-factor loadings of each item and conduct a k-means clustering with $k = 22$ , the same as the number of CDI categories (e.g., quantifiers, locations, toys, clothing, etc.). We use adjusted Rand Index to compare the clusters to the item category assignment (0 = chance agreement, 1 = perfect agreement), and find a value of 0.195. Next we will use gap statistics to choose the best $k$ , plot the solution, and again examine adjusted Rand Index vs. the CDI categories.

##   cluster size ave.sil.width
## 1       1   55          0.31
## 2       2   85          0.15
## 3       3   66          0.16
## 4       4  133          0.28
## 5       5  119          0.13
## 6       6   88          0.13
## 7       7    8          0.74
## 8       8   71          0.18
## 9       9   55          0.25

According to the gap statistics, the optimal number of clusters $k$ = 9. The adjusted Rand Index of this clustering solution compared to the 22 CDI categories is 0.303, somewhat higher than the $k = 22$ solution’s Rand Index.

Hierarchical Clustering

Examine correlations between factors and lexical norms

Compare factors to a large set of lexical norms including frequency, concreteness, etc to better interpret factors.

Bifactor Models

Since factors are loading on different lexical classes and CDI categories, and >6 factors are justified, let’s try bifactor models that load on 1) each lexical class (nouns, verbs, adjectives, function words, other) and 2) on each CDI category (22 levels, e.g. quantifiers, locations, animals, people, sounds, etc.).

Comparing the two bifactor models, the category model is preferred by AIC and BIC, and fits roughly as well (compare log-likelihoods) as the 4-factor exploratory model (3394 parameters), but AIC and BIC prefer the category model with fewer parameters (2040).

Comparison of lexical class and category bifactor models.
Model	AIC	BIC	logLik	df
Lexical Class	2385846	2400004	-1190883	.
Category	2365648	2379806	-1180784	0

Further analysis is needed to understand the multidimensional structure of the CDI data, but it is intriguing that nouns, which represent the bulk of the items on the CDI, seem to hang together, and separately from other parts of speech.

Understanding the Multidimensionality of CDI Data

George

2021-01-12