library(wordbankr) # WB data
library(tidyverse) # tidy
library(mirt) # IRT models
library(ltm) # more IRT functions
#library(difr) # some differential item functioning tools.
library(psych) # some psychometric stuff (tests of dimensionality)
library(Gifi)# some more psychometric stuff (tests of dimensionality)
library(knitr) # some formatting, tables, etc
library(patchwork) # combining plots.
library(GGally) # More plottinng options.
library()
Item response theory is a class of models for estimating the difficulty of items and the proficiency level of participants. Imagine we have a data set with i binary items and v participants. If we wanted to predict how well Person 1 did on Item 1, we could just take the difference between Person 1’s latent ability, and the difficulty of Item 1:
\[ \theta_{v = 1} - \beta_{i = 1} \] If Person 1’s ability (\(\theta\)) is greater than the difficulty (\(\beta\)) for Item 1, then they’re likely to get that item correct. If their ability is less than the difficulty of Item 1, then they are unlikely to get that item right. If we take the inverse logit of the difference between these values, we now have a probability that Person 1 person gets Item 1 correct. These two sets of values–participant abilities and item difficulties–and the inverse logit function are the components of the most basic IRT model, the Rasch model:
\[ P\left(X_{v i}=1\right)=\frac{\exp \left(\theta_{v}-\beta_{i}\right)}{1+\exp \left(\theta_{v}+\beta_{i}\right)} \]
For intuition, if a person’s ability equals the item’s difficulty parameter, the difference is 0, and the equation above produces a probability of .5. When \(\theta\) is larger than \(\beta\), the probabilility will be greater than .5, and when \(\theta\) is smaller than \(\beta\) the probability will be smaller than .5. (Most programs swap the sign on \(\beta\) for interpretability).
The Rasch makes a strict assumption–that items vary in their difficulty, but not in how well they discriminate between participants of varying ability levels. This is easier to wrap your head around with an example. I’ve fit a Rasch model to some data, and plotted item characteristic curves for some items. These tell us how estimated ability relates to the the probability of a correct response.
library(MPsychoR)
## Warning: package 'MPsychoR' was built under R version 4.0.3
data("RWDQ")
rasch <- rasch(RWDQ[,-1] )
plot(rasch, item=2:6, legend=TRUE)
The thing to notice here is that these are all parallel. The items vary in the latent ability necessary to get a item correct 50% of the time (an intercept basically) but not in how well they discriminate between participants of different ability levels. We can modify the Rasch model above by adding a discrimination parameter \(\alpha\). this gives us the 2 parameter logistic regression equation below, which is what I use in the subsequent analyses:
\[ \left(X_{v i}=1\right)=\frac{\exp \left(\alpha_{i}\left(\theta_{v}-\beta_{i}\right)\right)}{1+\exp \left(\alpha_{i}\left(\theta_{v}-\beta_{i}\right)\right)} \] Adding \(\alpha\) allows the item response curves to have different slopes.
l2p <- ltm(RWDQ[,-1] ~z1)
plot(l2p, item=2:6, legend=TRUE)
Now the item functions can cross. This means that some items don’t discriminate as well as well as others. So wdq_25 is largely affected by changes in ability from -2 t 0, but levels off after 0, whereas wdq_24 is affected by changes in ability across the whole range.
So to sum this introduction up, the 2 parameter model allows us to estimate abilities which are characteristics of participants, and difficulty and discrimination parameters, which are properties of items.
Ok now on to the Word Bank data.
Download data from Word Bank:
Inst <- get_instrument_data(language="English (American)", form="WS")
Admin <- get_administration_data(language="English (American)", form="WS")
Item <- get_item_data(language="English (American)", form = "WS")
Combine data sets. Create binary variable for accuracy.
Complex <- Admin %>%
full_join(.,Inst, by="data_id") %>%
full_join(., Item, by="num_item_id") %>%
filter(longitudinal==FALSE) %>%
filter(type == "complexity") %>%
mutate(
out = ifelse(value=="complex", yes=1, no=0)
)
Let’s see how many missing values there are.
Complex %>%
filter(is.na(out)) %>%
group_by(definition) %>%
count() # looks like 1426 participants don't have item-level complexity scores.
Looks like there are 1426 participants don’t have item-level complexity scores. So let’s drop those.
Complex <- filter(Complex, !is.na(out))
Let’s look at means and SDs for each item:
Complex %>%
group_by(definition) %>%
summarise(
mean=mean(out),
sd=sd(out),
category = first(complexity_category)
) %>%
arrange(category) %>%
kable(caption="Means and SDs for Each Item (Arranged by Item)")
| definition | mean | sd | category |
|---|---|---|---|
| baby blanket / baby’s blanket | 0.3873745 | 0.4872378 | morphology |
| daddy car / daddy’s car | 0.4451220 | 0.4970684 | morphology |
| daddy pick me up / daddy picked me up | 0.1743185 | 0.3794512 | morphology |
| doggie kiss me / doggie kissed me | 0.1879484 | 0.3907410 | morphology |
| I fall down / I fell down | 0.1807747 | 0.3849005 | morphology |
| I make tower / I making tower | 0.2360115 | 0.4247055 | morphology |
| kitty go away / kitty went away | 0.1280488 | 0.3342041 | morphology |
| kitty sleep / kitty sleeping | 0.4192970 | 0.4935326 | morphology |
| more cookie / more cookies | 0.3758967 | 0.4844404 | morphology |
| these my tooth / these my teeth | 0.4533716 | 0.4979103 | morphology |
| two foot / two feet | 0.4214491 | 0.4938798 | morphology |
| two shoe / two shoes | 0.4422525 | 0.4967431 | morphology |
| baby crying / baby crying cuz she’s sad | 0.1599713 | 0.3666452 | syntax |
| baby crying / baby is crying | 0.1624821 | 0.3689586 | syntax |
| baby want eat / baby want to eat | 0.2650646 | 0.4414468 | syntax |
| coffee hot / that coffee hot | 0.1929699 | 0.3947004 | syntax |
| cookie mommy / cookie for mommy | 0.2413917 | 0.4280040 | syntax |
| doggie table / doggie on table | 0.3511478 | 0.4774147 | syntax |
| don’t read book / don’t want you read that book | 0.1818508 | 0.3857907 | syntax |
| go bye-bye / wanna go bye-bye | 0.2679340 | 0.4429625 | syntax |
| I like read stories / I like to read stories | 0.1779053 | 0.3825016 | syntax |
| I no do it / I can’t do it | 0.2482066 | 0.4320498 | syntax |
| I sing song / I sing song for you | 0.1449067 | 0.3520699 | syntax |
| I want that / I want that one you got | 0.1137016 | 0.3175054 | syntax |
| lookit / lookit what I got | 0.2410330 | 0.4277870 | syntax |
| lookit me / lookit me dancing | 0.2596844 | 0.4385400 | syntax |
| no wash dolly / don’t wash dolly | 0.2116212 | 0.4085310 | syntax |
| read me story Mommy / read me a story Mommy | 0.1635581 | 0.3699405 | syntax |
| that my truck / that’s my truck | 0.2822812 | 0.4501902 | syntax |
| there a kitty / there’s a kitty | 0.3210187 | 0.4669517 | syntax |
| turn on light / turn on light so I can see | 0.0961263 | 0.2948172 | syntax |
| want cookies / want cookies and milk | 0.2048063 | 0.4036324 | syntax |
| want more juice / want juice in there | 0.2051650 | 0.4038946 | syntax |
| we made this / me and Paul made this | 0.1603300 | 0.3669776 | syntax |
| where’s my dolly / where’s my dolly name Sam | 0.1093974 | 0.3121932 | syntax |
| where mommy go / where did mommy go | 0.1822095 | 0.3860863 | syntax |
| you fix it / can you fix it | 0.1025825 | 0.3034672 | syntax |
Let’s also check the distribution of means for each category
Complex %>%
group_by(definition) %>%
summarise(
mean=mean(out),
category = first(complexity_category)
) %>%
ggplot(aes(x=category, y= mean)) + geom_boxplot()
Prepare data set for IRT modeling. I’m going to re-name each item to its category (morphology or syntax) and its item number so as to make some graphs easier to read.
Complex_short_with_ids <- Complex %>% # made
dplyr::select(data_id, value, complexity_category, num_item_id) %>%
mutate(
out = ifelse(value=="complex", yes=1, no=0),
label = str_c(complexity_category, num_item_id)
) %>%
dplyr::select(-value) %>%
pivot_wider(id_cols=data_id, names_from = "label", values_from="out") # dataset for IRT can't have IDs
Complex_short <- Complex_short_with_ids %>%
dplyr::select(-data_id)
Prior to fitting the IRT model, it’s worth looking at the correlation structure of the data to get a sense of the possible dimensionality.
Create tetrachoric correlation matrix (for binary data). I created a vector with labels, for labeling of the correlation plot.
Complex_poly <- tetrachoric(Complex_short)
rho <- Complex_poly$rho
lab = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37") # create labels for corPlot because the defintions are long and take up a ton of space
Check out the correlations
corPlot(rho, labels=lab)
They look really high. Let’s examine the dimensionality, using princals, which is basically just a version of PCA that is appropriate for categorical data.
pc <- princals(rho)
plot(pc)
Interesting. Seems like maybe there are 2 factors. An alternative approach is a scree plot with confidence intervals from simulated data:
fa.parallel(rho, fa="fa", cor="tet", n.obs = 2788)
## Parallel analysis suggests that the number of factors = 5 and the number of components = NA
Parallel analysis suggests one very large factor and a small second one. A third metric is very simple structure. This tells us the best solution assuming different values of k, where k is the maximum number of latent variables an observed variable loads on.
vss(rho, cor="tet", n.obs = 2788)
##
## Very Simple Structure
## Call: vss(x = rho, n.obs = 2788, cor = "tet")
## VSS complexity 1 achieves a maximimum of 1 with 1 factors
## VSS complexity 2 achieves a maximimum of 1 with 2 factors
##
## The Velicer MAP achieves a minimum of 0.02 with 4 factors
## BIC achieves a minimum of 17127.14 with 8 factors
## Sample Size adjusted BIC achieves a minimum of 18391.72 with 8 factors
##
## Statistics by number of factors
## vss1 vss2 map dof chisq prob sqresid fit RMSEA BIC SABIC complex eChisq
## 1 1.00 0.00 0.026 629 41190 0 2.18 1 0.15 36200 38199 1.0 2778
## 2 0.47 1.00 0.018 593 33101 0 1.37 1 0.14 28397 30281 1.8 1248
## 3 0.37 0.81 0.017 558 29519 0 1.08 1 0.14 25092 26865 2.5 826
## 4 0.36 0.74 0.017 524 26605 0 0.93 1 0.13 22448 24113 2.9 630
## 5 0.38 0.76 0.017 491 24433 0 0.81 1 0.13 20538 22098 3.1 489
## 6 0.37 0.71 0.018 459 22595 0 0.72 1 0.13 18954 20412 3.3 393
## 7 0.37 0.72 0.019 428 21444 0 0.64 1 0.13 18048 19408 3.3 331
## 8 0.39 0.71 0.020 398 20285 0 0.57 1 0.13 17127 18392 3.4 282
## SRMR eCRMS eBIC
## 1 0.0274 0.028 -2212
## 2 0.0183 0.019 -3457
## 3 0.0149 0.016 -3600
## 4 0.0130 0.015 -3527
## 5 0.0115 0.013 -3407
## 6 0.0103 0.012 -3248
## 7 0.0094 0.012 -3064
## 8 0.0087 0.011 -2875
Very simple structure also suggests one, if we assume each variable loads on one factor only. If we assume everything loads on 2 factors then a 2 factor solution is best
These tests seem to suggest a single factor is probably the best approach, though it’s possible a two factor approach would be better.
Let’s first fit the 2pl model and examine the output.
m1 <- mirt(Complex_short, 1, itemtype="2PL", verbose=FALSE)
m1
##
## Call:
## mirt(data = Complex_short, model = 1, itemtype = "2PL", verbose = FALSE)
##
## Full-information item factor analysis with 1 factor(s).
## Converged within 1e-04 tolerance after 197 EM iterations.
## mirt version: 1.33.2
## M-step optimizer: BFGS
## EM acceleration: Ramsay
## Number of rectangular quadrature: 61
## Latent density type: Gaussian
##
## Log-likelihood = -25839.34
## Estimated parameters: 74
## AIC = 51826.68; AICc = 51830.77
## BIC = 52265.72; SABIC = 52030.6
## G2 (1e+10) = 24307.36, p = 1
## RMSEA = 0, CFI = NaN, TLI = NaN
summary(m1)
## F1 h2
## morphology761 0.930 0.866
## morphology762 0.899 0.808
## morphology763 0.904 0.818
## morphology764 0.923 0.852
## morphology765 0.937 0.879
## morphology766 0.881 0.776
## morphology767 0.904 0.817
## morphology768 0.917 0.840
## morphology769 0.915 0.838
## morphology770 0.937 0.878
## morphology771 0.952 0.906
## morphology772 0.927 0.859
## syntax773 0.958 0.918
## syntax774 0.938 0.879
## syntax775 0.948 0.899
## syntax776 0.952 0.907
## syntax777 0.953 0.908
## syntax778 0.942 0.887
## syntax779 0.897 0.805
## syntax780 0.953 0.908
## syntax781 0.933 0.870
## syntax782 0.934 0.872
## syntax783 0.956 0.913
## syntax784 0.949 0.901
## syntax785 0.969 0.938
## syntax786 0.967 0.934
## syntax787 0.960 0.922
## syntax788 0.940 0.884
## syntax789 0.937 0.878
## syntax790 0.969 0.939
## syntax791 0.973 0.947
## syntax792 0.947 0.896
## syntax793 0.966 0.934
## syntax794 0.928 0.862
## syntax795 0.912 0.832
## syntax796 0.967 0.934
## syntax797 0.953 0.908
##
## SS loadings: 32.613
## Proportion Var: 0.881
##
## Factor correlations:
##
## F1
## F1 1
Let’s check the overall fit of the model. The chi square statistic here isn’t super meaningful given the sample size.
M2(m1)
RMSEA and CFI are good, SRMSR is a bit high but pretty reasonable. We can see if individual items misfit – i.e., do the observed cell counts differ from the expected cell counts.
itemfit(m1)
A few items significantly misfit. Let’s take everything with a p value less than .10. We’ll look at item gam plots of those, which plot the item response probability against the latent variable using a gam function, to look for deviations in the item-response function.
misfit <- itemfit(m1) %>% # Get labels of mis-fitting items.
filter(p.S_X2 <= .10) %>%
dplyr::select(item) %>%
as.vector()
items_good <- dplyr::select(Complex_short, -all_of(misfit$item)) # Well fitting items
items_bad <- dplyr::select(Complex_short, all_of(misfit$item)) # Poorly fitting items
mod_fit <- mirt(items_good, 1, "2PL", verbose=FALSE) # Calculate factor scores using only the well fitting items.
Theta <- fscores(mod_fit)
# Plot non-linear item response theory curves with the factor scores
IG762 <- itemGAM(items_bad$morphology762,Theta)
IG764 <- itemGAM(items_bad$morphology764,Theta)
IG769<- itemGAM(items_bad$morphology769,Theta)
IG771 <- itemGAM(items_bad$morphology771,Theta)
IG783 <- itemGAM(items_bad$syntax783,Theta)
IG792 <- itemGAM(items_bad$syntax792,Theta)
Here are the morphology items
plot(IG762); plot(IG764); plot(IG769); plot(IG771)
par(par(mfrow=c(2,2)))
Syntax items
plot(IG783); plot(IG792)
These don’t look like massive deviations. They are all monotonic, certainly, but they seem to slow down more than the logistic function.
Next let’s look at the relationship between the true scores and the raw scores.
true_raw <- Complex_short %>%
mutate(
Raw = rowSums(.[,1:37]),
Theta = fscores(m1, method="MAP")
) %>%
ggplot(aes(x=Theta, y=Raw)) + geom_point() + stat_smooth(method="loess") + theme_minimal()
raw_score <- Complex_short %>%
mutate(
Raw = rowSums(.[,1:37])
) %>%
ggplot(aes(x=Raw)) + geom_histogram() + theme_minimal()
true_score <- Complex_short %>%
mutate(
Theta = fscores(m1, method="MAP")
) %>%
ggplot(aes(x=Theta)) + geom_histogram() + theme_minimal()
library(patchwork)
true_raw/(true_score + raw_score)
## `geom_smooth()` using formula 'y ~ x'
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Two things suggest this is a more precise measurement: the relationship between the true and raw scores is sigmoid-ish, and there are multiple values of the true score at every level of the observed score. Also, note the number of 0s on the raw score, and really high scores on the true score.
Next let’s take a look at the difficulty and discrimination parameter for each item.
coefs_2pl <- coef(m1, as.data.frame = TRUE) %>%
t() %>%
as_tibble() %>%
dplyr::select(-c(149:150)) %>%
pivot_longer(everything()) %>%
separate(, col=name, into=c("item", "parameter"), sep="([.])") %>%
pivot_wider(id_cols=item, names_from=parameter, values_from=value) %>%
dplyr::select(item, a1, d) %>%
mutate(
category = gsub('[[:digit:]]+', '', item)
)
ggplot(coefs_2pl,
aes(x = a1, y = d)) +
geom_point(alpha = .3) +
ggrepel::geom_text_repel(data = coefs_2pl,
aes(label = item), size = 3) +
xlab("Discrimination") +
ylab("Difficulty") + theme_minimal()
ggplot(coefs_2pl, aes(x=a1, fill=category)) + geom_histogram() + theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(coefs_2pl, aes(x=d, fill=category)) + geom_histogram() + theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Strong correlation between the difficulty and discrimination parameter. Also note that the morphology items are less difficult than the syntax ones in general.
It’s important to remember that these latent variables are only interpretable if the following conditions hold: 1. The assumption of unidimensionality is appropriate. 2. The assumed distributional form of the latent variable is correct. 3. Measurement invariance holds: the items behave the same way across different groups of participants.
I can test 1 and 2 on my own. 3 is a bit of a stretch for me right now.
Let’s look at the distribution of the latent variable first. MIRT allows us to relax the assumption of normally distributed latent variables at the cost of loss of flexibility. I’ll look at a model which makes no assumptions about the form of the latent variable note to self: check this later , and compare results to those above.
m1_nonpar <- mirt(Complex_short, 1, itemtype="2PL", dentype = "EH", verbose=FALSE)
M2(m1_nonpar)
RMSREA is a bit lower than the model above. The standarized residuals are quite large though.
Complex_short %>%
mutate(
Theta_EH = fscores(m1_nonpar, method="MAP"),
Theta_gauss = fscores(m1, method="MAP")
) %>%
ggplot(aes(x=Theta_EH, y=Theta_gauss)) + geom_point() + stat_smooth(method="loess") + theme_minimal()
## Warning: The following factor score estimates failed to converge successfully:
## 1907
## `geom_smooth()` using formula 'y ~ x'
These are quite similar though a bit different in the tails. There’s another type of non-parametric latent variable in MIRT.
m1_nonpar2 <- mirt(Complex_short, 1, itemtype="2PL", dentype = "EHW", verbose=FALSE)
M2(m1_nonpar2)
Complex_short %>%
mutate(
Theta_EH2 = fscores(m1_nonpar2, method="MAP"),
Theta_gauss = fscores(m1, method="MAP")
) %>%
ggplot(aes(x=Theta_EH2, y=Theta_gauss)) + geom_point() + stat_smooth(method="loess") + theme_minimal()
## Warning: The following factor score estimates failed to converge successfully:
## 641
## `geom_smooth()` using formula 'y ~ x'
These are also very, very close. I think it’s sensible to stick with the Gaussian model. It fits better, its simpler and more flexible, and the factor scores are really, really similar across the two approaches.
The next thing we can do is check the dimensionality. I got mixed results before. Check dimensionality against 2 dimensional model.
First a formal test of dimensionality. We need to fit the model with ltm. The unidimTest takes forever to run and produces enormous output, so it’s commented out.
#m1_ltm <- ltm(Complex_short ~z1)
#unidimTest(m1_ltm, Complex_short)
Looks like this test of uni-dimensionality has been rejected. Let’s try explicitly modeling the multidimensionality.
Exploratory 2 dimensional IRT.
m2 <- mirt(Complex_short, 2, itemtype="2PL", verbose=FALSE)
M2(m2)
anova(m1, m2)
##
## Model 1: mirt(data = Complex_short, model = 1, itemtype = "2PL", verbose = FALSE)
## Model 2: mirt(data = Complex_short, model = 2, itemtype = "2PL", verbose = FALSE)
summary(m2, "oblimin", suppress=.20)
##
## Rotation: oblimin
##
## Rotated factor loadings:
##
## F1 F2 h2
## morphology761 -0.898 -0.229 0.892
## morphology762 -0.873 NA 0.820
## morphology763 -0.856 -0.334 0.890
## morphology764 -0.902 NA 0.855
## morphology765 -0.925 NA 0.869
## morphology766 -0.863 NA 0.774
## morphology767 -0.878 NA 0.827
## morphology768 -0.901 NA 0.839
## morphology769 -0.876 -0.292 0.894
## morphology770 -0.922 NA 0.887
## morphology771 -0.937 NA 0.921
## morphology772 -0.911 NA 0.869
## syntax773 -0.950 NA 0.911
## syntax774 -0.920 NA 0.888
## syntax775 -0.940 NA 0.886
## syntax776 -0.948 NA 0.896
## syntax777 -0.946 NA 0.896
## syntax778 -0.937 NA 0.874
## syntax779 -0.890 NA 0.787
## syntax780 -0.938 NA 0.911
## syntax781 -0.932 NA 0.864
## syntax782 -0.924 NA 0.860
## syntax783 -0.955 NA 0.905
## syntax784 -0.947 NA 0.892
## syntax785 -0.966 NA 0.929
## syntax786 -0.968 NA 0.933
## syntax787 -0.964 NA 0.936
## syntax788 -0.943 0.217 0.902
## syntax789 -0.936 NA 0.873
## syntax790 -0.971 NA 0.937
## syntax791 -0.973 NA 0.942
## syntax792 -0.947 NA 0.891
## syntax793 -0.968 NA 0.931
## syntax794 -0.927 NA 0.856
## syntax795 -0.910 NA 0.830
## syntax796 -0.967 NA 0.932
## syntax797 -0.946 0.216 0.907
##
## Rotated SS loadings: 31.937 0.727
##
## Factor correlations:
##
## F1 F2
## F1 1.000 0.082
## F2 0.082 1.000
This model fits better, but it doesn’t look like the second factor is adding much of anything. Also it looks like the correlation between factors is really small, which seems unlikely given the way language works.
Plot the discrimination parameters for each dimension against one another.
coefs_2plmulti <- coef(m2, as.data.frame = TRUE) %>%
t() %>%
as_tibble() %>%
dplyr::select(-c(185:190)) %>%
pivot_longer(everything()) %>%
separate(, col=name, into=c("item", "parameter"), sep="([.])") %>%
pivot_wider(id_cols=item, names_from=parameter, values_from=value) %>%
dplyr::select(item, a1, a2, d) %>%
mutate(
category = gsub('[[:digit:]]+', '', item)
)
ggplot(coefs_2plmulti ,
aes(x = a1, y = a2)) +
geom_point(alpha = .3) +
ggrepel::geom_text_repel(data =coefs_2plmulti,
aes(label = item), size = 3) +
xlab("Discrimination1") +
ylab("Discrimination2") + theme_minimal()
2 discrimination parameters. They’re both negative. I’m wondering if these are parameterized a little bit differently, such that they mean something different in the two dimensional context.
Plot the latent variables against the raw scores and against one another.
Complex_short %>%
mutate(
Raw = rowSums(.[,1:37]),
Theta1 = fscores(m2, method="MAP")[,1],
Theta2 = fscores(m2, method="MAP")[,2]
) %>%
dplyr::select(Raw, Theta1, Theta2) %>%
ggpairs()
## Warning: The following factor score estimates failed to converge successfully:
## 1263,1273
## Warning: The following factor score estimates failed to converge successfully:
## 1263,1273
The correlation between the raw scores and Theta1 is extremely high, whereas the correlation between theta2 and the raw score is minimal. It also has a much smaller range. There’s als a weak correlation between the factor scores.
Another way to think about this problem is that perhaps the morphology and syntax items are measuring different proficiencies We can fit a confirmatory IRT model, with one morphology factor and on syntactic.
model.1 <- mirt.model('
F1 = 1 - 12
F2 = 13 - 37
COV=F1*F2')
m3 <- mirt(Complex_short, model.1, "2PL", verbose=FALSE)
M2(m3)
summary(m3)
## F1 F2 h2
## morphology761 0.933 0.000 0.870
## morphology762 0.891 0.000 0.795
## morphology763 0.901 0.000 0.812
## morphology764 0.913 0.000 0.834
## morphology765 0.919 0.000 0.844
## morphology766 0.863 0.000 0.744
## morphology767 0.892 0.000 0.796
## morphology768 0.904 0.000 0.818
## morphology769 0.913 0.000 0.833
## morphology770 0.934 0.000 0.872
## morphology771 0.958 0.000 0.917
## morphology772 0.922 0.000 0.851
## syntax773 0.000 0.939 0.882
## syntax774 0.000 0.907 0.824
## syntax775 0.000 0.925 0.856
## syntax776 0.000 0.931 0.868
## syntax777 0.000 0.930 0.865
## syntax778 0.000 0.916 0.840
## syntax779 0.000 0.862 0.744
## syntax780 0.000 0.931 0.866
## syntax781 0.000 0.911 0.829
## syntax782 0.000 0.907 0.823
## syntax783 0.000 0.936 0.876
## syntax784 0.000 0.926 0.857
## syntax785 0.000 0.951 0.904
## syntax786 0.000 0.952 0.907
## syntax787 0.000 0.947 0.897
## syntax788 0.000 0.922 0.850
## syntax789 0.000 0.912 0.832
## syntax790 0.000 0.956 0.915
## syntax791 0.000 0.962 0.925
## syntax792 0.000 0.927 0.859
## syntax793 0.000 0.952 0.905
## syntax794 0.000 0.904 0.817
## syntax795 0.000 0.883 0.780
## syntax796 0.000 0.950 0.902
## syntax797 0.000 0.932 0.869
##
## SS loadings: 9.985 21.49
## Proportion Var: 0.27 0.581
##
## Factor correlations:
##
## F1 F2
## F1 1.000 0.959
## F2 0.959 1.000
The fit is pretty good, but the correlation between the latent variables is very high, .959
Compare the fit to the one dimension model and two-dimension exploratory model.
anova(m1, m3)
##
## Model 1: mirt(data = Complex_short, model = 1, itemtype = "2PL", verbose = FALSE)
## Model 2: mirt(data = Complex_short, model = model.1, itemtype = "2PL",
## verbose = FALSE)
anova(m2, m3)
##
## Model 1: mirt(data = Complex_short, model = model.1, itemtype = "2PL",
## verbose = FALSE)
## Model 2: mirt(data = Complex_short, model = 2, itemtype = "2PL", verbose = FALSE)
It its better than the 1 dimension, but worse than the 2 dimension exploratory model. Let’s try plotting each of the proficiencies against the raw data.
Complex_short %>%
mutate(
Raw = rowSums(.[,1:37]),
Theta1 = fscores(m3, method="MAP")[,1],
Theta2 = fscores(m3, method="MAP")[,2]
) %>%
dplyr::select(Raw, Theta1, Theta2) %>%
ggpairs()
## Warning: The following factor score estimates failed to converge successfully:
## 32,55,89,186,197,382,413,418,516,522,545,571,588,633,653,677,691,718,739,774,889,917,1060,1100,1190,1328,1697,1732,1881,1928,1955,2010,2098,2121,2163,2276,2294,2303,2331,2359,2490,2517,2532,2580,2610,2710,2721,2752,2760
## Warning: The following factor score estimates failed to converge successfully:
## 32,55,89,186,197,382,413,418,516,522,545,571,588,633,653,677,691,718,739,774,889,917,1060,1100,1190,1328,1697,1732,1881,1928,1955,2010,2098,2121,2163,2276,2294,2303,2331,2359,2490,2517,2532,2580,2610,2710,2721,2752,2760
The correlation between the morphology latent variable and the syntax latent variable is like .99. They show the same results, more or less.
by_Participant <-Complex_short_with_ids %>%
mutate(
Theta = fscores(m1, method="MAP")[,1],
Raw = rowSums(.[2:38])
) %>%
dplyr::select(data_id, Raw, Theta) %>%
full_join(Complex) %>%
group_by(data_id) %>%
slice(1)
## Joining, by = "data_id"
theta_by_age <- by_Participant %>%
ggplot(aes(x=age, y=Theta)) + geom_point() + stat_smooth(method="loess")
raw_by_age <- by_Participant %>%
ggplot(aes(x=age, y=Raw)) + geom_point() + stat_smooth(method="loess")
theta_by_age | raw_by_age
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
So, that’s really interesting. One I’ve accounted for the measurement error, the effect of age on grammatical complexity is pretty linear, whereas with the raw scores it’s clearly non-linear.
theta_by_prod <- by_Participant %>%
ggplot(aes(x=production, y=Theta)) + geom_point() + stat_smooth(method="loess")
raw_by_prod <- by_Participant %>%
ggplot(aes(x=production, y=Raw)) + geom_point() + stat_smooth(method="loess")
theta_by_prod | raw_by_prod
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
summary(lm(Theta ~ production, data=by_Participant))
##
## Call:
## lm(formula = Theta ~ production, data = by_Participant)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.82923 -0.20034 -0.00348 0.20719 1.44997
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.216e-01 1.203e-02 -68.31 <2e-16 ***
## production 3.336e-03 3.483e-05 95.81 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3756 on 2786 degrees of freedom
## Multiple R-squared: 0.7671, Adjusted R-squared: 0.7671
## F-statistic: 9179 on 1 and 2786 DF, p-value: < 2.2e-16
Sys.Date()
## [1] "2021-06-29"
R.Version()
## $platform
## [1] "x86_64-w64-mingw32"
##
## $arch
## [1] "x86_64"
##
## $os
## [1] "mingw32"
##
## $system
## [1] "x86_64, mingw32"
##
## $status
## [1] ""
##
## $major
## [1] "4"
##
## $minor
## [1] "0.2"
##
## $year
## [1] "2020"
##
## $month
## [1] "06"
##
## $day
## [1] "22"
##
## $`svn rev`
## [1] "78730"
##
## $language
## [1] "R"
##
## $version.string
## [1] "R version 4.0.2 (2020-06-22)"
##
## $nickname
## [1] "Taking Off Again"