This is a scratch pad. I’m playing around with MIRT to try to identify alternative models to the 2PL. Jay pointed out that the correlation between discrimination and difficulty parameters in the 2PL model may be evidence of a misfitting model. The form of the link function will strongly affect estimation of the extreme values of the latent variable. Here, I try some alternative models.
I’ve excluded all the preliminary data processing steps from this file, and will move on to the models.
The model I have been fitting to the syntactic complexity data is the 2PL model.
m1 <- mirt(Complex_short_grammatical, 1, itemtype="2PL", verbose=FALSE)
However, the pattern of discrimination and difficulty parameters here is a little worrisome.
coefs_2pl <- coef(m1, as.data.frame = TRUE) %>%
t() %>%
as_tibble() %>%
dplyr::select(-c(149:150)) %>%
pivot_longer(everything()) %>%
tidyr::separate(, col=name, into=c("item", "parameter"), sep="([.])") %>%
pivot_wider(id_cols=item, names_from=parameter, values_from=value) %>%
dplyr::select(item, a1, d) %>%
mutate(
category = gsub('[[:digit:]]+', '', item)
)
ggplot(coefs_2pl,
aes(x = a1, y = -d)) +
geom_point(alpha = .3) +
ggrepel::geom_text_repel(data = coefs_2pl,
aes(label = item), size = 3) +
xlab("Discrimination") +
ylab("Difficulty") + theme_minimal()
Two things stand out: First, the discrimination parameters are extremely large. According to DeAyala 2009, good discrimination parameters are between .8 and 2.5. The parameters we have here, are much larger, suggesting they will only discriminate between people at a relatively narrow range of values on the scale of syntactic proficiency. This can be seen in the item response functions below (selected for 4 random items):
itemplot(m1, 1)
itemplot(m1, 10)
itemplot(m1, 17)
itemplot(m1, 30)
Second, there is a correlation between the difficulty and discrimination parameters, which could indicate an estimation issue.
The way I see it, there could be two problems leading to those patterns above: First, the IRF could be incorrect, and second, the form of the latent variable could be incorrect. Below I compare the fit of this model to alternative models.
One alternative link function is the cloglog link function, written below \(F\left(\eta_{i}\right)=1-e^{-e^{\eta_{i}}}\) where \(\eta_{i}\) is the linear predictor.
Let’s make a function to plot it and compare it to the logistic distribution
cloglog <- function(a, b, th){ # I broke it into three parts, linear prediction, inner exp and outer exp
lin = a*(th-b) # linear predictor
inner = -exp(lin) # exp inside
outer = -exp(inner) # exp outside
pi = 1+outer # combine together
return(pi)
}
cloglog(5, 2, 2)
## [1] 0.6321206
And now the logistic link function
logit<- function(a, b, th){# I also broke it into three parts, linear prediction, inner exp and outer exp
lin = a*(th - b) # linear predictor
top = exp(lin) # numerator
bottom = 1 + top # denominator
pi = top/bottom # combine together
return(pi)
}
logit(5, 2, 2)
## [1] 0.5
Let’s plot the two against each other
tibble(
theta = rnorm(1000, 0, 1)
) %>%
mutate(
pi_clog = cloglog(1, .5, theta),
pi_logit = logit(1, .5, theta)
) %>%
pivot_longer(col=starts_with("pi"), names_to="link", names_prefix="pi_", values_to="pi") %>%
ggplot(aes(x = theta, y=pi, colour=link)) + geom_point()
So the two make similar predictions about th lowest levels of theta, but not the higher values of theta. The logit link function is symmetrical but the cloglog is not.
We can use the cloglog link function in MIRT using the createItem function. I’ve had a lot of trouble with this, however. The model converges in 1-3 iterations, sometimes returning the starting values and other times returning an estimated model that fits poorly. I wonder if I’ve used this function incorrectly. I asked a question about it on the MIRT Google Groups. I found that with the starting values below, the model converges after a couple iterations, so I’m using it to compare theta estimate and IRFs to those from the other models, but I’m not confident about this approach.
name <- 'cloglog'
par <- c(a = .5, b = 2)
est <- c(TRUE, TRUE)
P.cloglog <- function(par, Theta, ncat){
a <- par[1]
b <- par[2]
P1 <- 1 - exp(-exp(a*(Theta - b)))
cbind(1 - P1, P1)
}
clog <- createItem(name, par=par, est=est, P=P.cloglog)
m2 <- mirt(Complex_short_grammatical, 1, "cloglog", customItems=list(cloglog=clog), verbose=FALSE)
coefs_2pl <- coef(m2, as.data.frame = TRUE) %>%
t() %>%
as_tibble() %>%
dplyr::select(-c(75:76)) %>%
pivot_longer(everything()) %>%
tidyr::separate(, col=name, into=c("item", "parameter"), sep="([.])") %>%
pivot_wider(id_cols=item, names_from=parameter, values_from=value) %>%
dplyr::select(item, a, b) %>%
mutate(
category = gsub('[[:digit:]]+', '', item)
)
ggplot(coefs_2pl,
aes(x = a, y = b)) +
geom_point(alpha = .3) +
ggrepel::geom_text_repel(data = coefs_2pl,
aes(label = item), size = 3) +
xlab("Discrimination") +
ylab("Difficulty") + theme_minimal()
Based on the fact that this model only ran through 2 iterations, I think it’s giving me the starting values back. The discrimination parameters are smaller, but I think that’s due to the model not leaving it’s starting values. There’s also still a bit of correlation between the two items. Also, many, many of the IRFs significantly misfit the data, which was not the case with teh 2PL model.
itemfit(m2)
Since the cloglog model won’t run, I’m going to try a couple semi-parametric models. These can serve as a useful baseline to compare the 2PL and clog model to since they will fit the data quite well at the cost of interpretability of their parameters.
These spline models start with 4 knots. I’ve increase the NCYCLES to 4000 and it still doesn’t quite reach the convergence criteria, but the deviation is quite small
m3 <- mirt(Complex_short_grammatical, 1, "spline", technical=list(NCYCLES=4000), verbose=FALSE)
## EM cycles terminated after 4000 iterations.
I’ve tried a monopoly model with mode knots that the one above. This takes a long time to run so I’ve saved the model and re-opened it.
#m4 <- mirt(Complex_short_grammatical, 1, "monopoly", monopoly.k=3)
#saveRDS(m4, "test_models/m4.rds")
m4 <- readRDS("test_models/m4.rds")
Now we can compare these models to the 2PL and CLOG Model. I’ll start by looking at the correlation between factor scores. I’ll start by comparing theta across the three models.
thetas <- Complex_short_grammatical %>%
mutate(
Raw = rowSums(.[,1:37]),
Theta_logit = fscores(m1, method="EAP")[,1],
Theta_clog = fscores(m2, method="EAP")[,1],
Theta_spline = fscores(m3, method="EAP")[,1],
Theta_mono = fscores(m4, method="EAP")[,1],
)
ggpairs(thetas[,38:42])
The factor scores from the 2PL model correlate at .997 with those from both of the semi parametric models, suggesting the extra flexibility in the IRFs from these models aren’t adding very much. Also note the realy high correlation between cloglog and the raw data, suggesting it’s probably not estimating the latent variable.
Next we can look at the IRFs for each of these models, to see if the steepness of the IRFs for the 2PL is due to a misfitting link function.
mods <- list(twoPL = m1, clog = m2, spline = m3, mono=m4)
itemplot(mods,1)
itemplot(mods,6)
itemplot(mods,11)
itemplot(mods,16)
itemplot(mods,21)
itemplot(mods,26)
itemplot(mods,31)
itemplot(mods,36)
For al of these items, the two semi-parametric models produce IRFs that are very similar to that of the 2PL model, except for some semi-parametric silliness (spikes at the extreme values, etc)
Another possibility is that the gaussian latent variable is inappropriate. We can assume a non-normal distribution using the three methods from MIRT.
First the empirical histogram.
m5 <- mirt(Complex_short_grammatical, 1, "2PL", dentype="EH", verbose=FALSE)
coefs_2pl <- coef(m5, as.data.frame = TRUE) %>%
t() %>%
as_tibble() %>%
dplyr::select(-c(149:150)) %>%
pivot_longer(everything()) %>%
tidyr::separate(, col=name, into=c("item", "parameter"), sep="([.])") %>%
pivot_wider(id_cols=item, names_from=parameter, values_from=value) %>%
dplyr::select(item, a1, d) %>%
mutate(
category = gsub('[[:digit:]]+', '', item)
)
ggplot(coefs_2pl,
aes(x = a1, y = -d)) +
geom_point(alpha = .3) +
ggrepel::geom_text_repel(data = coefs_2pl,
aes(label = item), size = 3) +
xlab("Discrimination") +
ylab("Difficulty") + theme_minimal()
Discrimination parameters looks much more reasonable, though they are still correlated with the difficulty parameter.
Let’s try the weighted empirical histogram. I’ve set zeroExtreme to TRUE, because there are so many participants with 0s.
m6 <- mirt(Complex_short_grammatical, 1, dentype="EHW", technical = list(zeroExtreme = TRUE), verbose=FALSE)
coefs_2pl <- coef(m6, as.data.frame = TRUE) %>%
t() %>%
as_tibble() %>%
dplyr::select(-c(149:150)) %>%
pivot_longer(everything()) %>%
tidyr::separate(, col=name, into=c("item", "parameter"), sep="([.])") %>%
pivot_wider(id_cols=item, names_from=parameter, values_from=value) %>%
dplyr::select(item, a1, d) %>%
mutate(
category = gsub('[[:digit:]]+', '', item)
)
ggplot(coefs_2pl,
aes(x = a1, y = -d)) +
geom_point(alpha = .3) +
ggrepel::geom_text_repel(data = coefs_2pl,
aes(label = item), size = 3) +
xlab("Discrimination") +
ylab("Difficulty") + theme_minimal()
This compresses the difficulty and discrimination parameters are bit more.
Try with semi-parametric davidian curves.
m7<- mirt(Complex_short_grammatical, 1, "2PL", dentype="Davidian-4", verbose=FALSE)
coefs_2pl <- coef(m7, as.data.frame = TRUE) %>%
t() %>%
as_tibble() %>%
dplyr::select(-c(149:150)) %>%
pivot_longer(everything()) %>%
tidyr::separate(, col=name, into=c("item", "parameter"), sep="([.])") %>%
pivot_wider(id_cols=item, names_from=parameter, values_from=value) %>%
dplyr::select(item, a1, d) %>%
mutate(
category = gsub('[[:digit:]]+', '', item)
)
ggplot(coefs_2pl,
aes(x = a1, y = -d)) +
geom_point(alpha = .3) +
ggrepel::geom_text_repel(data = coefs_2pl,
aes(label = item), size = 3) +
xlab("Discrimination") +
ylab("Difficulty") + theme_minimal()
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_text_repel).
This makes the discrimination parameters even larger.
I’ll start by comparing theta across the three models.
thetas <- Complex_short_grammatical %>%
mutate(
Raw = rowSums(.[,1:37]),
Theta_logit = fscores(m1, method="MAP")[,1], # Used MAP here, but EAP gives same results.
Theta_EH = fscores(m5, method="MAP")[,1],
Theta_EHW = fscores(m6, method="MAP")[,1],
Theta_Dav = fscores(m7, method="MAP")[,1]
)
## Warning: The following factor score estimates failed to converge successfully:
## 152
## Warning: The following factor score estimates failed to converge successfully:
## 97,804
ggpairs(thetas[,38:42])
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 542 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 542 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 542 rows containing missing values
## Warning: Removed 542 rows containing missing values (geom_point).
## Warning: Removed 542 rows containing missing values (geom_point).
## Warning: Removed 542 rows containing missing values (geom_point).
## Warning: Removed 542 rows containing non-finite values (stat_density).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 542 rows containing missing values
## Warning: Removed 542 rows containing missing values (geom_point).
Correlation between factor scores in 2PL and other models is at least .992.
Let’s look at the IRFs
mods <- list(twoPL = m1, EH = m5, EHW = m6, Dav=m7)
itemplot(mods,1)
itemplot(mods,6)
itemplot(mods,11)
itemplot(mods,16)
itemplot(mods,21)
itemplot(mods,26)
itemplot(mods,31)
itemplot(mods,36)
The EH rises a bit more slowly than the 2PL. EHW is in the middle, Dav is generally a bit steeper.
no_0 <- Complex_short_grammatical %>%
mutate(
Raw = rowSums(.[,1:37])
) %>%
filter(Raw > 0) %>%
dplyr::select(-Raw)
m8 <- mirt(no_0, 1, "2PL", verbose=FALSE)
coefs_2pl <- coef(m8, as.data.frame = TRUE) %>%
t() %>%
as_tibble() %>%
dplyr::select(-c(149:150)) %>%
pivot_longer(everything()) %>%
tidyr::separate(, col=name, into=c("item", "parameter"), sep="([.])") %>%
pivot_wider(id_cols=item, names_from=parameter, values_from=value) %>%
dplyr::select(item, a1, d) %>%
mutate(
category = gsub('[[:digit:]]+', '', item)
)
ggplot(coefs_2pl,
aes(x = a1, y = -d)) +
geom_point(alpha = .3) +
ggrepel::geom_text_repel(data = coefs_2pl,
aes(label = item), size = 3) +
xlab("Discrimination") +
ylab("Difficulty") + theme_minimal()
# Summary
sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19043)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=Dutch_Netherlands.1252 LC_CTYPE=Dutch_Netherlands.1252
## [3] LC_MONETARY=Dutch_Netherlands.1252 LC_NUMERIC=C
## [5] LC_TIME=Dutch_Netherlands.1252
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] lordif_0.3-3 rms_6.2-0 SparseM_1.81 Hmisc_4.5-0
## [5] Formula_1.2-4 survival_3.2-11 GGally_2.1.2 patchwork_1.1.1
## [9] knitr_1.33 Gifi_0.3-9 psych_2.1.6 ltm_1.1-1
## [13] polycor_0.7-10 msm_1.6.8 MASS_7.3-54 mirt_1.34
## [17] lattice_0.20-44 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7
## [21] purrr_0.3.4 readr_2.0.0 tidyr_1.1.3 tibble_3.1.3
## [25] ggplot2_3.3.5 tidyverse_1.3.1 wordbankr_0.3.1
##
## loaded via a namespace (and not attached):
## [1] TH.data_1.0-10 colorspace_2.0-2 ellipsis_0.3.2
## [4] htmlTable_2.2.1 base64enc_0.1-3 fs_1.5.0
## [7] rstudioapi_0.13 farver_2.1.0 Deriv_4.1.3
## [10] MatrixModels_0.5-0 ggrepel_0.9.1 fansi_0.5.0
## [13] mvtnorm_1.1-2 lubridate_1.7.10 xml2_1.3.2
## [16] codetools_0.2-18 splines_4.1.0 mnormt_2.0.2
## [19] jsonlite_1.7.2 broom_0.7.9 cluster_2.1.2
## [22] dbplyr_2.1.1 png_0.1-7 compiler_4.1.0
## [25] httr_1.4.2 backports_1.2.1 assertthat_0.2.1
## [28] Matrix_1.3-3 cli_3.0.1 htmltools_0.5.1.1
## [31] quantreg_5.86 tools_4.1.0 gtable_0.3.0
## [34] glue_1.4.2 Rcpp_1.0.7 cellranger_1.1.0
## [37] jquerylib_0.1.4 vctrs_0.3.8 nlme_3.1-152
## [40] conquer_1.0.2 xfun_0.24 rvest_1.0.1
## [43] lifecycle_1.0.0 dcurver_0.9.2 polspline_1.1.19
## [46] zoo_1.8-9 scales_1.1.1 hms_1.1.0
## [49] sandwich_3.0-1 parallel_4.1.0 expm_0.999-6
## [52] RColorBrewer_1.1-2 yaml_2.2.1 gridExtra_2.3
## [55] sass_0.4.0 rpart_4.1-15 reshape_0.8.8
## [58] latticeExtra_0.6-29 stringi_1.7.3 highr_0.9
## [61] RMySQL_0.10.22 checkmate_2.0.0 permute_0.9-5
## [64] rlang_0.4.11 pkgconfig_2.0.3 matrixStats_0.60.0
## [67] evaluate_0.14 labeling_0.4.2 htmlwidgets_1.5.3
## [70] quantregGrowth_1.3-0 tidyselect_1.1.1 plyr_1.8.6
## [73] magrittr_2.0.1 R6_2.5.0 generics_0.1.0
## [76] multcomp_1.4-17 DBI_1.1.1 pillar_1.6.2
## [79] haven_2.4.3 foreign_0.8-81 withr_2.4.2
## [82] mgcv_1.8-35 nnet_7.3-16 modelr_0.1.8
## [85] crayon_1.4.1 utf8_1.2.2 tmvnsim_1.0-2
## [88] tzdb_0.1.2 rmarkdown_2.9 jpeg_0.1-9
## [91] grid_4.1.0 readxl_1.3.1 data.table_1.14.0
## [94] vegan_2.5-7 reprex_2.0.1 digest_0.6.27
## [97] GPArotation_2014.11-1 munsell_0.5.0 bslib_0.2.5.1