This is a scratch pad. I’m playing around with MIRT to try to identify alternative models to the 2PL. Jay pointed out that the correlation between discrimination and difficulty parameters in the 2PL model may be evidence of a misfitting model. The form of the link function will strongly affect estimation of the extreme values of the latent variable. Here, I try some alternative models.

I’ve excluded all the preliminary data processing steps from this file, and will move on to the models.

1 2pl Model

The model I have been fitting to the syntactic complexity data is the 2PL model.

m1 <- mirt(Complex_short_grammatical, 1, itemtype="2PL", verbose=FALSE) 

However, the pattern of discrimination and difficulty parameters here is a little worrisome.

coefs_2pl <- coef(m1, as.data.frame = TRUE) %>% 
 t() %>%
  as_tibble() %>%
  dplyr::select(-c(149:150)) %>%
  pivot_longer(everything()) %>%
  tidyr::separate(, col=name, into=c("item", "parameter"), sep="([.])") %>%
  pivot_wider(id_cols=item, names_from=parameter, values_from=value) %>%
  dplyr::select(item, a1, d) %>%
  mutate(
   category =  gsub('[[:digit:]]+', '', item)
  )

ggplot(coefs_2pl,  
       aes(x = a1, y = -d)) + 
  geom_point(alpha = .3) + 
  ggrepel::geom_text_repel(data = coefs_2pl, 
                  aes(label = item), size = 3) + 
  xlab("Discrimination") + 
  ylab("Difficulty") + theme_minimal()

Two things stand out: First, the discrimination parameters are extremely large. According to DeAyala 2009, good discrimination parameters are between .8 and 2.5. The parameters we have here, are much larger, suggesting they will only discriminate between people at a relatively narrow range of values on the scale of syntactic proficiency. This can be seen in the item response functions below (selected for 4 random items):

itemplot(m1, 1)
itemplot(m1, 10)
itemplot(m1, 17)
itemplot(m1, 30)

Second, there is a correlation between the difficulty and discrimination parameters, which could indicate an estimation issue.

The way I see it, there could be two problems leading to those patterns above: First, the IRF could be incorrect, and second, the form of the latent variable could be incorrect. Below I compare the fit of this model to alternative models.

2 Changing the IRF

2.1 cloglog

One alternative link function is the cloglog link function, written below \(F\left(\eta_{i}\right)=1-e^{-e^{\eta_{i}}}\) where \(\eta_{i}\) is the linear predictor.

Let’s make a function to plot it and compare it to the logistic distribution

cloglog <- function(a, b, th){ # I broke it into three parts, linear prediction, inner exp and outer exp
  lin = a*(th-b) # linear predictor
  inner = -exp(lin) # exp inside
  outer = -exp(inner) # exp outside
  pi = 1+outer # combine together
  return(pi)
}
cloglog(5, 2, 2)
## [1] 0.6321206

And now the logistic link function

logit<- function(a, b, th){# I also broke it into three parts, linear prediction, inner exp and outer exp
  lin = a*(th - b) # linear predictor
  top = exp(lin) # numerator
  bottom = 1 + top # denominator
  pi = top/bottom # combine together
  return(pi)
}
logit(5, 2, 2)
## [1] 0.5

Let’s plot the two against each other

tibble(
  theta = rnorm(1000, 0, 1)
) %>%
  mutate(
    pi_clog = cloglog(1, .5, theta),
    pi_logit = logit(1, .5, theta)
  ) %>%
  pivot_longer(col=starts_with("pi"), names_to="link", names_prefix="pi_", values_to="pi") %>%
  ggplot(aes(x = theta, y=pi, colour=link)) + geom_point()

So the two make similar predictions about th lowest levels of theta, but not the higher values of theta. The logit link function is symmetrical but the cloglog is not.

We can use the cloglog link function in MIRT using the createItem function. I’ve had a lot of trouble with this, however. The model converges in 1-3 iterations, sometimes returning the starting values and other times returning an estimated model that fits poorly. I wonder if I’ve used this function incorrectly. I asked a question about it on the MIRT Google Groups. I found that with the starting values below, the model converges after a couple iterations, so I’m using it to compare theta estimate and IRFs to those from the other models, but I’m not confident about this approach.

name <- 'cloglog'
par <- c(a = .5, b = 2)
est <- c(TRUE, TRUE)
P.cloglog <- function(par, Theta, ncat){
  a <- par[1]
  b <- par[2]
  P1 <- 1 - exp(-exp(a*(Theta - b)))
  cbind(1 - P1, P1)
}

clog <- createItem(name, par=par, est=est, P=P.cloglog)
m2 <- mirt(Complex_short_grammatical, 1, "cloglog", customItems=list(cloglog=clog), verbose=FALSE)
coefs_2pl <- coef(m2, as.data.frame = TRUE) %>% 
 t() %>%
  as_tibble() %>%
  dplyr::select(-c(75:76)) %>%
  pivot_longer(everything()) %>%
  tidyr::separate(, col=name, into=c("item", "parameter"), sep="([.])") %>%
  pivot_wider(id_cols=item, names_from=parameter, values_from=value) %>%
  dplyr::select(item, a, b) %>%
  mutate(
   category =  gsub('[[:digit:]]+', '', item)
  )

ggplot(coefs_2pl,  
       aes(x = a, y = b)) + 
  geom_point(alpha = .3) + 
  ggrepel::geom_text_repel(data = coefs_2pl, 
                  aes(label = item), size = 3) + 
  xlab("Discrimination") + 
  ylab("Difficulty") + theme_minimal()

Based on the fact that this model only ran through 2 iterations, I think it’s giving me the starting values back. The discrimination parameters are smaller, but I think that’s due to the model not leaving it’s starting values. There’s also still a bit of correlation between the two items. Also, many, many of the IRFs significantly misfit the data, which was not the case with teh 2PL model.

itemfit(m2)

2.2 Semi-Parametric

Since the cloglog model won’t run, I’m going to try a couple semi-parametric models. These can serve as a useful baseline to compare the 2PL and clog model to since they will fit the data quite well at the cost of interpretability of their parameters.

These spline models start with 4 knots. I’ve increase the NCYCLES to 4000 and it still doesn’t quite reach the convergence criteria, but the deviation is quite small

m3 <- mirt(Complex_short_grammatical, 1, "spline", technical=list(NCYCLES=4000), verbose=FALSE) 
## EM cycles terminated after 4000 iterations.

I’ve tried a monopoly model with mode knots that the one above. This takes a long time to run so I’ve saved the model and re-opened it.

#m4 <- mirt(Complex_short_grammatical, 1, "monopoly", monopoly.k=3)
#saveRDS(m4, "test_models/m4.rds")
m4 <- readRDS("test_models/m4.rds")

Now we can compare these models to the 2PL and CLOG Model. I’ll start by looking at the correlation between factor scores. I’ll start by comparing theta across the three models.

thetas <- Complex_short_grammatical %>%
  mutate(
    Raw = rowSums(.[,1:37]), 
    Theta_logit = fscores(m1, method="EAP")[,1], 
    Theta_clog = fscores(m2, method="EAP")[,1],
    Theta_spline = fscores(m3, method="EAP")[,1], 
    Theta_mono = fscores(m4, method="EAP")[,1], 
    ) 


ggpairs(thetas[,38:42])

The factor scores from the 2PL model correlate at .997 with those from both of the semi parametric models, suggesting the extra flexibility in the IRFs from these models aren’t adding very much. Also note the realy high correlation between cloglog and the raw data, suggesting it’s probably not estimating the latent variable.

Next we can look at the IRFs for each of these models, to see if the steepness of the IRFs for the 2PL is due to a misfitting link function.

mods <- list(twoPL = m1, clog = m2, spline = m3, mono=m4)
itemplot(mods,1)
itemplot(mods,6)
itemplot(mods,11)
itemplot(mods,16)
itemplot(mods,21)
itemplot(mods,26)
itemplot(mods,31)
itemplot(mods,36)

For al of these items, the two semi-parametric models produce IRFs that are very similar to that of the 2PL model, except for some semi-parametric silliness (spikes at the extreme values, etc)

3 Changing the LV

Another possibility is that the gaussian latent variable is inappropriate. We can assume a non-normal distribution using the three methods from MIRT.

First the empirical histogram.

m5 <- mirt(Complex_short_grammatical, 1, "2PL", dentype="EH", verbose=FALSE)
coefs_2pl <- coef(m5, as.data.frame = TRUE) %>% 
 t() %>%
  as_tibble() %>%
  dplyr::select(-c(149:150)) %>%
  pivot_longer(everything()) %>%
  tidyr::separate(, col=name, into=c("item", "parameter"), sep="([.])") %>%
  pivot_wider(id_cols=item, names_from=parameter, values_from=value) %>%
  dplyr::select(item, a1, d) %>%
  mutate(
   category =  gsub('[[:digit:]]+', '', item)
  )

ggplot(coefs_2pl,  
       aes(x = a1, y = -d)) + 
  geom_point(alpha = .3) + 
  ggrepel::geom_text_repel(data = coefs_2pl, 
                  aes(label = item), size = 3) + 
  xlab("Discrimination") + 
  ylab("Difficulty") + theme_minimal()

Discrimination parameters looks much more reasonable, though they are still correlated with the difficulty parameter.

Let’s try the weighted empirical histogram. I’ve set zeroExtreme to TRUE, because there are so many participants with 0s.

m6 <- mirt(Complex_short_grammatical, 1, dentype="EHW", technical = list(zeroExtreme = TRUE), verbose=FALSE)
coefs_2pl <- coef(m6, as.data.frame = TRUE) %>% 
 t() %>%
  as_tibble() %>%
  dplyr::select(-c(149:150)) %>%
  pivot_longer(everything()) %>%
  tidyr::separate(, col=name, into=c("item", "parameter"), sep="([.])") %>%
  pivot_wider(id_cols=item, names_from=parameter, values_from=value) %>%
  dplyr::select(item, a1, d) %>%
  mutate(
   category =  gsub('[[:digit:]]+', '', item)
  )

ggplot(coefs_2pl,  
       aes(x = a1, y = -d)) + 
  geom_point(alpha = .3) + 
  ggrepel::geom_text_repel(data = coefs_2pl, 
                  aes(label = item), size = 3) + 
  xlab("Discrimination") + 
  ylab("Difficulty") + theme_minimal()

This compresses the difficulty and discrimination parameters are bit more.

Try with semi-parametric davidian curves.

m7<- mirt(Complex_short_grammatical, 1, "2PL", dentype="Davidian-4", verbose=FALSE)
coefs_2pl <- coef(m7, as.data.frame = TRUE) %>% 
 t() %>%
  as_tibble() %>%
  dplyr::select(-c(149:150)) %>%
  pivot_longer(everything()) %>%
  tidyr::separate(, col=name, into=c("item", "parameter"), sep="([.])") %>%
  pivot_wider(id_cols=item, names_from=parameter, values_from=value) %>%
  dplyr::select(item, a1, d) %>%
  mutate(
   category =  gsub('[[:digit:]]+', '', item)
  )

ggplot(coefs_2pl,  
       aes(x = a1, y = -d)) + 
  geom_point(alpha = .3) + 
  ggrepel::geom_text_repel(data = coefs_2pl, 
                  aes(label = item), size = 3) + 
  xlab("Discrimination") + 
  ylab("Difficulty") + theme_minimal()
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_text_repel).

This makes the discrimination parameters even larger.

4 Comparing the four models

I’ll start by comparing theta across the three models.

thetas <- Complex_short_grammatical %>%
  mutate(
    Raw = rowSums(.[,1:37]), 
    Theta_logit = fscores(m1, method="MAP")[,1], # Used MAP here, but EAP gives same results. 
    Theta_EH = fscores(m5, method="MAP")[,1],
    Theta_EHW = fscores(m6, method="MAP")[,1], 
    Theta_Dav = fscores(m7, method="MAP")[,1]
  ) 
## Warning: The following factor score estimates failed to converge successfully:
##     152
## Warning: The following factor score estimates failed to converge successfully:
##     97,804
ggpairs(thetas[,38:42])
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 542 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 542 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 542 rows containing missing values
## Warning: Removed 542 rows containing missing values (geom_point).

## Warning: Removed 542 rows containing missing values (geom_point).

## Warning: Removed 542 rows containing missing values (geom_point).
## Warning: Removed 542 rows containing non-finite values (stat_density).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 542 rows containing missing values
## Warning: Removed 542 rows containing missing values (geom_point).

Correlation between factor scores in 2PL and other models is at least .992.

Let’s look at the IRFs

mods <- list(twoPL = m1, EH = m5, EHW = m6, Dav=m7)
itemplot(mods,1)
itemplot(mods,6)
itemplot(mods,11)
itemplot(mods,16)
itemplot(mods,21)
itemplot(mods,26)
itemplot(mods,31)
itemplot(mods,36)

The EH rises a bit more slowly than the 2PL. EHW is in the middle, Dav is generally a bit steeper.

5 Dropping the 0s.

no_0 <- Complex_short_grammatical %>%
  mutate(
    Raw = rowSums(.[,1:37])
  ) %>%
  filter(Raw > 0) %>%
  dplyr::select(-Raw)


m8 <- mirt(no_0, 1, "2PL", verbose=FALSE)
coefs_2pl <- coef(m8, as.data.frame = TRUE) %>% 
 t() %>%
  as_tibble() %>%
  dplyr::select(-c(149:150)) %>%
  pivot_longer(everything()) %>%
  tidyr::separate(, col=name, into=c("item", "parameter"), sep="([.])") %>%
  pivot_wider(id_cols=item, names_from=parameter, values_from=value) %>%
  dplyr::select(item, a1, d) %>%
  mutate(
   category =  gsub('[[:digit:]]+', '', item)
  )

ggplot(coefs_2pl,  
       aes(x = a1, y = -d)) + 
  geom_point(alpha = .3) + 
  ggrepel::geom_text_repel(data = coefs_2pl, 
                  aes(label = item), size = 3) + 
  xlab("Discrimination") + 
  ylab("Difficulty") + theme_minimal()

# Summary

  • Changing the IRF from logistic to a non-parametric function does not appear to affect the shape of the IRFs for most variables.
  • While models with the cloglog link function were difficult to estimate, it does not seem like the shape of the cloglog link function is appropriate given the results of the models with splines.
  • Changing from a gaussian to a non-parametric latent variable did substantially reduce the estimates of the discrimination parameters, and the IRFs were less steep. However, the correlation between the factor scores from the gaussian and non-gaussian models were greater than .99. Moreover, using a non-parametric latent variable precludes using other useful functions in MIRT for testing the model’s dimensionality and possible checking for DIF. Therefore my preference is to use the 2PL model with a gaussian latent variable, as we don’t care about the discrimination parameters pe se–we only care about their effect on the latent variable, which is, virtually, unchanged across these parameterizations.
sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19043)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=Dutch_Netherlands.1252  LC_CTYPE=Dutch_Netherlands.1252   
## [3] LC_MONETARY=Dutch_Netherlands.1252 LC_NUMERIC=C                      
## [5] LC_TIME=Dutch_Netherlands.1252    
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] lordif_0.3-3    rms_6.2-0       SparseM_1.81    Hmisc_4.5-0    
##  [5] Formula_1.2-4   survival_3.2-11 GGally_2.1.2    patchwork_1.1.1
##  [9] knitr_1.33      Gifi_0.3-9      psych_2.1.6     ltm_1.1-1      
## [13] polycor_0.7-10  msm_1.6.8       MASS_7.3-54     mirt_1.34      
## [17] lattice_0.20-44 forcats_0.5.1   stringr_1.4.0   dplyr_1.0.7    
## [21] purrr_0.3.4     readr_2.0.0     tidyr_1.1.3     tibble_3.1.3   
## [25] ggplot2_3.3.5   tidyverse_1.3.1 wordbankr_0.3.1
## 
## loaded via a namespace (and not attached):
##  [1] TH.data_1.0-10        colorspace_2.0-2      ellipsis_0.3.2       
##  [4] htmlTable_2.2.1       base64enc_0.1-3       fs_1.5.0             
##  [7] rstudioapi_0.13       farver_2.1.0          Deriv_4.1.3          
## [10] MatrixModels_0.5-0    ggrepel_0.9.1         fansi_0.5.0          
## [13] mvtnorm_1.1-2         lubridate_1.7.10      xml2_1.3.2           
## [16] codetools_0.2-18      splines_4.1.0         mnormt_2.0.2         
## [19] jsonlite_1.7.2        broom_0.7.9           cluster_2.1.2        
## [22] dbplyr_2.1.1          png_0.1-7             compiler_4.1.0       
## [25] httr_1.4.2            backports_1.2.1       assertthat_0.2.1     
## [28] Matrix_1.3-3          cli_3.0.1             htmltools_0.5.1.1    
## [31] quantreg_5.86         tools_4.1.0           gtable_0.3.0         
## [34] glue_1.4.2            Rcpp_1.0.7            cellranger_1.1.0     
## [37] jquerylib_0.1.4       vctrs_0.3.8           nlme_3.1-152         
## [40] conquer_1.0.2         xfun_0.24             rvest_1.0.1          
## [43] lifecycle_1.0.0       dcurver_0.9.2         polspline_1.1.19     
## [46] zoo_1.8-9             scales_1.1.1          hms_1.1.0            
## [49] sandwich_3.0-1        parallel_4.1.0        expm_0.999-6         
## [52] RColorBrewer_1.1-2    yaml_2.2.1            gridExtra_2.3        
## [55] sass_0.4.0            rpart_4.1-15          reshape_0.8.8        
## [58] latticeExtra_0.6-29   stringi_1.7.3         highr_0.9            
## [61] RMySQL_0.10.22        checkmate_2.0.0       permute_0.9-5        
## [64] rlang_0.4.11          pkgconfig_2.0.3       matrixStats_0.60.0   
## [67] evaluate_0.14         labeling_0.4.2        htmlwidgets_1.5.3    
## [70] quantregGrowth_1.3-0  tidyselect_1.1.1      plyr_1.8.6           
## [73] magrittr_2.0.1        R6_2.5.0              generics_0.1.0       
## [76] multcomp_1.4-17       DBI_1.1.1             pillar_1.6.2         
## [79] haven_2.4.3           foreign_0.8-81        withr_2.4.2          
## [82] mgcv_1.8-35           nnet_7.3-16           modelr_0.1.8         
## [85] crayon_1.4.1          utf8_1.2.2            tmvnsim_1.0-2        
## [88] tzdb_0.1.2            rmarkdown_2.9         jpeg_0.1-9           
## [91] grid_4.1.0            readxl_1.3.1          data.table_1.14.0    
## [94] vegan_2.5-7           reprex_2.0.1          digest_0.6.27        
## [97] GPArotation_2014.11-1 munsell_0.5.0         bslib_0.2.5.1