Item Response Theory (Binary Data)

Library

In this session , we will used some packages with each unique functions. Hereby, brief explanation of the packages:

dplyr: for tidying and formatting dataset
openxlsx: for operating xlsx format (exporting and so on)
readxl: for importing xlsx data
knitr: for displaying in certain theme
lavaan: for conducting Confirmatory Factor Analysis
mirt: for conducting item response theory analysis
gridExtra: for tidying some graphs

library(dplyr)
library(openxlsx)
library(readxl)
library(knitr)
library(lavaan)
library(mirt)
library(gridExtra)

Data Preparation

In this section, we simulated data to get a data set containing binary data (1/0). The data represent correct answer (1) and wrong answer (0) for six dummy items. The items were already determined in term of alpha (\(\alpha\)) and delta (\(\delta\)). Alpha referred to item discrimination which were 1.25, 1.5, 0, 0.75, 1.75 and 2.25 for item 1 to item 6 respectively. Delta (\(\delta\)) can be defined as item difficulty or item location where it were -2, 0, 1, 0, 0.5, and -1 respectively.

## Create Binary Data Simulation
#alpha <- c(1.25, 1.5, 0, 0.75, 1.75, 2.25)
#J <- length(alpha)
#delta <- c(-2, 0, 1,0,  0.5, -1)
#chi <- c(0,0,0,0,0,0)
#theta <- rnorm(1000)
#N <- length(theta)
#data.binary <- matrix(0, nrow = N, ncol = J)
  
#for(j in 1:J){
#  data.binary[, j] <- runif(N) < (chi[j] + (1 - chi[j]) / 
#                            (1 + exp(-alpha[j] * (theta - delta[j]))))
#}

## Assign Column Name
#colnames(data.binary) <- c("item01", "item02", "item03", "item04", "item05", "item06")
#data.binary <- data.binary %>% as.data.frame()

## Save data 
#write.xlsx(data.binary, "Data Binary.xlsx")

## Load data again
data.binary <- read_excel("Data Binary.xlsx") %>% as.data.frame()

# displaying data
kable(head(data.binary))

item01	item02	item03	item04	item05	item06
1	0	0	0	0	0
1	1	1	1	1	1
1	0	1	1	0	1
1	0	1	1	1	1
1	0	1	1	0	1
1	1	0	0	0	1

Dimensionality

IRT, in general, needed a uni-dimensional model which was reflected by some amount of items. Therefore, dimensionality test should be performed before conducting IRT analysis. This example, we analyzed the \(data.binary\) data frame to prove that it was uni-dimensional construct measuring a specific concept. We conducted Confirmatory Factor Analysis with uni-dimensional model.

Model Specification

Since we used a uni-dimensional model, we specified that all items were reflection of the specific context that we assumed.

cfa.spec <- 'factor =~ item01 + item02 + item03 + 
                       item04 + item05 + item06 '

Confirmatory Factor Analysis

### Running Analysis
cfa.mod <- cfa(cfa.spec, data.binary)

### Displyaing Result
summary(cfa.mod, fit.measures = T, standardized = T)

lavaan 0.6-12 ended normally after 51 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        12

  Number of observations                          1000

Model Test User Model:
                                                      
  Test statistic                                17.402
  Degrees of freedom                                 9
  P-value (Chi-square)                           0.043

Model Test Baseline Model:

  Test statistic                               444.174
  Degrees of freedom                                15
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.980
  Tucker-Lewis Index (TLI)                       0.967

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -3424.583
  Loglikelihood unrestricted model (H1)      -3415.882
                                                      
  Akaike (AIC)                                6873.166
  Bayesian (BIC)                              6932.059
  Sample-size adjusted Bayesian (BIC)         6893.946

Root Mean Square Error of Approximation:

  RMSEA                                          0.031
  90 Percent confidence interval - lower         0.005
  90 Percent confidence interval - upper         0.052
  P-value RMSEA <= 0.05                          0.930

Standardized Root Mean Square Residual:

  SRMR                                           0.021

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv
  factor =~                                                    
    item01            1.000                               0.082
    item02            3.570    0.581    6.147    0.000    0.292
    item03            0.340    0.248    1.372    0.170    0.028
    item04            2.438    0.430    5.667    0.000    0.199
    item05            3.010    0.498    6.041    0.000    0.246
    item06            3.034    0.492    6.161    0.000    0.248
  Std.all
         
    0.264
    0.584
    0.056
    0.399
    0.516
    0.602

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv
   .item01            0.090    0.004   21.380    0.000    0.090
   .item02            0.164    0.011   15.171    0.000    0.164
   .item03            0.249    0.011   22.320    0.000    0.249
   .item04            0.210    0.011   19.837    0.000    0.210
   .item05            0.167    0.010   17.404    0.000    0.167
   .item06            0.108    0.007   14.496    0.000    0.108
    factor            0.007    0.002    3.349    0.001    1.000
  Std.all
    0.930
    0.658
    0.997
    0.841
    0.734
    0.637
    1.000

Based on the result above, we have to notice that the CFI and TLI were more than 0.95 and the RMSEA was lower than 0.060 (cut-off by Hu & Bentler, xxxx). Those parameter created a concept of fit model which reflected our assumption that the measurement was developed in uni-dimensional model. Then, we should have a look for factor loading (\(\lambda\)) which also indicated item discrimination (McDonald, 2222) where value above 0.40 or higher was expected (Furr & Bacharach, 2017). In this case, we found item 03 had value below 0.40 which then we can take a note for this item in case of bad discriminating item.

Finally, we can move to next step which is IRT analysis with various techniques!

1-Parameter Logistic using Rasch

In brief, 1-Parameter Logistic emphasized on the utily of item difficulty which led to the different value of \(\delta\). Hereby the script for this analysis.

Model Analysis

### Running Analyis
model.1pl <- mirt(data.binary, 1, rep('Rasch', 6), SE = TRUE, verbose = F)

### Displaying result with SE by list of items
coef(model.1pl, IRT = T, SE = T)

$item01
         a      b  g  u
par      1 -2.492  0  1
CI_2.5  NA -2.722 NA NA
CI_97.5 NA -2.262 NA NA

$item02
         a      b  g  u
par      1 -0.063  0  1
CI_2.5  NA -0.215 NA NA
CI_97.5 NA  0.088 NA NA

$item03
         a      b  g  u
par      1  0.010  0  1
CI_2.5  NA -0.141 NA NA
CI_97.5 NA  0.162 NA NA

$item04
         a      b  g  u
par      1  0.020  0  1
CI_2.5  NA -0.132 NA NA
CI_97.5 NA  0.171 NA NA

$item05
         a     b  g  u
par      1 0.749  0  1
CI_2.5  NA 0.591 NA NA
CI_97.5 NA 0.907 NA NA

$item06
         a      b  g  u
par      1 -1.546  0  1
CI_2.5  NA -1.726 NA NA
CI_97.5 NA -1.365 NA NA

$GroupPars
        MEAN_1 COV_11
par          0  1.069
CI_2.5      NA  0.846
CI_97.5     NA  1.291

### Displaying result with out SE
coef(model.1pl, IRT = T, SE = T, simplify = T)

$items
       a      b g u
item01 1 -2.492 0 1
item02 1 -0.063 0 1
item03 1  0.010 0 1
item04 1  0.020 0 1
item05 1  0.749 0 1
item06 1 -1.546 0 1

$means
F1 
 0 

$cov
      F1
F1 1.069

Result showed that the \(\alpha\) were constant namely 1. However, the item difficulty varied across the item. Important to notice, the alpha parameter can merely interpreted as an easy item which was less equal to -1, moderate ranging from greater -1 until greater equal to 1, and a hard item which was greater than 1 (…..). Implication, this parameter can be indicators for test assembly which focus on the variety of item difficulty and also the foundation of computer adaptive test.

Item Characteristic Curve

Not only numerical value, this analysis also showed the Item Characteristic Curve (ICC). It plotted the probability of individual can success to answer a specific item correctly. However, using 1-PL, it only compared the item location across the item. Remember, the concept of item difficult was the .50 probability with certain value can successfully choose the correct answer. For example, item difficulty of item01 was got from .50 or 50% probability where person with ability (\(\theta\)) equal to -2.492 have 50% chance to answer correctly or incorrectly.

### Individual Plot
itemplot(model.1pl, 1, "trace")

(#fig:1-Parameter Logistic ICC)Single ICC for Item 1

### Arranged Plot
icc.1pl.1 <- itemplot(model.1pl, 1, "trace")
icc.1pl.2 <- itemplot(model.1pl, 2, "trace")
icc.1pl.3 <- itemplot(model.1pl, 3, "trace")
icc.1pl.4 <- itemplot(model.1pl, 4, "trace")
icc.1pl.5 <- itemplot(model.1pl, 5, "trace")
icc.1pl.6 <- itemplot(model.1pl, 6, "trace")

#### Arrange Plot for 1 and 2
grid.arrange(icc.1pl.1, icc.1pl.2, ncol = 2, heights = 3)

(#fig:1-Parameter Logistic ICC 2-1)Arranged ICC by 2 items

#### Arrange Plot for 3 and 4
grid.arrange(icc.1pl.3, icc.1pl.4, ncol = 2, heights = 3)

(#fig:1-Parameter Logistic ICC 2-2)Arranged ICC by 2 items

#### Arrange Plot for 5 and 6
grid.arrange(icc.1pl.5, icc.1pl.6, ncol = 2, heights = 3)

(#fig:1-Parameter Logistic ICC 2-3)Arranged ICC by 2 items

So, from the graphs above, we can merely conclude that item01 and item06 can be categorized as easy items in term of item difficulty. The remaining can be said as moderate items.

Item Information Curve

Different with ICC, Item Information Curve (IIC) provided how many information that the item can capture from the persons’ responses in order to differentiate the persons’ theta (\(\theta\)) based on their response. Let us have a look from graphs below.

### Individual Plot
itemplot(model.1pl, 1, "info", heights=6)

(#fig:1-Parameter Logistic IIC)Single IIC for Item 1

Item 1 had optimum information for person whose theta was approximately -3. It indicated that the item can give information about the different ability of person that close to -3. Such as, person who got -3,4 and -3,4 will be differentiated. On the other side, this item cannot provided the differentiation between persons whose theta were 4 and 6 because the information for these theta is so small that is around 0.01.

### Individual Plot
### Arranged Plot
icc.1pl.1 <- itemplot(model.1pl, 1, "info")
icc.1pl.2 <- itemplot(model.1pl, 2, "info")
icc.1pl.3 <- itemplot(model.1pl, 3, "info")
icc.1pl.4 <- itemplot(model.1pl, 4, "info")
icc.1pl.5 <- itemplot(model.1pl, 5, "info")
icc.1pl.6 <- itemplot(model.1pl, 6, "info")

#### Arrange Plot for 1 and 2
grid.arrange(icc.1pl.1, icc.1pl.2, ncol = 2, heights = 3)

(#fig:1-Parameter Logistic IIC 2-1)Arranged IIC by 2 items

#### Arrange Plot for 3 and 4
grid.arrange(icc.1pl.3, icc.1pl.4, ncol = 2, heights = 3)

(#fig:1-Parameter Logistic IIC 2-2)Arranged IIC by 2 items

#### Arrange Plot for 5 and 6
grid.arrange(icc.1pl.5, icc.1pl.6, ncol = 2, heights = 3)

(#fig:1-Parameter Logistic IIC 2-3)Arranged IIC by 2 items

2-Parameter Logistic

For this model, it modified the basic formula of 1-PL that was changing the \(\alpha\) value to be freely estimated. Let us check following script and its result.

Model Analysis

### Running Analyis
model.2pl <- mirt(data.binary, 1, rep('2PL', 6), SE = TRUE, verbose = F)

### Displaying result with SE by list of items
coef(model.2pl, IRT = T, SE = T)

$item01
            a      b  g  u
par     0.880 -2.720  0  1
CI_2.5  0.583 -3.481 NA NA
CI_97.5 1.177 -1.959 NA NA

$item02
            a      b  g  u
par     1.673 -0.049  0  1
CI_2.5  1.293 -0.159 NA NA
CI_97.5 2.053  0.062 NA NA

$item03
             a      b  g  u
par      0.107  0.075  0  1
CI_2.5  -0.049 -1.091 NA NA
CI_97.5  0.263  1.242 NA NA

$item04
            a      b  g  u
par     0.953  0.019  0  1
CI_2.5  0.729 -0.136 NA NA
CI_97.5 1.178  0.174 NA NA

$item05
            a     b  g  u
par     1.641 0.549  0  1
CI_2.5  1.254 0.416 NA NA
CI_97.5 2.028 0.682 NA NA

$item06
            a      b  g  u
par     3.125 -0.898  0  1
CI_2.5  1.908 -1.030 NA NA
CI_97.5 4.343 -0.767 NA NA

$GroupPars
        MEAN_1 COV_11
par          0      1
CI_2.5      NA     NA
CI_97.5     NA     NA

### Displaying result with out SE
coef(model.2pl, IRT = T, SE = T, simplify = T)

$items
           a      b g u
item01 0.880 -2.720 0 1
item02 1.673 -0.049 0 1
item03 0.107  0.075 0 1
item04 0.953  0.019 0 1
item05 1.641  0.549 0 1
item06 3.125 -0.898 0 1

$means
F1 
 0 

$cov
   F1
F1  1

Result from the analysis above displayed different value of \(\alpha\). This parameter was pretty unique as compared to item difficulty. The value greater than 1.3 (…..) indicated that the item can differentiate between high ability and low ability. Then, we can conclude that item02, item 05, and item06 had a good discrimination. The remaining were opposite. This finding, also, can be investigated from the graph below.

Item Characteristic Curve

### Individual Plot
itemplot(model.2pl, 1, "trace")

Basically, the graph was quite similar to the former model yet it can be different if we compared to the rest of items in this model.

### Arranged Plot
icc.2pl.1 <- itemplot(model.2pl, 1, "trace")
icc.2pl.2 <- itemplot(model.2pl, 2, "trace")
icc.2pl.3 <- itemplot(model.2pl, 3, "trace")
icc.2pl.4 <- itemplot(model.2pl, 4, "trace")
icc.2pl.5 <- itemplot(model.2pl, 5, "trace")
icc.2pl.6 <- itemplot(model.2pl, 6, "trace")

#### Arrange Plot for 1 and 2
grid.arrange(icc.2pl.1, icc.2pl.2, ncol = 2, heights = 3)

#### Arrange Plot for 3 and 4
grid.arrange(icc.2pl.3, icc.2pl.4, ncol = 2, heights = 3)

#### Arrange Plot for 5 and 6
grid.arrange(icc.2pl.5, icc.2pl.6, ncol = 2, heights = 3)

Based on the graph between item01 and item02, we can see that the slope (the other word of \(\alpha\)) was more slipper in item02 as compared to item01. It meant that item02 can differentiate individu better than item01.

Then, if we give concern to item03 in this model, it exhibited a linear model meaning that the item cannot differentiate the person with high and low ability. As compared to former model, this item was look so terrible as hte former got pretty slippery slope.

For the best practices, item05 and item06 represented good cases of item that can differentiate well. So, it will be expected case for practices.

Item Information Curve

### Individual Plot
itemplot(model.2pl, 1, "info", heights=6)

### Individual Plot
### Arranged Plot
icc.2pl.1 <- itemplot(model.2pl, 1, "info")
icc.2pl.2 <- itemplot(model.2pl, 2, "info")
icc.2pl.3 <- itemplot(model.2pl, 3, "info")
icc.2pl.4 <- itemplot(model.2pl, 4, "info")
icc.2pl.5 <- itemplot(model.2pl, 5, "info")
icc.2pl.6 <- itemplot(model.2pl, 6, "info")

#### Arrange Plot for 1 and 2
grid.arrange(icc.2pl.1, icc.2pl.2, ncol = 2, heights = 3)

#### Arrange Plot for 3 and 4
grid.arrange(icc.2pl.3, icc.2pl.4, ncol = 2, heights = 3)

#### Arrange Plot for 5 and 6
grid.arrange(icc.2pl.5, icc.2pl.6, ncol = 2, heights = 3)

3-Parameter Logistic

In term of 3-Parameter Logistic, it added a pseudo-guessing factor to the model. It meant that persons will have a higher chance to get correct answer if they answered the item merely by guessing. Then, we can investigate the influence of this parameter to the model.

Model Analysis

### Running Analyis
model.3pl <- mirt(data.binary, 1, rep('3PL', 6), SE = TRUE, verbose = F)

### Displaying result with SE by list of items
coef(model.3pl, IRT = T, SE = T)

$item01
             a      b     g  u
par      1.192 -0.871 0.648  1
CI_2.5  -0.002 -3.424 0.174 NA
CI_97.5  2.386  1.682 1.122 NA

$item02
            a      b g  u
par     1.408 -0.050 0  1
CI_2.5  1.064 -0.171 0 NA
CI_97.5 1.752  0.071 0 NA

$item03
             a       b      g  u
par      0.109   1.394  0.067  1
CI_2.5  -0.243 -43.426 -2.227 NA
CI_97.5  0.462  46.215  2.361 NA

$item04
            a      b      g  u
par     1.061  0.249  0.089  1
CI_2.5  0.385 -0.568 -0.228 NA
CI_97.5 1.737  1.066  0.406 NA

$item05
             a     b     g  u
par      6.897 0.618 0.105  1
CI_2.5  -6.160 0.496 0.065 NA
CI_97.5 19.954 0.741 0.146 NA

$item06
             a      b      g  u
par      5.658 -0.818  0.001  1
CI_2.5  -2.564 -0.964 -0.037 NA
CI_97.5 13.879 -0.671  0.039 NA

$GroupPars
        MEAN_1 COV_11
par          0      1
CI_2.5      NA     NA
CI_97.5     NA     NA

### Displaying result with out SE
coef(model.3pl, IRT = T, SE = T, simplify = T)

$items
           a      b     g u
item01 1.192 -0.871 0.648 1
item02 1.408 -0.050 0.000 1
item03 0.109  1.394 0.067 1
item04 1.061  0.249 0.089 1
item05 6.897  0.618 0.105 1
item06 5.658 -0.818 0.001 1

$means
F1 
 0 

$cov
   F1
F1  1

inevitably, the value of \(\alpha\) and \(\delta\) will be changed because of the psuedo-guessing factor. For instance, item01 had \(\delta\) of -0.871 which it got value of -2.720 in model 2-PL and -2.490 in model 1-PL which it can be explained from the unveiled pseudo-guessing factor that .648 or 64.8%. This percentage meant that a person had 64.8% chance to get correct answer if they answered the item by guessing. Then, the slope above 64.8% have a exponential shape where persons whose ability score was around -0.871 will be 50% have higher chance as compared to persons with the lowest ability score to answer correctly. Also, the slope begun more slipper than model 2 as the item discrimination or \(\alpha\) increased.

On the other hand, item02 and item06 exhibited lowest value of psuedo-guessing. It was absolutely expected results for practices.

Item Characteristic Curve

### Individual Plot
itemplot(model.3pl, 1, "trace")

### Arranged Plot
icc.3pl.1 <- itemplot(model.3pl, 1, "trace")
icc.3pl.2 <- itemplot(model.3pl, 2, "trace")
icc.3pl.3 <- itemplot(model.3pl, 3, "trace")
icc.3pl.4 <- itemplot(model.3pl, 4, "trace")
icc.3pl.5 <- itemplot(model.3pl, 5, "trace")
icc.3pl.6 <- itemplot(model.3pl, 6, "trace")

#### Arrange Plot for 1 and 2
grid.arrange(icc.3pl.1, icc.3pl.2, ncol = 2, heights = 3)

#### Arrange Plot for 3 and 4
grid.arrange(icc.3pl.3, icc.3pl.4, ncol = 2, heights = 3)

#### Arrange Plot for 5 and 6
grid.arrange(icc.3pl.5, icc.3pl.6, ncol = 2, heights = 3)

Item Information Curve

### Individual Plot
itemplot(model.2pl, 1, "info", heights=6)

### Arranged Plot
icc.3pl.1 <- itemplot(model.3pl, 1, "info")
icc.3pl.2 <- itemplot(model.3pl, 2, "info")
icc.3pl.3 <- itemplot(model.3pl, 3, "info")
icc.3pl.4 <- itemplot(model.3pl, 4, "info")
icc.3pl.5 <- itemplot(model.3pl, 5, "info")
icc.3pl.6 <- itemplot(model.3pl, 6, "info")

#### Arrange Plot for 1 and 2
grid.arrange(icc.3pl.1, icc.2pl.2, ncol = 2, heights = 3)

#### Arrange Plot for 3 and 4
grid.arrange(icc.3pl.3, icc.3pl.4, ncol = 2, heights = 3)

#### Arrange Plot for 5 and 6
grid.arrange(icc.3pl.5, icc.3pl.6, ncol = 2, heights = 3)

Comparasion

Item Characteristic Curve

Item03

### Arranged Plot
item03.1pl <- itemplot(model.1pl, 3, "trace")
item03.2pl <- itemplot(model.2pl, 3, "trace")
item03.3pl <- itemplot(model.3pl, 3, "trace")


#### Arrange Plot for 1 and 2
grid.arrange(item03.1pl, item03.2pl, item03.3pl, nrow = 2, layout_matrix = rbind(c(1,1,2,2),c(NA,3,3,NA)))

Item05

### Arranged Plot
item05.1pl <- itemplot(model.1pl, 5, "trace")
item05.2pl <- itemplot(model.2pl, 5, "trace")
item05.3pl <- itemplot(model.3pl, 5, "trace")


#### Arrange Plot for 1 and 2
grid.arrange(item05.1pl, item05.2pl, item05.3pl, nrow = 2, layout_matrix = rbind(c(1,1,2,2),c(NA,3,3,NA)))

Item06

### Arranged Plot
item06.1pl <- itemplot(model.1pl, 6, "trace")
item06.2pl <- itemplot(model.2pl, 6, "trace")
item06.3pl <- itemplot(model.3pl, 6, "trace")


#### Arrange Plot for 1 and 2
grid.arrange(item06.1pl, item06.2pl, item06.3pl, nrow = 2, layout_matrix = rbind(c(1,1,2,2),c(NA,3,3,NA)))

Item Information Curve

Item03

### Arranged Plot
item03.1pl <- itemplot(model.1pl, 3, "info")
item03.2pl <- itemplot(model.2pl, 3, "info")
item03.3pl <- itemplot(model.3pl, 3, "info")


#### Arrange Plot for 1 and 2
grid.arrange(item03.1pl, item03.2pl, item03.3pl, nrow = 2, layout_matrix = rbind(c(1,1,2,2),c(NA,3,3,NA)))

Item05

### Arranged Plot
item05.1pl <- itemplot(model.1pl, 5, "info")
item05.2pl <- itemplot(model.2pl, 5, "info")
item05.3pl <- itemplot(model.3pl, 5, "info")


#### Arrange Plot for 1 and 2
grid.arrange(item05.1pl, item05.2pl, item05.3pl, nrow = 2, layout_matrix = rbind(c(1,1,2,2),c(NA,3,3,NA)))

Item06

### Arranged Plot
item06.1pl <- itemplot(model.1pl, 6, "info")
item06.2pl <- itemplot(model.2pl, 6, "info")
item06.3pl <- itemplot(model.3pl, 6, "info")


#### Arrange Plot for 1 and 2
grid.arrange(item06.1pl, item06.2pl, item06.3pl, nrow = 2, layout_matrix = rbind(c(1,1,2,2),c(NA,3,3,NA)))

Test Characteristic Curve

### Arranged Plot
par(mfrow = c(1, 3))
plot(seq(-6, 6, by = 0.01), 
      expected.test(model.1pl, matrix(seq(-6, 6, by = 0.01))),
      col = "black",
      ylim=c(0, 6), type = "l", xlab=expression(theta), 
      ylab="Expected Score", main="Test Characteristic Curve", lwd = 2)
plot(seq(-6, 6, by = 0.01), 
      expected.test(model.2pl, matrix(seq(-6, 6, by = 0.01))),
      col = "black",
      ylim=c(0, 6), type = "l", xlab=expression(theta), 
      ylab="Expected Score", main="Test Characteristic Curve", lwd = 2)
plot(seq(-6, 6, by = 0.01), 
      expected.test(model.3pl, matrix(seq(-6, 6, by = 0.01))),
      col = "black",
      ylim=c(0, 6), type = "l", xlab=expression(theta), 
      ylab="Expected Score", main="Test Characteristic Curve", lwd = 2)

Precision of Measurement

### Arranged Plot
par(mfrow = c(3, 1))
theta = matrix(seq(-6, 6, by = 0.01))
testinfo.1pl = testinfo(model.1pl, Theta = matrix(seq(-6, 6, by = 0.01)), which.items = 1:6)
plot(theta, testinfo.1pl, type = "l", lty = 1,
     xlab = expression(theta),
     ylab = "Test Information 1-PL")
testinfo.2pl = testinfo(model.2pl, Theta = matrix(seq(-6, 6, by = 0.01)), which.items = 1:6)
plot(theta, testinfo.2pl, type = "l", lty = 1,
     xlab = expression(theta),
     ylab = "Test Information 2-PL")
testinfo.3pl = testinfo(model.3pl, Theta = matrix(seq(-6, 6, by = 0.01)), which.items = 1:6)
plot(theta, testinfo.3pl, type = "l", lty = 1,
     xlab = expression(theta),
     ylab = "Test Information 3-PL")

IRT Empirical Reliability

# 1-PL
empirical_rxx(fscores(object = model.1pl, method = "EAP", full.scores.SE = T))

       F1 
0.5220524

# 2-PL
empirical_rxx(fscores(object = model.2pl, method = "EAP", full.scores.SE = T))

       F1 
0.6394948

# 3-PL
empirical_rxx(fscores(object = model.3pl, method = "EAP", full.scores.SE = T))

       F1 
0.7264842

How we choose model?

Comparison between 1 and 2-PL

# Comparison 1-PL with 2-3PL
anova(model.1pl, model.2pl)

               AIC    SABIC       HQ      BIC    logLik      X2 df p
model.1pl 6934.356 6946.478 6947.413 6968.710 -3460.178             
model.2pl 6773.160 6793.940 6795.544 6832.053 -3374.580 171.196  5 0

This analysis was similar to ANOVA test. However it calculated the fit indices from the models. Results showed that there was a significant different between 1-PL and 2-PL models where the latter model had lower value of Bayesian Information Criteria (BIC) to deal with overfit issues and assess the model selection uncertainty (Wu, Fai Cheung, & On Leung, 2019). Therefore, based on this, we can choose model 2-PL rather than 1-PL

Comparison between 1 and 3-PL

# Comparison 2-PL with 3-PL
anova(model.1pl, model.3pl)

               AIC    SABIC       HQ      BIC    logLik      X2 df p
model.1pl 6934.356 6946.478 6947.413 6968.710 -3460.178             
model.3pl 6773.812 6804.983 6807.388 6862.152 -3368.906 182.543 11 0

Similar to the former analysis, it pointed out that the 1-PL models had greater significant value of BIC which thus we should use 3-PL than 1-PL model.

Comparison between 2 and 3-PL

# Comparison 1-PL with 3-PL
anova(model.2pl, model.3pl)

               AIC    SABIC       HQ      BIC    logLik     X2 df
model.2pl 6773.160 6793.940 6795.544 6832.053 -3374.580          
model.3pl 6773.812 6804.983 6807.388 6862.152 -3368.906 11.348  6
              p
model.2pl      
model.3pl 0.078

Inversely, we found that 2-PL and 3-PL models were not significantly different even though 2-PL model had lower BIC value. So, in this situation, based on Wu, Fai Cheung, and On Leung (2019), we can choose 2-PL model by focusing on its BIC value.

How about if we cannot compute anova() for certain models?

# Comparing BIC
kable(data.frame(cbind(Model =c("1-PL", "2-PL", "3-PL"),
                       BIC = c(round(extract.mirt(model.1pl, 'BIC'),3),
                       round(extract.mirt(model.2pl, 'BIC'),3),
                       round(extract.mirt(model.3pl, 'BIC'),3)))))

Model	BIC
1-PL	6968.71
2-PL	6832.053
3-PL	6862.152

For three models, it exhibited that 2-PL had the lowest BIC value. So, we can take an conclusion that we should use 2-PL model for this case.