Module ‘UWO-1’: dichotomous IRT using University of Western Ontario R code

This report has been created by an R program written to read item scores data from a csv file, such as the Omega-IScores.csv file often used with Lertap 5, and then to use functions in selected R packages in order to get a variety of IRT statistics and graphs.

Large sections of the R code used in this module are from a paper by Smyth and Johnson, University of Western Ontario. A copy of their paper, ‘Item Response Theory for Dichotomous Items’, might be available here.

This Rmd module has been assembled by Larry Nelson, Curtin University, Western Australia (l.nelson@curtin.edu.au). It was last updated 28 June 2018.

Omega-IScores.csv files are created by Lertap. A reference to the respective Lertap help page is here. It is NOT necessary to use Omega-IScores.csv files – replace the ‘scored <- read.csv(“Omega-IScores.csv”)’ below with whatever csv file is appropriate.

A reference for the Rasch model is here.

The csv file used in this analysis was obtained from this folder:

[1] "/Users/nelsonlarry/Dropbox/MyStuff/Lertap/SampleDatasets/U-Western-Ontario"

The files in this folder are listed below:

dir()

 [1] "D1_scored.csv"                          
 [2] "DichotomousIRTtutorial.pdf"             
 [3] "EIRT-binaries.csv"                      
 [4] "IRTmoduleUWO-1.html"                    
 [5] "IRTmoduleUWO-1.Rmd"                     
 [6] "LSATsample.xlsx"                        
 [7] "Omega-From-IScores.Rmd"                 
 [8] "Rasch-Analysis-TAM.Rmd"                 
 [9] "Rasch-Analysis-UWO-experimental.html"   
[10] "Rasch-Analysis-UWO-experimental.nb.html"
[11] "Rasch-Analysis-UWO-experimental.Rmd"    
[12] "Rasch-Analysis-UWO.html"                
[13] "Rasch-Analysis-UWO.Rmd"                 
[14] "rsconnect"                              
[15] "Smyth-Johnson1.txt"                     
[16] "Smyth-Johnson2.txt"

The first step gets R to read the csv file into an internal working dataset called “scored”. Then the first few records of the dataset are displayed as a check. You should confirm that these records are as expected! The numbers in the first column have been inserted by R. The second column should contain scores for the first item being processed.

Note: lines that begin with ## are ignored by the processor.

##scored <- read.csv("Omega-IScores.csv")
##scored <- read.csv("D1_scored.csv") ## from the FIMS study
scored <- read.csv("EIRT-binaries.csv") ## from EIRT's main dataset
head (scored)

  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1  1  0  1  0  0  0  0  0  0   1
2  1  0  1  1  1  1  0  1  1   0
3  1  1  1  1  1  1  0  1  1   1
4  0  0  0  0  0  0  0  1  0   0
5  1  0  1  0  1  0  1  1  1   1
6  1  1  1  0  1  0  0  1  0   1

Now try fitting the Rasch model with discrimination fixed (‘constrained’) to a value of 1 – this will be referred to in this report to as ‘Model1’.

fit1 <- rasch(scored, constraint=cbind(length(scored)+1,1))

We can use the ‘summary’ function to get fit statistics for the constrained Rash model. Results will appear below under ‘Model Summary’.

summary(fit1)


Call:
rasch(data = scored, constraint = cbind(length(scored) + 1, 1))

Model Summary:
  log.Lik      AIC      BIC
 -6250.77 12521.54 12570.62

Coefficients:
             value std.err   z.vals
Dffclt.V1   0.6608  0.0787   8.4014
Dffclt.V2  -0.1606  0.0766  -2.0974
Dffclt.V3  -0.2432  0.0767  -3.1694
Dffclt.V4  -0.4448  0.0774  -5.7462
Dffclt.V5  -0.5756  0.0781  -7.3717
Dffclt.V6   0.8504  0.0801  10.6138
Dffclt.V7   0.4681  0.0775   6.0359
Dffclt.V8  -1.3505  0.0859 -15.7148
Dffclt.V9   0.6348  0.0785   8.0886
Dffclt.V10 -0.3801  0.0771  -4.9274
Dscrmn      1.0000      NA       NA

Integration:
method: Gauss-Hermite
quadrature points: 21 

Optimization:
Convergence: 0 
max(|grad|): 0.06 
quasi-Newton: BFGS

The ‘coef’ function can be used to get a table of item difficulty, item discrimination (fixed at a value of 1 in this case), and the probablility that an ‘average’ individual (z-score of 0) will answer an item correctly.

coef(fit1,prob=TRUE)

        Dffclt Dscrmn P(x=1|z=0)
V1   0.6607775      1  0.3405650
V2  -0.1605864      1  0.5400606
V3  -0.2431810      1  0.5604974
V4  -0.4448344      1  0.6094104
V5  -0.5755829      1  0.6400504
V6   0.8504122      1  0.2993464
V7   0.4680712      1  0.3850729
V8  -1.3505109      1  0.7942131
V9   0.6347879      1  0.3464257
V10 -0.3801402      1  0.5939069

A goodness-of-fit measure, ‘GoF’, is next. A significant p-value (such as 0.050 or less) is taken to suggest that the model is not a good fit to the data. BE PATIENT with this step as a FEW MINUTES might be required for the computation cycle to conclude.

(Note: lines with ## at the front are not processed by R, they’re ignored. Computing GoF can be computationally demanding, possibly taking more than a minute or two to finish – for this reason GoF will sometimes be deactivated by the use of ## at the start of some of the lines below.)

##GoF.rasch(fit1,B=199)
GoF.rasch(fit1)


Bootstrap Goodness-of-Fit using Pearson chi-squared

Call:
rasch(data = scored, constraint = cbind(length(scored) + 1, 1))

Tobs: 1153.75 
# data-sets: 50 
p-value: 0.02

Following the suggestions in the Smyth and Johnson paper referenced above, we can now see what might happen if the constraint on the discrimination parameter in the Rasch model is relaxed (above it was constrained to a value of 1; now it’ll be free to assume any value – the value appears as ‘Dscrmn’ below). This will be referred to here as ‘Model2’.

fit2 <- rasch(scored)
summary(fit2)


Call:
rasch(data = scored)

Model Summary:
   log.Lik      AIC     BIC
 -6250.658 12523.32 12577.3

Coefficients:
             value std.err   z.vals
Dffclt.V1   0.6708  0.0826   8.1174
Dffclt.V2  -0.1625  0.0778  -2.0886
Dffclt.V3  -0.2469  0.0782  -3.1578
Dffclt.V4  -0.4515  0.0798  -5.6582
Dffclt.V5  -0.5837  0.0813  -7.1766
Dffclt.V6   0.8632  0.0859  10.0503
Dffclt.V7   0.4749  0.0801   5.9280
Dffclt.V8  -1.3707  0.0978 -14.0219
Dffclt.V9   0.6447  0.0823   7.8385
Dffclt.V10 -0.3869  0.0792  -4.8854
Dscrmn      0.9818  0.0391  25.1168

Integration:
method: Gauss-Hermite
quadrature points: 21 

Optimization:
Convergence: 0 
max(|grad|): 0.25 
quasi-Newton: BFGS

Is Model2 a better fit to the Rasch model? The ‘anova’ function in the ltm package may be used to test the hypothesis that there’s no significant difference between models. If the p-value below is less than 0.050, the hypothesis is rejected, and the second model might be regarded as providing a better fit.

A reference for these fit statistics is this Wikipedia page.

anova(fit1,fit2)


 Likelihood Ratio Table
          AIC      BIC  log.Lik  LRT df p.value
fit1 12521.54 12570.62 -6250.77                
fit2 12523.32 12577.30 -6250.66 0.22  1   0.636

Okay, now let’s try a third model by allowing items to have different discrimination values, we’ll refer to it here as ‘Model3’. This model is called the ‘2PL’ model in the literature, the ‘two-parameter logistic model’. The table of coefficients (below) has the difficulty estimate (‘Dffclt’) for each item, followed by the discrimination estimate (‘Dscrmn’) for each item.

fit3 <-ltm(scored ~ z1)
summary(fit3)


Call:
ltm(formula = scored ~ z1)

Model Summary:
   log.Lik      AIC      BIC
 -6215.946 12471.89 12570.05

Coefficients:
             value std.err  z.vals
Dffclt.V1   0.5789  0.0799  7.2468
Dffclt.V2  -0.1287  0.0613 -2.0997
Dffclt.V3  -0.2865  0.0948 -3.0218
Dffclt.V4  -0.7775  0.1829 -4.2499
Dffclt.V5  -0.4761  0.0711 -6.6912
Dffclt.V6   1.2294  0.1992  6.1712
Dffclt.V7   0.4624  0.0852  5.4251
Dffclt.V8  -1.4604  0.1740 -8.3928
Dffclt.V9   0.5445  0.0768  7.0883
Dffclt.V10 -0.3694  0.0801 -4.6098
Dscrmn.V1   1.2236  0.1342  9.1211
Dscrmn.V2   1.4469  0.1535  9.4286
Dscrmn.V3   0.8014  0.1022  7.8450
Dscrmn.V4   0.5015  0.0885  5.6682
Dscrmn.V5   1.3518  0.1468  9.2061
Dscrmn.V6   0.6261  0.0965  6.4858
Dscrmn.V7   1.0197  0.1168  8.7311
Dscrmn.V8   0.9004  0.1201  7.4995
Dscrmn.V9   1.2672  0.1383  9.1608
Dscrmn.V10  1.0468  0.1182  8.8531

Integration:
method: Gauss-Hermite
quadrature points: 21 

Optimization:
Convergence: 0 
max(|grad|): 1.3e-06 
quasi-Newton: BFGS

Is Model3, 2PL, better than Model2, the unconstrained Rasch model? That is, does Model3 appear to fit the data better than Model2? If the p-value below is significant, then yes, it would seem so.

anova(fit2, fit3)


 Likelihood Ratio Table
          AIC      BIC  log.Lik   LRT df p.value
fit2 12523.32 12577.30 -6250.66                 
fit3 12471.89 12570.05 -6215.95 69.42  9  <0.001

The Rasch and 2PL models might be seen as incomplete as they do not reflect the possible effects of guessing. The ‘3PL’ IRT model is used to estimate item difficultly, item discrimination, and an item guessing parameter.

We’ll get 3PL results next, and, since this is the fourth ‘model’ used in this report, we’ll refer to it as ‘Model4’ in this report (the first model was the constained Rasch model, the second was the unconstrained Rasch model, the third was the 2PL model, and now we’ll have 3PL, which we’re referring to here as ‘Model4’). In the table below, estimates of item guessing parameters are labelled as ‘Gussng’.

fit4 <-tpm(scored, type="latent.trait", max.guessing=1)

Warning in tpm(scored, type = "latent.trait", max.guessing = 1): Hessian matrix at convergence is not positive definite; unstable solution.

summary(fit4)


Call:
tpm(data = scored, type = "latent.trait", max.guessing = 1)

Model Summary:
   log.Lik      AIC      BIC
 -6212.672 12485.34 12632.58

Coefficients:
             value std.err  z.vals
Gussng.V1   0.0117     NaN     NaN
Gussng.V2   0.0001  0.0038  0.0262
Gussng.V3   0.0171  0.1746  0.0981
Gussng.V4   0.0060  0.0786  0.0767
Gussng.V5   0.0037  0.0558  0.0661
Gussng.V6   0.0815  0.1379  0.5908
Gussng.V7   0.0028  0.0371  0.0750
Gussng.V8   0.4240  0.1814  2.3372
Gussng.V9   0.1348  0.0451  2.9920
Gussng.V10  0.0006  0.0150  0.0421
Dffclt.V1   0.6009     NaN     NaN
Dffclt.V2  -0.1245  0.0615 -2.0236
Dffclt.V3  -0.2374  0.5028 -0.4722
Dffclt.V4  -0.7677  0.3755 -2.0444
Dffclt.V5  -0.4636  0.1265 -3.6658
Dffclt.V6   1.4556  0.3598  4.0460
Dffclt.V7   0.4682  0.1200  3.9031
Dffclt.V8  -0.3010  0.5961 -0.5049
Dffclt.V9   0.7625  0.0963  7.9217
Dffclt.V10 -0.3706  0.0884 -4.1940
Dscrmn.V1   1.2734     NaN     NaN
Dscrmn.V2   1.4567  0.1534  9.4984
Dscrmn.V3   0.8150  0.1823  4.4698
Dscrmn.V4   0.4935  0.0928  5.3179
Dscrmn.V5   1.3706  0.1686  8.1282
Dscrmn.V6   0.7651  0.3599  2.1256
Dscrmn.V7   1.0337  0.1395  7.4098
Dscrmn.V8   1.3862  0.5300  2.6154
Dscrmn.V9   2.2590  0.7031  3.2129
Dscrmn.V10  1.0276  0.1180  8.7046

Integration:
method: Gauss-Hermite
quadrature points: 21 

Optimization:
Optimizer: optim (BFGS)
Convergence: 0 
max(|grad|): 0.16

Let’s now see if the data fit the ‘3PL’ IRT model better than the ‘2PL’ IRT model. If the p-value below is significant (less than 0.050), we will have evidence to suggest that it does.

anova(fit3, fit4)


 Likelihood Ratio Table
          AIC      BIC  log.Lik  LRT df p.value
fit3 12471.89 12570.05 -6215.95                
fit4 12485.34 12632.58 -6212.67 6.55 10   0.767

Graphics are always nice to have. Here are some ‘ICCs’, item characteristic curves, for Model1, the Rasch model with discrimination fixed at a value of 1.

plot(fit1,
legend = TRUE,
     cx = "bottomright",
     lwd = 3,
     cex.main = 1.5,
     cex.lab = 1.3,
cex = 0.8)

Next, ICCs for the 2PL model:

plot(fit3,
legend = TRUE,
     cx = "bottomright",
     lwd = 3,
     cex.main = 1.5,
     cex.lab = 1.3,
cex = 0.8)

Next: ICCs for the 3PL model:

plot(fit4,
legend = TRUE,
     cx = "bottomright",
     lwd = 3,
     cex.main = 1.5,
     cex.lab = 1.3,
cex = 0.8)

Item information curves are shown below. They’re used to indicate where each item best measures the latent trait. The curves are based on results from the 2PL model, ‘fit3’ in this case.

plot(fit3,
legend = TRUE,
     cx = "topright",
     cex = 0.8,
     type = "IIC",
     annot = FALSE,
     lwd = 3,
     cex.main = 1.5,
     cex.lab = 1.3)

Last but not least we come to a graph of the ‘test information function’. It’s used as an indicator of where the items, taken as a whole (that is, as a ‘test’), provide the most precise information on the latent trait (‘ability’). In this example, ‘fit3’ has been employed, the 2PL IRT model.

plot(fit3, type="IIC",
     items = 0,
     lwd = 3,
     cex.main = 1.5,
     cex.lab = 1.3)

Module ‘UWO-1’: dichotomous IRT using University of Western Ontario R code

That’s all there is for now, this is THE END.