2PL Item Characteristic Curves (ICC)

The 2PL or 2-parameter logistic model has a unidimensional measure (latent variable) and two parameters: person (a person’s ability) and item (how good the item is at figuring a person out).

Binary IRT (2PL) Model

Interpretation of standardized coefficients (standard deviations):

Difficulty = theta = b = ability. 0 is right in the middle indicating average difficulty. Here, we see that Q3 is of average difficulty, whereas Q1 and Q5 are very easy: people who get them wrong are located 3 standard deviations below the mean.

Discrimination = a = how good the item is at figuring a person out, i.e. how discriminable that item is at that location. Ideally, closer to or above 1. Here, you can see that items 1 and 3 have the highest discriminating power.

           Dffclt    Dscrmn
Item 1 -3.3597341 0.8253715
Item 2 -1.3696497 0.7229499
Item 3 -0.2798983 0.8904748
Item 4 -1.8659189 0.6885502
Item 5 -3.1235725 0.6574516

Test Information Function (TIF)

This is the distribution of the ability scores given what the test is measuring. In this particular case, the test is giving the most information about people who are performing 2 standard deviations below the mean (peak of the curve). In practical terms: Do you really want to know about people who are not very good lawyers?

Factor Scores

Factor scores show a detailed response pattern for every participant and their performance, and lets you compare them and draw interesting conclusions. For instance, should you want to decide whether or not to give scholarships, then you would want to discriminate the most about who is at the top of the list of able people. Factor scores let you pick those who are at the higher end of the spectrum (2 standard deviations above the mean). In this LSAT example, we can see that two participants aced only 2 questions out of 5, but their ability score differs because of item difficulty. While Subject 4 had 2 items right, his z1 (latent trait) was -1.041 (Obs: how many people had that pattern; in this case, 11), Subject 6 also had 2 items right, but her z1 was -0.911 (and only this participant had that pattern). This means that although both participants got 2 items right, one of them gave the right answer to harder questions, hence obtaining a higher overall score.


Call:
ltm(formula = LSAT ~ z1, IRT.param = TRUE)

Scoring Method: Empirical Bayes

Factor-Scores for observed response patterns:
   Item 1 Item 2 Item 3 Item 4 Item 5 Obs     Exp     z1 se.z1
1       0      0      0      0      0   3   2.277 -1.895 0.795
2       0      0      0      0      1   6   5.861 -1.479 0.796
3       0      0      0      1      0   2   2.596 -1.460 0.796
4       0      0      0      1      1  11   8.942 -1.041 0.800
5       0      0      1      0      0   1   0.696 -1.331 0.797
6       0      0      1      0      1   1   2.614 -0.911 0.802
7       0      0      1      1      0   3   1.179 -0.891 0.803
8       0      0      1      1      1   4   5.955 -0.463 0.812
9       0      1      0      0      0   1   1.840 -1.438 0.796
10      0      1      0      0      1   8   6.431 -1.019 0.801
11      0      1      0      1      1  16  13.577 -0.573 0.809
12      0      1      1      0      1   3   4.370 -0.441 0.813
13      0      1      1      1      0   2   2.000 -0.420 0.813
14      0      1      1      1      1  15  13.920  0.023 0.828
15      1      0      0      0      0  10   9.480 -1.373 0.797
16      1      0      0      0      1  29  34.616 -0.953 0.802
17      1      0      0      1      0  14  15.590 -0.933 0.802
18      1      0      0      1      1  81  76.562 -0.506 0.811
19      1      0      1      0      0   3   4.659 -0.803 0.804
20      1      0      1      0      1  28  24.989 -0.373 0.815
21      1      0      1      1      0  15  11.463 -0.352 0.815
22      1      0      1      1      1  80  83.541  0.093 0.831
23      1      1      0      0      0  16  11.254 -0.911 0.802
24      1      1      0      0      1  56  56.105 -0.483 0.812
25      1      1      0      1      0  21  25.646 -0.463 0.812
26      1      1      0      1      1 173 173.310 -0.022 0.827
27      1      1      1      0      0  11   8.445 -0.329 0.816
28      1      1      1      0      1  61  62.520  0.117 0.832
29      1      1      1      1      0  28  29.127  0.139 0.833
30      1      1      1      1      1 298 296.693  0.606 0.855

Binary IRT (3PL)

In a 3PL or 3-parameter logistic, you would incorporate a guessing parameter to estimate how easy an item is to be guessed and see if you get better discrimination overall. Here, the guessing parameters are pretty low (< 0.08), so items are not very hard, but nonetheless they are not easy to guess.

           Gussng     Dffclt    Dscrmn
Item 1 0.03738668 -3.2964761 0.8286287
Item 2 0.07770994 -1.1451487 0.7603748
Item 3 0.01178206 -0.2490144 0.9015777
Item 4 0.03529306 -1.7657862 0.7006545
Item 5 0.05315665 -2.9902046 0.6657969

3PL ICC

The curves are pretty much the same, although they shifted up a tiny little bit because of the guessing parameter.

Model Comparison

To check whether a 2PL or a 3PL is best, you use likelihood ratios and choose the model with the lower AIC/BIC. In this case, the simpler model with two parameters (2PL) is better, i.e. adding the guessing parameter is not needed.


 Likelihood Ratio Table
                AIC     BIC  log.Lik   LRT df p.value
LSAT.model  4953.31 5002.38 -2466.65                 
LSAT.model2 4963.32 5036.94 -2466.66 -0.01  5       1