Matthew Sigal
December 5th, 2013
Computerized Adaptive Testing (CAT) is a powerful procedure that utilizes item response theory (IRT) models and a logic-based interface to obtain estimates of participant ability on a theoretical construct.
In this context, we can easily collapse items in this manner - e.g. for responses from a multiple choice scantron exam, see mirt::key2binary()
.
However, in the health outcomes context, items don’t dichotomize so easily (e.g. an item regarding pain intensity might ask if their pain is: none, mild, moderate, or severe); there is no real “right” answer, so we need a better model.
Models for polytomous items primarily differ in the definition of the probabilities being compared and the number of item parameters.
In each case, we are interested in the item information function (how an item relates to \( \theta \) at various levels of \( \theta \)) and how it contributes to the overall test information function, which is the sum of the item information functions a participant was exposed to.
The most general of these models is for nominal data, where no rank order is assumed for the response categories.
mirt
uses a different paramaterization: by specifying upper and lower anchors, we can evaluate the ordering of the categories in terms of their relation to the latent trait. If the data is actually ordinal, we would expect to see a steady increase over response categories.Simple example: Attitude to Science and Technology data from mirt
and ltm
nominal.mod <- mirt(Science, 1, 'nominal')
Factor loadings metric:
F1 h2
Comfort 0.509 0.259
Work 0.443 0.196
Future 0.770 0.593
Benefit 0.416 0.173
SS loadings: 1.221
Factor covariance:
F1
F1 1
Can think of responses in terms of patterns:
Method: EAP
Empirical Reliability:
F1
0.675
Comfort Work Future Benefit Freq F1 SE_F1
[1,] 1 1 1 1 2 -2.7684 0.5673
[2,] 1 3 2 1 1 -1.8257 0.5724
[3,] 1 4 2 3 1 -0.9887 0.5625
Or by participant:
Comfort Work Future Benefit F1
1 4 4 3 2 0.52753
2 3 3 3 3 -0.02208
3 3 2 2 3 -0.96610
Each item provides information, and at different levels of \( \theta \):
a1 ak0 ak1 ak2 ak3 d0 d1 d2 d3
par 1.007 0 1.54 1.999 3 0 3.636 5.902 4.53
We can see that the ak values steadily increase, which means that as we go up categories, they become more positively relatived to the latent trait being measured.
When all items are taken as a group, we have the overall test information function:
gpcm.mod <- mirt(Science, 1, 'gpcm')
Factor loadings metric:
F1 h2
Comfort 0.453 0.205
Work 0.443 0.196
Future 0.793 0.629
Benefit 0.391 0.153
SS loadings: 1.183
Factor covariance:
F1
F1 1
$Comfort
a b1 b2 b3
par 0.864 -3.274 -2.886 1.535
Factor loadings metric:
F1 h2
Comfort 0.507 0.257
Work 0.507 0.257
Future 0.507 0.257
Benefit 0.507 0.257
SS loadings: 1.026
Factor covariance:
F1
F1 1.003
$Comfort
a b1 b2 b3
par 1 -3.091 -2.596 1.389
$Work
a b1 b2 b3
par 1 -1.897 -0.911 1.859
Factor loadings metric:
F1 h2
Comfort 0.507 0.257
Work 0.507 0.257
Future 0.507 0.257
Benefit 0.507 0.257
SS loadings: 1.026
Factor covariance:
F1
F1 0.827
$Comfort
a1 d0 d1 d2 d3 c
par 1 0 3.704 5.031 3.669 0
$Work
a1 d0 d1 d2 d3 c
par 1 0 3.704 5.031 3.669 -2.201
Factor loadings metric:
F1 h2
Comfort 0.522 0.272
Work 0.584 0.342
Future 0.803 0.645
Benefit 0.541 0.293
SS loadings: 1.552
Factor covariance:
F1
F1 1
$Comfort
a b1 b2 b3
par 1.041 -4.67 -2.535 1.407
$Work
a b1 b2 b3
par 1.226 -2.385 -0.735 1.849
Steps:
Stopping Rules:
Valid Item Bank:
Item banks can be based upon established questionnaires, with caveats:
Goal: develop a practical/user-friendly system that does not cause undue response burden for the patient to assess mental health
Instrument: Mental Health Inventory, which measures 5 sub-domains:
Advanced CAT and Future Research: