This document serves an an exercise and illustrates a CAT simulation.
CAT is based on IRT and it requires a prior IRT model parameter
estimation. A purpose of CAT is to provide short questionnaires to
participants tailored to their abilities. In CAT, questions for
participants are chosen from an item pool in real time. For example, if
a participant answers a difficult question correctly, it seems that
there is no need to show him a much easier question to answer, on the
contrary, it would make more sense to provide a more difficult question
to pinpoint his ability estimate. In that sense, in CAT, participants do
not have to respond to the entire questionnaire. In this simulation we
will use an item pool of 200 questions in a CAT where the maximum number
of questions asked is 40.
For the purposes of this simulation we will use catR for the simulation,
and foreach and doSNOW for parallel computation.
library(catR)
library(foreach)
library(doSNOW)
options(width=160)
For this simulation we will use a 2PL model, that is a model with 2
parameters for each item, a discrimination parameter, which indicates
how good a question discriminates abilities, and a difficulty parameter
which indicates how difficult a question is. catR provides the function
genDichoMatrix, this function generates parameters for binary items.
These items have 2 possible values: question answered correctly (value
1) and question answered incorrectly (value 0). That type of responses
could be provided by multiple choice questions or whatever. All we need
to know is if a participant answered a question correctly or not and
represent this with a 0 or 1.
The genDichoMatrix will generate 200 items of a 2PL model with parameter
a modeled after a prior distribution with a mean 2 and sd .1. This will
hopefully provide positive parameters for a. For parameter b we need
both positive and negative difficulty and thus the simulated
distribution has a mean of 0 and a relatively large sd. A very low
negative difficulty indicates an easy to answer item which can be
answered correctly by a participant with low ability. An item with high
difficulty is interpreted using the same criteria. In addition, a high a
parameter indicates that the item is able to discriminate between
abilities, thus in a sense this value indicates how confident we can be
that this item measures the difficulty it is supposed to measure without
variability.
The genDichoMatrix also provides a c and a d parameter which stand for
guessing or inattentiveness, however because their values are always 0
and 1, they have no effect in the ability estimation formula. As a
result catR may use a 4PL formula internally, however item parameters
for c and d, will make the relevant formulas equivalent of a 2PL
model.
The resulting item bank from the genDichoMatrix, provides relatively
good discrimination for items that measure a relatively wide ability
levels.
itemBank<-genDichoMatrix(items=200,model="2PL",aPrior=c("norm",2,.1),bPrior=c("norm",0,4))
head(itemBank)
## a b c d ## 1 2.040940 -2.5058152 0 1 ## 2 2.168887 0.7345733 0 1 ## 3 2.158659 -3.3425144 0 1 ## 4 1.966909 6.3811232 0 1 ## 5 1.771476 1.3180311 0 1 ## 6 2.249766 -3.2818735 0 1
summary(itemBank)
## a b c d ## Min. :1.711 Min. :-8.8588 Min. :0 Min. :1 ## 1st Qu.:1.950 1st Qu.:-2.4553 1st Qu.:0 1st Qu.:1 ## Median :2.000 Median :-0.1975 Median :0 Median :1 ## Mean :2.004 Mean : 0.1422 Mean :0 Mean :1 ## 3rd Qu.:2.073 3rd Qu.: 2.4520 3rd Qu.:0 3rd Qu.:1 ## Max. :2.265 Max. : 9.6065 Max. :0 Max. :1
In addition, catR provides the genPattern function witch allows to generate item responses or participant responses for that matter, given an itemBank provided by genDichoMatrix or other means. For illustration purposes the code generates descriptive statistics for the first 9 columns of the generated data.
data<-genPattern(th=seq(-10,10,length=1000),it=itemBank,model=NULL,seed=1)
summary(data[,1:9])
## V1 V2 V3 V4 V5 V6 V7 V8 V9 ## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 ## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 ## Median :1.000 Median :0.000 Median :1.000 Median :0.000 Median :0.000 Median :1.000 Median :0.000 Median :0.000 Median :0.000 ## Mean :0.629 Mean :0.462 Mean :0.674 Mean :0.182 Mean :0.434 Mean :0.663 Mean :0.405 Mean :0.361 Mean :0.379 ## 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:0.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 ## Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.000
Assuming that the generated data are participant responses with each row representing a single participant, we can use a third package, the mirt package, to estimate item parameters, the overall model and participant scores, assuming participants answered all questions.
mirt_model<-mirt::mirt(data=data.frame(data),model=1,itemtype="2PL",technical=list(theta_lim=c(-6,6),NCYCLES=10000))
Here we will simulate a CAT scenario where participants fill in a CAT
questionnaire with a maximum of 40 selected questions out of a 200
questionnaire item pool. * The start list defines the parameters for
first item selection. In this case the start list selects one item which
has a difficulty level as close as the ability level of 0. The
difficulty and the ability level in this case are the same. This
selection method is signified by the “bOpt”.
* The test list defines how to choose the next question, provided the
first or more questions are already answered, and theta scores, and
other values -when necessary- are already estimated. The test list uses
the Bayesian modal estimation (Birnbaum, 1969) in order to estimate
scores. The EAP method is usually the recommended method for theta
estimation of unidimensional IRT models, however, “BM” in this case
seems to provide highly correlated CAT estimated scores and mirt EAP
estimated scores. The test list here uses “bOpt” for the selection of
next item, however there is a randomization parameter allowing it to
randomly choose 5 items around the most efficient “bOpt” choice. In
addition, the range of theta estimation will be between -20, 20. This is
a rather extreme range and its purpose is to avoid any extreme scores
accumulating at the ends of the distributions.
* The final list defines the options for the final ability
estimation.
* The stop list defines the conditions where the test will stop. In this
case, the test stops when 40 items out of 200 are asked and when the
precision criterion falls bellow .3. The precision criterion is
estimated using the SE of the provisional ability estimate.
start<-list(theta=0,nrItems=1,startSelect="bOpt")
test<-list(method="BM",itemSelect="bOpt",randomesque=5,range=c(-20,20))
final<-list(method="BM",range=c(-20,20))
stop<-list(rule=c("length","precision"),thr=c(40,0.3))
cl<-makeCluster(parallel::detectCores())
registerDoSNOW(cl)
result<-foreach(participant=1:nrow(data),.combine=rbind,.packages=c("catR")) %dopar% {
rc_result<-randomCAT(itemBank=itemBank,model=NULL,responses=data[participant,],start=start,test=test,stop=stop,final=final)
data.frame(questions=length(rc_result$thetaProv),est_theta=rc_result$thFinal,se=rc_result$seFinal)
}
stopCluster(cl)
result<-data.frame(result,mirt_scores=as.numeric(mirt::fscores(mirt_model)))
A summary of the results is best provided graphically. Here we will summarize what each graph represents.
par(mfrow=c(3,2),mai=c(.5,.5,.2,.2))
plot(result$est_theta,result$mirt_scores,xlab="CAT Theta",ylab="mirt Theta",main=paste("Pearson r",round(cor(result$est_theta,result$mirt_scores),4)))
plot(result$questions,result$est_theta,xlab="Questions asked",ylab="CAT Theta")
plot(result$questions,result$se,xlab="Questions asked",ylab="SE")
plot(result$est_theta,result$se,xlab="CAT Theta",ylab="SE")
plot(table(round(result$est_theta,0)),xlab="CAT Theta",ylab="Frequency",main="")
plot(table(round(result$mirt_scores,0)),xlab="mirt Theta",ylab="Frequency",main="")
It seems that the distribution of scores is uniform. The function which generated responses, produced responses from all thetas in about the same proportion. This does not seem to be worrying since in a real life scenario we would expect thetas with near normal distribution where the most responses and ability levels would fall on the highest test information areas.
In additions to the simulation used above, catR also provides the simulateRespondents which can provide an interesting summary.
result_simulate_respondents<-simulateRespondents(thetas=seq(-6,6,.001),itemBank=itemBank,genSeed=NULL,start=start,test=test,stop=stop,final=final)
result_simulate_respondents
plot(result_simulate_respondents,cex.main=5,cex.lab=5,cex.axis=5)