Simulating Response Data With Partial Credit Model

The following example demonstrates how to simulate polytomous response data based on Partial Credit model (PCM; Masters, 1982). In the example, there are four response options (0, 1, 2, or 3) for each item. There are 10 items in total and the sample size is 10000 individuals.

For more details about the PCM simulation process, see https://www.rasch.org/rmt/rmt213a.htm.

Thresholds for the items

First, we need to simulate thresholds for the items. Although PCM does not require thresholds to follow the same order as resonse options, the following codes set the thresholds in ascending order. That is, the higher response options (e.g., 2 or 3) are more difficult to endorse than the lower response options (e.g., 0 and 1).

set.seed(666)
thresholds <- t(apply(matrix(runif(10*3, .3, 1), 10), 1, cumsum))
thresholds <- -(thresholds - rowMeans(thresholds))
thresholds <- thresholds + rnorm(10)
thresholds <- thresholds*-1
thresholds

##              [,1]       [,2]        [,3]
##  [1,] -0.81590916  0.0272160  1.01617285
##  [2,] -1.30625806 -0.9947927 -0.27385084
##  [3,] -0.69852826 -0.3315069 -0.00466587
##  [4,]  0.18309404  0.5826085  0.98165551
##  [5,] -1.37285258 -0.9250642 -0.06059434
##  [6,] -0.04871371  0.8191658  1.30584577
##  [7,]  0.85602437  1.1816074  1.51149884
##  [8,] -2.12778889 -1.2036427 -0.47512053
##  [9,] -0.34207341  0.2961921  0.98353596
## [10,] -0.74749822 -0.1208330  0.77661749

Step Parameters

Next, we calculate the step parameters based on the thresholds that we generated above. The difference is that the step parameters are centered around the mean threshold for each item. Therefore, they add up to zero.

difficulty = rowMeans(thresholds)
step1 = thresholds[,1] - difficulty
step2 = thresholds[,2] - difficulty
step3 = thresholds[,3] - difficulty
steps <- cbind(difficulty, step1, step2, step3)
steps

##        difficulty      step1         step2     step3
##  [1,]  0.07582656 -0.8917357 -0.0486105666 0.9403463
##  [2,] -0.85830054 -0.4479575 -0.1364921820 0.5844497
##  [3,] -0.34490035 -0.3536279  0.0133934353 0.3402345
##  [4,]  0.58245269 -0.3993586  0.0001558314 0.3992028
##  [5,] -0.78617038 -0.5866822 -0.1388938338 0.7255760
##  [6,]  0.69209929 -0.7408130  0.1270665124 0.6137465
##  [7,]  1.18304354 -0.3270192 -0.0014361328 0.3284553
##  [8,] -1.26885071 -0.8589382  0.0652080078 0.7937302
##  [9,]  0.31255155 -0.6546250 -0.0163594636 0.6709844
## [10,] -0.03057126 -0.7169270 -0.0902617887 0.8071887

Data Generation

In this stage, we will use the step parameters to simulate responses for a hypothetical sample of examinees. The sample size is 10000 and the latent trait (i.e., theta) is normally distributed with a mean of zero and standard deviation of 1.

# Generate theta from a normal distribution
theta <- rnorm(10000, mean=0, sd=1)
hist(theta, main="Distribution of Theta Values")

# Create an empty response matrix
responses <- matrix(NA, nrow = length(theta), ncol = 10)

for(k in 1:length(theta)) {
  for(i in 1:10) {
    measure=0
    p <- vector()
    p[1] <- 1
    
    for(j in 2:4) {
      measure <- measure + theta[k] - steps[i,1] - steps[i,j]
      p[j] <- p[(j-1)] + exp(measure)
    }
    
    U <- runif(1, 0, 1)
    U = U * p[4]
    
    for(j in 1:4) {
      if(U <= p[j]) {
        responses[k,i] <- (j-1)
        break
      }
    }
  }
}

# Save responses as a data frame and rename the columns
responses <- as.data.frame(responses)
colnames(responses) <- paste0("item_", 1:10)
head(responses)

##   item_1 item_2 item_3 item_4 item_5 item_6 item_7 item_8 item_9 item_10
## 1      0      0      0      0      1      0      0      0      1       1
## 2      0      0      0      1      2      0      1      2      1       0
## 3      0      0      0      0      1      0      0      0      0       0
## 4      2      0      0      1      3      0      0      1      2       0
## 5      1      2      1      0      1      0      0      2      0       1
## 6      3      2      3      3      3      1      1      3      1       3

We can the descriptive statistics for the simulated data using the skim function from the skimr package.

library(skimr)
skim(responses)

## Skim summary statistics
##  n obs: 10000 
##  n variables: 10 
## 
## Variable type: numeric 
##  variable missing complete     n mean   sd p0 p25 median p75 p100
##    item_1       0    10000 10000 1.46 1.05  0   1      1   2    3
##   item_10       0    10000 10000 1.54 1.06  0   1      2   2    3
##    item_2       0    10000 10000 2.12 1.01  0   1      2   3    3
##    item_3       0    10000 10000 1.77 1.14  0   1      2   3    3
##    item_4       0    10000 10000 1.1  1.11  0   0      1   2    3
##    item_5       0    10000 10000 2.05 1     0   1      2   3    3
##    item_6       0    10000 10000 1.03 1.02  0   0      1   2    3
##    item_7       0    10000 10000 0.71 0.96  0   0      0   1    3
##    item_8       0    10000 10000 2.28 0.89  0   2      3   3    3
##    item_9       0    10000 10000 1.31 1.09  0   0      1   2    3

Checking the Simulated Data

In the final step, we can estimate the item parameters for the simulated data and see if the thresholds are similar to what we used for data generation.

library(mirt)
model.pcm <- 'f = 1-10' 
results.pcm <- mirt(data=responses, model=model.pcm, itemtype="Rasch", SE=TRUE, verbose=FALSE)
coef.pcm <- coef(results.pcm, IRTpars=TRUE, simplify=TRUE)
items.pcm <- as.data.frame(coef.pcm$items)
items.pcm

##         a          b1          b2           b3
## item_1  1 -0.78207249 -0.01056343  1.015020560
## item_2  1 -1.29572722 -1.03397889 -0.324319484
## item_3  1 -0.72500511 -0.33882761 -0.009037995
## item_4  1  0.15422789  0.58951333  0.878493096
## item_5  1 -1.37634998 -0.97178628 -0.084653158
## item_6  1 -0.07134146  0.82446136  1.295413471
## item_7  1  0.82265021  1.10684987  1.538457217
## item_8  1 -2.12984462 -1.20997200 -0.487904167
## item_9  1 -0.36451970  0.27640577  0.922328204
## item_10 1 -0.82036647 -0.12556923  0.818317915

Similarly, we can compare the estimated theta values from PCM with the true theta values.

estimated.theta <- fscores(results.pcm, method = "EAP", full.scores = TRUE)
head(estimated.theta)

##              F1
## [1,] -1.7314710
## [2,] -1.0579713
## [3,] -2.1927591
## [4,] -0.7826180
## [5,] -0.9170791
## [6,]  0.9586421

cor(theta, estimated.theta)

##             F1
## [1,] 0.9278974

plot(theta, estimated.theta, xlab="True Theta", ylab="Estimated Theta")

Simulating Response Data With Partial Credit Model

Okan Bulut

March 28, 2018

Thresholds for the items

Step Parameters

Data Generation

Checking the Simulated Data