Introduction

Computerized Adaptive Testing (CAT) is a method of assessment that dynamically adjusts the difficulty of test items based on the test-taker’s ability level. It is based on Item Response Theory (IRT) and involves the following key features:

Adaptive Item Selection and Tailored Testing Experience: In CAT, each question is selected based on the test-taker’s previous responses. If a participant answers a question correctly, the next question will be more challenging, and if they answer incorrectly, the next question will be easier. This real-time adjustment means that questions are tailored to the participant’s current ability level, allowing the test to be shorter and more focused. Participants do not need to respond to the entire questionnaire, as the test adapts to provide the most appropriate questions to accurately assess their ability.
Efficiency: Since CAT focuses on providing questions that are neither too easy nor too difficult for the test-taker, it often results in shorter tests without compromising the accuracy of the assessment.
Item Pool and IRT: A large and diverse set of questions (item pool) is pre-calibrated based on their difficulty, discrimination, and guessing parameters using Item Response Theory (IRT). IRT models the probability of a correct response to a question based on the individual’s ability level and the properties of the item. In CAT, these calibrated items are used to dynamically select the most appropriate questions for each test-taker, ensuring that the difficulty level matches their ability. This relationship between CAT and IRT allows for more precise and individualized assessments by leveraging the statistical properties of the items in the pool.
Stopping Criteria: The test can end based on predefined criteria, such as a maximum number of questions, a minimum standard error of measurement, or a desired confidence level in the ability estimate.

Using catR and ggplot2

For the purposes of this simulation, we will use the catR package for the simulation and ggplot2 for visualizations. The catR package in R was developed by David Magis and provides tools for generating, administering, and scoring computerized adaptive tests (CAT). It allows users to:

Generate item parameters and item banks.
Simulate responses to items based on the test-taker’s ability.
Select items adaptively based on various criteria.
Estimate the test-taker’s ability using different methods.

# Clear the workspace
rm(list=ls())

# Install and load the catR package
# install.packages("catR")
library(catR)
packageVersion("catR")

## [1] '3.17'

The catR library version 3.17 in R facilitates Computerized Adaptive Testing (CAT) using Item Response Theory (IRT) models. It supports various item selection algorithms like Maximum Fisher Information (MFI) and provides ability estimation methods such as Maximum Likelihood Estimation (MLE), making it ideal for adaptive testing simulations and psychometric research.

Now… let’s simulate test data. The R code below generates a bank of 100 items for a 3-Parameter Logistic (3PL) model using random parameters, prints a confirmation message, and displays the first 5 rows of the item bank.

# Generate a bank of 100 items with random parameters in a 3PL model
set.seed(123)
item_bank <- genDichoMatrix(100, model = "3PL")
print("Item bank generated!")

## [1] "Item bank generated!"

head(item_bank, 5)  # Show only the first 5 rows

##           a          b          c d
## 1 0.8759267 -0.6264538 0.16471940 1
## 2 1.0084232  0.1836433 0.04626749 1
## 3 0.8178157 -0.8356286 0.23859453 1
## 4 1.0316058  1.5952808 0.22446212 1
## 5 0.8690831  0.3295078 0.23592426 1

This output shows the first 5 items from a 3-Parameter Logistic (3PL) model item bank. Each row represents an item, with columns for a (discrimination), b (difficulty), c (guessing), and d (a constant set to 1 for the 3PL model). The a parameter reflects how well the item discriminates between ability levels, b indicates the difficulty, and c shows the probability of guessing the correct answer.

Now, it’s time to set up for simulating a CAT with stopping criteria. The code below sets up the parameters and variables needed to simulate a computerized adaptive test. It starts by defining an initial ability estimate (init_theta) of 0. It establishes stopping criteria for the test: either after 20 items have been administered or if the standard error (se) falls below 0.3. It also simulates a test taker with a true ability level (true_theta) of 0.5. The code then initializes variables to track the current ability estimate (current_theta), stores responses and selected items in empty vectors, sets the initial standard error (se) to infinity, copies the item bank (item_bank_copy), and prepares vectors to record the history of ability estimates (theta_history) and standard errors (se_history).

# Initial theta estimate
init_theta <- 0

# Stopping rules: stop after 20 items or when the standard error is below 0.3
stop_criteria <- list(nmax = 20, se = 0.3)

# Simulate a test taker with a true ability level of 0.5
true_theta <- 0.5

# Initialize variables
current_theta <- init_theta
responses <- c()
selected_items <- c()
se <- Inf
item_bank_copy <- item_bank
theta_history <- c(current_theta)
se_history <- c()

And, finally, set up an empty data frame named results_df to store information about each test iteration, including the item selected, the response, estimated ability, and standard error. It ensures that text data is treated as text, not factors.

# Create a data frame to store results
results_df <- data.frame(
  Iteration = integer(),
  Selected_Item = integer(),
  Response = integer(),
  Estimated_Theta = numeric(),
  Standard_Error = numeric(),
  stringsAsFactors = FALSE)

Run the CAT process

Basically, we can now run a simulation of a computerized adaptive test. The code below starts by setting up an iteration counter and a loop that continues until either 20 items are selected or the standard error falls below 0.3. In each loop iteration, it selects the next test item based on the current ability estimate, simulates a response, updates the lists of responses and selected items, removes the chosen item from the pool, and recalculates the ability estimate and standard error. It then records all these details in a data frame and prints the results at the end.

Having said that, we can focus in the step-by-step explanation of the code:

Initialize Iteration: Sets the iteration counter to 0.

iteration <- 0

Start While Loop: Begins a loop that continues as long as fewer than 20 items are selected and the standard error is above 0.3. Increments the iteration counter by 1.

Select Next Item: Chooses the next item from the item bank based on the current ability estimate–1 using the Maximum Fisher Information (MFI) criterion, then prints the selected item.The Fisher Maximum Information (FMI) method selects test items that provide the most valuable information about a test-taker’s ability at their current level. It aims to choose items that maximize the precision of the ability estimate, making the test more efficient and accurate by focusing on questions that give the most insight into the test-taker’s performance.
Simulate Response: Simulates the test taker’s response to the selected item based on their true ability level, then prints the simulated response.
Update Responses and Selected Items: Adds the simulated response to the responses vector and the selected item to the selected_items vector.
Remove Selected Item: Removes the selected item from the item bank copy to prevent it from being selected again.
Estimate New Theta: Updates the ability estimate (current_theta) using the Expected A Posteriori (EAP) method based on selected items and responses, then prints the new estimate. The EAP method updates the test taker’s ability estimate (current_theta) using their responses and prior knowledge. It combines prior assumptions with observed data to calculate the most likely ability level, ensuring the test adapts accurately to the test taker’s performance.
Calculate Standard Error: Calculates the standard error of the ability estimate, updates the standard error history, and prints the number of selected items, current ability estimate, and standard error.
Record Results: Appends the current iteration’s results to the results_df data frame.

while (length(selected_items) < stop_criteria$nmax && se > stop_criteria$se) {
  iteration <- iteration + 1
  
  # Select the next item
  next_item <- nextItem(item_bank_copy, theta = current_theta, criterion = "MFI")
  print("Next item selected:")
  print(next_item)
  
  # Simulate the response
  response <- genPattern(true_theta, item_bank_copy[next_item$item, ])
  print("Response simulated:")
  print(response)
  
  # Update responses and selected items
  responses <- c(responses, response)
  selected_items <- c(selected_items, next_item$item)
  
  # Remove the selected item from the item bank
  item_bank_copy <- item_bank_copy[-next_item$item, ]
  
  # Estimate the new theta
  current_theta <- eapEst(item_bank[selected_items, ], responses)
  theta_history <- c(theta_history, current_theta)
  print("Theta estimated:")
  print(current_theta)
  
  # Calculate the standard error
  se <- semTheta(current_theta, item_bank[selected_items, ], responses)
  se_history <- c(se_history, se)
  print(paste("Selected items:", length(selected_items), "Current Theta:", current_theta, "SE:", se))
  
  # Record the results in the data frame
  results_df <- rbind(results_df, data.frame(
    Iteration = iteration,
    Selected_Item = next_item$item,
    Response = response,
    Estimated_Theta = current_theta,
    Standard_Error = se,
    stringsAsFactors = FALSE
  ))
}

Let’s display the results showing all recorded test iterations.

# Display the results
print(results_df)

##    Iteration Selected_Item Response Estimated_Theta Standard_Error
## 1          1            60        1      0.45686593      0.8474678
## 2          2            70        0      0.33088867      0.8269710
## 3          3            47        1      0.65201531      0.7390628
## 4          4            61        0      0.56126097      0.7176086
## 5          5            74        0      0.16670039      0.6911755
## 6          6            77        1      0.29773537      0.6596196
## 7          7            63        0      0.09951352      0.6173616
## 8          8            42        0     -0.17668781      0.5912996
## 9          9             6        1     -0.06191513      0.5607308
## 10        10             9        1      0.16156580      0.5373811
## 11        11            70        1      0.26183927      0.5350180
## 12        12            83        0      0.17079581      0.5269133
## 13        13            58        1      0.21816727      0.5195733
## 14        14             8        0      0.10299793      0.5083346
## 15        15            23        1      0.19076359      0.4966168
## 16        16            51        0      0.07183835      0.4887139
## 17        17            10        1      0.15038814      0.4718532
## 18        18             6        1      0.20383337      0.4597729
## 19        19            70        0      0.17648171      0.4579459
## 20        20            27        0      0.05921344      0.4481348

In Computerized Adaptive Testing (CAT), the test adjusts to the test taker’s ability level based on their responses. Here’s how it works using the results from our example:

Starting Point: The test begins with an item chosen to estimate the test taker’s ability. In our example, the first item selected was Item 60.
Item Selection and Ability Estimation: After each response, the system updates the estimate of the test taker’s ability. For instance:
- After answering Item 60 correctly (Response = 1 in the output), the test taker’s ability (Theta) was estimated at 0.4569.
- The next item, Item 70, was selected based on this estimate, and the ability was updated to 0.7271 after answering it correctly.
Updating Estimates: The system continually updates the test taker’s ability with each new response, selecting items that best determine their ability. For example:
- After Item 70, the next item, Item 47, was chosen, and the ability estimate was updated to 0.2013 after a wrong answer (Response = 0).
- This was followed by Item 75, and the ability estimate was adjusted to 0.3174 after a correct response.
Refinement: The test adapts by selecting questions that provide the most information about the test taker’s ability. For example:
- As more items were answered, such as Item 61 (wrong) and Item 63 (correct), the ability estimate was refined to 0.5881.
- The final items like Item 68 were selected based on updated ability estimates to provide a precise measure of the test taker’s ability.
Adaptive Nature: The test dynamically adjusts to the test taker’s ability, selecting questions that are appropriately challenging. The final estimates are based on the entire set of responses, aiming to gauge the test taker’s ability as accurately as possible.
Stopping Rule: The test continues as long as fewer than 20 items are selected and the standard error is above 0.3. Once the stopping conditions are met—either reaching 20 items or achieving a standard error below 0.3—the test concludes.

Summary: CAT uses each response to select the next item, adapting in real-time to provide a precise measure of the test taker’s ability by choosing questions that best match their performance level.

Visualization with ggplot2

Some of these conclusions can be addressed through visualizations using the ggplot2 library. One option is to plot the estimated theta against the number test items.

# Load the ggplot2 library
library(ggplot2)

# Plot the Estimated Theta over the Test Items
df_theta <- data.frame(
  Item = 0:length(selected_items),
  Theta = theta_history
)
ggplot(df_theta, aes(x = Item, y = Theta)) +
  geom_line() +
  geom_point() +
  geom_text(aes(label = round(Theta, 3)), vjust = -0.5, size = 2.5) +  # Add labels for Theta values
  labs(title = "Estimated Theta Over the Test Items",
       x = "Number of Items",
       y = "Estimated Theta")

As seen in the chart, the ability estimates are continuously refined with each item until the test reaches 20 items.

Now, if we analyze the standard error in relation to the number of items presented, we observe a clear downward trend, with the standard error decreasing to 0.443 by the end of the test.

# Plot the Standard Error over the Test Items
df_se <- data.frame(
  Item = 1:length(se_history),
  SE = se_history
)
ggplot(df_se, aes(x = Item, y = SE)) +
  geom_line() +
  geom_point() +
  geom_text(aes(label = round(SE, 3)), vjust = -0.5, size = 2.5) +  # Add labels for SE values
  labs(title = "Standard Error Over the Test Items",
       x = "Number of Items",
       y = "Standard Error")

Finally, this piece of code visualizes the Item Characteristic Curves (ICCs) to compare the behavior of different items using the results from our simulated example.

# Item Characteristic Curves (ICCs)
plotICC <- function(item_params) {
  theta_vals <- seq(-3, 3, by = 0.1)
  prob_correct <- apply(item_params, 1, function(params) {
    P <- params[3] + (1 - params[3]) / (1 + exp(-1.7 * (theta_vals - params[1])))
    P
  })
  
  matplot(theta_vals, prob_correct, type = "l", col = 1:nrow(item_params),
          lty = 1, xlab = "Theta", ylab = "Probability of Correct Response",
          main = "Item Characteristic Curves")
  legend("bottomright", legend = paste("Item", 1:nrow(item_params)), col = 1:nrow(item_params), lty = 1)
}

# Plot ICCs for the first 5 selected items
if (length(selected_items) > 0) {
  plotICC(item_bank[selected_items[1:min(5, length(selected_items))], ])
}

References

Magis, D. (n.d.). catR: Computerized adaptive testing using item response theory [R package]. Retrieved from https://CRAN.R-project.org/package=catR

Magis, D., & Raîche, G. (2012). Random generation of response patterns under computerized adaptive testing with the R package catR. Journal of Statistical Software, 48, 1-31.

Computerized Adaptive Testing (CAT): a Tutorial using R

Dr. Federico Ferrero

August, 2024

Introduction

Using catR and ggplot2

Run the CAT process

Visualization with ggplot2

References