Computerized Adaptive Testing (CAT) is a method of assessment that dynamically adjusts the difficulty of test items based on the test-taker’s ability level. It is based on Item Response Theory (IRT) and involves the following key features:
Adaptive Item Selection and Tailored Testing Experience: In CAT, each question is selected based on the test-taker’s previous responses. If a participant answers a question correctly, the next question will be more challenging, and if they answer incorrectly, the next question will be easier. This real-time adjustment means that questions are tailored to the participant’s current ability level, allowing the test to be shorter and more focused. Participants do not need to respond to the entire questionnaire, as the test adapts to provide the most appropriate questions to accurately assess their ability.
Efficiency: Since CAT focuses on providing questions that are neither too easy nor too difficult for the test-taker, it often results in shorter tests without compromising the accuracy of the assessment.
Item Pool and IRT: A large and diverse set of questions (item pool) is pre-calibrated based on their difficulty, discrimination, and guessing parameters using Item Response Theory (IRT). IRT models the probability of a correct response to a question based on the individual’s ability level and the properties of the item. In CAT, these calibrated items are used to dynamically select the most appropriate questions for each test-taker, ensuring that the difficulty level matches their ability. This relationship between CAT and IRT allows for more precise and individualized assessments by leveraging the statistical properties of the items in the pool.
Stopping Criteria: The test can end based on predefined criteria, such as a maximum number of questions, a minimum standard error of measurement, or a desired confidence level in the ability estimate.
For the purposes of this simulation, we will use the catR package for the simulation and ggplot2 for visualizations. The catR package in R was developed by David Magis and provides tools for generating, administering, and scoring computerized adaptive tests (CAT). It allows users to:
# Clear the workspace
rm(list=ls())
# Install and load the catR package
# install.packages("catR")
library(catR)
packageVersion("catR")
## [1] '3.17'
The catR library version 3.17 in R facilitates Computerized Adaptive Testing (CAT) using Item Response Theory (IRT) models. It supports various item selection algorithms like Maximum Fisher Information (MFI) and provides ability estimation methods such as Maximum Likelihood Estimation (MLE), making it ideal for adaptive testing simulations and psychometric research.
Now… let’s simulate test data. The R code below generates a bank of 100 items for a 3-Parameter Logistic (3PL) model using random parameters, prints a confirmation message, and displays the first 5 rows of the item bank.
# Generate a bank of 100 items with random parameters in a 3PL model
set.seed(123)
item_bank <- genDichoMatrix(100, model = "3PL")
print("Item bank generated!")
## [1] "Item bank generated!"
head(item_bank, 5) # Show only the first 5 rows
## a b c d
## 1 0.8759267 -0.6264538 0.16471940 1
## 2 1.0084232 0.1836433 0.04626749 1
## 3 0.8178157 -0.8356286 0.23859453 1
## 4 1.0316058 1.5952808 0.22446212 1
## 5 0.8690831 0.3295078 0.23592426 1
This output shows the first 5 items from a 3-Parameter Logistic (3PL) model item bank. Each row represents an item, with columns for a (discrimination), b (difficulty), c (guessing), and d (a constant set to 1 for the 3PL model). The a parameter reflects how well the item discriminates between ability levels, b indicates the difficulty, and c shows the probability of guessing the correct answer.
Now, it’s time to set up for simulating a CAT with stopping criteria. The code below sets up the parameters and variables needed to simulate a computerized adaptive test. It starts by defining an initial ability estimate (init_theta) of 0. It establishes stopping criteria for the test: either after 20 items have been administered or if the standard error (se) falls below 0.3. It also simulates a test taker with a true ability level (true_theta) of 0.5. The code then initializes variables to track the current ability estimate (current_theta), stores responses and selected items in empty vectors, sets the initial standard error (se) to infinity, copies the item bank (item_bank_copy), and prepares vectors to record the history of ability estimates (theta_history) and standard errors (se_history).
# Initial theta estimate
init_theta <- 0
# Stopping rules: stop after 20 items or when the standard error is below 0.3
stop_criteria <- list(nmax = 20, se = 0.3)
# Simulate a test taker with a true ability level of 0.5
true_theta <- 0.5
# Initialize variables
current_theta <- init_theta
responses <- c()
selected_items <- c()
se <- Inf
item_bank_copy <- item_bank
theta_history <- c(current_theta)
se_history <- c()
And, finally, set up an empty data frame named results_df to store information about each test iteration, including the item selected, the response, estimated ability, and standard error. It ensures that text data is treated as text, not factors.
# Create a data frame to store results
results_df <- data.frame(
Iteration = integer(),
Selected_Item = integer(),
Response = integer(),
Estimated_Theta = numeric(),
Standard_Error = numeric(),
stringsAsFactors = FALSE)
Basically, we can now run a simulation of a computerized adaptive test. The code below starts by setting up an iteration counter and a loop that continues until either 20 items are selected or the standard error falls below 0.3. In each loop iteration, it selects the next test item based on the current ability estimate, simulates a response, updates the lists of responses and selected items, removes the chosen item from the pool, and recalculates the ability estimate and standard error. It then records all these details in a data frame and prints the results at the end.
Having said that, we can focus in the step-by-step explanation of the code:
Initialize Iteration: Sets the iteration counter to 0.
iteration <- 0
Start While Loop: Begins a loop that continues as long as fewer than 20 items are selected and the standard error is above 0.3. Increments the iteration counter by 1.
while (length(selected_items) < stop_criteria$nmax && se > stop_criteria$se) {
iteration <- iteration + 1
# Select the next item
next_item <- nextItem(item_bank_copy, theta = current_theta, criterion = "MFI")
print("Next item selected:")
print(next_item)
# Simulate the response
response <- genPattern(true_theta, item_bank_copy[next_item$item, ])
print("Response simulated:")
print(response)
# Update responses and selected items
responses <- c(responses, response)
selected_items <- c(selected_items, next_item$item)
# Remove the selected item from the item bank
item_bank_copy <- item_bank_copy[-next_item$item, ]
# Estimate the new theta
current_theta <- eapEst(item_bank[selected_items, ], responses)
theta_history <- c(theta_history, current_theta)
print("Theta estimated:")
print(current_theta)
# Calculate the standard error
se <- semTheta(current_theta, item_bank[selected_items, ], responses)
se_history <- c(se_history, se)
print(paste("Selected items:", length(selected_items), "Current Theta:", current_theta, "SE:", se))
# Record the results in the data frame
results_df <- rbind(results_df, data.frame(
Iteration = iteration,
Selected_Item = next_item$item,
Response = response,
Estimated_Theta = current_theta,
Standard_Error = se,
stringsAsFactors = FALSE
))
}
Let’s display the results showing all recorded test iterations.
# Display the results
print(results_df)
## Iteration Selected_Item Response Estimated_Theta Standard_Error
## 1 1 60 1 0.45686593 0.8474678
## 2 2 70 0 0.33088867 0.8269710
## 3 3 47 1 0.65201531 0.7390628
## 4 4 61 0 0.56126097 0.7176086
## 5 5 74 0 0.16670039 0.6911755
## 6 6 77 1 0.29773537 0.6596196
## 7 7 63 0 0.09951352 0.6173616
## 8 8 42 0 -0.17668781 0.5912996
## 9 9 6 1 -0.06191513 0.5607308
## 10 10 9 1 0.16156580 0.5373811
## 11 11 70 1 0.26183927 0.5350180
## 12 12 83 0 0.17079581 0.5269133
## 13 13 58 1 0.21816727 0.5195733
## 14 14 8 0 0.10299793 0.5083346
## 15 15 23 1 0.19076359 0.4966168
## 16 16 51 0 0.07183835 0.4887139
## 17 17 10 1 0.15038814 0.4718532
## 18 18 6 1 0.20383337 0.4597729
## 19 19 70 0 0.17648171 0.4579459
## 20 20 27 0 0.05921344 0.4481348
In Computerized Adaptive Testing (CAT), the test adjusts to the test taker’s ability level based on their responses. Here’s how it works using the results from our example:
Starting Point: The test begins with an item chosen to estimate the test taker’s ability. In our example, the first item selected was Item 60.
Item Selection and Ability Estimation: After each response, the system updates the estimate of the test taker’s ability. For instance:
Updating Estimates: The system continually updates the test taker’s ability with each new response, selecting items that best determine their ability. For example:
Refinement: The test adapts by selecting questions that provide the most information about the test taker’s ability. For example:
Adaptive Nature: The test dynamically adjusts to the test taker’s ability, selecting questions that are appropriately challenging. The final estimates are based on the entire set of responses, aiming to gauge the test taker’s ability as accurately as possible.
Stopping Rule: The test continues as long as fewer than 20 items are selected and the standard error is above 0.3. Once the stopping conditions are met—either reaching 20 items or achieving a standard error below 0.3—the test concludes.
Summary: CAT uses each response to select the next item, adapting in real-time to provide a precise measure of the test taker’s ability by choosing questions that best match their performance level.
Some of these conclusions can be addressed through visualizations using the ggplot2 library. One option is to plot the estimated theta against the number test items.
# Load the ggplot2 library
library(ggplot2)
# Plot the Estimated Theta over the Test Items
df_theta <- data.frame(
Item = 0:length(selected_items),
Theta = theta_history
)
ggplot(df_theta, aes(x = Item, y = Theta)) +
geom_line() +
geom_point() +
geom_text(aes(label = round(Theta, 3)), vjust = -0.5, size = 2.5) + # Add labels for Theta values
labs(title = "Estimated Theta Over the Test Items",
x = "Number of Items",
y = "Estimated Theta")
As seen in the chart, the ability estimates are continuously refined with each item until the test reaches 20 items.
Now, if we analyze the standard error in relation to the number of items presented, we observe a clear downward trend, with the standard error decreasing to 0.443 by the end of the test.
# Plot the Standard Error over the Test Items
df_se <- data.frame(
Item = 1:length(se_history),
SE = se_history
)
ggplot(df_se, aes(x = Item, y = SE)) +
geom_line() +
geom_point() +
geom_text(aes(label = round(SE, 3)), vjust = -0.5, size = 2.5) + # Add labels for SE values
labs(title = "Standard Error Over the Test Items",
x = "Number of Items",
y = "Standard Error")
Finally, this piece of code visualizes the Item Characteristic Curves (ICCs) to compare the behavior of different items using the results from our simulated example.
# Item Characteristic Curves (ICCs)
plotICC <- function(item_params) {
theta_vals <- seq(-3, 3, by = 0.1)
prob_correct <- apply(item_params, 1, function(params) {
P <- params[3] + (1 - params[3]) / (1 + exp(-1.7 * (theta_vals - params[1])))
P
})
matplot(theta_vals, prob_correct, type = "l", col = 1:nrow(item_params),
lty = 1, xlab = "Theta", ylab = "Probability of Correct Response",
main = "Item Characteristic Curves")
legend("bottomright", legend = paste("Item", 1:nrow(item_params)), col = 1:nrow(item_params), lty = 1)
}
# Plot ICCs for the first 5 selected items
if (length(selected_items) > 0) {
plotICC(item_bank[selected_items[1:min(5, length(selected_items))], ])
}
Magis, D. (n.d.). catR: Computerized adaptive testing using item response theory [R package]. Retrieved from https://CRAN.R-project.org/package=catR
Magis, D., & Raîche, G. (2012). Random generation of response patterns under computerized adaptive testing with the R package catR. Journal of Statistical Software, 48, 1-31.