Coomans et al. explore and compare latent variable models that seek to measure problem-solving ability more completely by including speed of solution finding in addition to respondents’ accuracy, which is typically the only measure used in Item Response Theory. Their paper evaluates new and commonly used psychometric models when using response-time by explicating how the underlying latent structure is represented in each. With the dramatic increase in availability of participant process data originating from increased use of technology during educational assessments, these methods also shed light into how other assessment process data might be modeled and used to better measure the desired underlying latent traits. The authors analyze item-pairs: two assessment items that seek to measure the same underlying trait. The authors explore these models using empirical data across a variety of cognitive tasks, provide justification for their selection of their recommended model, and supplementally evaluate the implications of using these models by demonstrating through error-analyses to highlight qualitative differences in erroneous responses based on response speed.
Key Analyses The most important reproduction analyses are those of tables 3 and 4, which are used to justify the conclusions of the paper. No additional analyses were initially planned for this paper, but additional analyses were run post-reproduction that enhance the model interpretability.
This study represents an explicit link between psychometrics and cognitive psychology, written by respected psychometricians. While written in 2017, its content is more relevant today than ever, as it compares methods of incorporating process data into understanding latent variables targeted by Item Response Theory (IRT). The study uses data from a variety of domains including arithmetic, language learning, game-playing problems, and chess to demonstrate the generalizability of the models. This reproduction offers the author the opportunity to learn some of these concepts from some of the best researchers in the field using empirical data from multiple domain sources.
Project repository (on Github): https://github.com/psych251/coomans2016
Original paper (as hosted in github): https://github.com/psych251/coomans2016/blob/main/original_paper/distinguishing_fast_slow_article.pdf
Please describe all the steps necessary to reproduce the key result(s) of this study.
No differences from the original paper were planned. The original data, definitions, and their formulas are used for reproduction. However, notes and code from an additional analysis completed post-reproduction are included in the discussion below.
Measure of success will be comparisons of the data found in their Table 3 and Table 4, since this is what is used to draw their conclusion.
Data preparation followed the analysis plan above and as described on pages 12 and 13 of the paper.
### Data Preparation
#### Load Relevant Libraries and Functions
library(rootSolve)
library(tidyr)
library(dplyr)
library(stringr)
library(ggplot2)
library(data.table)
setwd("~/Library/CloudStorage/OneDrive-Stanford/PSY251/Reproducibility/Fast_slow_processes/osfstorage-archive")
## if set to true, the threshold for whether a response is determined to be
## fast or slow is set at the item level as the mean time for that item
## for the non-chess datasets instead of 10 seconds. See discussion below
use_mean_fast_slow_threshold <- F
## If set to true, instances of retesting item pairs on a different day will be
## included in the dataset
include_retesters <- F
## Set up the final table structures for tables 3 and 4
assessment.list <- c("multiplication", "division", "addition", "subtraction", "set", "letterchaos", "chess")
model.list <- c("CIM", "CDM", "C2P&3I", "C3P&2I")
table3.assessment.list <- assessment.list[1:4]
table4.assessment.list <- assessment.list[5:7]
table3 <- data.frame(matrix(ncol = 5, nrow = 4, dimnames = list(NULL, c("domain", model.list))))
table4 <- data.frame(matrix(ncol = 5, nrow = 3, dimnames = list(NULL, c("domain", model.list))))
table3$domain <- table3.assessment.list
table4$domain <- table4.assessment.list
## Pass a contingency table of observed scores to each function and they
## will return the contingency table of model predicted scores
c2p3i.model <- function(dist.table) {
############################
# Estimation of 2P&3I Model
############################
## The variables correspond to the variables in table 7
## where A is the observed proportion contingency table
A <- dist.table / sum(dist.table)
phi1 <- A[1, 1] # phi1
phi2 <- A[1, 4] + A[4, 1] # phi2
phi4 <- A[2, 2] # phi4
phi5 <- A[2, 3] + A[3, 2] # phi5
phi6 <- A[3, 3] # phi6
phi7 <- A[1, 2] # phi7
phi8 <- A[1, 3] + A[4, 2] # phi8
phi9 <- A[4, 3] # phi9
phi10 <- A[2, 1] # phi10
phi11 <- A[2, 4] + A[3, 1] # phi11
phi12 <- A[3, 4] # phi12
## The three MLE equations in Eq. 16 for three roots
eq.16 <- function(x) {
c(
(A[3, 1] + A[4, 1]) / x[1] - (phi2) / (1 + x[1]) - (phi11) * x[2] / (x[1] * x[2] + x[3]),
(A[3, 1] + A[3, 2]) / x[2] - (phi5) / (1 + x[2]) - (phi11) * x[1] / (x[1] * x[2] + x[3]),
(A[2, 4] + A[4, 2]) / x[3] - (phi8) / (1 + x[3]) - (phi11) / (x[1] * x[2] + x[3])
)
}
## using two conditions to guide root finding for complex function surfaces
nonzero.roots <- 1 ## placeholder for condition if roots are not zero
roots.in.range <- 1 ## placeholder for condition of roots being out of range for distribution
iteration <- 0 ## counter to only try to find roots 300 times before quitting
while (((nonzero.roots > 0) | (roots.in.range > 0)) & (iteration < 300)) {
root <- multiroot(f = eq.16, start = runif(3), positive = TRUE, verbose = F)$root
nonzero.roots <- sum(root == c(0, 0, 0))
roots.in.range <- sum(root > 250) ## we know that our roots shouldn't go this high
iteration <- iteration + 1
}
if (iteration < 300) { ## If the roots for the above were found, use them to construct table 7
alpha1 <- root[1] # alpha1
alpha2 <- root[2] # alpha2
alpha3 <- root[3] # alpha3
## Construction of Table 7, where the last value (phi3, is calculated in the
## last step of the function) and model contingency table is returned by
## by the function
table7 <- c(
phi1, phi7, phi8 / (1 + alpha3), phi2 / (1 + alpha1),
phi10, phi4, phi5 / (1 + alpha2), phi11 * alpha3 / (alpha1 * alpha2 + alpha3),
phi11 * alpha1 * alpha2 / (alpha1 * alpha2 + alpha3), phi5 * alpha2 / (1 + alpha2), phi6, phi12,
phi2 * alpha1 / (1 + alpha1), phi8 * alpha3 / (1 + alpha3), phi9
)
matrix(c(table7, 1 - sum(table7)), 4, 4, byrow = TRUE) * sum(dist.table)
} else {
NA
}
}
c3p2i.model <- function(dist.table) {
############################
# Estimation of 3P&2I Model
############################
## The variables correspond to the variables in table 8
## where A is the observed proportion contingency table
A <- dist.table / sum(dist.table)
phi1 <- A[1, 1] # phi1
phi2 <- A[1, 4] + A[4, 1] # phi2
phi4 <- A[2, 2] # phi4
phi5 <- A[2, 3] + A[3, 2] # phi5
phi6 <- A[3, 3] # phi6
phi7 <- A[1, 2] # phi7
phi8 <- A[1, 3] # phi8
phi9 <- A[4, 2] # phi9
phi10 <- A[4, 3] # phi10
phi11 <- A[2, 1] # phi11
phi12 <- A[2, 4] # phi12
phi13 <- A[3, 1] # phi13
phi14 <- A[3, 4] # phi14
alpha <- (A[3, 2] + A[4, 1]) / (A[1, 4] + A[2, 3]) # alpha from Eq. 17
## Construction of Table 8, where the last value (phi3, is calculated in the
## last step of the function) and model contingency table is returned by
## by the function
table8 <- c(
phi1, phi7, phi8, phi2 / (1 + alpha),
phi11, phi4, phi5 / (1 + alpha), phi12,
phi13, phi5 * alpha / (1 + alpha), phi6, phi14,
phi2 * alpha / (1 + alpha), phi9, phi10
)
matrix(c(table8, 1 - sum(table8)), 4, 4, byrow = TRUE) * sum(dist.table)
}
cdm.model <- function(dist.table) {
#####################
# Estimation of CDM Model
#####################
## The variables correspond to the variables in table 10
## where A is the observed proportion contingency table
## the piX comments correspond to the original code
A <- dist.table / sum(dist.table)
phi1 <- A[1, 1] # pi0
phi2 <- A[1, 2] + A[2, 1] # pi12
phi3 <- A[1, 3] + A[2, 2] + A[3, 1] # pi1
phi4 <- A[1, 4] + A[2, 3] + A[3, 2] + A[4, 1] # pi32
phi5 <- A[2, 4] + A[3, 3] + A[4, 2] # pi2
phi6 <- A[3, 4] + A[4, 3] # pi52
## Equation 30
eq.30 <- function(x) {
phi2 * x / (2 + 2 * x) + phi3 * (x^2 + x / 2) / (1 + x + x^2) + phi4 * (3 / 2 * x^3 + x^2 + x / 2) / (1 + x + x^2 + x^3) + phi5 * (3 / 2 * x^2 + x + 1 / 2) / (x^2 + x + 1) + phi6 * (3 / 2 * x + 1) / (x + 1) + 3 * A[4, 4] / 2 - sum(A[, 2]) / 2 - sum(A[, 3]) - 3 * sum(A[, 4]) / 2
} ## solve for alpha from eq. 30
alpha <- multiroot(f = eq.30, start = 1, positive = TRUE, verbose = F)$root
## Below: the values for calculation for the values in contingency Table 10
## the last value (phi7) added in the next line
table10 <- c(
phi1, phi2 / (1 + alpha), phi3 / (1 + alpha + alpha^2), phi4 / (1 + alpha + alpha^2 + alpha^3),
phi2 * alpha / (1 + alpha), phi3 * alpha / (1 + alpha + alpha^2), phi4 * alpha / (1 + alpha + alpha^2 + alpha^3), phi5 / (1 + alpha + alpha^2),
phi3 * alpha^2 / (1 + alpha + alpha^2), phi4 * alpha^2 / (1 + alpha + alpha^2 + alpha^3), phi5 * alpha / (1 + alpha + alpha^2), phi6 / (1 + alpha),
phi4 * alpha^3 / (1 + alpha + alpha^2 + alpha^3), phi5 * alpha^2 / (1 + alpha + alpha^2), phi6 * alpha / (1 + alpha)
)
matrix(c(table10, 1 - sum(table10)), 4, 4) * sum(dist.table)
}
cim.model <- function(dist.table) {
############################
# Estimation of CIM
############################
## The variables correspond to the variables in table 9
## where A is the observed proportion contingency table
## the piX comments correspond to the original code
A <- dist.table / sum(dist.table)
phi1 <- A[1, 1] # pi02
phi2 <- A[1, 2] + A[2, 1] # pi01
phi3 <- A[2, 2] # pi00
phi4 <- A[1, 3] + A[2, 4] + A[3, 1] + A[4, 2] # pi11
phi5 <- A[2, 3] + A[3, 2] # pi10
phi6 <- A[1, 4] + A[4, 1] # pi12
phi7 <- A[3, 3] # pi20
phi8 <- A[3, 4] + A[4, 3] # pi21
## These are the systems of equations in eq. 23 to find alpha1 and 2
eq.23 <- function(x) {
c(
phi4 * (x[1] + x[1] * x[2]) / (1 + x[1] + x[2] + x[1] * x[2]) + phi5 * x[1] / (1 + x[1]) + phi6 * x[1] / (1 + x[1]) + phi7 + phi8 + A[4, 4] - sum(A[, 3]) - sum(A[, 4]),
phi4 * (x[2] + x[1] * x[2]) / (1 + x[1] + x[2] + x[1] * x[2]) + phi2 * x[2] / (1 + x[2]) + phi8 * x[2] / (1 + x[2]) + phi1 + phi6 + A[4, 4] - sum(A[, 1]) - sum(A[, 4])
)
}
root <- multiroot(f = eq.23, start = c(1, 1), positive = TRUE, verbose = F)$root
alpha1 <- root[1] # alpha1
alpha2 <- root[2] # alpha2
## These are the values for calculation matching the values in Table 9
## organized into their respective rows of four with
## the last value (phi9) added in the next line as the remaining sum to 1
table.9 <- c(
phi1, phi2 * alpha2 / (1 + alpha2), phi4 * alpha2 / (1 + alpha1 + alpha2 + alpha1 * alpha2), phi6 / (1 + alpha1),
phi2 / (1 + alpha2), phi3, phi5 / (1 + alpha1), phi4 / (1 + alpha1 + alpha2 + alpha1 * alpha2),
phi4 * alpha1 / (1 + alpha1 + alpha2 + alpha1 * alpha2), phi5 * alpha1 / (1 + alpha1), phi7, phi8 / (1 + alpha2),
phi6 * alpha1 / (1 + alpha1), phi4 * alpha1 * alpha2 / (1 + alpha1 + alpha2 + alpha1 * alpha2), phi8 * alpha2 / (1 + alpha2)
)
matrix(c(table.9, 1 - sum(table.9)), 4, 4) * sum(dist.table)
}
for (asmt in 1:length(assessment.list)) {
assessment <- assessment.list[asmt]
load(str_c(getwd(), "/", assessment, ".RData"))
# data is stored as addition.RData, subtraction.RData, multiplication.RData, division.RData, set.RData and letterchaos.RData, and chess.RData, as found in the original paper files
## Item score data is found in the res_max2 data.frame. Item pairs are found in the the combpair table.
print(paste("Now computing for", assessment))
## Set up array for chi square values for all the models
## collection mechanisms for computing the % of tables meeting the correct p-value
n.item.pairs <- ncol(combpair)
chi.sq.CDM <- rep(NA, n.item.pairs)
chi.sq.CIM <- rep(NA, n.item.pairs)
chi.sq.2p3i <- matrix(NA, n.item.pairs)
chi.sq.3p2i <- matrix(NA, n.item.pairs)
countpair <- 0 # count pairs with more than 500 observations
nosolution <- rep(0, n.item.pairs) ## collect any without solutions/or insufficient model data
## this changes the cutpoints for slow-fast to be determined by the item mean time
if (use_mean_fast_slow_threshold) {
res_max2 <- res_max2 %>%
left_join(res_max2 %>%
group_by(item_id) %>%
summarize(mean.t = mean(response_in_milliseconds)), by = "item_id") %>%
mutate(score_new = if_else(correct_answered == 1,
if_else(response_in_milliseconds < mean.t, 3, 2),
if_else(response_in_milliseconds < mean.t, 0, 1)
))
}
for (pair in 1:n.item.pairs) {
#####################################
# Set new item pair
#####################################
item.1 <- combpair[1, pair]
item.2 <- combpair[2, pair]
#####################################
# Create the pair dataframe
#####################################
df.1 <- res_max2 %>%
filter(item_id == item.1) %>%
filter((score != 0) | (response_in_milliseconds == 20000)) %>%
arrange(days, user_id, created_UNIX) %>%
distinct(user_id, days, .keep_all = TRUE)
df.2 <- res_max2 %>%
filter(item_id == item.2) %>%
filter((score != 0) | (response_in_milliseconds == 20000)) %>%
arrange(days, user_id, created_UNIX) %>%
distinct(user_id, days, .keep_all = TRUE)
if (include_retesters){
df <- inner_join(df.1, df.2, by = c("user_id", "days"), suffix = c("_1", "_2")) %>%
arrange(days, user_id, desc(created_UNIX_1)) %>%
distinct(user_id, days, .keep_all = TRUE)
} else {
df <- inner_join(df.1, df.2, by = c("user_id", "days"), suffix = c("_1", "_2")) %>%
arrange(days, user_id, desc(created_UNIX_1)) %>%
distinct(user_id, .keep_all = TRUE)
}
## This is a conversion used by the authors to map item pair contingency table
## values to whole numbers for ease of interpretation.
if (use_mean_fast_slow_threshold) {
TMP <- df[, c("score_new_1", "score_new_2")]
} else {
TMP <- floor((1 + df[, c("score_1", "score_2")]) * 2)
}
dist.table <- table(factor(x = TMP[, 1], levels = c(0, 1, 2, 3)), factor(x = TMP[, 2], levels = c(0, 1, 2, 3)))
## only keeping item pairs if there are at least 500 total observations
if (sum(dist.table) > 500) {
## For each model: use the contingency table to produce the predicted model
## contingency table and then compute the chi-square values
model.est <- cdm.model(dist.table)
if ((length(which(model.est < 1)) == 0) && (length(which(model.est >= 5) >= 13))) {
chi.sq.CDM[pair] <- sum((model.est - dist.table)^2 / model.est)
}
model.est <- cim.model(dist.table)
if ((length(which(model.est < 1)) == 0) && (length(which(model.est >= 5) >= 13))) {
chi.sq.CIM[pair] <- sum((model.est - dist.table)^2 / model.est)
}
model.est <- c3p2i.model(dist.table)
if ((length(which(model.est < 1)) == 0) && (length(which(model.est >= 5) >= 13))) {
chi.sq.3p2i[pair] <- sum((model.est - dist.table)^2 / model.est)
}
model.est <- c2p3i.model(dist.table)
if ((length(which(model.est < 1)) == 0) && (length(which(model.est >= 5) >= 13))) {
chi.sq.2p3i[pair] <- sum((model.est - dist.table)^2 / model.est)
}
}
}
## populates the table 3 and table 4 based on errors determined by chi square
## scores consistent with p values less than 0.001 for the models
if (asmt < 5) {
table3[asmt, "CDM"] <- length(which(chi.sq.CDM > 26.12)) / sum(!is.na(chi.sq.CDM))
table3[asmt, "CIM"] <- length(which(chi.sq.CIM > 20.52)) / sum(!is.na(chi.sq.CIM))
table3[asmt, "C2P.3I"] <- length(which(chi.sq.2p3i > 10.83)) / sum(!is.na(chi.sq.2p3i))
table3[asmt, "C3P.2I"] <- length(which(chi.sq.3p2i > 10.83)) / sum(!is.na(chi.sq.3p2i))
print(table3)
} else {
table4[asmt - 4, "CDM"] <- length(which(chi.sq.CDM > 26.12)) / sum(!is.na(chi.sq.CDM))
table4[asmt - 4, "CIM"] <- length(which(chi.sq.CIM > 20.52)) / sum(!is.na(chi.sq.CIM))
table4[asmt - 4, "C2P.3I"] <- length(which(chi.sq.2p3i > 10.83)) / sum(!is.na(chi.sq.2p3i))
table4[asmt - 4, "C3P.2I"] <- length(which(chi.sq.3p2i > 10.83)) / sum(!is.na(chi.sq.3p2i))
print(table4)
}
}
#####################
# CHESS DATA SET
#####################
asmt <- 7
assessment <- assessment.list[asmt]
load(str_c(getwd(), "/", assessment, ".RData"))
print(paste("Now computing for", assessment))
n.item.pairs <- 780 # number of item pairs for chess
pair <- 1 ## This is the indexer for the irregular chess dataset setup
chi.sq.CDM <- rep(NA, n.item.pairs)
chi.sq.CIM <- rep(NA, n.item.pairs)
chi.sq.2p3i <- matrix(NA, n.item.pairs)
chi.sq.3p2i <- matrix(NA, n.item.pairs)
for (r in 2:40) {
for (c in 1:(r - 1)) {
#####################
# Extract item pair contingency table for chess
#####################
dist.table <- table(factor(x = Y[, r], levels = c(0, 0.5, 1, 1.5)), factor(x = Y[, c], levels = c(0, 0.5, 1, 1.5)))
## For each model: use the contingency table to produce the predicted model
## contingency table and then compute the chi-square values
model.est <- cdm.model(dist.table)
if ((length(which(model.est < 1)) == 0) && (length(which(model.est >= 5) >= 13))) {
chi.sq.CDM[pair] <- sum((model.est - dist.table)^2 / model.est)
}
model.est <- cim.model(dist.table)
if ((length(which(model.est < 1)) == 0) && (length(which(model.est >= 5) >= 13))) {
chi.sq.CIM[pair] <- sum((model.est - dist.table)^2 / model.est)
}
model.est <- c3p2i.model(dist.table)
if ((length(which(model.est < 1)) == 0) && (length(which(model.est >= 5) >= 13))) {
chi.sq.3p2i[pair] <- sum((model.est - dist.table)^2 / model.est)
}
model.est <- c2p3i.model(dist.table)
if ((length(which(model.est < 1)) == 0) && (length(which(model.est >= 5) >= 13))) {
chi.sq.2p3i[pair] <- sum((model.est - dist.table)^2 / model.est)
}
pair <- pair + 1
}
}
table4[asmt - 4, "CDM"] <- length(which(chi.sq.CDM > 26.12)) / sum(!is.na(chi.sq.CDM))
table4[asmt - 4, "CIM"] <- length(which(chi.sq.CIM > 20.52)) / sum(!is.na(chi.sq.CIM))
table4[asmt - 4, "C2P.3I"] <- length(which(chi.sq.2p3i > 10.83)) / sum(!is.na(chi.sq.2p3i))
table4[asmt - 4, "C3P.2I"] <- length(which(chi.sq.3p2i > 10.83)) / sum(!is.na(chi.sq.3p2i))
print(table4)
knitr::kable(table3)
| domain | CIM | CDM | C2P.3I | C3P.2I |
|---|---|---|---|---|
| multiplication | 0.9555035 | 1 | 0.0023095 | 0.3148148 |
| division | 0.9450867 | 1 | 0.0058140 | 0.2086957 |
| addition | 0.6076696 | 1 | 0.0058309 | 0.0882353 |
| subtraction | 0.6000000 | 1 | 0.0022989 | 0.2000000 |
knitr::kable(table4)
| domain | CIM | CDM | C2P.3I | C3P.2I |
|---|---|---|---|---|
| set | 1.0000000 | 1.0000000 | 0 | 0.0000000 |
| letterchaos | 0.1538462 | 1.0000000 | 0 | 0.0000000 |
| chess | 0.9017341 | 0.9673367 | 0 | 0.3920705 |
Tables 3 and 4, as shown above, have been exactly reproduced according to the results of the paper. The data reported above should reflect and be identical to their results tables (Table 3 and Table 4). These tables show the percentage of all the contingency tables across all the models that fall outside of their p-value of 0.001. Each number in every table entry indicates what percentage of this number of remaining item pairs has a \(\chi^2\)-value that exceeds the p = 0.001 threshold for the corresponding number of degrees of freedom (meaning, of all the thousands of tests run, what percent of these models fail to reject the null hypothesis).
The first reproductions were not successful, because I failed to use
the UNIX_created variables in the datasets, as they were
not referenced in the paper or the code. By understanding the code used
to generate their results, it appears that they have filtered and
aligned the data so that the first instance of an individual with an
item is the one used in the analysis (by assuming it has the smallest
UNIX_created number). Not using this variable caused slight
deviations in each model’s results from those of the original
authors.
In their code for item pair creation (except chess), they remove the top and bottom observations so they can use a “sort–>concatenate” matching method, where they sort all observations by day, then by user, then by item, then by a “UNIX” variable (that serves to select between responses by an individual if they did an item more than once on the same day). They take this full matrix, copy it and sort the copy, and then remove the top observation from the original and the bottom of the copy before concatenating them and finally looking for matches. Their method has the effect of matching the lowest UNIX number of the second item in the pair with the lowest in the first. It would eliminate time where a participant did both items in the pair on another day. I think this is an oversight. If both items are done on another occasion, the model should still be valid. Additionally, the removal of the top value and bottom value does occasionally remove a data point in some of the data pairs.
The subjects being tested are in the process of learning the domain
area (e.g., arithmetic). We assume that as they learn, the underlying
latent mechanisms that influence speed and accuracy might change. Thus,
if a student takes the same two items on another day, the models can,
and should in this author’s opinion, include this new observation. This
change would serve to strengthen the claims of the authors, because, we
presume, that on average at a later date a retesting student’s latent
processes would mature. When running this analysis by including
retesters, all the original conclusions are upheld. Users may set the
value include_retesters to True to see the
effects of including instances of other-day restesting.
The original authors wrote, “We choose, quite arbitrarily, to define fast responses as those responses that are given before half of the time has expired and to call all other responses slow responses.” This arbitrary cutoff of 10 seconds for fast vs. slow may work for illustrative purposes, but may not be representative of what is fast or slow for a given domain or item. To explore the arbitrariness of this cutoff, a second analysis was run where the distinction between fast vs slow was defined by whether a participant was faster or slower than the mean response time for a given item. (This analysis excludes chess. The chess set comes formatted differently than the other data, making this analysis impossible.)
This analysis resulted in perturbations to overall results in table 3 for most models, but for the 3-person-2-item (3P2I) model it led to a large degradation of its performance. When the fast vs. slow cutoff was set non-arbitrarily and was capturing some amount of item information, it could not distinguish as easily, with the new contingencies tables having a more equal distribution of fast and slow. Cursory analysis showed that the 3P2I model would outperform the CIM and CDM models on some of the distribution tables where proportions of fast and slow responses were imbalanced. It could be that the 3P2I model could more easily classify if there were more of an imbalance in speed types. By removing that imbalance, the model degraded.
By removing the arbitrariness of the cutoff, the quantity of valid item pair contingency table predictions for each model increased and became comparable across all models. In other words, by making the classification of “fast” less arbitrary, the models have more valid model predictions and more item pairs shared between them. This strengthens the confidence of the conclusions made made by using it.
This new analysis can be performed by adjusting the boolean variable
use_mean_fast_slow_threshold to True, which
will run all the above analyses with the new fast-slow cut scores.
It should be noted that the above code for both the reproduction and this second analysis run much faster than the code of the original authors, as it removes the costly process of duplicating large arrays to create the contingency tables for item pairs.
The code above can successfully reproduce tables 3 and 4 from the
original paper exactly. Additionally, it has been set up for future
users to test how the adjustment of the fast-slow cutoff changes the
performance overall and especially on the 3P2I model. Moreover, users
can also choose to include retesters in the dataset as well, to further
bolster the testing of the models. Not including the reasoning for the
use of the UNIX_created variable was an oversight, but it
allowed this author to investigate the question of whether to include
students testing an item pair at a later date.