Q-matrix and variational autoencoders to estimate multidimensional item response theory models with correlated and independent latent variables

VAE Result Continue…

Mahbubul Hasan

11/18/2021

Research Questions

How to use variational autoencoder, in the estimation of MIRT models with large numbers of correlated and independent latent traits?

How are the effects of various factors such as percentage of misfit items in the test, and item quality (item discrimination) on item and model fit in case of misspecification of Q-matrix?

## Data

Two small sizes:
6-traits, 35 items and 10,000 students
6 traits, 40 items and 18000 students
Two large sample sizes
6 traits and 50 items and 25,000 students
20 traits 200 items and 60,000 students
One real data
Reading Inventory and Scholastic Evaluation(RISE): 3 traits, 32 items and 4100 students

Data

ML2Pvae(Variational Autoencoder Models for IRT Parameter Estimation) R package(simulated)

Discrimination: \([0.25, 1.75]\), difficulty paramters:\([−3, 3]\), Q-matrix:\(Bern(0.2)\). sampled uniformly.

Probabilities for each student answering each question correctly were calculated with the ML2P model(McKinley and Reckase,1980).

Method

Q-matrix Misspecification (Table 1: Correctly specified Q matrix, misspecified Q matrix, and misfit items. For instance, if i use 200 items when 20% misfit= 40 items and when 40% misfit= 80 items(This example from Sunbul et. al. 2018 for 15 items)

Research Model

(Research model: 1. low-high dimensional data from population, 2. VAE architecture, 3. synthesize data., 4. incorporate q-matrix,5. model parameter estimation)

What is latent variable?

A latent variable is a variable that cannot be directly measured.

Myth of the cave-

Shadows are the observed variable and latent variables are underlying object generating shadows

E.g. we can measure height directly in various ways. But how can you directly measure, e.g Socioeconomic status or any personality variable.

What is Q-matrix and What can we do with Q-matrix?

Shows the relationship between test items and latent or underlying attributes, or concepts.

Q-matrix is a MxN matrix, where M equals the number of questions in an assessment, and N equals the total number of concepts required for understanding all questions.

Q-matrix can be used for understanding students’ performance
by assigning the closest ideal response to a student’s response vector, it can be assumed which concepts the student does, and which s/he does not know.

Q-matrix & how to use it in VAE model?

AE and VAE architectures are combined with the ML2P model in the following ways-

no hidden layer in the decoder, a sigmoidal activation function on the output layer nodes (with non-negative weights).
Q-matrix to determine the connections between the latent traits and the output items.

VAE model [Converse et al.,2021]

Multidimensional item response theory(MIRT)

naive approach to quantifying student knowledge is to look at the percentage of questions that the student answered correctly
this does not take into account the fact that each item on the assessment is different-both in difficulty and in content

For example, if one student answers only questions 1 and 4 incorrect, and another student answers only questions 3 and 7 incorrect, they have the same percentage score. But it is not likely that the two students share the same latent trait values. Questions 3 and 7 may have tested a different skill than items 1 and 4, and could vary greatly in difficulty level.

in this place MIRT models is useful and play major role in educational measurement.

Multidimensional Logistic 2-Parameter (ML2P) model

ML2P model gives the probability of students answering a particular question as a continuous function of student ability (McKinley and Reckase, 1980).
ML2p model: \(P(u_{ij}= 1\mid \Theta_j;a_i,b_i)=\frac{1}{1+exp\Big[-\sum_{k=1}^k a_{ji}\theta_{jk} + b_i\Big]}\)
difficulty parameter bi for item i,
discrimination parameter \(a_{ik}≥0\) for each latent trait k quantifying the level of ability k required to answer item i correctly.
ML2P model gives the probability of student j with latent abilities \(𝛩_j = (𝜃_{j_1},...,𝜃_{jK})^⊤\) answering item i correctly as

ML2P-VAE model

no hidden layers in the decoder. Instead, the non-zero weights in the decoder are determined by a given Q-matrix
next, connect the encoded distribution layer directly to the output layer. The output layer must use the sigmoidal activation function-

\(Q(zi) = \frac{1}{1+e^{-z_i}}\)

VAE model [Converse et al.,2021]
We could try inputting non-binary responses (partial credit) into the ML2P-VAE model, but it wouldn’t have as much theory backing it. The Samejima graded response model would be a good fit

Model parameters

num_items and num_skills describe the assessment length and the number of abilities being evaluated by the assessment.
Q_matrix is a num_skills by num_items matrix which specifies the relationship between items and abilities.

Get parameter estimates for Model after training

get_item_parameter_estimates() all trainable parameters of the decoder part of the VAE and returns the values which serve as estimates to the item parameters.

Load in true values (included in this pacakge)

#disc_true <- as.matrix(disc_true) 
#diff_true <- as.matrix(diff_true)
#theta_true<- as.matrix(theta_true)

Assumes latent traits are correlated

Correlation plots of discrimination parameter estimates for data set (i) with 35 items and 6 latent traits.
Each color represents discrimination parameters relating one of the 6 latent skills.

Ability parameter: The 6 colors in the plot represent discrimination and ability parameters associated with the 6 latent traits
Difficulty parameters are on the item level, not the latent trait level. So each item i has exactly one difficulty parameter b_i, regardless of the number of latent skills.

(Small sample: ML2P-VAE parameter estimates for data set with 35 items and 6 correleted latent traits and 10000 student.Each color corresponds to discrimination parameters related to one of the 6 latent traits. for difficulty paramterter by itself with 35 items)

Assumes latent traits are correlated(continue…)

(Small sample: ML2P-VAE parameter estimates for data set with 40 items and 6 correlated latent traits and 18,000 student.Each color corresponds to discrimination parameters related to one of the 6 latent traits. For difficulty parameter by itself with 40 items)

Assumes latent traits are correlated(continue…)

(Large sample: ML2P-VAE parameter estimates for data set with 50 items and 6 correlated latent traits and 25,000 student.Each color corresponds to discrimination parameters related to one of the 6 latent traits. For difficulty parameter by itself with 50 items)

Assumes latent traits are correlated(continue…)

Ability parameter: The 20 colors in the plot represent discrimination and ability parameters associated with the 20 latent traits
Difficulty parameters are on the item level, not the latent trait level. So each item i has exactly one difficulty parameter b_i, regardless of the number of latent skills.

(Large sample: ML2P-VAE parameter estimates for data set with 200 items and 20 correlated latent traits and 60,000 student.Each color corresponds to discrimination parameters related to one of the 20 latent traits. For difficulty parameter by itself with 200 items)

Assumes latent traits are independent

(Small sample: ML2P-VAE parameter estimates for data set with 35 items and 6 independent latent traits and 10000 student.Each color corresponds to discrimination parameters related to one of the 6 latent traits. for difficulty parameter by itself with 35 items)

Assumes latent traits are independent(continue…)

(Small sample: ML2P-VAE parameter estimates for data set with 40 items and 6 independent latent traits and 18,000 student.Each color corresponds to discrimination parameters related to one of the 6 latent traits. For difficulty parameter by itself with 40 items)

Assumes latent traits are independent(continue…)

(Large sample: ML2P-VAE parameter estimates for data set with 50 items and 6 independent latent traits and 25,000 student.Each color corresponds to discrimination parameters related to one of the 6 latent traits. For difficulty parameter by itself with 50 items)

Assumes latent traits are independent(continue…)

(Large sample: ML2P-VAE parameter estimates for data set with 200 items and 20 independent latent traits and 60,000 student.Each color corresponds to discrimination parameters related to one of the 20 latent traits. For difficulty parameter by itself with 200 items)

Q-matrix Misspacification/Validation

(Small sample: Q-matrix Misspacification plots comparison with specified q-matrix vs under, over and mixed method for both 20 percent and 40 percent changes of items. Figure shows how data point goes below 45 degree once different method of misspacifications are applied. RMSE and Cor score also confirms this changes)

Q-matrix Misspacification/Validation(continue…)

(Large sample: Q-matrix Misspacification plots comparison with specified q-matrix vs under, over and mixed method for both 20 percent and 40 percent changes of items. Figure shows how data point goes below 45 degree once different method of misspacifications are applied. RMSE and Cor score also confirms this changes)

Q-matrix Misspacification/Validation(continue…)

(Error measures for ability (theta) parameters from various parameter estimation methods on two different data sets. Table shows how RMSE and BIAS and Cor score changes as we misfit q-matrix. other methods are kept blank intentionally)

Q-matrix and variational autoencoders to estimate multidimensional item response theory models with correlated and independent latent variables

VAE Result Continue…

Research Questions

## Data

Data

Method

Research Model

What is latent variable?

What is Q-matrix and What can we do with Q-matrix?

Q-matrix & how to use it in VAE model?

Multidimensional item response theory(MIRT)

Multidimensional Logistic 2-Parameter (ML2P) model

ML2P-VAE model

Model parameters

Get parameter estimates for Model after training

Load in true values (included in this pacakge)

Assumes latent traits are correlated

Assumes latent traits are correlated(continue…)

Assumes latent traits are correlated(continue…)

Assumes latent traits are correlated(continue…)

Assumes latent traits are independent

Assumes latent traits are independent(continue…)

Assumes latent traits are independent(continue…)

Assumes latent traits are independent(continue…)

Q-matrix Misspacification/Validation

Q-matrix Misspacification/Validation(continue…)

Q-matrix Misspacification/Validation(continue…)

Thank you!

Questions?