Q-matrix and variational autoencoders to estimate multidimensional item response theory models with correlated and independent latent variables

The Western Tennessee Chapter of the American Statistical Association:Annual Fall Chapter Symposium

Mahbubul Hasan(presenter), Faculty Mentors & Team Members- Dale Bowman, Lih Y Deng, Ching-Chi Yang, Jhon Sabatini, and John Hollander.

11/04/2021

Plan for this lecture

Research Questions
Data description
Method:Q-matrix misspecification
Generative model, why generative model?
Autoencoder, Variational Autoencoder(VAE)
Q-matrix & how to use it in VAE model?
Multidimensional Logistic 2-Parameter (ML2P) Model
VAE/deep learning frontiers in educational assessment
Future directions or extensions

Which face is fake?

[Image:MITS191]

Research Questions

How to use variational autoencoder, in the estimation of MIRT models with large numbers of correlated and independent latent traits? How is the error measure looks like when changes were made in latent traits and the number of students?

How are the effects of various factors such as sample sizes, percentage of misfit items in the test, and item quality (item discrimination) on item and model fit in case of misspecification of Q matrix?

Data

Two small sizes:
6-traits, 35 items and 10,000 students
6 traits, 40 items and 18000 students
Two large sample sizes
6 traits and 50 items and 25,000 students
20 traits 200 items and 60,000 students
One real data
Reading Inventory and Scholastic Evaluation(RISE): 3 traits, 32 items and 4100 students

Data

ML2Pvae(Variational Autoencoder Models for IRT Parameter Estimation) R package(simulated)

N= students, found in the data frame theta_true, were sampled from \(N(0,Σ)\).
\(Σ\) specifies the correlations between the abilities, and is found in the data frame correlation_matrix.

Discrimination: \([0.25, 1.75]\), difficulty paramters:\([−3, 3]\), Q-matrix:\(Bern(0.2)\). sampled uniformly.

Probabilities for each student answering each question correctly were calculated with the ML2P model(McKinley and Reckase,1980).

Method

Q-matrix Misspecification (Table 1: Correctly specified Q matrix, misspecified Q matrix, and misfit items. For instance, if i use 200 items when 20% misfit= 40 items and when 40% misfit= 80 items(This example from Sunbul et. al. 2018 for 15 items)

Research Model

(Research model: 1. low-high dimensional data from population, 2. VAE architecture, 3. synthesize data., 4. incorporate q-matrix,5. model parameter estimation)

DataFrame(head)

data(response)

head(q-matrix)

Unsupervised learning

Data: Y is the data, no labels exist.
Objective: to explore and learn hidden and underlying structure of the data
Example: dimensionality reduction, clustering

Generative model

generative means once we learn the underlying structure of the data we can create an new example that does not exist
Goal:Take as input training samples from some distribution and learn a model that represents that distribution
The generative modeling process [O.Reilly,2021]

Why generative model?

Facial classification, Debiasing
Able to uncover underlying latent variables in the dataset

Outliers detection
detect outliers in the distribution and use outliers during training to improve even more!

[Amini,2019]

What is latent variable?

Myth of the cave-

Shadows are the observed variable and latent variables are underlying object generating shadows
latent variables in this case(response data) are the knowledge components needed to solve the problem(e.g. questions)

What is autoencoder?

Autoencoders are a special class of neural networks in which the input and output layers are the same.

An AE consists of two neural networks: an encoder and a decoder. The encoder takes the high-dimensional input \(x\), and feeds forward through one or more hidden layers to some latent space.

The decoder takes in this **latent dimension, and feeds forward through hidden layers to output a reconstruction of the original input, \(\hat{X}\).

AE model

[Rocca,2019]

Why do we care about low dimentional `Z`?

Lower dimensional feature representation from unlabeled data
Reduce noise, use most meaningful features from distribution
train the model to use features to reconstruct the original data.
bottleneck hidden layers

What is VAE?

VAE is probabilistic twist on autoencoder.
VAE still consists of an encoder and decoder, but the latent space returned from the encoder is trained to learn a probability distribution for \(\Theta\) given \(X\).
VAE allows the network to map the training data to a (latent) normal distribution.

sample from this distribution, and feed forward that sample through the decoder

VAE model(MNIST data)

VAE optimization

Training VAE model [Doersch, 2016]

Left is without the “reparameterization trick”, and right is with it.
Red shows sampling operations that are non-differentiable. Blue shows loss layers.
The feed forward behavior of these networks is identical, but backpropagation can be applied only to the right network.

Reparametrizing the sampling lyer

Reparametrizing [MIT-S191]
Enables to train VAE end to end by backpropagation with respect to Z and actual weights of the encoder.

VAE: latent purturbation during training

purturbation: a small change in a system which can be as a result of a third object interacting with the system
slowly increase or decrease the latent variable, keep other variales fixed.
Disentanglement

VAE summary

Compress representation of information to something we can use to learn
Reconstruction allows for unsupervised learning(one label in data no Y)
Reparametrizing trick to train end to end
Use perturbation to interpret hidden lyers
Generate new set of data/example.

What is Q-matrix and What can we do with Q-matrix?

Shows the relationship between test items and latent or underlying attributes, or concepts.

Q-matrix is a MxN matrix, where M equals the number of questions in an assessment, and N equals the total number of concepts required for understanding all questions.

Q-matrix can be used for understanding students’ performance
by assigning the closest ideal response to a student’s response vector, it can be assumed which concepts the student does, and which s/he does not know.

Q-matrix & how to use it in VAE model?

AE and VAE architectures are combined with the ML2P model in the following ways-

no hidden layer in the decoder, a sigmoidal activation function on the output layer nodes (with non-negative weights).
Q-matrix to determine the connections between the latent traits and the output items.

VAE model [Converse et al.,2021]

What is difficulty and discrimination perameter?

Reflects students varied levels of understanding
difficulty
Difficulty parameters can be any number, but are usually between [-3,3]

Discrimination parameters

be fair- students’ with better understanding should do better in test
discrimination
Discrimination parameters are always positive, usually between [0.25, 3]

Multidimensional item response theory(MIRT)

naive approach to quantifying student knowledge is to look at the percentage of questions that the student answered correctly
this does not take into account the fact that each item on the assessment is different-both in difficulty and in content

For example, if one student answers only questions 1 and 4 incorrect, and another student answers only questions 3 and 7 incorrect, they have the same percentage score. But it is not likely that the two students share the same latent trait values. Questions 3 and 7 may have tested a different skill than items 1 and 4, and could vary greatly in difficulty level.

in this place MIRT models is useful and play major role in educational measurement.

Multidimensional Logistic 2-Parameter (ML2P) model

ML2P model gives the probability of students answering a particular question as a continuous function of student ability (McKinley and Reckase, 1980).
ML2p model: \(P(u_{ij}= 1\mid \Theta_j;a_i,b_i)=\frac{1}{1+exp\Big[-\sum_{k=1}^k a_{ji}\theta_{jk} + b_i\Big]}\)
difficulty parameter bi for item i,
discrimination parameter \(a_{ik}≥0\) for each latent trait k quantifying the level of ability k required to answer item i correctly.
ML2P model gives the probability of student j with latent abilities \(𝛩_j = (𝜃_{j_1},...,𝜃_{jK})^⊤\) answering item i correctly as

ML2P-VAE model

no hidden layers in the decoder. Instead, the non-zero weights in the decoder are determined by a given Q-matrix
next, connect the encoded distribution layer directly to the output layer. The output layer must use the sigmoidal activation function-

\(Q(zi) = \frac{1}{1+e^{-z_i}}\)

VAE model [Converse et al.,2021]
We could try inputting non-binary responses (partial credit) into the ML2P-VAE model, but it wouldn’t have as much theory backing it. The Samejima graded response model would be a good fit

Model parameters

num_items and num_skills describe the assessment length and the number of abilities being evaluated by the assessment.
Q_matrix is a num_skills by num_items matrix which specifies the relationship between items and abilities.

Get parameter estimates for Model after training

get_item_parameter_estimates() all trainable parameters of the decoder part of the VAE and returns the values which serve as estimates to the item parameters.

Load in true values (included in this pacakge)

#disc_true <- as.matrix(disc_true) 
#diff_true <- as.matrix(diff_true)
#theta_true<- as.matrix(theta_true)

Assumes latent traits are correlated

Correlation plots of discrimination parameter estimates for data set (i) with 35 items and 6 latent traits.
Each color represents discrimination parameters relating one of the 6 latent skills.

Ability parameter: The 6 colors in the plot represent discrimination and ability parameters associated with the 6 latent traits
Difficulty parameters are on the item level, not the latent trait level. So each item i has exactly one difficulty parameter b_i, regardless of the number of latent skills.

(Small sample: ML2P-VAE parameter estimates for data set with 35 items and 6 correleted latent traits and 10000 student.Each color corresponds to discrimination parameters related to one of the 6 latent traits. for difficulty paramterter by itself with 35 items)

Assumes latent traits are correlated(continue…)

(Small sample: ML2P-VAE parameter estimates for data set with 40 items and 6 correlated latent traits and 18,000 student.Each color corresponds to discrimination parameters related to one of the 6 latent traits. For difficulty parameter by itself with 40 items)

Assumes latent traits are correlated(continue…)

(Large sample: ML2P-VAE parameter estimates for data set with 50 items and 6 correlated latent traits and 25,000 student.Each color corresponds to discrimination parameters related to one of the 6 latent traits. For difficulty parameter by itself with 50 items)

Assumes latent traits are correlated(continue…)

Ability parameter: The 20 colors in the plot represent discrimination and ability parameters associated with the 20 latent traits
Difficulty parameters are on the item level, not the latent trait level. So each item i has exactly one difficulty parameter b_i, regardless of the number of latent skills.

(Large sample: ML2P-VAE parameter estimates for data set with 200 items and 20 correlated latent traits and 60,000 student.Each color corresponds to discrimination parameters related to one of the 20 latent traits. For difficulty parameter by itself with 200 items)

Assumes latent traits are independent

(Small sample: ML2P-VAE parameter estimates for data set with 35 items and 6 independent latent traits and 10000 student.Each color corresponds to discrimination parameters related to one of the 6 latent traits. for difficulty parameter by itself with 35 items)

Assumes latent traits are independent(continue…)

(Small sample: ML2P-VAE parameter estimates for data set with 40 items and 6 independent latent traits and 18,000 student.Each color corresponds to discrimination parameters related to one of the 6 latent traits. For difficulty parameter by itself with 40 items)

Assumes latent traits are independent(continue…)

(Large sample: ML2P-VAE parameter estimates for data set with 50 items and 6 independent latent traits and 25,000 student.Each color corresponds to discrimination parameters related to one of the 6 latent traits. For difficulty parameter by itself with 50 items)

Assumes latent traits are independent(continue…)

(Large sample: ML2P-VAE parameter estimates for data set with 200 items and 20 independent latent traits and 60,000 student.Each color corresponds to discrimination parameters related to one of the 20 latent traits. For difficulty parameter by itself with 200 items)

Q-matrix Misspacification/Validation

(Small sample: Q-matrix Misspacification plots comparison with specified q-matrix vs under, over and mixed method for both 20 percent and 40 percent changes of items. Figure shows how data point goes below 45 degree once different method of misspacifications are applied. RMSE and Cor score also confirms this changes)

Q-matrix Misspacification/Validation(continue…)

(Large sample: Q-matrix Misspacification plots comparison with specified q-matrix vs under, over and mixed method for both 20 percent and 40 percent changes of items. Figure shows how data point goes below 45 degree once different method of misspacifications are applied. RMSE and Cor score also confirms this changes)

Q-matrix Misspacification/Validation(continue…)

(Error measures for ability (theta) parameters from various parameter estimation methods on two different data sets. Table shows how RMSE and BIAS and Cor score changes as we misfit q-matrix. other methods are kept blank intentionally)

Corr latent traits model

most effective when parameters specify a more complicated distribution, such as \(N (μ, Σ) or N (0, Σ)\)
use TensorFlow probability library

Corr latent traits

VAE for educational assessment(summary)

ML2P-VAE methods are most useful on high-dimensional data
Faster in runtime
Reserve data privacy
Students’ latent abilities estimation are often more accurate when using ML2P-VAE methods.
This is especially interesting, as the estimates \(𝛩_j\) are not updated in the iterations of a gradient descent algorithm, while the estimates to \(a_{ik}\) and \(b_i\) are.
Highly interpretable model.
During estimation of difficulty parameters, the improvement gained from using traditional methods is incredibly small. [Converse et. al. 2021]

Other thoughts/comments

This was a very basic example, and results can be improved by fine-tuning the network architecture.
Though real-world datasets likely won’t have “true” values of the parameters available, simulated dataset includes them for reference.

Future directions

Data anonymization via VAE to protect learners privacy.
Use of other distribution in the VAE model e.g., exponential families
Compare ML2P-VAE methods with such sampling based parameter estimation approaches e.g.,Gibbs sampling, Hamiltonian Monte Carlo, and Metropolis-Hastings,the Robbins-Monroe algorithm algorithms in the future.
Extend ML2P-VAE method to ML3P-VAE model.

Q-matrix and variational autoencoders to estimate multidimensional item response theory models with correlated and independent latent variables

The Western Tennessee Chapter of the American Statistical Association:Annual Fall Chapter Symposium

Plan for this lecture

Which face is fake?

Research Questions

Data

Data

Method

Research Model

DataFrame(head)

Unsupervised learning

Generative model

Why generative model?

What is latent variable?

What is autoencoder?

Why do we care about low dimentional Z?

What is VAE?

VAE optimization

Reparametrizing the sampling lyer

VAE: latent purturbation during training

VAE summary

What is Q-matrix and What can we do with Q-matrix?

Q-matrix & how to use it in VAE model?

What is difficulty and discrimination perameter?

Discrimination parameters

Multidimensional item response theory(MIRT)

Multidimensional Logistic 2-Parameter (ML2P) model

ML2P-VAE model

Model parameters

Get parameter estimates for Model after training

Load in true values (included in this pacakge)

Assumes latent traits are correlated

Assumes latent traits are correlated(continue…)

Assumes latent traits are correlated(continue…)

Assumes latent traits are correlated(continue…)

Assumes latent traits are independent

Assumes latent traits are independent(continue…)

Assumes latent traits are independent(continue…)

Assumes latent traits are independent(continue…)

Q-matrix Misspacification/Validation

Q-matrix Misspacification/Validation(continue…)

Q-matrix Misspacification/Validation(continue…)

Corr latent traits model

VAE for educational assessment(summary)

Other thoughts/comments

Future directions

Thank you!

Questions?

Why do we care about low dimentional `Z`?