Item-Response Theory

Let’s start with some basic notation, and working through several increasingly complex models. We’re following Chapter 1 & Chapter 8 of An introduction to psychometric theory with applications in R by William Revelle, which can be found here.

A bit of notation:

\(\theta_i\) - the latent ‘ability’ of the \(i\)th person.
\(\delta_j\) - the ‘difficulty’ of the \(j\)th item.

Using the example of an exam, the latent ability of a subject (\(\theta_i\)) might be how knowledgeable Tim is about conservation. The difficulty of the item (\(\delta_j\)) might be how challenging a conservation exam question is. We’re interested in the probability (\(\mathrm{Pr}\)) of Tim (\(i\)) getting a question (\(j\)) correct (\(corr\)) given his ability (\(\theta_i\)) and the difficulty of the question (\(\delta_j\)). This is described by some function (\(f\)) of a person’s ability and the question difficulty. Here the response is dichotomous, but later we’ll move onto polytomous responses like the ones we’re dealing with in our study. This is the ability model:

\[\mathrm{Pr}(corr|\theta_i, \delta_j) = f(\theta_i - \delta_j)\] The probability of someone answering correctly should always increase with their ability (monotonically increasing).

A similar but distinct model is one where someones endorsement of an item depends on some latent variable, and where the probability of endorsement is greatest at a given level of the latent variable. For example, the probability of endorsing the statement “I don’t like nature” is going to peak at low levels of biophilia but the probabilty of endorsing the statement “Nature is more important than anything else” will peak at high levels. This is called the attitude model:

\[\mathrm{Pr}(endor|\theta_i, \delta_j) = f(|\theta_i - \delta_j|) \] Whereas the ability model refers to “ability” and “difficulty”, the attitude model refers to “latent attribute” and “item location” - the notation for both is the same (\(\theta_i\) and\(\delta_j\) respectively). The equations for the ability and attitude models look very similar. But, in the ability model getting a correct answer is a function of the relative difference between ability and difficulty, whereas in the attitude model the probability of endorsement is a function of the absolute difference between latent attribute and item location (\(|\theta_i - \delta_j|\)). Below, the left plot shows the probability function of the basic ability model, and the right shows the basic attitude model.

We’ll discuss if the ability or attitude model is more important later. But for the time being, we’ll proceed with the ability model to explore some basic concepts. A basic model that we can build on, for understanding the more complex models later, is the Rasch model. This is described by:

\[\mathrm{Pr}(corr|\theta_i, \delta_j) = \frac{1}{1+e^{\theta_i-\delta_j}} \]

Where the probabilty of the \(i\)th respondent getting the correct answer to the \(j\)th item is a logistic function of the difference their ability (\(\theta_i\)) and the item difficulty (\(\delta_j\)). But how do we know a persons ability and the item difficulty? This is revealed by looking at mean number of correct answers between multiple respondents, for multiple items, discussed in the Chapter mentioned above. However, one feature of the Rasch model is that it assumes that each item has equal ability to be able to discriminate based on ability.

Here we can introduce the Item Characteristic Curves and Item Information Curves. We’ve already seen the Item Characteristic Curves - this is the probability of getting a correct answer at a given level of ability. The below plot shows Item Characteristic Curves for three items, of increasing difficulty, from a Rasch model. We see that although the difficulty changes, the shape of the curve - its ability to discriminate - stays the same.

The Item Information Curve (not to be confused with the basic attitude model) is the first derivative of the Item Characteristic Curves. It essentially shows where the curve is steepest, and so where it has greatest discriminatory power.

These one parameter models only assume that items only vary in terms of their difficulty. However, this is often unrealistic, since items can also differ in terms of how well they discriminate based on ability. These two parameter models introduce a discrimination parameter, \(\alpha\). We can extend the logistic model introduced above by adding this new parameter:

\[\mathrm{Pr}(corr|\theta_i, \alpha_j, \delta_j) = \frac{1}{1+e^{\alpha_j(\theta_i-\delta_j})} \] The addition of this parameter leads to better model fit and non-parallel Item Characteristic Curves and Item Information Curves. I’m not going to bother plotting this - just imagine the curves at different slopes to each other.

Now, both these models assume that peoples reported answers are a true reflection of what they believe. However, we know in tests that you can still get the correct answer by guessing - through random luck. The three parameter models include this so called “guessing parameter”, denoted by \(γ\). This third parameter might not be necessary within our analysis, since we’re not assuming people would need to guess. The three parameter model is described by:

\[\mathrm{Pr}(corr|\theta_i, \alpha_j, \gamma, \delta_j) = \frac{1-\gamma}{1+e^{\alpha_j(\theta_i-\delta_j})} \]

You can extent the model even further, but we’ll not go into those extensions. However, so far we’ve been talking about models with dichotomous responses - right or wrong, endorsed or not endorsed. However, in our analysis we have polytomous responses - levels of agreement, etc.

This is where it becomes useful for us to have introduced the attitude model. Lets use the example of asking Diogo “How much time do you spend talking about conservation?” with the responses “None of the time”, “Some of the time”, “Half the time”, “Most of the time” and “All of the time”. We assume that the more Diogo enjoys talking about conservation the higher the level of response he chooses. The probability of saying “None of the time” monotonically increases the less he likes talking about conservation and the probability of endorsing “All of the time” monotonically increases the more he likes talking about it. But if he says “Some of the time” then he’s above some threshold between “None of the time” and “Some of the time” but below some threshold of “Some of the time” and “Half the time”. This is a graded response model. We know the probability of giving the extreme responses is just the probability of not passing the first or last threshold. Then, the probability of the intermediate responses are the probability of passing one threshold but not another.

For a 2 parameter logistic model, the probability of endorsing the \(k\)th response is a function of ability, item thresholds, and the discrimination parameter. This graded response model is described by:

\[\mathrm{Pr}(r=k|\theta_i, \delta_k, \delta_{k-1}, \alpha_k) =\mathrm{Pr}(r|\theta_i, \delta_{k-1}, \alpha_k) -\mathrm{Pr}(r|\theta_i, \delta_{k}, \alpha_k)= \frac{1}{1+e^{\alpha_{k}(\delta_{k-1}-\theta_i)}}-\frac{1}{1+e^{\alpha_{k}(\theta_k-\theta_i)}} \]

The long and the short of it is that for each item we get Item Characteristic Curves that looks something like the figure below.