A Gentle Introduction to Item Response Theory

Author

James Silva Garcia

Published

September 8, 2025

Introductory ideas

Not long ago, I was asked an interesting question: imagine you have a dataset with 1000 rows—each row representing a test taker—and their answers to a 10-question test, recorded as either correct (1) or incorrect (0). How would you analyze this data?

A natural first step might be to score each person by counting their total correct answers, and then, for each question, calculate the fraction of test takers who got it right. But we can go further. With this kind of information, we can ask deeper questions: for each test item, what is the probability that someone answers it correctly? Thinking about this brought me back to when I first learned how logistic regression could be used to study binary outcomes in a course on categorical data analysis.

This idea—using logistic regression to study test responses—forms the foundation of a psychometric framework known as Item Response Theory (IRT). At its core, IRT assumes that each person has an underlying trait, such as knowledge or ability, that we cannot observe directly. What we can observe are their answers to test questions, and IRT provides a way to model how this hidden trait influences the probability of getting each question right.

For an arbitrary question, one popular model choice is the two-parameter logistic model that can be expressed as

\[ {\sf Pr(Correct \space answer)} = \frac{1}{1 \space + \space exp[-discrimination (z \space - \space difficulty)]} \]

where \(z\) is a latent trait (for example, latent knowledge or ability) and \(discrimination\) and \(difficulty\) are two model parameters to be estimated from the data. In typical applications, \(z\) is assumed to follow a Normal Standard distribution. Once these two parameters are estimated, we can allow \(z\) to vary across a range of values (for example, from -6 to 6) to get a visualization known as Item Characteristic Curve (ICC).

Note that when \(z \space = \space difficulty\), the probability of a correct answer becomes 0.5. Therefore, the \(difficulty\) parameter –usually denoted by \(\beta\)– can be interpreted as the ability level required to obtain a 0.5 probability of answering correctly. The \(discrimination\) parameter –usually denoted by \(\alpha\)– is directly related to the steepness of the ICC and measures how effectively an item (i.e., a question) differentiates between examinees with different ability levels. In practice, high discrimination levels are preferred because the probability of answering correctly changes rapidly with small changes in ability around the item’s difficulty level.

Does IRT make sense in genomics or other research areas?

In genetics, a similar data structure can be found:

  • Each item = a genetic marker (e.g., SSRs, SNPs, mutations, gene variants).

  • Each test taker = a study participant.

  • The latent trait = an underlying biological risk, susceptibility, or expression profile.

Just like test items vary in difficulty and discrimination, genetic variants differ in how strongly they associate with traits or diseases. For example:

  • An SSR or a SNP with a high “discrimination parameter” could sharply separate high-risk individuals from low-risk ones.

  • An SSR or a SNP with a high “difficulty parameter” might represent a rare allele, only present in people with a high level of genetic susceptibility.

Similar analogies can be established with other research areas.

Data simulation from a two-parameter logistic model

Now, let me show you one of the possible ways to simulate data from a two-parameter logistic model using R. We start by making sure a folder called “images” is created (used to save script-generated *.png files). We also load some of the tidyverse R packages, together with bslib (for bootstrap styling and theming), plotly (for interactive visualizations), and ltm/mirt to peform IRT analyses. Colorblind friendly colors are also set.

Show code
# setwd("C:/Users/james/workspace/quartoHub/IRTintro")
if (!dir.exists("images")) dir.create("images")

library(dplyr)
library(ggplot2)
library(purrr)
library(bslib)
library(plotly)
library(ltm)
library(mirt)

# Specify colorblind tones for plotting
Colors = c("#33A02C",  "#66C2A5",  "#1F78B4",  "#7570B3",  "#666666",  "#FFD92F",  "#6A3D9A", "#E7298A",  "#FF7F00",  "#E31A1C")

Let’s simulate data for a 10 independent items (nItems = 10) instrument and 1000 examinees (nPersons = 1000). A set of 10 discrimination values are drawn from a Uniform distribution between 0.25 and 2. Likewise, 10 difficulty values are drawn from a Normal Standard distribution.

Let’s take a look at the simulated difficulty and discrimination parameters for each of the 10 items.

Show code
##### Create objects for simulation
# Simulate data for a 2PL model
set.seed(37573)
nItems = 10
nPersons = 1000

# Generate item parameters
discrimination = round(runif(nItems, 0.25, 2), 3)  # Discrimination a
difficulty = round(rnorm(nItems, 0, 1),3)          # Difficulty b

print(data.frame(difficulty,discrimination))
   difficulty discrimination
1      -0.680          0.642
2       1.412          1.941
3       0.826          1.550
4      -0.460          1.854
5       1.396          1.672
6       1.606          1.708
7      -1.066          0.473
8       1.598          1.549
9      -0.749          0.997
10      0.803          0.888

Next, we simulate 1000 ability values that also follow a Normal Standard distribution. Two ways of simulating data are illustrated. The first option is using the simdata function from mirt; the second option is to use base R functions and purrr to generate a data frame called baseSim which is used to illustrate model fitting functions (mirt and ltm).

Parameter estimates using mirt are shown below.

Show code
# Generate person abilities
ability = round(rnorm(nPersons, 0, 1), 3)

# Simulate response data using mirt's simdata
mirtSim = simdata(a = discrimination, d = difficulty, itemtype = "2PL", Theta = ability)


# Simulate response data using base R
# Compute probabilities and simulate responses
myfxGetResponses = function(eaITEM) {
  data.frame(
    response = rbinom(nPersons, size = 1,
           prob = 1 / (1 + exp(-discrimination[eaITEM] * (ability - difficulty[eaITEM])))
           )
  ) %>% 
    set_names(paste0("Item",eaITEM))
}

baseSim = map_dfc(1:nItems, ~myfxGetResponses(.))


##### Model fitting for base R simulated data
mirtMod = mirt(baseSim, 1, "2PL", verbose=F, SE=T)
ltmMod = ltm(baseSim ~ z1)

coefMirt = coef(mirtMod, simplify=TRUE)$items
coefLtm = coef(ltmMod, prob=T, order=F)

data.frame(difficulty=-coefMirt[,2]/coefMirt[,1], discrimination=coefMirt[, 1]) %>% 
  mutate(across(everything(), ~ round(.x, 3)))
       difficulty discrimination
Item1      -0.766          0.555
Item2       1.543          1.620
Item3       0.811          1.652
Item4      -0.432          1.854
Item5       1.380          1.790
Item6       1.743          1.456
Item7      -1.127          0.468
Item8       1.538          1.747
Item9      -0.803          0.911
Item10      0.802          0.865

And here are the nearly identical parameter estimates obtained with ltm.

Show code
coefLtm %>%
  as.data.frame() %>% 
  mutate(across(everything(), ~ round(.x, 3)))
       Dffclt Dscrmn P(x=1|z=0)
Item1  -0.766  0.555      0.605
Item2   1.544  1.619      0.076
Item3   0.811  1.652      0.208
Item4  -0.432  1.855      0.690
Item5   1.380  1.789      0.078
Item6   1.743  1.456      0.073
Item7  -1.127  0.468      0.629
Item8   1.538  1.746      0.064
Item9  -0.803  0.911      0.675
Item10  0.802  0.865      0.333

Sm🄰rt IRT model visualizations

We can use parameter estimates from ltm to perform some data wrangling to generate a data frame called parameters which is used for storing item annotations, and a pointsDF data frame to store the columns needed to be able to overlay points on top of a faceted ggplot2 visualization.

Show code
##### Calculate predicted 2PL functions and build elegant visualizations
# Define the 2-Parameter Logistic function
myfx2PL =  function(x, a, b) {
  1 / (1 + exp(-a * (x - b)))
}

# Create a data frame with two sets of parameters estimated using ltm
parameters = data.frame(
  Item = 1:nItems,
  a = as.numeric(coefLtm[,2]),  # Discrimination
  b = as.numeric(coefLtm[,1])   # Difficulty
) %>% 
  arrange(b) %>% 
  mutate(Color = Colors,
         Label = paste0("Item ", Item, " [a = ", round(a, 2), "; b = ", round(b, 2), "]"))


# Define the ability range
xValues = seq(-6, 6, by = 0.1)

# Evaluate the function for each item
myfxCalc = function(eaITEM) {
  data.frame(
    Item = parameters$Item[eaITEM],
    Color = parameters$Color[eaITEM],
    Label = parameters$Label[eaITEM],
    a = parameters$a[eaITEM],
    b = parameters$b[eaITEM],
    x = xValues,
    y = myfx2PL(xValues,
                a = parameters$a[eaITEM], 
                b = parameters$b[eaITEM])
  )
}

#Generate line plotting data
plotDF = map_df(1:nrow(parameters), ~myfxCalc(.))

plotDF = plotDF %>%
  mutate(tooltipText = paste0("Item ", Item, " [Ability = ", round(x, 2), "; Prob = ", round(y, 2), "]"))

#Generate point plotting data
pointsDF = parameters %>%
  dplyr::select(Item, Color, b, Label) %>%
  mutate(x = b,
         y = 0.5,
         tooltipText = paste0("Item ", Item, " [Ability = ", round(x, 2), "; Prob = ", round(y, 2), "]"))

#Static item-faceted visualization
facetPlot = ggplot(plotDF, aes(x = x, y = y, color = I(Color), text=tooltipText, group=1))+
  geom_line(linewidth = 1.2) +
  facet_wrap(~factor(Label, levels=unique(Label)), ncol = 5) +
  # facet_wrap(~factor(Label, levels=unique(Label)), ncol = 5, as.table=F, scales="free_x") +
  geom_vline(aes(xintercept=b),linewidth=0.15,colour="gray50") +
  geom_hline(aes(yintercept=0.5),linewidth=0.15,colour="gray50") +
  geom_point(data=pointsDF, aes(color=I(Color), shape=intToUtf8(9679)), size=6) +
  geom_label(data=pointsDF, aes(label=Item, fill=I("white")), color="black", size=3.25, label.size=NA,
             label.padding=unit(0.01, "lines"), fontface="bold") +
  ylab(paste0("Probability of correct answer")) +
  xlab("Ability") +
  theme_classic()+
  theme(axis.line.x=element_line(color="gray20", linewidth=1.0),
        axis.line.y=element_line(color="gray20", linewidth=0.5),
        axis.text.x=element_text(hjust = 1),
        # panel.grid.major.x=element_line(size=0.05, color="#B3B3B3"),
        panel.grid.minor=element_blank(), 
        panel.border=element_rect(colour="gray50",fill=NA,linewidth=1.0),
        panel.background = element_rect(colour = "gray50", linewidth=1.0),
        panel.spacing = unit(0.2, "lines"),
        text=element_text(size=14),
        strip.text=element_text(size=11,face = "bold"),
        legend.position = "none")


ggsave("images/facetedPlot.png",facetPlot, width=11, height=11/1.618, units="in", dpi=320)

Once the facetedPlot.png file is saved inside the images folder, we can display it within our document.

For the sake of interpretation, item plots are ordered anscendingly by the estimated \(difficulty\) parameter. Reading left-to-right and top-to-bottom we can see that item ➆ was the least difficult one (\(difficulty\) = -1.13), while item ➅ was the most difficult (\(difficulty\) = 1.74). We can also see that the ICC for item ➆ was the flattest, indicating that this was the least discriminant item, while the ICC for item ➃ was the steepest, which makes it a highly discriminant item.

The facet-annoated visualization is a fancy static visualization. We can also make a few minor modifications to the facetPlot object in order to generate a version that can be displayed as an interactive visualization (called facetPlotly) within the html document.

Show code
#Interactive item-faceted visualization
#wrapped in suppressWarnings because geom_label has not been implemented in plotly
facetPlot = ggplot(plotDF, aes(x = x, y = y, color = I(Color), text=tooltipText, group=1))+
  geom_line(linewidth = 0.75) +
  facet_wrap(~factor(Label, levels=unique(Label)), ncol = 5) +
  # facet_wrap(~factor(Label, levels=unique(Label)), ncol = 5, as.table=F, scales="free_x") +
  geom_vline(aes(xintercept=b),linewidth=0.15,colour="gray50") +
  geom_hline(aes(yintercept=0.5),linewidth=0.15,colour="gray50") +
  geom_point(data=pointsDF, aes(color=I(Color), shape=intToUtf8(9679)), size=2) +
  geom_label(data=pointsDF, aes(label=Item, fill=I("white")), color="black", size=3.25, label.size=NA,
             label.padding=unit(0.01, "lines"), fontface="bold") +
  expand_limits(y = c(0,1)) +
  ylab(paste0("Probability of correct answer")) +
  xlab("Ability") +
  theme_classic()+
  theme(axis.line.x=element_line(color="gray20", linewidth=1.0),
        axis.line.y=element_line(color="gray20", linewidth=0.5),
        axis.text.x=element_text(hjust = 1),
        # panel.grid.major.x=element_line(size=0.05, color="#B3B3B3"),
        panel.grid.minor=element_blank(), 
        panel.border=element_rect(colour="gray50",fill=NA,linewidth=1.0),
        panel.background = element_rect(colour = "gray50", linewidth=1.0),
        panel.spacing = unit(0.2, "lines"),
        text=element_text(size=10),
        strip.text=element_text(size=7.75,face = "bold"),
        legend.position = "none")

facetPlotly = ggplotly(facetPlot, tooltip="text") %>% config(displayModeBar=F)

card(
  align = "center",
  style = "border: 4px double #b3b3b3; background: mintcream; border-radius: 15px; padding: 2px;
  margin: 2.5%",
  card_header(style= "background: aquamarine", 
  "Interactive Visualization of Faceted Item Characteristic Curves"),
  full_screen = F, height=750,
  facetPlotly
  )
Interactive Visualization of Faceted Item Characteristic Curves

We can also generate a static visualization to display all ICC on a single plot, with a properly built annotated legend as shown below.

Show code
#Static visualization of all models fitted
fitsPlot = ggplot(plotDF, aes(x = x, y = y, color = factor(Label, levels=unique(Label)), data_id = Item)) +
  geom_line(linewidth = 1.2) +
  scale_color_manual(name="Parameter estimates",values = Colors) +
  geom_hline(aes(yintercept=0.5),linewidth=0.15,colour="gray50") +
  geom_vline(aes(xintercept=b),linewidth=0.15,colour="gray50") +
  geom_point(data=pointsDF, aes(color=I(Color), shape=intToUtf8(9679)), size=3, show.legend=F) +
  ylab(paste0("Probability of correct answer")) +
  xlab("Ability") +
  theme_classic()+
  theme(axis.line.x=element_line(color="gray20", linewidth=1.0),
        axis.line.y=element_line(color="gray20", linewidth=0.5),
        # panel.grid.major.x=element_line(size=0.05, color="#B3B3B3"),
        panel.grid.minor=element_blank(), 
        panel.border=element_rect(colour="gray50",fill=NA,linewidth=1.0),
        panel.background = element_rect(colour = "gray50", linewidth=1.0),
        panel.spacing = unit(0.2, "lines"),
        text=element_text(size=14),
        strip.text=element_text(size=12,face = "bold"),
        legend.position = "right",
        legend.key = element_rect(fill = NA, color = NA))

ggsave("images/fitsPlot.png",fitsPlot, width=11, height=11/1.618, units="in", dpi=320)

A few changes can be made to the ggplot2 code that generated the previous visualization, and then pass it to ggplotly to get the interactive version shown below.

Show code
#Interactive item-faceted visualization
#wrapped in suppressWarnings because geom_label has not been implemented in plotly
fitsPlotly = ggplot(plotDF, 
                    aes(x = x, y = y, color = factor(Label, levels=unique(Label)), data_id = Item, group=1,
                        text=tooltipText)) +
  geom_line(linewidth = 0.75) +
  geom_point(data=pointsDF, 
             aes(color = factor(Label, levels=unique(Label)), shape=intToUtf8(9679)), size=2, show.legend=F) +
  scale_color_manual(name="Parameter estimates",values = Colors) +
  geom_hline(aes(yintercept=0.5),linewidth=0.15,colour="gray50") +
  ylab(paste0("Probability of correct answer")) +
  xlab("Ability") +
  theme_classic()+
  theme(axis.line.x=element_line(color="gray20", linewidth=1.0),
        axis.line.y=element_line(color="gray20", linewidth=0.5),
        # panel.grid.major.x=element_line(size=0.05, color="#B3B3B3"),
        panel.grid.minor=element_blank(), 
        panel.border=element_rect(colour="gray50",fill=NA,linewidth=1.0),
        panel.background = element_rect(colour = "gray50", linewidth=1.0),
        panel.spacing = unit(0.2, "lines"),
        text=element_text(size=12),
        strip.text=element_text(size=12,face = "bold"),
        legend.position = "right",
        legend.key = element_rect(fill = NA, color = NA))

fitsggPlotly = ggplotly(fitsPlotly, tooltip="text", height = 420) %>%
  style(showlegend = FALSE, traces = 1:nItems) %>% 
  config(displayModeBar=F)

card(
  align = "center",
  style = "border: 4px double #b3b3b3; background: mintcream; border-radius: 20px; padding: 2px;
  margin: 2.5%",
  card_header(style= "background: aquamarine", 
  "Interactive Item Characteristic Curves"),
  full_screen = F,
  fitsggPlotly
    )
Interactive Item Characteristic Curves

A few more utilities from ltm

When we simulated the dataset of responses (i.e., baseSim) it was assumed that the probability of a correct answer was driven by a single latent trait (ability). The unidimTest() function from ltm provides a way to check whether that assumption is reasonable for the simulated data (we know, it should). P-values higher than a significance level (say 0.05) support the null hypothesis of a single latent trait. For the baseSim data the function return a p-value of 0.505, so the null hypothesis holds.

Show code
##### Some other features available in ltm and mirt
# Test for unidimensionality
# Ho: All items can be explained by a single latent ability factor
# P-values higher than 0.05 support Ho
unidimTest(ltmMod)

Unidimensionality Check using Modified Parallel Analysis

Call:
ltm(formula = baseSim ~ z1)

Matrix of tertachoric correlations
        Item1  Item2  Item3  Item4  Item5  Item6  Item7  Item8  Item9 Item10
Item1  1.0000 0.2055 0.2099 0.2815 0.2369 0.2222 0.1247 0.1865 0.1569 0.0188
Item2  0.2055 1.0000 0.4501 0.5121 0.4021 0.4334 0.2405 0.5421 0.2488 0.2798
Item3  0.2099 0.4501 1.0000 0.4737 0.5497 0.4183 0.1737 0.4163 0.4202 0.3176
Item4  0.2815 0.5121 0.4737 1.0000 0.4907 0.4247 0.1890 0.5080 0.3439 0.3643
Item5  0.2369 0.4021 0.5497 0.4907 1.0000 0.4160 0.1063 0.5321 0.3476 0.3096
Item6  0.2222 0.4334 0.4183 0.4247 0.4160 1.0000 0.2522 0.4833 0.1958 0.3032
Item7  0.1247 0.2405 0.1737 0.1890 0.1063 0.2522 1.0000 0.1985 0.1437 0.1482
Item8  0.1865 0.5421 0.4163 0.5080 0.5321 0.4833 0.1985 1.0000 0.2077 0.2771
Item9  0.1569 0.2488 0.4202 0.3439 0.3476 0.1958 0.1437 0.2077 1.0000 0.3151
Item10 0.0188 0.2798 0.3176 0.3643 0.3096 0.3032 0.1482 0.2771 0.3151 1.0000

Alternative hypothesis: the second eigenvalue of the observed data is substantially larger 
            than the second eigenvalue of data under the assumed IRT model

Second eigenvalue in the observed data: 0.576
Average of second eigenvalues in Monte Carlo samples: 0.5807
Monte Carlo samples: 100
p-value: 0.505

ltm comes with a plot() function that allows us to build different kinds of visualizations. If we pass the output from descript(baseSim) to the plot function and use custom Unicode symblos as plotting characters, then we can obtain a visualization for the proportion of correctly answered items at each level of Total Score. The following visualization shows that for all Total Score levels, items and (the “easy” ones) have higher proportions than items and (the “difficult” ones).

Show code
# Get fancy Unicode characters for plotting
unicodePC = data.frame(Item=1:10, unicodePC = map_chr(10122:10131, intToUtf8))
pointsDF = pointsDF %>% 
  left_join(unicodePC) %>% 
  arrange(Item)
                       
# Plotting observed total scores
invisible(capture.output({
png("images/plotDescript.png", width=7.75, height=7.75/1.618, units="in", res=320, type="cairo-png",
    family="Segoe UI Symbol")
plot(descript(baseSim), pc=pointsDF$unicodePC, col=pointsDF$Color, cex=1.25,
     main="Proportion of Item Correct Responses at each Total Score Level")
dev.off()
}))

By default, the ltm::plot() function produces a visualization for all ICCs. Using base R, this visualization can be enhanced with Unicode symbols and a legend to make it more elegant. This is illustrated in the following static visualization. The drawback is that, currently, there is not a direct way to convert base R plots into interactive visualizations.

Show code
# Plotting Item Characteristic Curves
invisible(capture.output({
png("images/ltmFits.png", width=7.75, height=7.75/1.618, units="in", res=320, type="cairo-png",
    family="Segoe UI Symbol")

par(mar = c(5, 4, 4, 12.5) + 0.1)
plot(ltmMod, col=pointsDF$Color, lwd=2.5, zrange=c(-6,6), labels=rep(NA,nItems))
abline(h=0.5, col="gray50", lwd=1)
points(pointsDF$x, pointsDF$y,
       pc=rep(intToUtf8(9679), nItems), col=rep("white", nItems), cex=1.25)
points(pointsDF$x, pointsDF$y, pc=pointsDF$unicodePC, col=pointsDF$Color, cex=1.25)
legend(6.9,1.1, legend=pointsDF$Label, col=pointsDF$Color, pt.cex=1.25, cex=1, bty="n", pc=pointsDF$unicodePC, xpd=T, x.intersp=1, y.intersp=1.5)
dev.off()
}))

# Reset default graphical margins
par(mar = c(5, 4, 4, 2) + 0.1)

Finally, the ltm::plot() function can be used to visualize an Item Information Curve (IIC). An IIC shows how much statistical information (precision) a test item provides about a person’s latent trait (ability) at different levels of the \(difficulty\) parameter. The higher the curve at a given \(difficulty\), the more precisely the item measures ability around that level.

The following visualization shows the performance of the two most extreme items (namely, items and ). Between these, we can see that only item is highly informative around the estimated \(difficulty\) value of ~1.74.

Show code
# Specify the item to plot (e.g., Item 1)
# Use itemNumber = c(7,9) to plot items 7 and 9
# Use itemNumber = 7:9 to plot items 7 to 9
# The following line selects items with the minimum and maximum difficulties
itemNumber = c(which(pointsDF$b==min(pointsDF$b)), which(pointsDF$b==max(pointsDF$b)))
# Set up a 1x2 plotting layout
invisible(capture.output({
png("images/itemFits.png", width=7.75, height=7.75/1.618, units="in", res=320, type="cairo-png",
    family="Segoe UI Symbol")

par(mar = c(4.25, 4.25, 2, 0.5))
par(mfrow = c(1, 2))
# Plot the Item Characteristic Curve (ICC)
plot(ltmMod, type = "ICC", items = itemNumber, zrange=c(-6,6),
     main = "Item Characteristic Curve for selected Item(s)", cex.main=0.9,
     col=pointsDF$Color[itemNumber], lwd=2, labels=rep(NA, length(itemNumber)))
abline(h=0.5, col="gray50", lwd=1)
abline(v=pointsDF$x[itemNumber], col="gray50", lwd=1)
points(pointsDF$x[itemNumber], pointsDF$y[itemNumber],
       pc=rep(intToUtf8(9679),length(itemNumber)), col=rep("white", length(itemNumber)), cex=1.25)
points(pointsDF$x[itemNumber], pointsDF$y[itemNumber],
       pc=pointsDF$unicodePC[itemNumber], col=pointsDF$Color[itemNumber], cex=1.25)
# Plot the Item Information Curve (IIC)
plot(ltmMod, type = "IIC", items = itemNumber, zrange=c(-6,6),
     main = "Item Information Curves", cex.main=0.9,
     col=pointsDF$Color[itemNumber], lwd=2, labels=rep(NA, length(itemNumber)))
abline(v=pointsDF$x[itemNumber], col="gray50", lwd=1)
dev.off()
}))

# Reset default 1x1 frame and graphical margins
par(mfrow = c(1, 1))
par(mar = c(5, 4, 4, 2) + 0.1)

Closing remarks

I invite you to keep learning about the essence of IRT and extend its application to your research area. Should you need additional assistance with ideas captured in this short publication, please do not hesitate reaching out to me.

Enjoy🛸!