Visualizing Percentage Correct with Heat Maps and Lollipop Plots

Introduction

This is an RMarkdown document displaying R code for generating two versions of percentages of correct responses to items on an assessment. Additionally, code is provided for simulating percentage correct for each item, assuming it is drawn from a normal distribution. Then, percentage choosing two alternative response options are simulated, this time using a truncated normal distribution.

One plot is a heat map of response options that are color-coated to denote correct and incorrect responses. The intensity of the color will reflect the percentage that endorsed a response option.

The second plot is a simple visual of percentage correct on an axis in the form of a lollipop plot. The colors of the points will denote whether most respondents got the item correct or incorrect. A line segment is displayed from 50% correct to the percentage correct for each item.

Initial Simulation of Percentage Endorsing Response Options

The first block of code accomplished the following:

Load appropriate R packages.
Simulate percentages for response options A/B/C for 30 items for 3 different tests.
Assign percentages based on a randomized answer key and simulated “percentage correct” values.

library(ggplot2)
library(dplyr)
library(MCMCglmm)

keymat <- matrix(0, 90, 3)
keyind <- matrix(0, 90, 3)
keyvec <- sample(1:3, 90, replace=T)
for (i in 1:90) {
    if (keyvec[i] == 1) {
        keymat[i,1] = rtnorm(1, .65, .2, lower = 0, upper = 1)
        maxleft = 1 - keymat[i, 1]
        halfleft = (1 - keymat[i, 1])/2
        sdtemp = halfleft/3
        keymat[i,2] = rtnorm(1, halfleft, sdtemp, lower = 0, upper = maxleft)
        keymat[i,3] = maxleft - keymat[i,2]
        keyind[i,1] = 1
    }
    if (keyvec[i] == 2) {
        keymat[i,2] = rtnorm(1, .65, .2, lower = 0, upper = 1)
        maxleft = 1 - keymat[i, 2]
        halfleft = (1 - keymat[i, 2])/2
        sdtemp = halfleft/3
        keymat[i,1] = rtnorm(1, halfleft, sdtemp, lower = 0, upper = maxleft)
        keymat[i,3] = maxleft - keymat[i,1]
        keyind[i,2] = 1
    }
    if (keyvec[i] == 3) {
        keymat[i,3] = rtnorm(1, .65, .2, lower = 0, upper = 1)
        maxleft = 1 - keymat[i, 3]
        halfleft = (1 - keymat[i, 3])/2
        sdtemp = halfleft/3
        keymat[i, 1] = rtnorm(1, halfleft, sdtemp, lower = 0, upper = maxleft)
        keymat[i, 2] = maxleft - keymat[i, 1]
        keyind[i, 3] = 1
    }
}

Creation of a Master Data Frame

The next blocks of code creates Test, Item, and Response variables. Also, components are combined to create a single data frame containing these new variables, along with response data and answer key indicators.

temp_a <- cbind(keymat[,1],keyind[,1])
temp_b <- cbind(keymat[,2],keyind[,2])
temp_c <- cbind(keymat[,3],keyind[,3])

perc_mat <- rbind(temp_a,temp_b,temp_c)
perc_mat <- as.data.frame(perc_mat)

Test <- rep(c(rep("Test 1",30), rep("Test 2",30), rep("Test 3",30)),3)
Test <- as.data.frame(Test)

Item <- rep(seq(1,30),9)
Item <- as.data.frame(Item)

Response <- c(rep("A",90), rep("B",90), rep("C",90))
Response <- as.data.frame(Response)

part_mat <- cbind(Test,Item,Response)
resp_mat <- cbind(part_mat, perc_mat)
colnames(resp_mat)[4] <- "Percentage"
colnames(resp_mat)[5] <- "Key"

Format Decimal Places and Coerce Variables into Appropriate Classes

The code below formats percentages into one decimal place for better visualization. Also, percentages and item numbers are converted into numeric and factor classes.

resp_mat <- resp_mat %>% mutate(Percentage = Percentage*100)
resp_mat$Percentage <- format(round(resp_mat$Percentage, 1), nsmall = 1)

resp_mat$Percentage <- as.numeric(resp_mat$Percentage)
resp_mat$Item <- as.factor(resp_mat$Item)

Create Heat Maps

Heat maps are then constructed. Green tiles denote correct response options, and red denotes incorrect. The higher the percentage endorsing a response option, the more intense the color. Percentages are shown on the tiles. All sets of items and all tests are displayed in a single plot.

heat_plot <- resp_mat %>% ggplot(aes(Response,Item)) + geom_tile(aes(fill=Key,alpha=Percentage)) + geom_text(aes(label=Percentage),size=4) + scale_fill_gradient(low="red", high="forestgreen") + facet_wrap(~Test, nrow=1)

heat_plot

Creating an Indicator for Most Correct vs. Most Incorrect

Prior to creating lollipop plots to visualize items by whether most got an item correct or incorrect, an appropriate indicator variable is created.

corr_mat <- resp_mat %>% filter(Key == 1)
corr_mat <- corr_mat %>%  arrange(Test,Item)
corr_mat <- corr_mat %>% mutate(Difficulty = ifelse(Percentage < 50, "Most Incorrect", "Most Correct"))
corr_mat$Difficulty <- as.factor(corr_mat$Difficulty)

Create Lollipop Plots

Heat maps are then constructed. Green points denote most of the respondents got the item correct, and red denotes most respondents got the item wrong. Percentages are plotted on a horizontal axis along with a line segment extendign from 50% to the point for each item. The idea is to more easily identify the most difficult/easiest items and get a sense of general difficulty across tests.

lolli_plot <- corr_mat %>% ggplot(aes(x = Item, y = Percentage, label = Percentage)) + 
geom_segment(aes(x = Item, xend = Item, y = 50, yend = Percentage)) +
geom_point(stat='identity',aes(col = Difficulty), size=7) + geom_point(shape = 1, size = 7, color = "black") + scale_color_manual(name = "Difficulty", labels=c("Most Correct", "Most Incorrect"), values=c("Most Correct" = "darkseagreen1", "Most Incorrect" = "salmon")) + geom_text(color = "black", size = 2) + ylim(0, 100) + coord_flip() + facet_wrap(~Test, nrow=1)

lolli_plot