Producing Percentage Correct Line Plots with Error Bands

Introduction

This is an RMarkdown document displaying R code for generating two versions of line plots displaying percentage correct for three groups over a series of items. Response data is simumlated for 1,000 respondents across 20 items. This was done by simulating the probability of a correct response for each of three subgroups (of size 850, 150, and 50). The simulated probabilities were compared against randomized values from a U[0,1] distribution, and responses were coded accordingly.

The simulation conditions are manipulated to exhibit similar trends for percentage correct for each group, but with constant differences in overall percentage correct between the three groups.

The first plot is a simple line plot with different colors representing the different groups. The second plot is similar but with the added element of color ribbons to represent standard error for each sample proportion.

This was created with the intention of inspecting similarities and differences in group performances while also considering group-specific sample sizes.

Initial Response Data Simulation and Data Frame Preparation

The first block of code accomplished the following:

Load appropriate R packages.
Simulate probabilities for a correct response for the 20 items.
Create two adjusted vectors of probabilities with lower chances of success to be used for the second and third groups.
Use these vectors to act as the mean in simulated draws from a random normal distribution to get specific probabilities for 20,000 individual responses.
Assign correct/incorrect codes based on probabilities vs. random values from a U[0,1] distribution.
Test and group variables are created and all data are combined into a data frame.

library(ggplot2)
library(dplyr)

respmat <- matrix(0, 1000, 20)

cor_vec1 <- runif(20, .4, .9)
cor_vec2 <- cor_vec1 - .1 + runif(1, -.05, .05)
cor_vec3 <- cor_vec1 - .3 + runif(1, -.05, .05)

cor_vec1 <- as.data.frame(cor_vec1)
cor_vec2 <- as.data.frame(cor_vec2)
cor_vec3 <- as.data.frame(cor_vec3)

for (j in 1:20) {
    for (i in 1:800) {
        randval <- runif(1, 0, 1)
        corrval <-  rnorm(1, cor_vec1[j, 1], .25)
        respmat[i, j] = ifelse(randval < corrval, 1, 0)
    }

    for (i in 801:950) {
        randval <- runif(1, 0, 1)
        corrval <-  rnorm(1, cor_vec2[j, 1], .25)
        respmat[i, j] = ifelse(randval < corrval, 1, 0)
    }

    for (i in 951:1000) {
        randval <- runif(1, 0, 1)
        corrval <-  rnorm(1, cor_vec3[j, 1], .25)
        respmat[i, j] = ifelse(randval < corrval, 1, 0)
    }
}

colnames(respmat) <- c("s1",  "s2", "s3", "s4", "s5", "s6", "s7", "s8", "s9", "s10",  "s11", "s12", "s13", "s14", "s15", "s16", "s17", "s18", "s19", "s20")

Group <- c(rep(1,800), rep(2, 150), rep(3, 50))
Group <- as.data.frame(Group)

Resp_set <- cbind(respmat, Group)

Creation of a Summary Set for Proportion Correct for Each Group for All Items.

The next block of code creates a summary data frame that calculates percentage correct for each item for each of three groups of interest.

First, proportion correct is calculated within group for all items and inserted into a new data frame. Then the data frame is transposed for proper orientation. Last, group names are given to the columns in this new data frame.

Prop_set_t <- Resp_set %>% group_by(Group) %>% summarize_all(list(prp = mean))
Prop_set_t <- as.data.frame(Prop_set_t)

Prop_set <- t(Prop_set_t)

Prop_set <- Prop_set[2:21,1:3]
colnames(Prop_set) <- c("Group_1", "Group_2", "Group_3")

Creation of Item List and Formation of Final Data Frame

The item variable is created and combined with the previous data frame to create a finalized data frame for creating the plots of interest.

Item <- seq(1,20,1)
Item <- as.data.frame(Item)

Full_set <- cbind(Item, Prop_set)

Creating a Simple Line Plot

A line plot with three lines representing the three groups is produced. Points are also included to represent the specific proportions correct at each item.

plot1 <- Full_set %>% 
ggplot() + ggtitle("Percent Correct (3 Groups)") + 
geom_line(aes(y = Group_1, x = Item), col = "blue", stat = "identity") +
geom_line(aes(y = Group_2, x = Item), col = "red", stat = "identity") +
geom_line(aes(y = Group_3, x = Item), col = "green3", stat = "identity") + 
geom_point(aes(y = Group_1, x = Item), col = "blue", stat = "identity") +
geom_point(aes(y = Group_2, x = Item), col = "red", stat = "identity") +
geom_point(aes(y = Group_3, x = Item), col = "green3", stat = "identity") + 
ylab("Percent Correct") + scale_x_continuous(breaks=seq(1,20,1)) + ylim(0,1) + theme(plot.title = element_text(hjust = 0.5))

plot1

Creating a Line Plot with Error Bands

A second version of the line plot is produced that puts colored ribbons around the lines to represent standard error around each of the sample proportions. Code is included that calculates this standard error, then adds/subtracts it from each proportion to produce boundaries for the ribbons.

plot2 <- Full_set %>% 
mutate(se_g1 = sqrt((Group_1*(1 - Group_1))/800)) %>%
mutate(se_g2 = sqrt((Group_2*(1 - Group_2))/150)) %>%
mutate(se_g3 = sqrt((Group_3*(1 - Group_3))/50)) %>%
mutate(low_g1 = Group_1 - se_g1, high_g1 = Group_1 + se_g1) %>%
mutate(low_g2 = Group_2 - se_g2, high_g2 = Group_2 + se_g2) %>%
mutate(low_g3 = Group_3 - se_g3, high_g3 = Group_3 + se_g3) %>%
ggplot() + ggtitle("Percent Correct (3 Groups)") + 
geom_line(aes(y = Group_1, x = Item), col = "blue", stat = "identity") +
geom_line(aes(y = Group_2, x = Item), col = "red", stat = "identity") +
geom_line(aes(y = Group_3, x = Item), col = "green3", stat = "identity") + 
geom_point(aes(y = Group_1, x = Item), col = "blue", stat = "identity") +
geom_point(aes(y = Group_2, x = Item), col = "red", stat = "identity") +
geom_point(aes(y = Group_3, x = Item), col = "green3", stat = "identity") + 
geom_ribbon(aes(ymin = low_g1, ymax = high_g1, x = Item), fill = "blue", alpha = .5) +
geom_ribbon(aes(ymin = low_g2, ymax = high_g2, x = Item), fill = "red", alpha = .5) +
geom_ribbon(aes(ymin = low_g3, ymax = high_g3, x = Item), fill = "green3", alpha = .5) +
ylab("Percent Correct") + scale_x_continuous(breaks=seq(1,20,1)) + ylim(0,1) + theme(plot.title = element_text(hjust = 0.5))

plot2