This is an RMarkdown document displaying R code for generating two versions of line plots displaying percentage correct for three groups over a series of items. Response data is simumlated for 1,000 respondents across 20 items. This was done by simulating the probability of a correct response for each of three subgroups (of size 850, 150, and 50). The simulated probabilities were compared against randomized values from a U[0,1] distribution, and responses were coded accordingly.
The simulation conditions are manipulated to exhibit similar trends for percentage correct for each group, but with constant differences in overall percentage correct between the three groups.
The first plot is a simple line plot with different colors representing the different groups. The second plot is similar but with the added element of color ribbons to represent standard error for each sample proportion.
This was created with the intention of inspecting similarities and differences in group performances while also considering group-specific sample sizes.
The first block of code accomplished the following:
library(ggplot2)
library(dplyr)
respmat <- matrix(0, 1000, 20)
cor_vec1 <- runif(20, .4, .9)
cor_vec2 <- cor_vec1 - .1 + runif(1, -.05, .05)
cor_vec3 <- cor_vec1 - .3 + runif(1, -.05, .05)
cor_vec1 <- as.data.frame(cor_vec1)
cor_vec2 <- as.data.frame(cor_vec2)
cor_vec3 <- as.data.frame(cor_vec3)
for (j in 1:20) {
for (i in 1:800) {
randval <- runif(1, 0, 1)
corrval <- rnorm(1, cor_vec1[j, 1], .25)
respmat[i, j] = ifelse(randval < corrval, 1, 0)
}
for (i in 801:950) {
randval <- runif(1, 0, 1)
corrval <- rnorm(1, cor_vec2[j, 1], .25)
respmat[i, j] = ifelse(randval < corrval, 1, 0)
}
for (i in 951:1000) {
randval <- runif(1, 0, 1)
corrval <- rnorm(1, cor_vec3[j, 1], .25)
respmat[i, j] = ifelse(randval < corrval, 1, 0)
}
}
colnames(respmat) <- c("s1", "s2", "s3", "s4", "s5", "s6", "s7", "s8", "s9", "s10", "s11", "s12", "s13", "s14", "s15", "s16", "s17", "s18", "s19", "s20")
Group <- c(rep(1,800), rep(2, 150), rep(3, 50))
Group <- as.data.frame(Group)
Resp_set <- cbind(respmat, Group)
The next block of code creates a summary data frame that calculates percentage correct for each item for each of three groups of interest.
First, proportion correct is calculated within group for all items and inserted into a new data frame. Then the data frame is transposed for proper orientation. Last, group names are given to the columns in this new data frame.
Prop_set_t <- Resp_set %>% group_by(Group) %>% summarize_all(list(prp = mean))
Prop_set_t <- as.data.frame(Prop_set_t)
Prop_set <- t(Prop_set_t)
Prop_set <- Prop_set[2:21,1:3]
colnames(Prop_set) <- c("Group_1", "Group_2", "Group_3")
The item variable is created and combined with the previous data frame to create a finalized data frame for creating the plots of interest.
Item <- seq(1,20,1)
Item <- as.data.frame(Item)
Full_set <- cbind(Item, Prop_set)
A line plot with three lines representing the three groups is produced. Points are also included to represent the specific proportions correct at each item.
plot1 <- Full_set %>%
ggplot() + ggtitle("Percent Correct (3 Groups)") +
geom_line(aes(y = Group_1, x = Item), col = "blue", stat = "identity") +
geom_line(aes(y = Group_2, x = Item), col = "red", stat = "identity") +
geom_line(aes(y = Group_3, x = Item), col = "green3", stat = "identity") +
geom_point(aes(y = Group_1, x = Item), col = "blue", stat = "identity") +
geom_point(aes(y = Group_2, x = Item), col = "red", stat = "identity") +
geom_point(aes(y = Group_3, x = Item), col = "green3", stat = "identity") +
ylab("Percent Correct") + scale_x_continuous(breaks=seq(1,20,1)) + ylim(0,1) + theme(plot.title = element_text(hjust = 0.5))
plot1
A second version of the line plot is produced that puts colored ribbons around the lines to represent standard error around each of the sample proportions. Code is included that calculates this standard error, then adds/subtracts it from each proportion to produce boundaries for the ribbons.
plot2 <- Full_set %>%
mutate(se_g1 = sqrt((Group_1*(1 - Group_1))/800)) %>%
mutate(se_g2 = sqrt((Group_2*(1 - Group_2))/150)) %>%
mutate(se_g3 = sqrt((Group_3*(1 - Group_3))/50)) %>%
mutate(low_g1 = Group_1 - se_g1, high_g1 = Group_1 + se_g1) %>%
mutate(low_g2 = Group_2 - se_g2, high_g2 = Group_2 + se_g2) %>%
mutate(low_g3 = Group_3 - se_g3, high_g3 = Group_3 + se_g3) %>%
ggplot() + ggtitle("Percent Correct (3 Groups)") +
geom_line(aes(y = Group_1, x = Item), col = "blue", stat = "identity") +
geom_line(aes(y = Group_2, x = Item), col = "red", stat = "identity") +
geom_line(aes(y = Group_3, x = Item), col = "green3", stat = "identity") +
geom_point(aes(y = Group_1, x = Item), col = "blue", stat = "identity") +
geom_point(aes(y = Group_2, x = Item), col = "red", stat = "identity") +
geom_point(aes(y = Group_3, x = Item), col = "green3", stat = "identity") +
geom_ribbon(aes(ymin = low_g1, ymax = high_g1, x = Item), fill = "blue", alpha = .5) +
geom_ribbon(aes(ymin = low_g2, ymax = high_g2, x = Item), fill = "red", alpha = .5) +
geom_ribbon(aes(ymin = low_g3, ymax = high_g3, x = Item), fill = "green3", alpha = .5) +
ylab("Percent Correct") + scale_x_continuous(breaks=seq(1,20,1)) + ylim(0,1) + theme(plot.title = element_text(hjust = 0.5))
plot2