This document is to create the figures and tables needed for the resubmission of the Extended Time Paper to Frontiers in STEM Education

Libraries

library(readxl)
library(dplyr)
library(boot)
library(ggplot2)
library(cowplot)
library(colorspace)
library(knitr)

Import, Clean, and Combine Data Files

This includes demographic and exam data for Winter terms 2013-2016 and 2018. (not displayed… importing all these data files, selecting relevant variables, combining demographic and exam data)

Combine all of the cleaned exam data and demographic data from Winter 2013-2016 and 2018 into one table - \(allData\). Note, students that took the course multiple times might show up in data multiple times. But, they have a distinct row for each time they took the class.

Update - 1/15/2022. \(allData\) is updated so that it only includes students who completed the course. This applies for the old terms and WN 18. WN 18 also is limited by demographic data. Although 633 students completed the course, we only have demographic data for 596 students.

# Anonymous, all data.
allData <- left_join(clean_combined_Dem_Data, combined_RepQuest_Scores,
                     by = c("umid", "term")) %>%
  select(c(11,3,5:9,4,12:49))

Better Than Expected Scores on Repeated Questions for All Students

BTE Score

To do all of our BTE comparisons, we need a BTE score on repeated questions for every student. Better Than Expected (BTE) is the student’s grade on the topic of interest minus their cumulative grade in all of their other classes (GPAO). (\(BTE = grade - GPAO\)). We need to calculate their Repeated Questions Grade.

First, we need to find out how many of the 38 total repeated questions each student had the opportunity to answer. This is because for the old terms, only a subset of the repeated questions were included in the exams. Second, we need to calculate their score on the repeated questions they did have the opportunity to answer.

Students got either a 1 (correct) or 0 (incorrect) for every repeated question they were exposed to. If we add up their score on all the repeated questions, we get their repeated questions total correct (\(SumQScore\))

Error on BTE Score

We will also need to calculate the error on the Repeated Question BTE Score for every student. This is because BTE is a calculation based on the student’s performance on the repeated questions and their GPAO. If they were to take these exams again, how likely would it be for them to get the same scores? We answer this through bootstrap resampling, a statistical technique that makes no assumptions about the distribution.

Bootstrap Resampling

To get set up for the bootstrap resampling standard error calculation, we need to set up a list of vectors where every vector is of length 38 and holds the student’s scores on the repeated questions. \(RepQuestScores\) will be the list that holds all these vectors. These are the vectors we will use to calculate the standard error for the BTE scores.

Note that we could have used these vectors to also calculate BTE. I chose to do this in smaller steps using \(NAcount\) and \(SumQScore\) so I could check calculations more easily along the way and make sure my code was working as expected. This also becomes useful later in that we are separating the constant (\(NAcount\)) and the variable (\(SumQScore\)) instead of lumping them into one calculation (\(grade\)).

allData <- mutate(allData, NAcount = 0, SumQScore = 0)

# Create empty list where each element will be the vector for one student
RepQuestScores <- vector(mode = "list", length = nrow(allData))

for (i in 1:nrow(allData)) {
  for(j in 9:46) {
    # allData[i,j] means allData[row number, column number]
    if (is.na(allData[i,j]) == TRUE) {
      allData$NAcount[i] = allData$NAcount[i] + 1
      }
    if (is.na(allData[i,j]) == FALSE) {
      allData$SumQScore[i] = allData$SumQScore[i] + allData[[i,j]]
      }
    RepQuestScores[[i]] = append(RepQuestScores[[i]],
                                 allData[[i,j]])
  }
}

To calcuate BTE, the repeated questions grade and the GPAO need to be on the same scale. GPAO is on a 4.0 scale. To calculate repeated questions grade on a 4.0 scale, need to divide total correct by total exposed to and multiply by 4.

BTE and Error Calculation (Bootstrap)

For every student, there are a fixed number of repeated questions they were exposed to. With that constraint, we need to resample for the questions they were exposed to in order to estimate what their repeated question grade would be if they took these exams again.

Note that by error propagation rules, we can calculate the error on the repeated question grade and that will be the same as for the BTE, because we are assuming no error on the students’ GPAO.

My understanding of error propagation:
For a calculation \(A = B - C\)
\(A_{error} = \sqrt{B_{error}^2 + C_{error}^2}\)
In our case of \(BTE = grade - gpao\)
there is no error on the \(gpao\). So \(BTE_{error} = grade_{error}\)

For a calculation \(X = aK\), where \(a\) is constant, \(X_{error} = aK_{error}\)
In our case \(grade = aSumQScore\) where \(a = \frac{4}{38-NAcount}\)
\(NAcount\) is constant for each student, because they were only exposed to a specific set of the repeated questions.
So, \(BTE_{error} = SumQScore_{error}\frac{4}{38-NAcount}\)

Additional notes on error propagation (to justify to myself this is the right process):
For a calculation \(X = \frac{FG}{K}\)
\(X_{error} = X\sqrt{(\frac{F_{error}}{F})^2 + (\frac{G_{error}}{G})^2 + (\frac{K{error}}{K})^2}\)
In our case of \(grade = \frac{SumQScore(4)}{38-(NAcount)}\)
there is no error on \(4\), \(38\), or \(NAcount\). So \(grade_{error} = grade\sqrt{(\frac{SumQScore_{error}}{SumQScore})^2}\) or, \(grade_{error} = grade\frac{SumQScore_{error}}{SumQScore}\)

Thus, \(BTE_{error} = \frac{SumQScore(4)}{38-(NAcount)} * \frac{SumQScore_{error}}{SumQScore}\) which simplifies to \(BTE_{error} = SumQScore_{error}\frac{4}{38-NAcount}\), the same as above.

To calculate BTE error in code We need to set up a function that will be used for the bootstrap resampling. We are setting up this function now so we can calculate BTE and BTE standard error during the same for loop. This saves on the time it takes for the code to run if we limit the for loops. Our bootstrap resampling will be on the \(RepQuestScores\) for each student. The function we are doing over and over again is summing the scores to calculate \(SumQScore\) and the standard error on this value.

sample_sum <- function(data, indices) {
  sample = data[indices]
  bar = sum(sample)
  return(bar)
}

allData$RepQuest_BTE <- NA
allData$RepQuest_PercentScore <- NA
allData$RepQuest_SumQScore_std_error <- NA

for (k in 1:nrow(allData)) {
  allData$RepQuest_BTE[[k]] = ((allData$SumQScore[[k]]*4)/
    (38-allData$NAcount[[k]])) - allData$gpao[[k]]
  
  allData$RepQuest_PercentScore[[k]] = (allData$SumQScore[[k]]/
          (38-allData$NAcount[[k]])) * 100
  
  allData$RepQuest_SumQScore_std_error[[k]] = sd(boot(RepQuestScores[[k]][!is.na(RepQuestScores[[k]])],
                                                sample_sum,R=1000)$t)
}

Note, we also cacluate \(PercentScore\) above. This is because if later we decide to plot using overall score instead of BTE score, we will have the calculations done.
\(PercentScore = SumQScore\frac{100}{38-NAcount}\)
\(PercentScore_{error} = SumQScore{error}\frac{100}{38-NAcount}\)

allData$RepQuest_BTE_error <- allData$RepQuest_SumQScore_std_error * 
  (4/(38-allData$NAcount))
allData$RepQuest_PercentScore_error <- allData$RepQuest_SumQScore_std_error*
  (100/(38-allData$NAcount))

Note that for students who got all right or all wrong for the repeated questions they were exposed to, the BTE standard error and percent score standard error are zero. This is because with bootstrap resampling, the only options to sample from are either all right or all wrong. So, there is no standard error because the resampled versions would also be either all right or all wrong.
Is this a problem?

Now we are ready to plot average BTEs for specific student groups and the error on these averages. We will also calculate the p-value for the differences between average BTEs to show significance for skeptical reviewers (that don’t understand the error bars…)!

Data to Use for Research Questions 1 and 2

graphData <- allData %>% select(c(1:8,47,48,51,50,53,49,52))
#graphData <- allData %>% select(c(4:10,49,50,53,51,54,52,55))

# Make names for data more formal
for (a in 1:nrow(graphData)) {
   if (is.na(graphData$gender[a])) {
    graphData$gender[a] = "Gender Unknown"
   }
  if (graphData$gender[a] != "Female" && graphData$gender[a] != "Male") {
    graphData$gender[a] = "Gender Unknown"
  }
  if (is.na(graphData$urm[a])) {
    graphData$urm[a] = "Race/Ethnicity Unknown"
  }
  if (graphData$urm[a] != "a URM" && graphData$urm[a] != "b non-URM") {
    graphData$urm[a] = "Race/Ethnicity Unknown"
  }
  if (graphData$urm[a] == "a URM") {
    graphData$urm[a] = "Underrepresented/Marginalized"
  }
  if (graphData$urm[a] == "b non-URM") {
    graphData$urm[a] = "Racial Majority"
  }
  if (is.na(graphData$firstgen[a])) {
    graphData$firstgen[a] = "College Generation Unknown"
  }
  if (graphData$firstgen[a] != "FG" && graphData$firstgen[a] != "non-FG") {
    graphData$firstgen[a] = "College Generation Unknown"
  }
  if (graphData$firstgen[a] == "FG") {
    graphData$firstgen[a] = "First Generation"
  }
  if (graphData$firstgen[a] == "non-FG") {
    graphData$firstgen[a] = "Continuing Generation"
  }
}
 
# Define an order for the data so graphs are consistent
graphData$gender <- factor(graphData$gender,
                           levels = c("Female", "Male", "Gender Unknown"))
graphData$urm  <- factor(graphData$urm,
                         levels = c("Underrepresented/Marginalized",
                                    "Racial Majority",
                                    "Race/Ethnicity Unknown"))
graphData$firstgen <- factor(graphData$firstgen,
                             levels = c("First Generation",
                                        "Continuing Generation",
                                        "College Generation Unknown"))

Research Questions & Plots

RESEARCH QUESTION 1

How do performance differences between different demographic groups change when the course is restructured to alleviate time pressure on exams?

Raw Data Figures

Added as a result of reviewer comments. Creating boxplots for grades on repeated questions, by the different demographic groups, by WN 2018 vs. prior terms

Used in 1/16/2022 Resubmission - FIGURE 1

RQ1_box_gender <- ggplot(graphData, 
                         aes(term_group, RepQuest_PercentScore)) +
  geom_boxplot(aes(fill = gender, linetype = gender)) + theme_bw() +
  xlab("") + ylab("Average Performance (%)") +
  scale_x_discrete(breaks=c("old", "WN18"),
                      labels=c("Prior Terms", "Winter 2018")) +
  scale_y_continuous(breaks = c(0,20,40,60,80,100)) +
  theme(legend.position = "bottom", legend.title = element_blank(),
          legend.text = element_text(size=10)) +
    scale_fill_manual(values = c("#66ccee","#4477AA")) +
    scale_linetype_manual(values=c("solid", "twodash")) +
  annotate("text", x=.8,y=102, label = "n=809", size=3) +
  annotate("text", x=1.2,y=102, label = "n=1684", size=3) +
  annotate("text", x=1.8,y=102, label = "n=229", size=3) +
  annotate("text", x=2.2,y=102, label = "n=367", size=3) +
  facet_grid(.~"Sex")


RQ1_box_urm <- ggplot(graphData, 
                      aes(term_group, RepQuest_PercentScore)) +
  geom_boxplot(aes(fill = urm, linetype = urm)) + theme_bw() +
  xlab("Term") + ylab("") +
  scale_x_discrete(breaks=c("old", "WN18"),
                      labels=c("Prior Terms", "Winter 2018")) +
  scale_y_continuous(breaks = c(0,20,40,60,80,100)) +
  theme(legend.position = "bottom", legend.title = element_blank(),
          legend.text = element_text(size=10)) +
  scale_fill_manual(values = c("#228833","#ccbb44","#D1DF86"),
                    labels = c("Underrepresented/\nMarginalized (URM)", 
                               "Racial Majority", "Race/Ethnicity\nUnknown")) +
  scale_linetype_manual(values=c("solid", "twodash", "dotted"),
                        labels = c("Underrepresented/\nMarginalized (URM)", 
                               "Racial Majority", "Race/Ethnicity\nUnknown")) +
  annotate("text", x=.75,y=102, label = "n=321", size=3) +
  annotate("text", x=1,y=102, label = "n=2031", size=3) +
  annotate("text", x=1.25,y=102, label = "n=141", size=3) +
  annotate("text", x=1.75,y=102, label = "n=129", size=3) +
  annotate("text", x=2,y=102, label = "n=440", size=3) +
  annotate("text", x=2.25,y=102, label = "n=27", size=3) +
  facet_grid(.~"Race/Ethnicity")

RQ1_box_firstgen <- ggplot(graphData, 
                           aes(term_group, RepQuest_PercentScore)) +
  geom_boxplot(aes(fill = firstgen, linetype = firstgen)) + theme_bw() +
  xlab("") + ylab("") +
  scale_x_discrete(breaks=c("old", "WN18"),
                      labels=c("Prior Terms", "Winter 2018")) +
  scale_y_continuous(breaks = c(0,20,40,60,80,100)) +
  theme(legend.position = "bottom", legend.title = element_blank(),
          legend.text = element_text(size=10)) +
  scale_fill_manual(values = c("#EE6677","#AA3377", "#EBCAE1"),
                    labels = c("First\nGeneration",
                               "Continuing\nGeneration",
                               "College Generation\nUnknown")) +
  scale_linetype_manual(values=c("solid", "twodash", "dotted"),
                        labels = c("First\nGeneration",
                               "Continuing\nGeneration",
                               "College Generation\nUnknown")) +
  annotate("text", x=.75,y=102, label = "n=120", size=3) +
  annotate("text", x=1,y=102, label = "n=2219", size=3) +
  annotate("text", x=1.25,y=102, label = "n=154", size=3) +
  annotate("text", x=1.75,y=102, label = "n=55", size=3) +
  annotate("text", x=2,y=102, label = "n=528", size=3) +
  annotate("text", x=2.25,y=102, label = "n=13", size=3) +
  facet_grid(.~"College Generation")

RQ1_box <- plot_grid(RQ1_box_gender, RQ1_box_urm,
                     RQ1_box_firstgen, nrow = 1)
print(RQ1_box)

Notes:
In setting up this question, we specify the gaps we are interested in:
1. Gendered performance gap, comparing performance of females and males
2. Racial performance gap, comparing underrepresented/marginalized to racial majority 3. College-generation gap, comparing first-generation college students to continuing generation college students

To answer this question, we need to identify the gap for each comparison in the original (normal time) terms for the Repeated Questions and compare it to the gap in the Winter 2018 (extended time) term. We need to determine if the change in the gaps are significant and justify this with a p-value.

Preparing Data for Comparisons

gender_BTE <- graphData %>% group_by(term_group, gender) %>% 
  summarise(BTE_avg = mean(RepQuest_BTE), number_people = n(), 
            BTE_avg_error = sqrt(sum(RepQuest_BTE_error^2))/number_people, 
            .groups = "keep")

gender_BTE <- rename(gender_BTE, demographic = gender)

urm_BTE <- graphData %>% group_by(term_group, urm) %>% 
  summarise(BTE_avg = mean(RepQuest_BTE), number_people = n(), 
            BTE_avg_error = sqrt(sum(RepQuest_BTE_error^2))/number_people, 
            .groups = "keep")

urm_BTE <- rename(urm_BTE, demographic = urm)

firstgen_BTE <- graphData %>% group_by(term_group, firstgen) %>% 
  summarise(BTE_avg = mean(RepQuest_BTE), number_people = n(), 
            BTE_avg_error = sqrt(sum(RepQuest_BTE_error^2))/number_people, 
            .groups = "keep")

firstgen_BTE <- rename(firstgen_BTE, demographic = firstgen)

RQ1_data <- bind_rows(gender_BTE, urm_BTE, firstgen_BTE, .id = "source")
RQ1_data$demographic <- factor(RQ1_data$demographic,
                               levels = c("Female", "Male",
                                          "Underrepresented/Marginalized", 
                                          "Racial Majority",
                                          "Race/Ethnicity Unknown", 
                                          "First Generation",
                                          "Continuing Generation", 
                                          "College Generation Unknown"))
RQ1_data

Used in 1/16/2022 Resubmission - FIGURE 2

# Remove "Race/Ethnicity unknown" and "College generation unknown" from plot
RQ1_plot_BTE <- ggplot(filter(RQ1_data, demographic != "Race/Ethnicity Unknown",
                              demographic != "College Generation Unknown"),
                       aes(term_group, BTE_avg,
                           ymin = BTE_avg - BTE_avg_error,
                           ymax = BTE_avg + BTE_avg_error,
                           fill = demographic)) +
  scale_fill_manual(values = c("#66ccee","#4477AA",
                               "#228833","#ccbb44",
                               "#EE6677","#AA3377")) +
  scale_y_continuous(limits = c(-0.52,0.0)) +
  geom_col(position = position_dodge(.9)) +
  geom_errorbar(position = position_dodge(.9), width = .2, color = "gray20") +
  geom_point(aes(term_group, y = BTE_avg, 
                 shape = demographic), color = "gray20", 
             position = position_dodge(.9)) +
  geom_text(aes(label = paste("n=",filter(RQ1_data, demographic != "Race/Ethnicity Unknown",
                               demographic != "College Generation Unknown")$number_people)),
            vjust = 3.5, size = 3,
            position = position_dodge(.9)) +
  facet_wrap(~source, labeller = as_labeller(c(`1` = "Sex", 
                                               `2` = "Race/Ethnicity", 
                                               `3` = "College Generation"))) +
  theme_bw() + theme(legend.position = "bottom", 
                     legend.title = element_blank()) +
  xlab("Term") + ylab("Average Better Than Expected (BTE) Score") +
  labs(caption = expression(italic("BTE = (Repeated Questions grade) - (Grades in other courses (GPAO))"))) +
  scale_x_discrete(breaks=c("old", "WN18"),
                      labels=c("Prior Terms", "Winter 2018"))

print(RQ1_plot_BTE)

Are the changes in performance differences significant?

PerformanceDifferences <- data.frame(matrix(ncol = 0, nrow = 3))

PerformanceDifferences$Comparison <- c("Gender", "Race/Ethnicity", "College Generation")
PerformanceDifferences$Comparison <- factor(PerformanceDifferences$Comparison,
                               levels = c("Gender", "Race/Ethnicity", "College Generation"))

# Old Differences
PerformanceDifferences$Old_Diff <- c((filter(RQ1_data, demographic == "Female", 
                                             term_group == "old")$BTE_avg - 
                                        filter(RQ1_data, demographic == "Male", 
                                               term_group == "old")$BTE_avg),
                                     (filter(RQ1_data, 
                                             demographic == "Underrepresented/Marginalized", 
                                             term_group == "old")$BTE_avg - 
                                        filter(RQ1_data, demographic == "Racial Majority", 
                                               term_group == "old")$BTE_avg),
                                     (filter(RQ1_data, demographic == "First Generation", 
                                             term_group == "old")$BTE_avg - 
                                        filter(RQ1_data, demographic == "Continuing Generation", 
                                               term_group == "old")$BTE_avg))

PerformanceDifferences$Old_Diff_error <- c(sqrt(filter(RQ1_data, demographic == "Female", 
                                                       term_group == "old")$BTE_avg_error^2 + 
                                                  filter(RQ1_data, demographic == "Male", 
                                                         term_group == "old")$BTE_avg_error^2),
                                     sqrt(filter(RQ1_data, 
                                                 demographic == "Underrepresented/Marginalized", 
                                                 term_group == "old")$BTE_avg_error^2 + 
                                            filter(RQ1_data, demographic == "Racial Majority", 
                                                   term_group == "old")$BTE_avg_error^2),
                                     sqrt(filter(RQ1_data, demographic == "First Generation", 
                                                 term_group == "old")$BTE_avg_error^2 + 
                                            filter(RQ1_data, demographic == "Continuing Generation", 
                                                   term_group == "old")$BTE_avg_error^2))

# Winter 2018 Differences
PerformanceDifferences$WN18_Diff <- c((filter(RQ1_data, demographic == "Female", 
                                              term_group == "WN18")$BTE_avg - 
                                         filter(RQ1_data, demographic == "Male", 
                                                term_group == "WN18")$BTE_avg),
                                     (filter(RQ1_data, 
                                             demographic == "Underrepresented/Marginalized", 
                                             term_group == "WN18")$BTE_avg - 
                                        filter(RQ1_data, demographic == "Racial Majority", 
                                               term_group == "WN18")$BTE_avg),
                                     (filter(RQ1_data, demographic == "First Generation", 
                                             term_group == "WN18")$BTE_avg - 
                                        filter(RQ1_data, demographic == "Continuing Generation", 
                                               term_group == "WN18")$BTE_avg))

PerformanceDifferences$WN18_Diff_error <- c(sqrt(filter(RQ1_data, demographic == "Female", 
                                                        term_group == "WN18")$BTE_avg_error^2 + 
                                                   filter(RQ1_data, demographic == "Male", 
                                                          term_group == "WN18")$BTE_avg_error^2),
                                     sqrt(filter(RQ1_data, 
                                                 demographic == "Underrepresented/Marginalized", 
                                                 term_group == "WN18")$BTE_avg_error^2 + 
                                            filter(RQ1_data, demographic == "Racial Majority", 
                                                   term_group == "WN18")$BTE_avg_error^2),
                                     sqrt(filter(RQ1_data, demographic == "First Generation", 
                                                 term_group == "WN18")$BTE_avg_error^2 + 
                                            filter(RQ1_data, 
                                                   demographic == "Continuing Generation", 
                                                   term_group == "WN18")$BTE_avg_error^2))

Adjust PerformanceDifferences table to make it easier to graph next plot.

OldPerfDiff <- PerformanceDifferences[c(1:3)] %>% mutate(Term = "Prior Terms") %>%
  rename(Difference = Old_Diff, Error = Old_Diff_error)

WN18PerfDiff <- PerformanceDifferences[c(1,4,5)] %>% mutate(Term = "Winter 2018") %>%
  rename(Difference = WN18_Diff, Error = WN18_Diff_error)

PerfDiff_Comparison <- bind_rows(OldPerfDiff, WN18PerfDiff, .id = "source")

Used in 1/16/2022 Resubmission - FIGURE 3

RQ1_plot_BTE_Diff <- ggplot(PerfDiff_Comparison, aes(x = Comparison, y = Difference)) + 
  geom_point(aes(color = Term, shape = Term), size = 3, position = position_dodge(.1)) +
  geom_errorbar(aes(ymin = Difference - Error, ymax = Difference + Error,
                    color = Term), width = .1, position = position_dodge(.1)) +
  xlab("Demographic Category") + ylab("Better Than Expected (BTE) Performance Difference") +
  theme_bw() +
   theme(legend.position = "bottom", legend.title = element_blank()) +
  scale_color_manual(values = c("#BB5566", "#004488")) +
    scale_x_discrete(breaks=c("Gender", "Race/Ethnicity", "College Generation"),
                      labels=c("Sex", "Race/Ethnicity", "College Generation")) +
  annotate("text", x=.9,y=-0.2, label = "n=2493", size=3) +
  annotate("text", x=1.1,y=-0.27, label = "n=596", size=3) +
  annotate("text", x=1.9,y=0.05, label = "n=2352", size=3) +
  annotate("text", x=2.1,y=-0.02, label = "n=569", size=3) +
  annotate("text", x=2.9,y=-0.1, label = "n=2339", size=3) +
  annotate("text", x=3.1,y=-0.08, label = "n=583", size=3) +
  labs(caption = expression(italic("BTE = (Repeated Questions grade) - (Grades in other courses (GPAO))\n\nBTE Difference = (BTE minority group) - (BTE majority group)")))
print(RQ1_plot_BTE_Diff)

RESEARCH QUESTION 2

Does overall student performance increase when the course reduces the time pressure by providing longer time limits for exams?

Preparing Data for Comparison

RQ2_data_BTE <- graphData %>% subset(!is.na(RepQuest_BTE)) %>%
  group_by(term_group) %>%
  summarise(Overall_BTE_avg = mean(RepQuest_BTE), number_people_BTE = n(),
            Overall_BTE_avg_error = sqrt(sum(RepQuest_BTE_error^2))/number_people_BTE,
            .groups = "keep")

RQ2_data_percent <- graphData %>% group_by(term_group) %>%
  summarise(Overall_percent_avg = mean(RepQuest_PercentScore),
            number_people_percent = n(), 
            Overall_percent_avg_error = 
              sqrt(sum(RepQuest_PercentScore_error^2))/number_people_percent, .groups = "keep")

RQ2_data <- full_join(RQ2_data_BTE, RQ2_data_percent, by = "term_group")
RQ2_data

Overall BTE

RQ2_plot_BTE <- ggplot(RQ2_data, aes(term_group, Overall_BTE_avg)) + 
  geom_point(size = 3) + 
  geom_errorbar(aes(ymin = Overall_BTE_avg - Overall_BTE_avg_error, 
                    ymax = Overall_BTE_avg + Overall_BTE_avg_error), width = .1) +
  xlab("Term") + ylab("Average Better Than Expected (BTE) Score") +
  scale_x_discrete(breaks=c("old", "WN18"),
                      labels=c("Prior Terms", "Winter 2018")) +
  theme_bw()

print(RQ2_plot_BTE)

Used in 1/16/2022 Resubmission: FIGURE 4

RQ2_plot_percent <- ggplot(RQ2_data, aes(term_group, Overall_percent_avg)) + 
  geom_point(size = 3) + 
  geom_errorbar(aes(ymin = Overall_percent_avg - Overall_percent_avg_error, 
                    ymax = Overall_percent_avg + Overall_percent_avg_error), width = .1) +
  xlab("Term") + ylab("Average Performance on Repeated Questions (%)") +
  scale_x_discrete(breaks=c("old", "WN18"),
                      labels=c("Prior Terms", "Winter 2018")) +
  scale_y_continuous(limits = c(72,76)) +
  theme_bw() +
  annotate("text", x=.9,y=72.9, label = "n=2493", size=3) +
  annotate("text", x=1.9,y=75, label = "n=596", size=3)

print(RQ2_plot_percent)

RESEARCH QUESTION 3

Do students use their extended time differently?

Notes: In initial paper we grouped students into three cohorts - early, middle, and late - for each exam based on when they turned in the exams. We looked to see the distribution of student groups in the different cohorts. We found first generation college students fell in the early cohort more than other student groups, especially in the earlier exams. We also looked at GPAO and ACT Math scores and how they correlated.

Preparing Data for Comparisons

# Redownload demographic data, because accidentally removed useful columns earlier.
RQ3_data <-read_excel("~/Documents/FinalDFNow.xlsx") %>%
  select(c(5:31,44,45)) %>% rename(umid = sis_user_id) %>%
  inner_join(clean_combined_Dem_Data[c(2,3)], by = c("umid", "term")) %>%
  select(c(2:29))

#RQ3_data <- WN18_Term_Dem_Data %>% select(c(6,9:11,24,25,44,45,7,8,26,12:23,27:31)) %>% filter(BTE != "NA")

for (d in 1:nrow(RQ3_data)) {
  if (RQ3_data$satM[d] == "NA") RQ3_data$satM[d] = NA
  if (RQ3_data$actM[d] == "NA") RQ3_data$actM[d] = NA
  if (RQ3_data$BTE[d] == "NA") RQ3_data$BTE[d] = NA
  if (RQ3_data$grade[d] == "NA") RQ3_data$grade[d] = NA
}
RQ3_data$satM <- as.numeric(RQ3_data$satM)
RQ3_data$actM <- as.numeric(RQ3_data$actM)
RQ3_data$BTE <- as.numeric(RQ3_data$BTE)
RQ3_data$grade <- as.numeric(RQ3_data$grade)

Note in general that the \(number\_people\_cluster\) below may not be the exact number that the means are calculated on for SATM and ACTM because we are removing NA values when calculating the mean. So the \(number\_people\_cluster\) just tells you how many people were in that cohort. Not, what the average (\(SATM\) or \(ACTM\)) was for that whole group.

RQ3_data_c1 <- RQ3_data %>% group_by(c1) %>% summarise(Exam_BTE_avg = mean(bte1),
                                                       number_people_cluster = n(),
                                                       Time_avg = mean(t1),
                                                       Exam_Grade_avg = mean(e1),
                                                       BTE_avg = mean(BTE),
                                                       GPAO_avg = mean(gpao),
                                                       GPAO_error = sd(gpao)/sqrt(number_people_cluster),
                                                       Class_Grade_avg = mean(grade),
                                                       SAT_Math_avg = mean(satM[!is.na(satM)]),
                                                       ACT_Math_avg = mean(actM[!is.na(actM)])) %>%
  mutate(Exam = c("Exam 1")) %>% rename(time_cluster = c1)

RQ3_data_c1 <- RQ3_data_c1[c(12,1,3,4,10,11,9,7,8,6,5,2)]

RQ3_data_c2 <- RQ3_data %>% group_by(c2) %>% summarise(Exam_BTE_avg = mean(bte2),
                                                       number_people_cluster = n(),
                                                       Time_avg = mean(t2),
                                                       Exam_Grade_avg = mean(e2),
                                                       BTE_avg = mean(BTE),
                                                       GPAO_avg = mean(gpao),
                                                       GPAO_error = sd(gpao)/sqrt(number_people_cluster),
                                                       Class_Grade_avg = mean(grade),
                                                       SAT_Math_avg = mean(satM[!is.na(satM)]),
                                                       ACT_Math_avg = mean(actM[!is.na(actM)])) %>%
  mutate(Exam = c("Exam 2")) %>% rename(time_cluster = c2)

RQ3_data_c2 <- RQ3_data_c2[c(12,1,3,4,10,11,9,7,8,6,5,2)]

RQ3_data_c3 <- RQ3_data %>% group_by(c3) %>% summarise(Exam_BTE_avg = mean(bte3),
                                                       number_people_cluster = n(),
                                                       Time_avg = mean(t3),
                                                       Exam_Grade_avg = mean(e3),
                                                       BTE_avg = mean(BTE),
                                                       GPAO_avg = mean(gpao),
                                                       GPAO_error = sd(gpao)/sqrt(number_people_cluster),
                                                       Class_Grade_avg = mean(grade),
                                                       SAT_Math_avg = mean(satM[!is.na(satM)]),
                                                       ACT_Math_avg = mean(actM[!is.na(actM)])) %>%
  mutate(Exam = c("Exam 3")) %>% rename(time_cluster = c3)

RQ3_data_c3 <- RQ3_data_c3[c(12,1,3,4,10,11,9,7,8,6,5,2)]

RQ3_data_c4 <- RQ3_data %>% group_by(c4) %>% summarise(Exam_BTE_avg = mean(bte4),
                                                       number_people_cluster = n(),
                                                       Time_avg = mean(t4),
                                                       Exam_Grade_avg = mean(e4),
                                                       BTE_avg = mean(BTE),
                                                       GPAO_avg = mean(gpao),
                                                       GPAO_error = sd(gpao)/sqrt(number_people_cluster),
                                                       Class_Grade_avg = mean(grade),
                                                       SAT_Math_avg = mean(satM[!is.na(satM)]),
                                                       ACT_Math_avg = mean(actM[!is.na(actM)])) %>%
  mutate(Exam = c("Exam 4")) %>% rename(time_cluster = c4)

RQ3_data_c4 <- RQ3_data_c4[c(12,1,3,4,10,11,9,7,8,6,5,2)]

Note that all of the averages calculated in the RQ3 data clusters tables also have errors associated with them. I didn’t calculate all of these yet because I wanted to see what direction we were planning to go with the plots before spending the time on calculating errors. I included the number of people in the clusters to give a sense of size of errors.

RQ3_data_clusters <- bind_rows(RQ3_data_c1, RQ3_data_c2, RQ3_data_c3, RQ3_data_c4, .id = "source")
RQ3_data_clusters <- mutate(RQ3_data_clusters, standard_time = 0)


for (o in (1:nrow(RQ3_data_clusters))) {
  if(RQ3_data_clusters[[o,1]] == "1" || RQ3_data_clusters[[o,1]] == "2" ||
     RQ3_data_clusters[[o,1]] == "3") {
    RQ3_data_clusters$standard_time[o] = 90
  }
  if(RQ3_data_clusters[[o,1]] == "4") {
    RQ3_data_clusters$standard_time[o] = 120
  }
}

RQ3_data_clusters

Research Question 3 Plot

RQ3_plot_ACTM <- ggplot(RQ3_data_clusters, 
                          aes(time_cluster, ACT_Math_avg, color = Exam)) + 
  geom_point(position = position_dodge(.2), size = 3) + scale_color_brewer(palette = "RdYlBu") +
  ggtitle("ACT Math Scores by Cluster") +
  theme(legend.position = "bottom", legend.title = element_blank()) + 
  xlab("Time Cluster") + ylab("Average ACT Math") +
  scale_x_continuous(breaks=c(1, 2, 3),
                      labels=c("Early", "Middle", "Late"))

print(RQ3_plot_ACTM)

Used in 1/16/2022 Resubmission: Figure 5

RQ3_plot_GPAO_error <- ggplot(RQ3_data_clusters, 
                          aes(time_cluster, GPAO_avg, color = Exam, shape = Exam)) + 
  geom_point(position = position_dodge(.2), size = 3) + 
  scale_color_manual(values = c("#4477AA","#228833","#AA3377","#000000")) +
  geom_errorbar(aes(time_cluster, ymin = GPAO_avg - GPAO_error, ymax = GPAO_avg + GPAO_error),
                position = position_dodge(.2), width = .2) +
  geom_text(aes(label = paste("n=",number_people_cluster)),
            vjust = 8, size = 3,
            position = position_dodge(.6)) +
  theme_bw() +
  theme(legend.position = "bottom", legend.title = element_blank()) + 
  xlab("Time Cohort") + ylab("Average GPAO") +
  scale_x_continuous(breaks=c(1, 2, 3),
                      labels=c("Early", "Middle", "Late")) +
  labs(caption = expression(italic("GPAO: cumulative grade point average from all other classes")))

print(RQ3_plot_GPAO_error)

Preparing Data for Demographic Break Down by Cluster

RQ3_data_gender <- RQ3_data %>% group_by(gender,cmean) %>% 
  summarise(number_people_cluster = n(), BTE_avg = mean(BTE[!is.na(BTE)]), .groups = "keep") %>%
  rename(demographic = gender)

RQ3_data_urm <- RQ3_data %>% group_by(urm,cmean) %>% 
  summarise(number_people_cluster = n(), BTE_avg = mean(BTE[!is.na(BTE)]), .groups = "keep") %>%
  rename(demographic = urm)

for (e in 1:nrow(RQ3_data_urm)) {
  if (RQ3_data_urm$demographic[e] != "a URM" && RQ3_data_urm$demographic[e] != "b non-URM") {
    RQ3_data_urm$demographic[e] = "Race/Ethnicity Unknown"
  }
  if (RQ3_data_urm$demographic[e] == "a URM") {
    RQ3_data_urm$demographic[e] = "Underrepresented/Marginalized"
  }
  if (RQ3_data_urm$demographic[e] == "b non-URM") {
    RQ3_data_urm$demographic[e] = "Racial Majority"
  }
}

RQ3_data_firstgen <- RQ3_data %>% group_by(firstgen,cmean) %>% 
  summarise(number_people_cluster = n(), BTE_avg = mean(BTE[!is.na(BTE)]), .groups = "keep") %>%
  rename(demographic = firstgen)

for (f in 1:nrow(RQ3_data_firstgen)) {
  if (RQ3_data_firstgen$demographic[f] != "FG" && RQ3_data_firstgen$demographic[f] != "non-FG") {
    RQ3_data_firstgen$demographic[f] = "College Generation Unknown"
  }
  if (RQ3_data_firstgen$demographic[f] == "FG") {
    RQ3_data_firstgen$demographic[f] = "First Generation"
  }
  if (RQ3_data_firstgen$demographic[f] == "non-FG") {
    RQ3_data_firstgen$demographic[f] = "Continuing Generation"
  }
}

RQ3_data_demographics <- bind_rows(RQ3_data_gender, RQ3_data_urm, RQ3_data_firstgen, .id = "source") %>%
  mutate(total_demographic = NA)

for (g in 1:nrow(RQ3_data_demographics)) {
  if(RQ3_data_demographics$demographic[g] == "Female") {
   RQ3_data_demographics$total_demographic[g] = nrow(RQ3_data %>% filter(gender == "Female"))
  }
  if(RQ3_data_demographics$demographic[g] == "Male") {
   RQ3_data_demographics$total_demographic[g] = nrow(RQ3_data %>% filter(gender == "Male"))
  }
  if(RQ3_data_demographics$demographic[g] == "Underrepresented/Marginalized") {
   RQ3_data_demographics$total_demographic[g] = nrow(RQ3_data %>% filter(urm == "a URM"))
  }
  if(RQ3_data_demographics$demographic[g] == "Racial Majority") {
   RQ3_data_demographics$total_demographic[g] = nrow(RQ3_data %>% filter(urm == "b non-URM"))
  }
  if(RQ3_data_demographics$demographic[g] == "Race/Ethnicity Unknown") {
   RQ3_data_demographics$total_demographic[g] = nrow(RQ3_data %>% filter(urm == "NA"))
  }
  if(RQ3_data_demographics$demographic[g] == "First Generation") {
   RQ3_data_demographics$total_demographic[g] = nrow(RQ3_data %>% filter(firstgen == "FG"))
  }
  if(RQ3_data_demographics$demographic[g] == "College Generation Unknown") {
   RQ3_data_demographics$total_demographic[g] = nrow(RQ3_data %>% filter(firstgen == "NA"))
  }
  if(RQ3_data_demographics$demographic[g] == "Continuing Generation") {
   RQ3_data_demographics$total_demographic[g] = nrow(RQ3_data %>% filter(firstgen == "non-FG"))
  }
}

RQ3_data_demographics$demographic <- factor(RQ3_data_demographics$demographic,
                               levels = c("Female", "Male",
                                          "Underrepresented/Marginalized", "Racial Majority",
                                          "Race/Ethnicity Unknown", "First Generation",
                                          "Continuing Generation", "College Generation Unknown"))
RQ3_data_demographics

Group students into early, middle, and late time cohorts (clusters).

Preparing Data for Cluster Averages

RQ3_data_cluster_avg <- RQ3_data %>% mutate(Cluster_avg = NA)

for (h in (1:nrow(RQ3_data_cluster_avg))) {
  if (RQ3_data_cluster_avg$cmean[h] == 1.00 || 
      RQ3_data_cluster_avg$cmean[h] == 1.25 || 
      RQ3_data_cluster_avg$cmean[h] == 1.50) {
    RQ3_data_cluster_avg$Cluster_avg[h] = "Early"
    }
  if (RQ3_data_cluster_avg$cmean[h] == 1.75 || 
      RQ3_data_cluster_avg$cmean[h] == 2.00 || 
      RQ3_data_cluster_avg$cmean[h] == 2.25) {
    RQ3_data_cluster_avg$Cluster_avg[h] = "Middle"
    }
  if (RQ3_data_cluster_avg$cmean[h] == 2.50 || 
      RQ3_data_cluster_avg$cmean[h] == 2.75 || 
      RQ3_data_cluster_avg$cmean[h] == 3.00) {
    RQ3_data_cluster_avg$Cluster_avg[h] = "Late"
      }
}

RQ3_data_cluster_gender <- RQ3_data_cluster_avg %>% 
  group_by(Cluster_avg, gender) %>% 
  summarise(number_people_cluster = n(), 
            BTE_avg = mean(BTE),
            BTE_error = sd(BTE)/sqrt(number_people_cluster), .groups = "keep") %>%
  rename(demographic = gender)

RQ3_data_cluster_urm <- RQ3_data_cluster_avg %>% 
  group_by(Cluster_avg, urm) %>% 
  summarise(number_people_cluster = n(), 
            BTE_avg = mean(BTE),
            BTE_error = sd(BTE)/sqrt(number_people_cluster), .groups = "keep") %>%
  rename(demographic = urm)

for (l in 1:nrow(RQ3_data_cluster_urm)) {
  if (RQ3_data_cluster_urm$demographic[l] != "a URM" && 
      RQ3_data_cluster_urm$demographic[l] != "b non-URM") {
    RQ3_data_cluster_urm$demographic[l] = "Race/Ethnicity Unknown"
  }
  if (RQ3_data_cluster_urm$demographic[l] == "a URM") {
    RQ3_data_cluster_urm$demographic[l] = "Underrepresented/Marginalized"
  }
  if (RQ3_data_cluster_urm$demographic[l] == "b non-URM") {
    RQ3_data_cluster_urm$demographic[l] = "Racial Majority"
  }
}

RQ3_data_cluster_firstgen <- RQ3_data_cluster_avg %>% 
  group_by(Cluster_avg, firstgen) %>% 
  summarise(number_people_cluster = n(), 
            BTE_avg = mean(BTE),
            BTE_error = sd(BTE)/sqrt(number_people_cluster), .groups = "keep") %>%
  rename(demographic = firstgen)

for (m in 1:nrow(RQ3_data_cluster_firstgen)) {
  if (RQ3_data_cluster_firstgen$demographic[m] != "FG" && 
      RQ3_data_cluster_firstgen$demographic[m] != "non-FG") {
    RQ3_data_cluster_firstgen$demographic[m] = "College Generation Unknown"
  }
  if (RQ3_data_cluster_firstgen$demographic[m] == "FG") {
    RQ3_data_cluster_firstgen$demographic[m] = "First Generation"
  }
  if (RQ3_data_cluster_firstgen$demographic[m] == "non-FG") {
    RQ3_data_cluster_firstgen$demographic[m] = "Continuing Generation"
  }
}

RQ3_data_cluster_demographics <- bind_rows(RQ3_data_cluster_gender,
                                           RQ3_data_cluster_urm,
                                           RQ3_data_cluster_firstgen,
                                           .id = "source") %>%
  mutate(total_demographic = NA)

for (n in 1:nrow(RQ3_data_cluster_demographics)) {
  if(RQ3_data_cluster_demographics$demographic[n] == "Female") {
   RQ3_data_cluster_demographics$total_demographic[n] = nrow(RQ3_data %>% filter(gender == "Female"))
  }
  if(RQ3_data_cluster_demographics$demographic[n] == "Male") {
   RQ3_data_cluster_demographics$total_demographic[n] = nrow(RQ3_data %>% filter(gender == "Male"))
  }
  if(RQ3_data_cluster_demographics$demographic[n] == "Underrepresented/Marginalized") {
   RQ3_data_cluster_demographics$total_demographic[n] = nrow(RQ3_data %>% filter(urm == "a URM"))
  }
  if(RQ3_data_cluster_demographics$demographic[n] == "Racial Majority") {
   RQ3_data_cluster_demographics$total_demographic[n] = nrow(RQ3_data %>% filter(urm == "b non-URM"))
  }
  if(RQ3_data_cluster_demographics$demographic[n] == "Race/Ethnicity Unknown") {
   RQ3_data_cluster_demographics$total_demographic[n] = nrow(RQ3_data %>% filter(urm == "NA"))
  }
  if(RQ3_data_cluster_demographics$demographic[n] == "First Generation") {
   RQ3_data_cluster_demographics$total_demographic[n] = nrow(RQ3_data %>% filter(firstgen == "FG"))
  }
  if(RQ3_data_cluster_demographics$demographic[n] == "College Generation Unknown") {
   RQ3_data_cluster_demographics$total_demographic[n] = nrow(RQ3_data %>% filter(firstgen == "NA"))
  }
  if(RQ3_data_cluster_demographics$demographic[n] == "Continuing Generation") {
   RQ3_data_cluster_demographics$total_demographic[n] = nrow(RQ3_data %>% filter(firstgen == "non-FG"))
  }
}

RQ3_data_cluster_demographics$demographic <- factor(RQ3_data_cluster_demographics$demographic,
                                                    levels = c("Female", "Male",
                                                               "Underrepresented/Marginalized", "Racial Majority",
                                                               "Race/Ethnicity Unknown", "First Generation",
                                                               "Continuing Generation", 
                                                               "College Generation Unknown"))
RQ3_data_cluster_demographics$Cluster_avg <- factor(RQ3_data_cluster_demographics$Cluster_avg,
                                                    levels= c("Early", "Middle", "Late"))
RQ3_data_cluster_demographics

Used in 1/16/2022 Resubmission: FIGURE 6

# Removed "Race/Ethnicity unknown" and "College generation unknown" from plot
RQ3_plot_cluster_demographics_prop <- ggplot(filter(RQ3_data_cluster_demographics, 
                                                    demographic == "Female" || 
                                                      demographic == "Male" || 
                                                      demographic == "Underrepresented/Marginalized" || 
                                                      demographic == "Racial Majority" || 
                                                      demographic == "First Generation" || 
                                                      demographic == "Continuing Generation"), 
                                    aes(Cluster_avg, (number_people_cluster/total_demographic), 
                                        fill = demographic)) + 
  geom_col(position = "dodge") + 
  geom_point(aes(Cluster_avg, y = (number_people_cluster/total_demographic), 
                 shape = demographic), color = "gray20", 
             position = position_dodge(.9)) +
  geom_text(aes(label = paste("n=",filter(RQ3_data_cluster_demographics,
                                          demographic != "Race/Ethnicity Unknown",
                                          demographic != "College Generation Unknown")$number_people_cluster)), vjust = -1.8, size = 3,
            position = position_dodge(.9)) +
  scale_fill_manual(values = c("#66ccee","#4477AA","#228833","#ccbb44","#EE6677","#AA3377")) +
  facet_wrap(~source, labeller = 
               as_labeller(c(`1` = "Sex", `2` = "Race/Ethnicity", `3` = "College Generation"))) +
  xlab("Average Time Cohort Across Exams") + 
  ylab("Proportion of Demographic Group") + 
  theme_bw() +
  theme(legend.position = "bottom", legend.title = element_blank())

print(RQ3_plot_cluster_demographics_prop)

Used in 1/16/2022 Resubmission: FIGURE 7

# Removed "Race/Ethnicity unknown" and "College generation unknown" from plot
RQ3_plot_cluster_demographic_BTE <- ggplot(filter(RQ3_data_cluster_demographics, 
                                                    demographic == "Female" || 
                                                      demographic == "Male" || 
                                                      demographic == "Underrepresented/Marginalized" || 
                                                      demographic == "Racial Majority" || 
                                                      demographic == "First Generation" || 
                                                      demographic == "Continuing Generation"), 
                                    aes(Cluster_avg, BTE_avg, 
                                        fill = demographic)) + 
  geom_col(position = "dodge") + 
  scale_fill_manual(values = c("#66ccee","#4477AA","#228833","#ccbb44","#EE6677","#AA3377")) +
  geom_errorbar(aes(Cluster_avg, ymin = BTE_avg - BTE_error, ymax = BTE_avg + BTE_error),
                position = position_dodge(.9),width = .2, color = "gray20") +
  geom_point(aes(Cluster_avg, y = BTE_avg, shape = demographic), color = "gray20", 
             position = position_dodge(.9)) +
  geom_text(aes(label = paste("n=",filter(RQ3_data_cluster_demographics,
                                          demographic != "Race/Ethnicity Unknown",
                                          demographic != "College Generation Unknown")$number_people_cluster)), vjust = 10, size = 3,
            position = position_dodge(.9)) +
  facet_wrap(~source, labeller = 
               as_labeller(c(`1` = "Sex", `2` = "Race/Ethnicity", `3` = "College Generation"))) +
  xlab("Average Time Cohort Across Exams") +
  ylab("Better Than Expected (BTE) Score Using Course Grade") +
  theme_bw() +
  theme(legend.position = "bottom", legend.title = element_blank()) +
  labs(caption = expression(italic("BTE = (Course grade) - (Grades in other courses (GPAO))")))

print(RQ3_plot_cluster_demographic_BTE)

2022_01_16 Time Paper Revisions

Libraries

Import, Clean, and Combine Data Files

Better Than Expected Scores on Repeated Questions for All Students

BTE Score

Error on BTE Score

Bootstrap Resampling

BTE and Error Calculation (Bootstrap)

Data to Use for Research Questions 1 and 2

Research Questions & Plots

RESEARCH QUESTION 1

Raw Data Figures

Used in 1/16/2022 Resubmission - FIGURE 1

Preparing Data for Comparisons

Used in 1/16/2022 Resubmission - FIGURE 2

Are the changes in performance differences significant?

Used in 1/16/2022 Resubmission - FIGURE 3

RESEARCH QUESTION 2

Preparing Data for Comparison

Overall BTE

Used in 1/16/2022 Resubmission: FIGURE 4

RESEARCH QUESTION 3

Preparing Data for Comparisons

Research Question 3 Plot

Used in 1/16/2022 Resubmission: Figure 5

Preparing Data for Demographic Break Down by Cluster

Preparing Data for Cluster Averages

Used in 1/16/2022 Resubmission: FIGURE 6

Used in 1/16/2022 Resubmission: FIGURE 7