Difference in Grades by Day of Presentation

Introduction

During the 2024 cohort admissions process for the Master’s program in Infectious Diseases and Tropical Medicine at the Federal University of Minas Gerais (UFMG, Brazil), candidates whose projects were approved in the first phase were scheduled for the oral presentations into two groups. One group presented on December 4th, while the other presented on December 6th. This study aims to analyze the grade distributions of these student groups, explore their discrepancies, and formulate hypotheses to explain these differences.

Methods

Data was extracted directly from the webpage of the admission process for the 2024 cohort (in portuguese). The presentation schedules can be acessed directly here (12/4) and here (12/6) and loaded into R using the following commands.

## ID of the students that presented, M06 wasn't included as the student gave up
## on the admissions process before presenting and didn't end presenting
ID <- c("IMT-M04", "IMT-M05", "IMT-M07", "IMT-M08", "IMT-M09","IMT-M10",
        "IMT-M11", "IMT-M12", "IMT-M13", "IMT-M14","IMT-M15","IMT-M16", 
        "IMT-M17", "IMT-M19", "IMT-M20", "IMT-M21", "IMT-M22",
        "IMT-M23", "IMT-M24", "IMT-M25", "IMT-M27")

## Respective grades
grade <- c(81.00, 76.50, 74.50, 79.00, 76.00, 62.50, 76.00, 71.00, 65.00, 73.00,
           90.80, 97.35, 91.09, 90.69, 78.14, 88.45, 85.50, 97.65, 82.10, 71.50,
           92.85)

## Grades for the first phase
gradeFirstPhase <- c(78.01, 78.06, 73.04, 83.65, 73.88, 75.27, 80.16, 73.02,
                     75.11, 77.54, 76.60, 77.00, 80.16, 82.87, 74.54, 74.87,
                     79.80, 82.28, 80.55, 80.14, 80.52)

## factor variable describing which students presented on which day, the first
## ten students presented on the first day and the latter eleven on the second day
day <- factor(rep(c("day 1", "day 2"), c(10, 11)))

## loading dplyr to construct a tibble - which is a easier way to present the data
library(dplyr)

## creating the tibble
tbl1<-tibble(ID = ID, Grade = grade, Day = day, GradeFirstPhase = gradeFirstPhase)

## printing it in its entirety
print(tbl1, n=21)

## # A tibble: 21 × 4
##    ID      Grade Day   GradeFirstPhase
##    <chr>   <dbl> <fct>           <dbl>
##  1 IMT-M04  81   day 1            78.0
##  2 IMT-M05  76.5 day 1            78.1
##  3 IMT-M07  74.5 day 1            73.0
##  4 IMT-M08  79   day 1            83.6
##  5 IMT-M09  76   day 1            73.9
##  6 IMT-M10  62.5 day 1            75.3
##  7 IMT-M11  76   day 1            80.2
##  8 IMT-M12  71   day 1            73.0
##  9 IMT-M13  65   day 1            75.1
## 10 IMT-M14  73   day 1            77.5
## 11 IMT-M15  90.8 day 2            76.6
## 12 IMT-M16  97.4 day 2            77  
## 13 IMT-M17  91.1 day 2            80.2
## 14 IMT-M19  90.7 day 2            82.9
## 15 IMT-M20  78.1 day 2            74.5
## 16 IMT-M21  88.4 day 2            74.9
## 17 IMT-M22  85.5 day 2            79.8
## 18 IMT-M23  97.6 day 2            82.3
## 19 IMT-M24  82.1 day 2            80.6
## 20 IMT-M25  71.5 day 2            80.1
## 21 IMT-M27  92.8 day 2            80.5

The tibble constructed can be compared with the results table available directly from the website to be assured of its validity.

A short analysis was undertaken to describe the data and to make inferences about the differences among the two groups.

Results

Descriptive Analysis

Mean, median, standard deviation and interquartile ranges were calculated for the grade variable in each group, as well as for the overall distribution.

## grouping by day and then employing summary statistics
tbl1 %>% group_by(Day) %>% summarise(N = n(), Grade.mean = mean(Grade),
                                     Grade.median = median(Grade),
                                     Grade.std.dev = sd(Grade),
                                     Grade.IQR = IQR(Grade),
                                     Grade.minimal = min(Grade),
                                     Grade.maximum = max(Grade))-> summarised

## creating summary statistics for the entire distribution
tbl1 %>% summarise(N = n(), Grade.mean = mean(Grade),
                    Grade.median = median(Grade),
                    Grade.std.dev = sd(Grade),
                    Grade.IQR = IQR(Grade),
                    Grade.minimal = min(Grade),
                    Grade.maximum = max(Grade)) %>% 
        mutate (Day = "Overall") -> summarisedTotal

## printing the descriptive statistics by group and overall
rbind(summarised, summarisedTotal)

## # A tibble: 3 × 8
##   Day         N Grade.mean Grade.median Grade.std.dev Grade.IQR Grade.minimal
##   <fct>   <int>      <dbl>        <dbl>         <dbl>     <dbl>         <dbl>
## 1 day 1      10       73.4         75.2          5.86      4.88          62.5
## 2 day 2      11       87.8         90.7          7.99      8.17          71.5
## 3 Overall    21       81.0         79           10.1      16.2           62.5
## # ℹ 1 more variable: Grade.maximum <dbl>

From these values, one can already see that the mean of grades in day 2 is considerably higher than the maximum value of grades in day 1.

Density plots were built for the distribution of grades as a whole and then divided by groups. Dotted lines represent the mean and dashed lines represent the median.

## Loading ggplot2 to build plots
library(ggplot2)

## creating a density plot
ggplot(tbl1, aes(x=Grade)) + geom_density() +
        ## adding a vertical solid line for the mean
geom_vline(aes(xintercept = mean(Grade))) +
        ## adding a vertical dashed line for the median
        geom_vline(aes(xintercept = median(Grade)),
                   linetype = "dashed")

## creating a density plot for the distributions in day 1 and 2 separately
ggplot(tbl1, aes(x=Grade, color = Day)) + geom_density() +
        ## adding the means and medians for each group
        geom_vline(data = summarised, aes(xintercept = Grade.mean, color = Day)) +
        geom_vline(data = summarised, aes(xintercept = Grade.median, color = Day),
                   linetype = "dashed") +
        geom_text(aes(x=73.45000, label="\nMean", y=0.0375, family = "sans"),
                  colour="red", angle=90) +
        geom_text(aes(x=75.25, label="\nMedian", y=0.0625, family = "sans"),
                  colour="red", angle=90) +
        geom_text(aes(x=87.82909, label="\nMean", y=0.075, family = "sans"),
                  colour="cyan", angle=90) +
        geom_text(aes(x=90.69, label="\nMedian", y=0.0625, family = "sans"),
                  colour="cyan", angle=90)

The density plots for the overall distribution show a pattern that resembles a bimodal distribution, which can be seen when the groups are divided.

Inferential Analysis

In order to consider whether applying parametrical or non-parametrical inferential tests was appropriate, the Shapiro-Wilk test for normality was conducted on the entire distribution.

tbl1 %>% pull(Grade) %>% shapiro.test

## 
##  Shapiro-Wilk normality test
## 
## data:  .
## W = 0.96377, p-value = 0.5951

As the results did not negate the null-hypothesis, a student’s t-test was conducted to determine whether the differences observed between the groups were due to chance. The test conducted was two-sided with an alpha level for significance set at 0.05.

t.test(data = tbl1, Grade ~ Day)

## 
##  Welch Two Sample t-test
## 
## data:  Grade by Day
## t = -4.7303, df = 18.232, p-value = 0.0001616
## alternative hypothesis: true difference in means between group day 1 and group day 2 is not equal to 0
## 95 percent confidence interval:
##  -20.759616  -7.998566
## sample estimates:
## mean in group day 1 mean in group day 2 
##            73.45000            87.82909

The null-hypothesis of no difference between the groups was rejected, as per the parameters above. We can also infer with 95% certainty that the true difference between the groups lies between 8 and 20.8 points in the grades.

When considering the grades that the students were given on the first phase, in which the text version of the project was examined, the student’s t-test that compares both is the following:

t.test(data = tbl1, GradeFirstPhase ~ day)

## 
##  Welch Two Sample t-test
## 
## data:  GradeFirstPhase by day
## t = -1.6426, df = 17.639, p-value = 0.1182
## alternative hypothesis: true difference in means between group day 1 and group day 2 is not equal to 0
## 95 percent confidence interval:
##  -5.1456523  0.6336523
## sample estimates:
## mean in group day 1 mean in group day 2 
##              76.774              79.030

The result is not significant, which points towards the differences arising in the evaluation of the second phase.

Sensitivity Analysis

In order to demonstrate that the results are robust to the assumption that the distribution of grades in the second phase is roughly normal, a Mann-Whitney test was also conducted to analyse if the results were any different.

wilcox.test(data = tbl1, Grade ~ Day)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Grade by Day
## W = 9, p-value = 0.00135
## alternative hypothesis: true location shift is not equal to 0

The null-hypothesis was also rejected with this inferential test which does not rely on the assumption of normality.

Discussion

With the results described above we can affirm that the grades of the students which participated in the admissions process of the 2024 cohort of the Infectious Disease and Tropical Medicine Master’s Program at UFMG were significantly different depending on the day that the oral presentation took place.

Further investigation is necessary to ascertain the underlying reasons for this discrepancy. Appropriate hypothesis should consider: the effect that being on a different day of the week can have on the presenter and the evaluators, whether the ID’s for the students were randomly attributed or if there was some sort factor that led to certain students having the lowest numbers and other students having the highest numbers and, perhaps most importantly, whether different evaluators were involved between the two presentation days.

Given that the admissions process aims to select the most suitable candidates for the program, an in-depth analysis of the aforementioned factors contributing to the observed differences would be of considerable merit.