library(dplyr)
data=readxl::read_excel('DevStudentsOverTime.xlsx')

Comparison Group

group1=data %>% 
  filter(GRADUATED=='NO')

group2=data %>% 
  filter(GRADUATED=='YES')

Group 1 is Non-Graduated and group 2 is Graduated

Normality Check

hist(group1$HOURS_EARNED)

hist(group2$HOURS_EARNED)

shapiro.test(group1$HOURS_EARNED)
## 
##  Shapiro-Wilk normality test
## 
## data:  group1$HOURS_EARNED
## W = 0.83686, p-value < 2.2e-16
shapiro.test(group2$HOURS_EARNED)
## 
##  Shapiro-Wilk normality test
## 
## data:  group2$HOURS_EARNED
## W = 0.96089, p-value < 2.2e-16

Is there a significant difference at all regarding the non-grad and the grad group in hours earned? Or was it by chance?

Stat Test

wilcox.test(group1$HOURS_EARNED,group2$HOURS_EARNED,paired = F)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  group1$HOURS_EARNED and group2$HOURS_EARNED
## W = 890892, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0

Total sum of group credit hours earned

sum(group1$HOURS_EARNED)#NO group
## [1] 36601
sum(group2$HOURS_EARNED)#YES group
## [1] 67289

More students were in the NO group, but more hours were earned in the YES group.