1 Hypothesis 1

1.1 Student Distress Over Time

We hypothesize that student distress, as represented by mean PHQ-9 score, will have risen significantly over time.

Furthermore, we hypothesize that there will be significant differences in PHQ-9 scores across demographic variables including:

  • Gender
  • Race
  • Sexual orientation
  • Year in education (e.g., freshman, sophomore, etc.)

Specifically, we hypothesize that students with marginalized identities will have higher PHQ-9 scores than their peers.

1.1.1 Score and Time

model=lm(Score~Period, data=data)
summary(model)

Call:
lm(formula = Score ~ Period, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-14.4902  -4.0158  -0.1107   4.0790  13.4585 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 13.25694    0.24641  53.801  < 2e-16 ***
Period       0.09486    0.02663   3.562 0.000376 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.471 on 2411 degrees of freedom
Multiple R-squared:  0.005234,  Adjusted R-squared:  0.004821 
F-statistic: 12.69 on 1 and 2411 DF,  p-value: 0.0003756

At the α = .05 significance level, we reject the null hypothesis that the slope equals zero. There is statistically significant evidence of a linear relationship between PHQ-9 score and Period. The estimated slope of 0.0949 indicates that, on average, PHQ-9 scores increase by approximately 0.095 points per semester. However, the R² value of 0.0052 indicates that Period explains only about 0.5% of the variability in PHQ-9 scores. Thus, while the upward trend over time is statistically significant, the magnitude of the effect is small.

mean_by_period <- data %>%
  group_by(Period) %>%
  summarise(
    mean_score = mean(Score, na.rm = TRUE),
    n = sum(!is.na(Score))
  )

ggplot(mean_by_period, aes(x = Period, y = mean_score)) +
  geom_line() +
  geom_point() +
  labs(
    title = "Mean PHQ-9 Score by Period",
    x = "Period",
    y = "Mean Score"
  )

1.1.2 Score and Gender

model=lm(Score ~ Gender2, data = data)
summary(model)

Call:
lm(formula = Score ~ Gender2, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-14.0922  -3.8546   0.1454   4.1454  13.1454 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)         14.0922     0.2492  56.554  < 2e-16 ***
Gender2Non-binary    4.0165     0.8402   4.780 1.86e-06 ***
Gender2PNA           4.1459     1.2135   3.417 0.000645 ***
Gender2Trans man     2.4078     1.7389   1.385 0.166300    
Gender2Trans woman   2.9078     2.4466   1.189 0.234754    
Gender2Woman        -0.2376     0.2795  -0.850 0.395210    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.442 on 2403 degrees of freedom
  (4 observations deleted due to missingness)
Multiple R-squared:  0.01795,   Adjusted R-squared:  0.01591 
F-statistic: 8.784 on 5 and 2403 DF,  p-value: 2.858e-08

At the α = .05 significance level, we reject the null hypothesis of equal mean PHQ-9 scores across gender groups. The overall F-test indicates that at least one gender group differs significantly in mean PHQ-9 score. Therefore, there is statistically significant evidence that PHQ-9 scores vary by gender. The effect is very small though. If we look at averages, Non-binary and PNA genders have the largest scores.

mean_score_gender <- data %>% 
  group_by(Gender2) %>% 
  summarise(avg_score = mean(Score))

mean_score_gender
# A tibble: 7 × 2
  Gender2     avg_score
  <chr>           <dbl>
1 Man              14.1
2 Non-binary       18.1
3 PNA              18.2
4 Trans man        16.5
5 Trans woman      17  
6 Woman            13.9
7 <NA>             14.8

1.1.3 Score and Race

model=lm(Score ~ Race2, data = data)
summary(model)

Call:
lm(formula = Score ~ Race2, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-14.9866  -3.7587   0.2413   4.2413  13.2413 

Coefficients:
                                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)                         14.98661    0.36518  41.039  < 2e-16 ***
Race2Arab/ME                         2.32109    1.55922   1.489  0.13672    
Race2Asia/PI                        -0.58661    0.68210  -0.860  0.38988    
Race2DNI                             2.14673    1.45767   1.473  0.14096    
Race2Multi-ethnic                    0.03034    0.62170   0.049  0.96108    
Race2Native American/Alaskan Native -1.22789    0.38715  -3.172  0.00154 ** 
Race2PNA                            -0.79911    0.86930  -0.919  0.35805    
Race2White                          -0.13105    0.68210  -0.192  0.84766    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.465 on 2397 degrees of freedom
  (8 observations deleted due to missingness)
Multiple R-squared:  0.0112,    Adjusted R-squared:  0.008313 
F-statistic: 3.879 on 7 and 2397 DF,  p-value: 0.0003296

At the α = .05 significance level, we reject the null hypothesis of equal mean PHQ-9 scores across racial groups. The overall F-test indicates that at least one racial group differs significantly in mean PHQ-9 score. Therefore, there is statistically significant evidence that PHQ-9 scores vary by race. The effect is very small though.

mean_score_race <- data %>% 
  group_by(Race2) %>% 
  summarise(avg_score = mean(Score))

mean_score_race
# A tibble: 9 × 2
  Race2                          avg_score
  <chr>                              <dbl>
1 African/Afro-Caribbean/Black        15.0
2 Arab/ME                             17.3
3 Asia/PI                             14.4
4 DNI                                 17.1
5 Multi-ethnic                        15.0
6 Native American/Alaskan Native      13.8
7 PNA                                 14.2
8 White                               14.9
9 <NA>                                11.4

1.1.4 Score and Sexual Orientation

model=lm(Score ~ Sorient2, data = data)
summary(model)

Call:
lm(formula = Score ~ Sorient2, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.867  -4.000  -0.336   3.664  13.664 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)          14.16667    0.66257  21.381   <2e-16 ***
Sorient2Bisexual      1.70023    0.73341   2.318   0.0205 *  
Sorient2DNI          -0.09524    1.21399  -0.078   0.9375    
Sorient2Gay/lesbian   1.22013    0.84400   1.446   0.1484    
Sorient2Heterosexual -0.83068    0.67582  -1.229   0.2191    
Sorient2Panromantic  -0.76667    2.49675  -0.307   0.7588    
Sorient2Pansexual     3.21171    1.10547   2.905   0.0037 ** 
Sorient2PNA           0.83333    0.84100   0.991   0.3218    
Sorient2Queer         2.48551    1.03386   2.404   0.0163 *  
Sorient2Questioning   1.23333    0.88310   1.397   0.1627    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.383 on 2398 degrees of freedom
  (5 observations deleted due to missingness)
Multiple R-squared:  0.04094,   Adjusted R-squared:  0.03734 
F-statistic: 11.37 on 9 and 2398 DF,  p-value: < 2.2e-16

At the α = .05 significance level, we reject the null hypothesis of equal mean PHQ-9 scores across sexual orientation groups. The overall F-test indicates that at least one sexual orientation group differs significantly in mean PHQ-9 score. Therefore, there is statistically significant evidence that PHQ-9 scores vary by sexual orientation. Although the effect is still small, it is larger than the effects observed for gender and race.

mean_score_race <- data %>% 
  group_by(Sorient2) %>% 
  summarise(avg_score = mean(Score))

mean_score_race
# A tibble: 11 × 2
   Sorient2     avg_score
   <chr>            <dbl>
 1 Asexual           14.2
 2 Bisexual          15.9
 3 DNI               14.1
 4 Gay/lesbian       15.4
 5 Heterosexual      13.3
 6 PNA               15  
 7 Panromantic       13.4
 8 Pansexual         17.4
 9 Queer             16.7
10 Questioning       15.4
11 <NA>              14.6

1.1.5 Score and Class

model=lm(Score ~ Class2, data = data)
summary(model)

Call:
lm(formula = Score ~ Class2, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-14.4360  -4.0891  -0.0891   3.9109  14.0056 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)     14.08912    0.21197  66.467  < 2e-16 ***
Class2Junior     0.03717    0.31047   0.120   0.9047    
Class2Post bacc  1.61088    0.88800   1.814   0.0698 .  
Class2Senior    -0.03129    0.34148  -0.092   0.9270    
Class2Senior+   -2.09474    0.46048  -4.549 5.66e-06 ***
Class2Sophomore  0.34687    0.31641   1.096   0.2731    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.454 on 2406 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.01304,   Adjusted R-squared:  0.01099 
F-statistic: 6.358 on 5 and 2406 DF,  p-value: 7.121e-06

At the α = .05 significance level, we reject the null hypothesis of equal mean PHQ-9 scores across class year groups. The overall F-test indicates that at least one class year differs significantly in mean PHQ-9 score. Therefore, there is statistically significant evidence that PHQ-9 scores vary by year in education. The effect is very small though.

mean_score_race <- data %>% 
  group_by(Class2) %>% 
  summarise(avg_score = mean(Score))

mean_score_race
# A tibble: 7 × 2
  Class2     avg_score
  <chr>          <dbl>
1 First year      14.1
2 Junior          14.1
3 Post bacc       15.7
4 Senior          14.1
5 Senior+         12.0
6 Sophomore       14.4
7 <NA>             8  

1.2 Conclusion

Overall, the mean PHQ-9 score increased significantly over time. We also found significant differences in PHQ-9 scores across demographics such as gender, race, sexual orientation, and year in education. There were cases where marginalized identities, especially within gender and sexual orientation, had higher PHQ-9 scores compared to the reference groups.

However, it is important to note that the sample sizes for each period were relatively small and fluctuated a lot. The survey was also not balanced across demographics. For example, the gender breakdown shows the sample was heavily dominated by women, which may influence the results.

Lastly, the overall mean PHQ-9 score across the study was 14.04. This places the average respondent within the moderate depression range, hinting that students experiencing greater distress were more likely to participate in the survey.

imbalance <- data %>% group_by(Gender2) %>% summarise(counts = n())
imbalance
# A tibble: 7 × 2
  Gender2     counts
  <chr>        <int>
1 Man            477
2 Non-binary      46
3 PNA             21
4 Trans man       10
5 Trans woman      5
6 Woman         1850
7 <NA>             4

2 Hypothesis 2

