How do different factors contribute to medical students’ Center for Epidemiologic Studies Depression Scale (CES-D) scores?

Introduction

To answer the question, I will be using the dataset from the study: The relationship between medical students’ empathy, mental health, and burnout, published under the National Institute of Health. The study explores the relationship between various factors such as age and relationship status with mental health using standardized scores.

I chose this topic because I was interested to see which factors were most significant in contributing to depression in students.

Study dataset provided by the EU Open Research Repository:
https://zenodo.org/records/5702895#.Y8OraNJBwUE

Key Variables:

Age: Age (years)
Year: Year in medical school (1-5)
Sex: Gender (man, women, non-binary)
Part: Partner Status (yes/no)
Job: Job Status (yes/no)
Stud_h: Hours studied weekly
Health: Health satisfaction (Very dissatisfied - Very Satisfied)
CESD: CES-D depression score (continuous from 0-60)

Load libraries and dataset

library(tidyverse)
students <- read_csv("Data_Carrard_et_al_2022_MedTeach.csv")
dim(students)
## [1] 886  20

Data Analysis

This dataset’s classes and variable names are mostly clean; however, the data is presented in numbers which can only be interpreted with the provided reference sheet. I will replace these numbers with the written data. I will then select only relevant variables.

#str(students)
head(students)
## # A tibble: 6 × 20
##      id   age  year   sex glang  part   job stud_h health  psyt  jspe qcae_cog
##   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl>    <dbl>
## 1     2    18     1     1   120     1     0     56      3     0    88       62
## 2     4    26     4     1     1     1     0     20      4     0   109       55
## 3     9    21     3     2     1     0     0     36      3     0   106       64
## 4    10    21     2     2     1     0     1     51      5     0   101       52
## 5    13    21     3     1     1     1     0     22      4     0   102       58
## 6    14    26     5     2     1     1     1     10      2     0   102       48
## # ℹ 8 more variables: qcae_aff <dbl>, amsp <dbl>, erec_mean <dbl>, cesd <dbl>,
## #   stai_t <dbl>, mbi_ex <dbl>, mbi_cy <dbl>, mbi_ea <dbl>

Replace numeric placeholders

students$sex[students$sex == 1] <- "Man"
students$sex[students$sex == 2] <- "Woman"
students$sex[students$sex == 3] <- "Non-binary"

students$health[students$health == 1] <- "Very dissatisfied"
students$health[students$health == 2] <- "Dissatisfied"
students$health[students$health == 3] <- "Neutral"
students$health[students$health == 4] <- "Satisfied"
students$health[students$health == 5] <- "Very satisfied"

Correct Variable Classes

students$part <- as.character(students$part)
students$job <- as.character(students$job)
students$year <- as.factor(students$year)

students$health <- factor(students$health, 
                          levels = c("Very dissatisfied",
                                     "Dissatisfied",
                                     "Neutral",
                                     "Satisfied",
                                     "Very satisfied"))

Select only useful variables

students2 <- students |>
  select(age,       #Select only useful variables
         year, 
         sex, 
         part, 
         job, 
         stud_h, 
         health,
         cesd) |>
  mutate(degree_level = case_when(         #Add a column for degree 
    year %in% c(4, 5, 6) ~ "Master",
    year %in% c(1, 2, 3) ~ "Bachelor"
  )) |>
  arrange(age)      #arrange from youngest to oldest

Statistical Analysis

I will be using a multiple linear regression model because I am using multiple variables to predict a continuous outcome variable (CES-D score).

Multiple Linear Regression Model

students_model <- lm(cesd ~ age + sex + stud_h + part + health, data = students2)
#job removed due to low p-value
#degree and year omitted due to multicollinearity
summary(students_model)
## 
## Call:
## lm(formula = cesd ~ age + sex + stud_h + part + health, data = students2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -25.591  -7.106  -1.318   6.005  37.302 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          19.41911    3.19790   6.072 1.88e-09 ***
## age                  -0.29896    0.10785  -2.772 0.005690 ** 
## sexNon-binary         0.65141    4.51573   0.144 0.885333    
## sexWoman              4.64600    0.72991   6.365 3.14e-10 ***
## stud_h                0.07646    0.02207   3.464 0.000558 ***
## part1                -1.18644    0.69003  -1.719 0.085893 .  
## healthDissatisfied   11.02541    1.94916   5.657 2.09e-08 ***
## healthNeutral         6.35826    1.83911   3.457 0.000572 ***
## healthSatisfied      -0.12952    1.70720  -0.076 0.939543    
## healthVery satisfied -4.44214    1.76330  -2.519 0.011938 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.898 on 876 degrees of freedom
## Multiple R-squared:  0.2641, Adjusted R-squared:  0.2565 
## F-statistic: 34.92 on 9 and 876 DF,  p-value: < 2.2e-16

Coefficients:

Age: ~-0.3, as age increases by 1 year, depression score tends to slightly decrease by around 0.3 points

Study Hours: ~0.08, as study hours increase by 1 hour, depression score slightly increases by around .08 points.

Partner status: Those with partners tend to have a lower depression score, around 1.19 points fewer than those without.

Sex (male baseline): Women tend to have a higher depression score than men (~4.65 points). There is not enough data on non-binary individuals to yield statistically significant results.

Health Satisfaction (baseline very dissatisfied): Those who are very satisfied with their health have lower depression scores (~4.44 fewer points) than those who are very dissatisfied. Those who are dissatisfied have the highest scores, though surprisingly not those who are very dissatisfied, who score about 11.02 points higher. Students who are neutral score around 6.36 points higher than very dissatisfied students.

P-value: 2.2e-16. This small p-value shows that the model is statistically significant. Sex (non-binary) and health (satisfied) are not significant as their p-values are greater than 0.05. (0.885333 and 0.939543)

Adjusted R^2: 0.2565. This indicates that around 25.7% of the variance in depression scores can be explained by this model. This does not surprise me as there are many factors that go into predicting depression beyond the variables in this model.

Model Assumptions and Diagnostics

Diagnostic Plots

par(mfrow=c(2,2)); plot(students_model); par(mfrow=c(1,1))

Linearity:
The residuals vs fitted plot shows randomly scattered residuals around zero with no clear pattern, indicating linearity.

Independence of observations:

plot(resid(students_model), type="b",
     main="Residuals vs Order", ylab="Residuals"); abline(h=0, lty=2)

The residuals vs order plot shows residuals centered evenly around zero, indicating independence.

Homoscedasticity

The spread of points across the residuals vs fitted plot line is distributed evenly around zero which indicates homoscedasticity. In addition, the scale-location plot’s line is mostly horizontal with an even spread of points which demonstrates homoscedasticity.

Normality of residuals:

The Q-Q residuals plot follows the horizontal line, indicating normality.

Multicollinearity

cor(students2[, c("age", "stud_h")], use = "complete.obs")
##               age     stud_h
## age     1.0000000 -0.2935569
## stud_h -0.2935569  1.0000000

There is little correlation between the variables, showing there is little multicollinearity in the model.

Conclusion

Key Findings

Gender and health satisfaction are most significant in predicting medical students’ depression scores, while age, partner status, and hours studied also contribute. Women and those who spend the most time studying tend to have higher depression scores, while those who are older, very satisfied with their health, and those with partners have the lowest depression scores.

Implications

My results answer my question of which factors influence medical students’ depression. This information is useful as it can highlight warning signs for students who may suffer from depression, allowing them to get treatment. It also highlights some factors that are controllable like hours studied, which shows that it is important to take breaks to avoid being more depressed.

Future Research

In the future, I would like to look into why medical students who are women tend to have higher depression scores. I would also like to see if depression scores change over time while students are in medical school, since I had to omit this variable due to multicollinearity with age.

References

Carrard V, Bourquin C, Berney S, Schlegel K, Gaume J, Bart PA, et al. The relationship between medical students’ empathy, mental health, and burnout: A cross-sectional study. Medical Teacher. 2022;44:1392–9. 10.1080/0142159X.2022.2098708