Predictors of Medical Students’ Anxiety, Depression and Burnout

Objective of the analysis

This analysis aims at identifying factors that predict an increased probability of mental heath issues among medical students. This should allow policy makers, faculties of medicine and employers to identify groups at risk.

Source

The data that were used for this analysis were made available on https://www.kaggle.com by Carrard et al. (2022).

Here is the direct link to the dataset and the codebook.

Furthermore, I looked for more background information on the different scales that were used to gather these data.

Stai Score as a metric of anxiety
Maslach Burnout Inventory as a metric of burnout
JSPE as a metric of empathy
AMSP scores as a metric of academic motivation
Gert Test as a metric of ability to identify emotions correctly in others

Processing the data

Before actually starting the analysis, we will start by exploring and preparing the dataset

We’ll use the , Tableone, Tidyverse, GGPLOT2 and DPLYR packages. So we’ll install them first.

–> If needed, you can use the code chunk below to install the packages

install.packages(“dplyr”) install.packages(“ggplot2”) install.packages(“tidyverse”) install.packages(“tableone”) install.packages(“corrplot”) install.packages(“PerformanceAnalytics”) install.packages(“Hmisc”) install.packages(“MASS”)

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ lubridate 1.9.2     ✔ tibble    3.1.8
## ✔ purrr     1.0.1     ✔ tidyr     1.3.0
## ✔ readr     2.1.4

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors

library(tableone)

library(corrplot)

## corrplot 0.92 loaded

library(PerformanceAnalytics)

## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## 
## ################################### WARNING ###################################
## # We noticed you have dplyr installed. The dplyr lag() function breaks how    #
## # base R's lag() function is supposed to work, which breaks lag(my_xts).      #
## #                                                                             #
## # Calls to lag(my_xts) that you enter or source() into this session won't     #
## # work correctly.                                                             #
## #                                                                             #
## # All package code is unaffected because it is protected by the R namespace   #
## # mechanism.                                                                  #
## #                                                                             #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning.  #
## #                                                                             #
## # You can use stats::lag() to make sure you're not using dplyr::lag(), or you #
## # can add conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop   #
## # dplyr from breaking base R's lag() function.                                #
## ################################### WARNING ###################################
## 
## Attaching package: 'xts'
## 
## The following objects are masked from 'package:dplyr':
## 
##     first, last
## 
## 
## Attaching package: 'PerformanceAnalytics'
## 
## The following object is masked from 'package:graphics':
## 
##     legend

library(Hmisc)

## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## 
## Attaching package: 'Hmisc'
## 
## The following objects are masked from 'package:dplyr':
## 
##     src, summarize
## 
## The following objects are masked from 'package:base':
## 
##     format.pval, units

library(MASS)

## 
## Attaching package: 'MASS'
## 
## The following object is masked from 'package:dplyr':
## 
##     select

The first step is to load the CSV file that was previously downlaaded from Kaggle.

Mental_Health_Data <- read.csv("~/Downloads/Data_Carrard_et_al_2022_MedTeach.csv")

We start by taking a look at the dataset and the variables:

View(Mental_Health_Data)
glimpse(Mental_Health_Data)

## Rows: 886
## Columns: 20
## $ id        <int> 2, 4, 9, 10, 13, 14, 17, 21, 23, 24, 28, 34, 35, 36, 38, 40,…
## $ age       <int> 18, 26, 21, 21, 21, 26, 23, 23, 23, 22, 18, 18, 24, 21, 22, …
## $ year      <int> 1, 4, 3, 2, 3, 5, 5, 4, 4, 2, 1, 1, 5, 2, 2, 2, 1, 4, 5, 3, …
## $ sex       <int> 1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1, 2, 2, …
## $ glang     <int> 120, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 90, 1, 1, 1, 90, 1, 1,…
## $ part      <int> 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, …
## $ job       <int> 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ stud_h    <int> 56, 20, 36, 51, 22, 10, 15, 8, 20, 20, 20, 9, 25, 51, 42, 40…
## $ health    <int> 3, 4, 3, 5, 4, 2, 3, 4, 2, 5, 4, 5, 5, 2, 3, 4, 4, 4, 4, 4, …
## $ psyt      <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, …
## $ jspe      <int> 88, 109, 106, 101, 102, 102, 117, 118, 118, 108, 105, 115, 9…
## $ qcae_cog  <int> 62, 55, 64, 52, 58, 48, 58, 65, 69, 56, 54, 68, 70, 53, 55, …
## $ qcae_aff  <int> 27, 37, 39, 33, 28, 37, 38, 40, 46, 36, 26, 36, 37, 37, 37, …
## $ amsp      <int> 17, 22, 17, 18, 21, 17, 23, 32, 23, 22, 25, 27, 28, 19, 15, …
## $ erec_mean <dbl> 0.7380952, 0.6904762, 0.6904762, 0.8333333, 0.6904762, 0.690…
## $ cesd      <int> 34, 7, 25, 17, 14, 14, 45, 6, 43, 11, 7, 5, 8, 19, 25, 18, 2…
## $ stai_t    <int> 61, 33, 73, 48, 46, 56, 56, 36, 43, 43, 32, 45, 38, 48, 45, …
## $ mbi_ex    <int> 17, 14, 24, 16, 22, 18, 28, 11, 26, 18, 11, 15, 11, 19, 19, …
## $ mbi_cy    <int> 13, 11, 7, 10, 14, 15, 17, 10, 21, 6, 6, 4, 10, 11, 7, 14, 1…
## $ mbi_ea    <int> 20, 26, 23, 21, 23, 18, 16, 27, 22, 23, 26, 26, 29, 20, 26, …

For our own convenience, we rename the variables:

Mental_Health_Data <- rename(Mental_Health_Data,language = glang, partner=part, study_hours=stud_h, self_reported_health=health, psychotherapy=psyt, empathy_jspe=jspe, cognitive_empathy=qcae_cog, affective_empathy=qcae_aff, academic_drive=amsp,emotion_recognition=erec_mean, depression=cesd, anxiety=stai_t, exhaustion=mbi_ex, cynicism=mbi_cy, personal_efficacy=mbi_ea)

names(Mental_Health_Data)

##  [1] "id"                   "age"                  "year"                
##  [4] "sex"                  "language"             "partner"             
##  [7] "job"                  "study_hours"          "self_reported_health"
## [10] "psychotherapy"        "empathy_jspe"         "cognitive_empathy"   
## [13] "affective_empathy"    "academic_drive"       "emotion_recognition" 
## [16] "depression"           "anxiety"              "exhaustion"          
## [19] "cynicism"             "personal_efficacy"

Let’s check out how many rows there are in our dataset.

nrow(Mental_Health_Data)

## [1] 886

Before moving on with the data preparation, we will make sure that the categorical variables receive labels for their values. We do this by creating new variables (same variable name with addition of ‘Label’). We keep the old variable so we can still use them as a binary variable for calculations further down the road.
* Sex_label * Language_label * Partner_label * Job_label * Psychotherapy_label

Mental_Health_Data$sex_label<-factor(Mental_Health_Data$sex,
                                 levels = c(1,2,3),
                                 labels = c("Man", "Woman", "Non-binary"))

Mental_Health_Data$language_label<-factor(Mental_Health_Data$language,
                                levels=c(1,15,20,37,51,52,53,54,59,60,62,63,82,83,84,85,86,87,88,89,90,92,93,94,95,96,98,100,101,102,104,106,108,112,113,114,116,117,118,119,120,121),
                                 labels = c("French", "German", "English", "Arab","Basque", "Bulgarian", "Catalan", "Chinese", "Korean", "Croatian", "Danish", "Spanish", "Estonian", "Finnish","Galician","Greek","Hebrew","Hindi","Hungarian","Indonesian","Italian","Japanese","Kazakh","Latvian","Lithuanian","Malay","Dutch","Norwegian","Polish","Portuguese","Romanian","Russian","Serbian","Slovak","Slovenian","Swedish","Czech","Thai","Turkish","Ukrainian","Vietnamese","Other"))

Mental_Health_Data$job_label<-factor(Mental_Health_Data$job,
                                 levels = c(0,1),
                                 labels = c("No", "Yes"))


Mental_Health_Data$psychotherapy_label<-factor(Mental_Health_Data$psychotherapy,
                                 levels = c(0,1),
                                 labels = c("No", "Yes"))

Mental_Health_Data$partner_label<-factor(Mental_Health_Data$partner,
                                 levels = c(0,1),
                                 labels = c("No", "Yes"))

In the next step, we will try to get a better understanding of the data by generating some descriptive statistics:

Table_descriptives<-CreateTableOne(data=Mental_Health_Data)
summary(Table_descriptives)

## 
##      ### Summary of continuous variables ###
## 
## strata: Overall
##                        n miss p.miss  mean    sd median   p25   p75  min  max
## id                   886    0      0 889.7 5e+02  876.0 447.5 1e+03  2.0 1790
## age                  886    0      0  22.4 3e+00   22.0  20.0 2e+01 17.0   49
## year                 886    0      0   3.1 2e+00    3.0   1.0 5e+00  1.0    6
## sex                  886    0      0   1.7 5e-01    2.0   1.0 2e+00  1.0    3
## language             886    0      0  14.3 3e+01    1.0   1.0 1e+00  1.0  121
## partner              886    0      0   0.6 5e-01    1.0   0.0 1e+00  0.0    1
## job                  886    0      0   0.3 5e-01    0.0   0.0 1e+00  0.0    1
## study_hours          886    0      0  25.3 2e+01   25.0  12.0 4e+01  0.0   70
## self_reported_health 886    0      0   3.8 1e+00    4.0   3.0 5e+00  1.0    5
## psychotherapy        886    0      0   0.2 4e-01    0.0   0.0 0e+00  0.0    1
## empathy_jspe         886    0      0 106.4 9e+00  107.0 101.0 1e+02 67.0  125
## cognitive_empathy    886    0      0  58.5 7e+00   58.0  54.0 6e+01 37.0   76
## affective_empathy    886    0      0  34.8 5e+00   35.0  31.0 4e+01 18.0   48
## academic_drive       886    0      0  23.2 5e+00   23.0  20.0 3e+01  6.0   35
## emotion_recognition  886    0      0   0.7 9e-02    0.7   0.7 8e-01  0.4    1
## depression           886    0      0  18.1 1e+01   16.0   9.0 2e+01  0.0   56
## anxiety              886    0      0  42.9 1e+01   43.0  34.0 5e+01 20.0   77
## exhaustion           886    0      0  16.9 5e+00   17.0  13.0 2e+01  5.0   30
## cynicism             886    0      0  10.1 5e+00    9.0   6.0 1e+01  4.0   24
## personal_efficacy    886    0      0  24.2 5e+00   24.0  21.0 3e+01 10.0   36
##                       skew  kurt
## id                    0.02 -1.21
## age                   2.07  9.43
## year                  0.26 -1.29
## sex                  -0.69 -1.10
## language              2.28  3.51
## partner              -0.26 -1.94
## job                   0.64 -1.60
## study_hours           0.45 -0.48
## self_reported_health -0.88  0.24
## psychotherapy         1.32 -0.25
## empathy_jspe         -0.66  0.71
## cognitive_empathy    -0.09  0.12
## affective_empathy    -0.20 -0.22
## academic_drive       -0.11 -0.05
## emotion_recognition  -0.41  0.24
## depression            0.68 -0.08
## anxiety               0.24 -0.41
## exhaustion            0.10 -0.39
## cynicism              0.75 -0.03
## personal_efficacy    -0.18 -0.19
## 
## =======================================================================================
## 
##      ### Summary of categorical variables ### 
## 
## strata: Overall
##                  var   n miss p.miss      level freq percent cum.percent
##            sex_label 886    0    0.0        Man  275    31.0        31.0
##                                           Woman  606    68.4        99.4
##                                      Non-binary    5     0.6       100.0
##                                                                         
##       language_label 886    0    0.0     French  717    80.9        80.9
##                                          German   31     3.5        84.4
##                                         English   22     2.5        86.9
##                                            Arab    3     0.3        87.2
##                                          Basque    0     0.0        87.2
##                                       Bulgarian    0     0.0        87.2
##                                         Catalan    0     0.0        87.2
##                                         Chinese    1     0.1        87.4
##                                          Korean    0     0.0        87.4
##                                        Croatian    3     0.3        87.7
##                                          Danish    0     0.0        87.7
##                                         Spanish    5     0.6        88.3
##                                        Estonian    0     0.0        88.3
##                                         Finnish    0     0.0        88.3
##                                        Galician    0     0.0        88.3
##                                           Greek    0     0.0        88.3
##                                          Hebrew    0     0.0        88.3
##                                           Hindi    0     0.0        88.3
##                                       Hungarian    0     0.0        88.3
##                                      Indonesian    0     0.0        88.3
##                                         Italian   45     5.1        93.3
##                                        Japanese    1     0.1        93.5
##                                          Kazakh    0     0.0        93.5
##                                         Latvian    0     0.0        93.5
##                                      Lithuanian    1     0.1        93.6
##                                           Malay    0     0.0        93.6
##                                           Dutch    1     0.1        93.7
##                                       Norwegian    0     0.0        93.7
##                                          Polish    0     0.0        93.7
##                                      Portuguese   27     3.0        96.7
##                                        Romanian    4     0.5        97.2
##                                         Russian    6     0.7        97.9
##                                         Serbian    1     0.1        98.0
##                                          Slovak    0     0.0        98.0
##                                       Slovenian    0     0.0        98.0
##                                         Swedish    1     0.1        98.1
##                                           Czech    0     0.0        98.1
##                                            Thai    0     0.0        98.1
##                                         Turkish    2     0.2        98.3
##                                       Ukrainian    0     0.0        98.3
##                                      Vietnamese    2     0.2        98.5
##                                           Other   13     1.5       100.0
##                                                                         
##            job_label 886    0    0.0         No  577    65.1        65.1
##                                             Yes  309    34.9       100.0
##                                                                         
##  psychotherapy_label 886    0    0.0         No  687    77.5        77.5
##                                             Yes  199    22.5       100.0
##                                                                         
##        partner_label 886    0    0.0         No  387    43.7        43.7
##                                             Yes  499    56.3       100.0
##

The tables above have provided us with a better understanding of the different variables and their distribution.

We also learned from the previous steps that there don’t seem to be any missing values.

When it comes to the respondents, we have learned the following:

The majority of the respondents (68.4%) is female. 31 % is male and 0,5% identifies as non-binary.
The respondents are medical students. The mean age is 22.4 at the time of completing the questionnaire.
On average, respondents spend 25 hours/week on their study. Interestingly, there is a lot of variance (SD=25).
The majority of the respondents are native French speakers (over 80 %).
34.9 % of the respondents has paying job next to their studies.
22.5 % of the respondents consulted a psychotherapist in the course of the last 12 months.
55,3 % of the respondes has a partner

Data analysis

Exploration of correlations

We start our analysis with checking out the existing correlations between the different variables.

col1 <- colorRampPalette(c("#7F0000","red","#FF7F00","yellow","white", 
        "cyan", "#007FFF", "blue","#00007F"))
col2 <- colorRampPalette(c("#67001F", "#B2182B", "#D6604D", "#F4A582", "#FDDBC7",
            "#FFFFFF", "#D1E5F0", "#92C5DE", "#4393C3", "#2166AC", "#053061"))  
col3 <- colorRampPalette(c("red", "white", "blue")) 
col4 <- colorRampPalette(c("#7F0000","red","#FF7F00","yellow","#7FFF7F", 
            "cyan", "#007FFF", "blue","#00007F"))   

M<-cor(Mental_Health_Data[,c('age','sex','job', 'partner', 'study_hours', 'self_reported_health','empathy_jspe','cognitive_empathy','affective_empathy','psychotherapy','academic_drive','emotion_recognition','depression','anxiety','exhaustion','cynicism','personal_efficacy')])
corrplot(M, method ="number", col=col3(200), number.cex = .5)

M<-cor(Mental_Health_Data[,c('age','sex','job', 'partner', 'study_hours', 'self_reported_health','empathy_jspe','cognitive_empathy','affective_empathy','psychotherapy','academic_drive','emotion_recognition','depression','anxiety','exhaustion','cynicism','personal_efficacy')])
corrplot(M, method ="number", number.cex = .5)

## using these color spectrums
corrplot(M, addrect=2, col=col1(100))

The included graphs don’t provide dus with any information on the significance levels of the correlations. Therefore, we also calculate the p-values for the correlations.

Cormatrix <- rcorr (as.matrix(Mental_Health_Data[,c('age','sex','job', 'partner', 'study_hours', 'self_reported_health','empathy_jspe','cognitive_empathy','affective_empathy','psychotherapy','academic_drive','emotion_recognition','depression','anxiety','exhaustion','cynicism','personal_efficacy')]))

Cormatrix

##                        age   sex   job partner study_hours self_reported_health
## age                   1.00 -0.07  0.23    0.19       -0.29                -0.03
## sex                  -0.07  1.00  0.02    0.00       -0.01                -0.09
## job                   0.23  0.02  1.00    0.05       -0.20                -0.02
## partner               0.19  0.00  0.05    1.00       -0.10                 0.08
## study_hours          -0.29 -0.01 -0.20   -0.10        1.00                -0.07
## self_reported_health -0.03 -0.09 -0.02    0.08       -0.07                 1.00
## empathy_jspe          0.22  0.13  0.08    0.12       -0.13                 0.00
## cognitive_empathy     0.06  0.12  0.02    0.04       -0.02                -0.03
## affective_empathy    -0.01  0.36  0.00    0.01       -0.03                -0.06
## psychotherapy         0.02  0.16  0.06    0.03       -0.03                -0.14
## academic_drive        0.00 -0.13  0.03    0.06        0.00                 0.03
## emotion_recognition  -0.02  0.21  0.02    0.03       -0.06                 0.02
## depression           -0.14  0.23 -0.06   -0.11        0.17                -0.36
## anxiety              -0.08  0.25 -0.06   -0.07        0.15                -0.31
## exhaustion           -0.18  0.16 -0.07   -0.01        0.19                -0.29
## cynicism              0.00  0.02  0.01   -0.02       -0.09                -0.19
## personal_efficacy     0.05 -0.04 -0.02    0.05        0.10                 0.22
##                      empathy_jspe cognitive_empathy affective_empathy
## age                          0.22              0.06             -0.01
## sex                          0.13              0.12              0.36
## job                          0.08              0.02              0.00
## partner                      0.12              0.04              0.01
## study_hours                 -0.13             -0.02             -0.03
## self_reported_health         0.00             -0.03             -0.06
## empathy_jspe                 1.00              0.34              0.26
## cognitive_empathy            0.34              1.00              0.26
## affective_empathy            0.26              0.26              1.00
## psychotherapy                0.05              0.05              0.12
## academic_drive               0.10              0.39             -0.07
## emotion_recognition          0.10              0.07              0.14
## depression                  -0.08             -0.03              0.25
## anxiety                     -0.08             -0.08              0.33
## exhaustion                  -0.04             -0.02              0.22
## cynicism                    -0.01             -0.02              0.13
## personal_efficacy            0.08              0.18             -0.11
##                      psychotherapy academic_drive emotion_recognition
## age                           0.02           0.00               -0.02
## sex                           0.16          -0.13                0.21
## job                           0.06           0.03                0.02
## partner                       0.03           0.06                0.03
## study_hours                  -0.03           0.00               -0.06
## self_reported_health         -0.14           0.03                0.02
## empathy_jspe                  0.05           0.10                0.10
## cognitive_empathy             0.05           0.39                0.07
## affective_empathy             0.12          -0.07                0.14
## psychotherapy                 1.00          -0.07                0.00
## academic_drive               -0.07           1.00                0.00
## emotion_recognition           0.00           0.00                1.00
## depression                    0.27          -0.15                0.03
## anxiety                       0.29          -0.25                0.04
## exhaustion                    0.18          -0.07                0.02
## cynicism                      0.15          -0.03                0.06
## personal_efficacy            -0.16           0.22               -0.03
##                      depression anxiety exhaustion cynicism personal_efficacy
## age                       -0.14   -0.08      -0.18     0.00              0.05
## sex                        0.23    0.25       0.16     0.02             -0.04
## job                       -0.06   -0.06      -0.07     0.01             -0.02
## partner                   -0.11   -0.07      -0.01    -0.02              0.05
## study_hours                0.17    0.15       0.19    -0.09              0.10
## self_reported_health      -0.36   -0.31      -0.29    -0.19              0.22
## empathy_jspe              -0.08   -0.08      -0.04    -0.01              0.08
## cognitive_empathy         -0.03   -0.08      -0.02    -0.02              0.18
## affective_empathy          0.25    0.33       0.22     0.13             -0.11
## psychotherapy              0.27    0.29       0.18     0.15             -0.16
## academic_drive            -0.15   -0.25      -0.07    -0.03              0.22
## emotion_recognition        0.03    0.04       0.02     0.06             -0.03
## depression                 1.00    0.72       0.61     0.41             -0.45
## anxiety                    0.72    1.00       0.53     0.33             -0.46
## exhaustion                 0.61    0.53       1.00     0.51             -0.48
## cynicism                   0.41    0.33       0.51     1.00             -0.57
## personal_efficacy         -0.45   -0.46      -0.48    -0.57              1.00
## 
## n= 886 
## 
## 
## P
##                      age    sex    job    partner study_hours
## age                         0.0305 0.0000 0.0000  0.0000     
## sex                  0.0305        0.5348 0.9925  0.6815     
## job                  0.0000 0.5348        0.1569  0.0000     
## partner              0.0000 0.9925 0.1569         0.0019     
## study_hours          0.0000 0.6815 0.0000 0.0019             
## self_reported_health 0.3756 0.0088 0.4945 0.0217  0.0287     
## empathy_jspe         0.0000 0.0002 0.0206 0.0003  0.0000     
## cognitive_empathy    0.0774 0.0006 0.4628 0.2298  0.4932     
## affective_empathy    0.8090 0.0000 0.9960 0.7667  0.3374     
## psychotherapy        0.6497 0.0000 0.0736 0.4249  0.3259     
## academic_drive       0.9887 0.0001 0.4003 0.0668  0.9625     
## emotion_recognition  0.5783 0.0000 0.6545 0.3684  0.0950     
## depression           0.0000 0.0000 0.0752 0.0015  0.0000     
## anxiety              0.0148 0.0000 0.0607 0.0314  0.0000     
## exhaustion           0.0000 0.0000 0.0512 0.7168  0.0000     
## cynicism             0.9561 0.5740 0.7756 0.5813  0.0093     
## personal_efficacy    0.1701 0.1926 0.5824 0.1547  0.0024     
##                      self_reported_health empathy_jspe cognitive_empathy
## age                  0.3756               0.0000       0.0774           
## sex                  0.0088               0.0002       0.0006           
## job                  0.4945               0.0206       0.4628           
## partner              0.0217               0.0003       0.2298           
## study_hours          0.0287               0.0000       0.4932           
## self_reported_health                      0.9163       0.4229           
## empathy_jspe         0.9163                            0.0000           
## cognitive_empathy    0.4229               0.0000                        
## affective_empathy    0.0599               0.0000       0.0000           
## psychotherapy        0.0000               0.1492       0.1688           
## academic_drive       0.4188               0.0031       0.0000           
## emotion_recognition  0.4712               0.0038       0.0282           
## depression           0.0000               0.0172       0.3086           
## anxiety              0.0000               0.0255       0.0204           
## exhaustion           0.0000               0.2280       0.4824           
## cynicism             0.0000               0.8356       0.4606           
## personal_efficacy    0.0000               0.0140       0.0000           
##                      affective_empathy psychotherapy academic_drive
## age                  0.8090            0.6497        0.9887        
## sex                  0.0000            0.0000        0.0001        
## job                  0.9960            0.0736        0.4003        
## partner              0.7667            0.4249        0.0668        
## study_hours          0.3374            0.3259        0.9625        
## self_reported_health 0.0599            0.0000        0.4188        
## empathy_jspe         0.0000            0.1492        0.0031        
## cognitive_empathy    0.0000            0.1688        0.0000        
## affective_empathy                      0.0002        0.0336        
## psychotherapy        0.0002                          0.0308        
## academic_drive       0.0336            0.0308                      
## emotion_recognition  0.0000            0.9179        0.9341        
## depression           0.0000            0.0000        0.0000        
## anxiety              0.0000            0.0000        0.0000        
## exhaustion           0.0000            0.0000        0.0298        
## cynicism             0.0001            0.0000        0.3830        
## personal_efficacy    0.0007            0.0000        0.0000        
##                      emotion_recognition depression anxiety exhaustion cynicism
## age                  0.5783              0.0000     0.0148  0.0000     0.9561  
## sex                  0.0000              0.0000     0.0000  0.0000     0.5740  
## job                  0.6545              0.0752     0.0607  0.0512     0.7756  
## partner              0.3684              0.0015     0.0314  0.7168     0.5813  
## study_hours          0.0950              0.0000     0.0000  0.0000     0.0093  
## self_reported_health 0.4712              0.0000     0.0000  0.0000     0.0000  
## empathy_jspe         0.0038              0.0172     0.0255  0.2280     0.8356  
## cognitive_empathy    0.0282              0.3086     0.0204  0.4824     0.4606  
## affective_empathy    0.0000              0.0000     0.0000  0.0000     0.0001  
## psychotherapy        0.9179              0.0000     0.0000  0.0000     0.0000  
## academic_drive       0.9341              0.0000     0.0000  0.0298     0.3830  
## emotion_recognition                      0.3743     0.2624  0.6482     0.0652  
## depression           0.3743                         0.0000  0.0000     0.0000  
## anxiety              0.2624              0.0000             0.0000     0.0000  
## exhaustion           0.6482              0.0000     0.0000             0.0000  
## cynicism             0.0652              0.0000     0.0000  0.0000             
## personal_efficacy    0.2996              0.0000     0.0000  0.0000     0.0000  
##                      personal_efficacy
## age                  0.1701           
## sex                  0.1926           
## job                  0.5824           
## partner              0.1547           
## study_hours          0.0024           
## self_reported_health 0.0000           
## empathy_jspe         0.0140           
## cognitive_empathy    0.0000           
## affective_empathy    0.0007           
## psychotherapy        0.0000           
## academic_drive       0.0000           
## emotion_recognition  0.2996           
## depression           0.0000           
## anxiety              0.0000           
## exhaustion           0.0000           
## cynicism             0.0000           
## personal_efficacy

To make it easier for us to focus on the significant correlations, we create a matrix in which only the correlations that are significant on a 5 percent level are displayed.

Cormatrix <- rcorr (as.matrix(Mental_Health_Data[,c('age','sex','job', 'partner', 'study_hours', 'self_reported_health','empathy_jspe','cognitive_empathy','affective_empathy','psychotherapy','academic_drive','emotion_recognition','depression','anxiety','exhaustion','cynicism','personal_efficacy')]))

flattenCorrMatrix(Cormatrix$r,Cormatrix$P)

## Error in flattenCorrMatrix(Cormatrix$r, Cormatrix$P): could not find function "flattenCorrMatrix"

length(Cormatrix$P)<-length(Cormatrix$r)

corrplot(Cormatrix$r, type="upper",  
         p.mat = Cormatrix$P, sig.level = 0.05, insig = "blank")

## Error in data.frame(..., check.names = FALSE): arguments imply differing number of rows: 153, 136

This visual provides us with a clear idea of the number and strength of the significant correlations. If we need more detail, we can generate a very similar matrix using the actual numbers of the correlationcoefficients:

corrplot(Cormatrix$r, method="number", type="upper", number.cex = .5,
         p.mat = Cormatrix$P, sig.level = 0.05, insig = "blank")

## Error in data.frame(..., check.names = FALSE): arguments imply differing number of rows: 153, 136

Our conclusion so far, is that there seem to be quite some correlations between the dependent variables (depression, anxiety and burnout). This may indicate that there is certain underlying ‘Mental health’ dimension.

There are also some correlations between de independent variables amongst each other and some correlations between dependent and independent variables.

To further unravel the connections and correct for multicollinearity, we will do some regression analyses.

Regression analyses: Predicting Depression, anxiety and exhaustion/burnout

intercept_only_depression <-lm(depression ~ 1, data=Mental_Health_Data[,c('age','sex','job', 'partner', 'study_hours', 'self_reported_health','empathy_jspe','cognitive_empathy','affective_empathy','psychotherapy','academic_drive','emotion_recognition','depression')])

all_depression <- lm(depression ~ ., data=Mental_Health_Data[,c('age','sex','job', 'partner', 'study_hours', 'self_reported_health','empathy_jspe','cognitive_empathy','affective_empathy','psychotherapy','academic_drive','emotion_recognition','depression')])

forward_depression <- step(intercept_only_depression, direction='forward', scope=formula(all_depression), trace=0)

forward_depression$anova

##                      Step Df   Deviance Resid. Df Resid. Dev      AIC
## 1                         NA         NA       885  116608.71 4325.558
## 2  + self_reported_health -1 14949.1722       884  101659.54 4206.004
## 3     + affective_empathy -1  6102.3016       883   95557.24 4153.157
## 4         + psychotherapy -1  4494.4898       882   91062.75 4112.472
## 5           + study_hours -1  3161.8948       881   87900.86 4083.162
## 6          + empathy_jspe -1  1979.7783       880   85921.08 4064.978
## 7                   + sex -1  1361.4073       879   84559.67 4052.827
## 8        + academic_drive -1   886.9895       878   83672.68 4045.484
## 9                   + age -1   643.5346       877   83029.15 4040.644
## 10              + partner -1   224.2780       876   82804.87 4040.247

forward_depression$coefficients

##          (Intercept) self_reported_health    affective_empathy 
##          33.86021307          -3.21977804           0.42412202 
##        psychotherapy          study_hours         empathy_jspe 
##           5.29780075           0.09057173          -0.14083035 
##                  sex       academic_drive                  age 
##           2.41194048          -0.20495994          -0.25264881 
##              partner 
##          -1.04303066

backward_depression <- step(all_depression, direction='backward', scope=formula(all_depression), trace=0)

backward_depression$anova

##                    Step Df   Deviance Resid. Df Resid. Dev      AIC
## 1                       NA         NA       873   82530.47 4043.306
## 2 - emotion_recognition  1   4.217434       874   82534.68 4041.352
## 3                 - job  1  83.495810       875   82618.18 4040.248
## 4   - cognitive_empathy  1 186.689120       876   82804.87 4040.247

backward_depression$coefficients

##          (Intercept)                  age                  sex 
##          33.86021307          -0.25264881           2.41194048 
##              partner          study_hours self_reported_health 
##          -1.04303066           0.09057173          -3.21977804 
##         empathy_jspe    affective_empathy        psychotherapy 
##          -0.14083035           0.42412202           5.29780075 
##       academic_drive 
##          -0.20495994

both_depression <- step(intercept_only_depression, direction='both', scope=formula(all_depression), trace=0)

both_depression$anova

##                      Step Df   Deviance Resid. Df Resid. Dev      AIC
## 1                         NA         NA       885  116608.71 4325.558
## 2  + self_reported_health -1 14949.1722       884  101659.54 4206.004
## 3     + affective_empathy -1  6102.3016       883   95557.24 4153.157
## 4         + psychotherapy -1  4494.4898       882   91062.75 4112.472
## 5           + study_hours -1  3161.8948       881   87900.86 4083.162
## 6          + empathy_jspe -1  1979.7783       880   85921.08 4064.978
## 7                   + sex -1  1361.4073       879   84559.67 4052.827
## 8        + academic_drive -1   886.9895       878   83672.68 4045.484
## 9                   + age -1   643.5346       877   83029.15 4040.644
## 10              + partner -1   224.2780       876   82804.87 4040.247

both_depression$coefficients

##          (Intercept) self_reported_health    affective_empathy 
##          33.86021307          -3.21977804           0.42412202 
##        psychotherapy          study_hours         empathy_jspe 
##           5.29780075           0.09057173          -0.14083035 
##                  sex       academic_drive                  age 
##           2.41194048          -0.20495994          -0.25264881 
##              partner 
##          -1.04303066

We learn from these analyses that the backward stepwise regression produced a different model than the forward and both directions model. The analysis generates a model with 9 predictors.

In the next step, we will do the same analyses for the dependent variable ‘anxiety’.

intercept_only_anxiety <-lm(anxiety ~ 1, data=Mental_Health_Data[,c('age','sex','job', 'partner', 'study_hours', 'self_reported_health','empathy_jspe','cognitive_empathy','affective_empathy','psychotherapy','academic_drive','emotion_recognition','anxiety')])

all_anxiety <- lm(anxiety ~ ., data=Mental_Health_Data[,c('age','sex','job', 'partner', 'study_hours', 'self_reported_health','empathy_jspe','cognitive_empathy','affective_empathy','psychotherapy','academic_drive','emotion_recognition','anxiety')])

forward_anxiety <- step(intercept_only_anxiety, direction='forward', scope=formula(all_anxiety), trace=0)

forward_anxiety$anova

##                      Step Df   Deviance Resid. Df Resid. Dev      AIC
## 1                         NA         NA       885  126982.86 4401.070
## 2     + affective_empathy -1 13941.8401       884  113041.02 4300.027
## 3  + self_reported_health -1 10311.2818       883  102729.74 4217.282
## 4        + academic_drive -1  6130.1605       882   96599.58 4164.769
## 5         + psychotherapy -1  5380.7563       881   91218.82 4115.989
## 6           + study_hours -1  2837.8658       880   88380.95 4089.988
## 7          + empathy_jspe -1  2004.4280       879   86376.53 4071.662
## 8                   + sex -1   859.7539       878   85516.77 4064.799
## 9     + cognitive_empathy -1   605.1668       877   84911.60 4060.507
## 10                  + job -1   219.2899       876   84692.31 4060.216

forward_anxiety$coefficients

##          (Intercept)    affective_empathy self_reported_health 
##          55.89211583           0.66958826          -2.69640112 
##       academic_drive        psychotherapy          study_hours 
##          -0.35699843           6.13118711           0.09509483 
##         empathy_jspe                  sex    cognitive_empathy 
##          -0.15547782           2.41009973          -0.15018646 
##                  job 
##          -1.07033559

backward_anxiety <- step(all_anxiety, direction='backward', scope=formula(all_anxiety), trace=0)

backward_anxiety$anova

##                    Step Df  Deviance Resid. Df Resid. Dev      AIC
## 1                       NA        NA       873   84628.59 4065.549
## 2                 - age  1  1.040617       874   84629.63 4063.560
## 3 - emotion_recognition  1  4.589701       875   84634.22 4061.608
## 4             - partner  1 58.097876       876   84692.31 4060.216

backward_anxiety$coefficients

##          (Intercept)                  sex                  job 
##          55.89211583           2.41009973          -1.07033559 
##          study_hours self_reported_health         empathy_jspe 
##           0.09509483          -2.69640112          -0.15547782 
##    cognitive_empathy    affective_empathy        psychotherapy 
##          -0.15018646           0.66958826           6.13118711 
##       academic_drive 
##          -0.35699843

both_anxiety <- step(intercept_only_anxiety, direction='both', scope=formula(all_anxiety), trace=0)

both_anxiety$anova

##                      Step Df   Deviance Resid. Df Resid. Dev      AIC
## 1                         NA         NA       885  126982.86 4401.070
## 2     + affective_empathy -1 13941.8401       884  113041.02 4300.027
## 3  + self_reported_health -1 10311.2818       883  102729.74 4217.282
## 4        + academic_drive -1  6130.1605       882   96599.58 4164.769
## 5         + psychotherapy -1  5380.7563       881   91218.82 4115.989
## 6           + study_hours -1  2837.8658       880   88380.95 4089.988
## 7          + empathy_jspe -1  2004.4280       879   86376.53 4071.662
## 8                   + sex -1   859.7539       878   85516.77 4064.799
## 9     + cognitive_empathy -1   605.1668       877   84911.60 4060.507
## 10                  + job -1   219.2899       876   84692.31 4060.216

both_anxiety$coefficients

##          (Intercept)    affective_empathy self_reported_health 
##          55.89211583           0.66958826          -2.69640112 
##       academic_drive        psychotherapy          study_hours 
##          -0.35699843           6.13118711           0.09509483 
##         empathy_jspe                  sex    cognitive_empathy 
##          -0.15547782           2.41009973          -0.15018646 
##                  job 
##          -1.07033559

We can see that the resulting model is also a model with 9 predictors. Job and cognitive empathy are included in the model that predicts anxiety and not in the depression-model. It’s the other way around for the variables partner and age.

In the next step, we are going to calculate the indicators of burnout. There are 3 separate indicators for burn-out: 2 positive indicators (exhaustion and cynicism) and 1 negative (personal efficacy). By considering the 3 parameters together, it’s possible to get a good picture of the whole profile of a group or individual. It is, however, not methodologically correct to make the sum of the different variables to get 1 score (1 score can refer to different underlying realities).

For the sake of this exercise, we will use ‘exhaustion’ as an indicator of burnout. It seems wiser to be cautious and conscientiously work with a narrowed down, specific definition than to develop metrix that are methodologically questionable.

intercept_only_exhaustion <-lm(exhaustion ~ 1, data=Mental_Health_Data[,c('age','sex','job', 'partner', 'study_hours', 'self_reported_health','empathy_jspe','cognitive_empathy','affective_empathy','psychotherapy','academic_drive','emotion_recognition','exhaustion')])

all_exhaustion <- lm(exhaustion ~ ., data=Mental_Health_Data[,c('age','sex','job', 'partner', 'study_hours', 'self_reported_health','empathy_jspe','cognitive_empathy','affective_empathy','psychotherapy','academic_drive','emotion_recognition','exhaustion')])

forward_exhaustion <- step(intercept_only_exhaustion, direction='forward', scope=formula(all_exhaustion), trace=0)

forward_exhaustion$anova

##                     Step Df   Deviance Resid. Df Resid. Dev      AIC
## 1                        NA         NA       885   24448.84 2941.412
## 2 + self_reported_health -1 1993.72558       884   22455.11 2868.045
## 3    + affective_empathy -1  960.65200       883   21494.46 2831.306
## 4                  + age -1  876.54838       882   20617.91 2796.418
## 5          + study_hours -1  363.61968       881   20254.29 2782.653
## 6        + psychotherapy -1  377.76027       880   19876.53 2767.972
## 7    + cognitive_empathy -1  139.03844       879   19737.49 2763.753
## 8              + partner -1   54.71038       878   19682.78 2763.293

forward_exhaustion$coefficients

##          (Intercept) self_reported_health    affective_empathy 
##          22.09201383          -1.26630413           0.20275853 
##                  age          study_hours        psychotherapy 
##          -0.24707207           0.04471573           1.58432376 
##    cognitive_empathy              partner 
##          -0.06370225           0.51292423

backward_exhaustion <- step(all_exhaustion, direction='backward', scope=formula(all_exhaustion), trace=0)

backward_exhaustion$anova

##                    Step Df   Deviance Resid. Df Resid. Dev      AIC
## 1                       NA         NA       873   19602.29 2769.663
## 2 - emotion_recognition  1  0.5513058       874   19602.84 2767.687
## 3      - academic_drive  1  2.1358736       875   19604.97 2765.784
## 4                 - job  1  7.9881306       876   19612.96 2764.145
## 5        - empathy_jspe  1 26.9527435       877   19639.92 2763.362
## 6                 - sex  1 42.8651545       878   19682.78 2763.293

backward_exhaustion$coefficients

##          (Intercept)                  age              partner 
##          22.09201383          -0.24707207           0.51292423 
##          study_hours self_reported_health    cognitive_empathy 
##           0.04471573          -1.26630413          -0.06370225 
##    affective_empathy        psychotherapy 
##           0.20275853           1.58432376

both_exhaustion <- step(intercept_only_exhaustion, direction='both', scope=formula(all_exhaustion), trace=0)

both_exhaustion$anova

##                     Step Df   Deviance Resid. Df Resid. Dev      AIC
## 1                        NA         NA       885   24448.84 2941.412
## 2 + self_reported_health -1 1993.72558       884   22455.11 2868.045
## 3    + affective_empathy -1  960.65200       883   21494.46 2831.306
## 4                  + age -1  876.54838       882   20617.91 2796.418
## 5          + study_hours -1  363.61968       881   20254.29 2782.653
## 6        + psychotherapy -1  377.76027       880   19876.53 2767.972
## 7    + cognitive_empathy -1  139.03844       879   19737.49 2763.753
## 8              + partner -1   54.71038       878   19682.78 2763.293

both_exhaustion$coefficients

##          (Intercept) self_reported_health    affective_empathy 
##          22.09201383          -1.26630413           0.20275853 
##                  age          study_hours        psychotherapy 
##          -0.24707207           0.04471573           1.58432376 
##    cognitive_empathy              partner 
##          -0.06370225           0.51292423

We learn from these analysis that the prediction of exhaustion requires a relatively more economic model (7 instead of 9 variables). Sex, job, JSPE measurement of empathy and academic motivation, were variables that were included in the models that predicted.

So far, we have identified the predictors of depression, anxiety and burnout/exhaustion in medical students. The resulting models include 7-9 independent variables. Whilst there is some overlap in the variables, we can also see some differences between the models.

Identifying which variables that predict important risks for mental health issues

In an attempt to generate a model that allows us to identify the group of students which is most at risk, we will explore the following approach: * Identify the top 25 percent of highest scores of depression, anxiety and exhaustion. * Identify the cases that are in the top 25 for the 3 metrics. * Perform a logistic regression to identify the variables that allow us to identify these high risk profiles.

Step 1 is to calculate the cut off scores for depression, anxiety and exhaustion:

quantile(Mental_Health_Data$depression, probs = 0.75)

## 75% 
##  25

quantile(Mental_Health_Data$anxiety, probs = 0.75)

## 75% 
##  51

quantile(Mental_Health_Data$exhaustion, probs = 0.75)

## 75% 
##  20

Now that we know these values, we can create a binary variable to indicate if someone belongs to the top 25 percent or not.

Mental_Health_Data$depression_binary <- ifelse (Mental_Health_Data$depression >25,1,0)
Mental_Health_Data$anxiety_binary <- ifelse (Mental_Health_Data$anxiety >51,1,0)   
Mental_Health_Data$exhaustion_binary <- ifelse (Mental_Health_Data$exhaustion >20,1,0)

Now that we have these new variables, we can create a new variable to indicate if somebody belongs to the top 25 pct of depression, anxiety and exhaustion:

Mental_Health_Data$Mental_Health_risk_cont <- Mental_Health_Data$depression_binary + Mental_Health_Data$anxiety_binary + Mental_Health_Data$exhaustion_binary
Mental_Health_Data$Mental_Health_risk_bin <- ifelse (Mental_Health_Data$Mental_Health_risk_cont > 2.5,1,0)

Mental_Health_Data$Mental_Health_risk_bin

##   [1] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
##  [38] 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0
##  [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [112] 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1
## [149] 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [186] 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## [223] 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0
## [260] 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [297] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
## [334] 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 1 0 0
## [371] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [408] 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [445] 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0
## [482] 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
## [519] 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [556] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0
## [593] 1 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [630] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
## [667] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
## [704] 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
## [741] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [778] 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [815] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
## [852] 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0

Now that we have created this variable, we can check how many people are to be considered ‘high risk’ (in the sense that they are in the highest 25 pct scores of depression, anxiety and exhaustion)

factor(Mental_Health_Data$Mental_Health_risk_bin)

##   [1] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
##  [38] 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0
##  [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [112] 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1
## [149] 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [186] 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## [223] 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0
## [260] 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [297] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
## [334] 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 1 0 0
## [371] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [408] 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [445] 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0
## [482] 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
## [519] 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [556] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0
## [593] 1 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [630] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
## [667] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
## [704] 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
## [741] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [778] 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [815] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
## [852] 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0
## Levels: 0 1

table(Mental_Health_Data$Mental_Health_risk_bin)

## 
##   0   1 
## 808  78

Before doing a final logistic regression, we explore the point-bisserial correlations between our new variable and the other independent variables by checking out the different signifant correlations (on a 5 pct level).

Cormatrix_risk <- rcorr (as.matrix(Mental_Health_Data[,c('age','sex','job', 'partner', 'study_hours', 'self_reported_health','empathy_jspe','cognitive_empathy','affective_empathy','psychotherapy','academic_drive','emotion_recognition','Mental_Health_risk_bin')]))

flattenCorrMatrix(Cormatrix_risk$r,Cormatrix_risk$P)

## Error in flattenCorrMatrix(Cormatrix_risk$r, Cormatrix_risk$P): could not find function "flattenCorrMatrix"

corrplot(Cormatrix_risk$r, method = "number", type="upper", number.cex = .5,
         p.mat = Cormatrix_risk$P, sig.level = 0.05, insig = "blank")

## Error in data.frame(..., check.names = FALSE): arguments imply differing number of rows: 91, 78

Our last step consists of doing a logistical regression with the overall mental health risk variable we’ve just created as a dependent variable.

full.model <- glm(Mental_Health_risk_bin~.,data = Mental_Health_Data[,c('age','sex','job', 'partner', 'study_hours', 'self_reported_health','empathy_jspe','cognitive_empathy','affective_empathy','psychotherapy','academic_drive','emotion_recognition','Mental_Health_risk_bin')], family=binomial)
coef(full.model)

##          (Intercept)                  age                  sex 
##          0.242386799         -0.070676193          1.174014694 
##                  job              partner          study_hours 
##         -0.292119953         -0.453792663          0.015851591 
## self_reported_health         empathy_jspe    cognitive_empathy 
##         -0.604078122         -0.001360804         -0.040380690 
##    affective_empathy        psychotherapy       academic_drive 
##          0.053058816          0.751145492         -0.001389802 
##  emotion_recognition 
##         -1.095532081

step.model <- full.model %>% stepAIC(trace=FALSE)
coef(step.model)

##          (Intercept)                  age                  sex 
##          -0.37094644          -0.08080130           1.11888487 
##              partner          study_hours self_reported_health 
##          -0.45326258           0.01727268          -0.60914421 
##    cognitive_empathy    affective_empathy        psychotherapy 
##          -0.04186289           0.05132707           0.76216492

The result is a model with 8 variables that have an impact on the probability of in the top 25 percent of depression, anxiety AND exhaustion: Age, sex, partner, hours of study, self reported health, cognitive empathy, affective empathy and having consulted a psycho therapist over the last 12 months.

Capstone project

Michael

2023-03-02