This analysis aims at identifying factors that predict an increased probability of mental heath issues among medical students. This should allow policy makers, faculties of medicine and employers to identify groups at risk.
The data that were used for this analysis were made available on https://www.kaggle.com by Carrard et al. (2022).
Here is the direct link to the dataset and the codebook.
Furthermore, I looked for more background information on the different scales that were used to gather these data.
Before actually starting the analysis, we will start by exploring and preparing the dataset
We’ll use the , Tableone, Tidyverse, GGPLOT2 and DPLYR packages. So we’ll install them first.
–> If needed, you can use the code chunk below to install the packages
install.packages(“dplyr”) install.packages(“ggplot2”) install.packages(“tidyverse”) install.packages(“tableone”) install.packages(“corrplot”) install.packages(“PerformanceAnalytics”) install.packages(“Hmisc”) install.packages(“MASS”)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ lubridate 1.9.2 ✔ tibble 3.1.8
## ✔ purrr 1.0.1 ✔ tidyr 1.3.0
## ✔ readr 2.1.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(tableone)
library(corrplot)
## corrplot 0.92 loaded
library(PerformanceAnalytics)
## Loading required package: xts
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
##
## ################################### WARNING ###################################
## # We noticed you have dplyr installed. The dplyr lag() function breaks how #
## # base R's lag() function is supposed to work, which breaks lag(my_xts). #
## # #
## # Calls to lag(my_xts) that you enter or source() into this session won't #
## # work correctly. #
## # #
## # All package code is unaffected because it is protected by the R namespace #
## # mechanism. #
## # #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning. #
## # #
## # You can use stats::lag() to make sure you're not using dplyr::lag(), or you #
## # can add conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop #
## # dplyr from breaking base R's lag() function. #
## ################################### WARNING ###################################
##
## Attaching package: 'xts'
##
## The following objects are masked from 'package:dplyr':
##
## first, last
##
##
## Attaching package: 'PerformanceAnalytics'
##
## The following object is masked from 'package:graphics':
##
## legend
library(Hmisc)
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
##
## Attaching package: 'Hmisc'
##
## The following objects are masked from 'package:dplyr':
##
## src, summarize
##
## The following objects are masked from 'package:base':
##
## format.pval, units
library(MASS)
##
## Attaching package: 'MASS'
##
## The following object is masked from 'package:dplyr':
##
## select
The first step is to load the CSV file that was previously downlaaded from Kaggle.
Mental_Health_Data <- read.csv("~/Downloads/Data_Carrard_et_al_2022_MedTeach.csv")
We start by taking a look at the dataset and the variables:
View(Mental_Health_Data)
glimpse(Mental_Health_Data)
## Rows: 886
## Columns: 20
## $ id <int> 2, 4, 9, 10, 13, 14, 17, 21, 23, 24, 28, 34, 35, 36, 38, 40,…
## $ age <int> 18, 26, 21, 21, 21, 26, 23, 23, 23, 22, 18, 18, 24, 21, 22, …
## $ year <int> 1, 4, 3, 2, 3, 5, 5, 4, 4, 2, 1, 1, 5, 2, 2, 2, 1, 4, 5, 3, …
## $ sex <int> 1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1, 2, 2, …
## $ glang <int> 120, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 90, 1, 1, 1, 90, 1, 1,…
## $ part <int> 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, …
## $ job <int> 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ stud_h <int> 56, 20, 36, 51, 22, 10, 15, 8, 20, 20, 20, 9, 25, 51, 42, 40…
## $ health <int> 3, 4, 3, 5, 4, 2, 3, 4, 2, 5, 4, 5, 5, 2, 3, 4, 4, 4, 4, 4, …
## $ psyt <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, …
## $ jspe <int> 88, 109, 106, 101, 102, 102, 117, 118, 118, 108, 105, 115, 9…
## $ qcae_cog <int> 62, 55, 64, 52, 58, 48, 58, 65, 69, 56, 54, 68, 70, 53, 55, …
## $ qcae_aff <int> 27, 37, 39, 33, 28, 37, 38, 40, 46, 36, 26, 36, 37, 37, 37, …
## $ amsp <int> 17, 22, 17, 18, 21, 17, 23, 32, 23, 22, 25, 27, 28, 19, 15, …
## $ erec_mean <dbl> 0.7380952, 0.6904762, 0.6904762, 0.8333333, 0.6904762, 0.690…
## $ cesd <int> 34, 7, 25, 17, 14, 14, 45, 6, 43, 11, 7, 5, 8, 19, 25, 18, 2…
## $ stai_t <int> 61, 33, 73, 48, 46, 56, 56, 36, 43, 43, 32, 45, 38, 48, 45, …
## $ mbi_ex <int> 17, 14, 24, 16, 22, 18, 28, 11, 26, 18, 11, 15, 11, 19, 19, …
## $ mbi_cy <int> 13, 11, 7, 10, 14, 15, 17, 10, 21, 6, 6, 4, 10, 11, 7, 14, 1…
## $ mbi_ea <int> 20, 26, 23, 21, 23, 18, 16, 27, 22, 23, 26, 26, 29, 20, 26, …
For our own convenience, we rename the variables:
Mental_Health_Data <- rename(Mental_Health_Data,language = glang, partner=part, study_hours=stud_h, self_reported_health=health, psychotherapy=psyt, empathy_jspe=jspe, cognitive_empathy=qcae_cog, affective_empathy=qcae_aff, academic_drive=amsp,emotion_recognition=erec_mean, depression=cesd, anxiety=stai_t, exhaustion=mbi_ex, cynicism=mbi_cy, personal_efficacy=mbi_ea)
names(Mental_Health_Data)
## [1] "id" "age" "year"
## [4] "sex" "language" "partner"
## [7] "job" "study_hours" "self_reported_health"
## [10] "psychotherapy" "empathy_jspe" "cognitive_empathy"
## [13] "affective_empathy" "academic_drive" "emotion_recognition"
## [16] "depression" "anxiety" "exhaustion"
## [19] "cynicism" "personal_efficacy"
Let’s check out how many rows there are in our dataset.
nrow(Mental_Health_Data)
## [1] 886
Before moving on with the data preparation, we will make sure that
the categorical variables receive labels for their values. We do this by
creating new variables (same variable name with addition of ‘Label’). We
keep the old variable so we can still use them as a binary variable for
calculations further down the road.
* Sex_label * Language_label * Partner_label * Job_label *
Psychotherapy_label
Mental_Health_Data$sex_label<-factor(Mental_Health_Data$sex,
levels = c(1,2,3),
labels = c("Man", "Woman", "Non-binary"))
Mental_Health_Data$language_label<-factor(Mental_Health_Data$language,
levels=c(1,15,20,37,51,52,53,54,59,60,62,63,82,83,84,85,86,87,88,89,90,92,93,94,95,96,98,100,101,102,104,106,108,112,113,114,116,117,118,119,120,121),
labels = c("French", "German", "English", "Arab","Basque", "Bulgarian", "Catalan", "Chinese", "Korean", "Croatian", "Danish", "Spanish", "Estonian", "Finnish","Galician","Greek","Hebrew","Hindi","Hungarian","Indonesian","Italian","Japanese","Kazakh","Latvian","Lithuanian","Malay","Dutch","Norwegian","Polish","Portuguese","Romanian","Russian","Serbian","Slovak","Slovenian","Swedish","Czech","Thai","Turkish","Ukrainian","Vietnamese","Other"))
Mental_Health_Data$job_label<-factor(Mental_Health_Data$job,
levels = c(0,1),
labels = c("No", "Yes"))
Mental_Health_Data$psychotherapy_label<-factor(Mental_Health_Data$psychotherapy,
levels = c(0,1),
labels = c("No", "Yes"))
Mental_Health_Data$partner_label<-factor(Mental_Health_Data$partner,
levels = c(0,1),
labels = c("No", "Yes"))
In the next step, we will try to get a better understanding of the data by generating some descriptive statistics:
Table_descriptives<-CreateTableOne(data=Mental_Health_Data)
summary(Table_descriptives)
##
## ### Summary of continuous variables ###
##
## strata: Overall
## n miss p.miss mean sd median p25 p75 min max
## id 886 0 0 889.7 5e+02 876.0 447.5 1e+03 2.0 1790
## age 886 0 0 22.4 3e+00 22.0 20.0 2e+01 17.0 49
## year 886 0 0 3.1 2e+00 3.0 1.0 5e+00 1.0 6
## sex 886 0 0 1.7 5e-01 2.0 1.0 2e+00 1.0 3
## language 886 0 0 14.3 3e+01 1.0 1.0 1e+00 1.0 121
## partner 886 0 0 0.6 5e-01 1.0 0.0 1e+00 0.0 1
## job 886 0 0 0.3 5e-01 0.0 0.0 1e+00 0.0 1
## study_hours 886 0 0 25.3 2e+01 25.0 12.0 4e+01 0.0 70
## self_reported_health 886 0 0 3.8 1e+00 4.0 3.0 5e+00 1.0 5
## psychotherapy 886 0 0 0.2 4e-01 0.0 0.0 0e+00 0.0 1
## empathy_jspe 886 0 0 106.4 9e+00 107.0 101.0 1e+02 67.0 125
## cognitive_empathy 886 0 0 58.5 7e+00 58.0 54.0 6e+01 37.0 76
## affective_empathy 886 0 0 34.8 5e+00 35.0 31.0 4e+01 18.0 48
## academic_drive 886 0 0 23.2 5e+00 23.0 20.0 3e+01 6.0 35
## emotion_recognition 886 0 0 0.7 9e-02 0.7 0.7 8e-01 0.4 1
## depression 886 0 0 18.1 1e+01 16.0 9.0 2e+01 0.0 56
## anxiety 886 0 0 42.9 1e+01 43.0 34.0 5e+01 20.0 77
## exhaustion 886 0 0 16.9 5e+00 17.0 13.0 2e+01 5.0 30
## cynicism 886 0 0 10.1 5e+00 9.0 6.0 1e+01 4.0 24
## personal_efficacy 886 0 0 24.2 5e+00 24.0 21.0 3e+01 10.0 36
## skew kurt
## id 0.02 -1.21
## age 2.07 9.43
## year 0.26 -1.29
## sex -0.69 -1.10
## language 2.28 3.51
## partner -0.26 -1.94
## job 0.64 -1.60
## study_hours 0.45 -0.48
## self_reported_health -0.88 0.24
## psychotherapy 1.32 -0.25
## empathy_jspe -0.66 0.71
## cognitive_empathy -0.09 0.12
## affective_empathy -0.20 -0.22
## academic_drive -0.11 -0.05
## emotion_recognition -0.41 0.24
## depression 0.68 -0.08
## anxiety 0.24 -0.41
## exhaustion 0.10 -0.39
## cynicism 0.75 -0.03
## personal_efficacy -0.18 -0.19
##
## =======================================================================================
##
## ### Summary of categorical variables ###
##
## strata: Overall
## var n miss p.miss level freq percent cum.percent
## sex_label 886 0 0.0 Man 275 31.0 31.0
## Woman 606 68.4 99.4
## Non-binary 5 0.6 100.0
##
## language_label 886 0 0.0 French 717 80.9 80.9
## German 31 3.5 84.4
## English 22 2.5 86.9
## Arab 3 0.3 87.2
## Basque 0 0.0 87.2
## Bulgarian 0 0.0 87.2
## Catalan 0 0.0 87.2
## Chinese 1 0.1 87.4
## Korean 0 0.0 87.4
## Croatian 3 0.3 87.7
## Danish 0 0.0 87.7
## Spanish 5 0.6 88.3
## Estonian 0 0.0 88.3
## Finnish 0 0.0 88.3
## Galician 0 0.0 88.3
## Greek 0 0.0 88.3
## Hebrew 0 0.0 88.3
## Hindi 0 0.0 88.3
## Hungarian 0 0.0 88.3
## Indonesian 0 0.0 88.3
## Italian 45 5.1 93.3
## Japanese 1 0.1 93.5
## Kazakh 0 0.0 93.5
## Latvian 0 0.0 93.5
## Lithuanian 1 0.1 93.6
## Malay 0 0.0 93.6
## Dutch 1 0.1 93.7
## Norwegian 0 0.0 93.7
## Polish 0 0.0 93.7
## Portuguese 27 3.0 96.7
## Romanian 4 0.5 97.2
## Russian 6 0.7 97.9
## Serbian 1 0.1 98.0
## Slovak 0 0.0 98.0
## Slovenian 0 0.0 98.0
## Swedish 1 0.1 98.1
## Czech 0 0.0 98.1
## Thai 0 0.0 98.1
## Turkish 2 0.2 98.3
## Ukrainian 0 0.0 98.3
## Vietnamese 2 0.2 98.5
## Other 13 1.5 100.0
##
## job_label 886 0 0.0 No 577 65.1 65.1
## Yes 309 34.9 100.0
##
## psychotherapy_label 886 0 0.0 No 687 77.5 77.5
## Yes 199 22.5 100.0
##
## partner_label 886 0 0.0 No 387 43.7 43.7
## Yes 499 56.3 100.0
##
The tables above have provided us with a better understanding of the different variables and their distribution.
We also learned from the previous steps that there don’t seem to be any missing values.
When it comes to the respondents, we have learned the following:
We start our analysis with checking out the existing correlations between the different variables.
col1 <- colorRampPalette(c("#7F0000","red","#FF7F00","yellow","white",
"cyan", "#007FFF", "blue","#00007F"))
col2 <- colorRampPalette(c("#67001F", "#B2182B", "#D6604D", "#F4A582", "#FDDBC7",
"#FFFFFF", "#D1E5F0", "#92C5DE", "#4393C3", "#2166AC", "#053061"))
col3 <- colorRampPalette(c("red", "white", "blue"))
col4 <- colorRampPalette(c("#7F0000","red","#FF7F00","yellow","#7FFF7F",
"cyan", "#007FFF", "blue","#00007F"))
M<-cor(Mental_Health_Data[,c('age','sex','job', 'partner', 'study_hours', 'self_reported_health','empathy_jspe','cognitive_empathy','affective_empathy','psychotherapy','academic_drive','emotion_recognition','depression','anxiety','exhaustion','cynicism','personal_efficacy')])
corrplot(M, method ="number", col=col3(200), number.cex = .5)
M<-cor(Mental_Health_Data[,c('age','sex','job', 'partner', 'study_hours', 'self_reported_health','empathy_jspe','cognitive_empathy','affective_empathy','psychotherapy','academic_drive','emotion_recognition','depression','anxiety','exhaustion','cynicism','personal_efficacy')])
corrplot(M, method ="number", number.cex = .5)
## using these color spectrums
corrplot(M, addrect=2, col=col1(100))
The included graphs don’t provide dus with any information on the significance levels of the correlations. Therefore, we also calculate the p-values for the correlations.
Cormatrix <- rcorr (as.matrix(Mental_Health_Data[,c('age','sex','job', 'partner', 'study_hours', 'self_reported_health','empathy_jspe','cognitive_empathy','affective_empathy','psychotherapy','academic_drive','emotion_recognition','depression','anxiety','exhaustion','cynicism','personal_efficacy')]))
Cormatrix
## age sex job partner study_hours self_reported_health
## age 1.00 -0.07 0.23 0.19 -0.29 -0.03
## sex -0.07 1.00 0.02 0.00 -0.01 -0.09
## job 0.23 0.02 1.00 0.05 -0.20 -0.02
## partner 0.19 0.00 0.05 1.00 -0.10 0.08
## study_hours -0.29 -0.01 -0.20 -0.10 1.00 -0.07
## self_reported_health -0.03 -0.09 -0.02 0.08 -0.07 1.00
## empathy_jspe 0.22 0.13 0.08 0.12 -0.13 0.00
## cognitive_empathy 0.06 0.12 0.02 0.04 -0.02 -0.03
## affective_empathy -0.01 0.36 0.00 0.01 -0.03 -0.06
## psychotherapy 0.02 0.16 0.06 0.03 -0.03 -0.14
## academic_drive 0.00 -0.13 0.03 0.06 0.00 0.03
## emotion_recognition -0.02 0.21 0.02 0.03 -0.06 0.02
## depression -0.14 0.23 -0.06 -0.11 0.17 -0.36
## anxiety -0.08 0.25 -0.06 -0.07 0.15 -0.31
## exhaustion -0.18 0.16 -0.07 -0.01 0.19 -0.29
## cynicism 0.00 0.02 0.01 -0.02 -0.09 -0.19
## personal_efficacy 0.05 -0.04 -0.02 0.05 0.10 0.22
## empathy_jspe cognitive_empathy affective_empathy
## age 0.22 0.06 -0.01
## sex 0.13 0.12 0.36
## job 0.08 0.02 0.00
## partner 0.12 0.04 0.01
## study_hours -0.13 -0.02 -0.03
## self_reported_health 0.00 -0.03 -0.06
## empathy_jspe 1.00 0.34 0.26
## cognitive_empathy 0.34 1.00 0.26
## affective_empathy 0.26 0.26 1.00
## psychotherapy 0.05 0.05 0.12
## academic_drive 0.10 0.39 -0.07
## emotion_recognition 0.10 0.07 0.14
## depression -0.08 -0.03 0.25
## anxiety -0.08 -0.08 0.33
## exhaustion -0.04 -0.02 0.22
## cynicism -0.01 -0.02 0.13
## personal_efficacy 0.08 0.18 -0.11
## psychotherapy academic_drive emotion_recognition
## age 0.02 0.00 -0.02
## sex 0.16 -0.13 0.21
## job 0.06 0.03 0.02
## partner 0.03 0.06 0.03
## study_hours -0.03 0.00 -0.06
## self_reported_health -0.14 0.03 0.02
## empathy_jspe 0.05 0.10 0.10
## cognitive_empathy 0.05 0.39 0.07
## affective_empathy 0.12 -0.07 0.14
## psychotherapy 1.00 -0.07 0.00
## academic_drive -0.07 1.00 0.00
## emotion_recognition 0.00 0.00 1.00
## depression 0.27 -0.15 0.03
## anxiety 0.29 -0.25 0.04
## exhaustion 0.18 -0.07 0.02
## cynicism 0.15 -0.03 0.06
## personal_efficacy -0.16 0.22 -0.03
## depression anxiety exhaustion cynicism personal_efficacy
## age -0.14 -0.08 -0.18 0.00 0.05
## sex 0.23 0.25 0.16 0.02 -0.04
## job -0.06 -0.06 -0.07 0.01 -0.02
## partner -0.11 -0.07 -0.01 -0.02 0.05
## study_hours 0.17 0.15 0.19 -0.09 0.10
## self_reported_health -0.36 -0.31 -0.29 -0.19 0.22
## empathy_jspe -0.08 -0.08 -0.04 -0.01 0.08
## cognitive_empathy -0.03 -0.08 -0.02 -0.02 0.18
## affective_empathy 0.25 0.33 0.22 0.13 -0.11
## psychotherapy 0.27 0.29 0.18 0.15 -0.16
## academic_drive -0.15 -0.25 -0.07 -0.03 0.22
## emotion_recognition 0.03 0.04 0.02 0.06 -0.03
## depression 1.00 0.72 0.61 0.41 -0.45
## anxiety 0.72 1.00 0.53 0.33 -0.46
## exhaustion 0.61 0.53 1.00 0.51 -0.48
## cynicism 0.41 0.33 0.51 1.00 -0.57
## personal_efficacy -0.45 -0.46 -0.48 -0.57 1.00
##
## n= 886
##
##
## P
## age sex job partner study_hours
## age 0.0305 0.0000 0.0000 0.0000
## sex 0.0305 0.5348 0.9925 0.6815
## job 0.0000 0.5348 0.1569 0.0000
## partner 0.0000 0.9925 0.1569 0.0019
## study_hours 0.0000 0.6815 0.0000 0.0019
## self_reported_health 0.3756 0.0088 0.4945 0.0217 0.0287
## empathy_jspe 0.0000 0.0002 0.0206 0.0003 0.0000
## cognitive_empathy 0.0774 0.0006 0.4628 0.2298 0.4932
## affective_empathy 0.8090 0.0000 0.9960 0.7667 0.3374
## psychotherapy 0.6497 0.0000 0.0736 0.4249 0.3259
## academic_drive 0.9887 0.0001 0.4003 0.0668 0.9625
## emotion_recognition 0.5783 0.0000 0.6545 0.3684 0.0950
## depression 0.0000 0.0000 0.0752 0.0015 0.0000
## anxiety 0.0148 0.0000 0.0607 0.0314 0.0000
## exhaustion 0.0000 0.0000 0.0512 0.7168 0.0000
## cynicism 0.9561 0.5740 0.7756 0.5813 0.0093
## personal_efficacy 0.1701 0.1926 0.5824 0.1547 0.0024
## self_reported_health empathy_jspe cognitive_empathy
## age 0.3756 0.0000 0.0774
## sex 0.0088 0.0002 0.0006
## job 0.4945 0.0206 0.4628
## partner 0.0217 0.0003 0.2298
## study_hours 0.0287 0.0000 0.4932
## self_reported_health 0.9163 0.4229
## empathy_jspe 0.9163 0.0000
## cognitive_empathy 0.4229 0.0000
## affective_empathy 0.0599 0.0000 0.0000
## psychotherapy 0.0000 0.1492 0.1688
## academic_drive 0.4188 0.0031 0.0000
## emotion_recognition 0.4712 0.0038 0.0282
## depression 0.0000 0.0172 0.3086
## anxiety 0.0000 0.0255 0.0204
## exhaustion 0.0000 0.2280 0.4824
## cynicism 0.0000 0.8356 0.4606
## personal_efficacy 0.0000 0.0140 0.0000
## affective_empathy psychotherapy academic_drive
## age 0.8090 0.6497 0.9887
## sex 0.0000 0.0000 0.0001
## job 0.9960 0.0736 0.4003
## partner 0.7667 0.4249 0.0668
## study_hours 0.3374 0.3259 0.9625
## self_reported_health 0.0599 0.0000 0.4188
## empathy_jspe 0.0000 0.1492 0.0031
## cognitive_empathy 0.0000 0.1688 0.0000
## affective_empathy 0.0002 0.0336
## psychotherapy 0.0002 0.0308
## academic_drive 0.0336 0.0308
## emotion_recognition 0.0000 0.9179 0.9341
## depression 0.0000 0.0000 0.0000
## anxiety 0.0000 0.0000 0.0000
## exhaustion 0.0000 0.0000 0.0298
## cynicism 0.0001 0.0000 0.3830
## personal_efficacy 0.0007 0.0000 0.0000
## emotion_recognition depression anxiety exhaustion cynicism
## age 0.5783 0.0000 0.0148 0.0000 0.9561
## sex 0.0000 0.0000 0.0000 0.0000 0.5740
## job 0.6545 0.0752 0.0607 0.0512 0.7756
## partner 0.3684 0.0015 0.0314 0.7168 0.5813
## study_hours 0.0950 0.0000 0.0000 0.0000 0.0093
## self_reported_health 0.4712 0.0000 0.0000 0.0000 0.0000
## empathy_jspe 0.0038 0.0172 0.0255 0.2280 0.8356
## cognitive_empathy 0.0282 0.3086 0.0204 0.4824 0.4606
## affective_empathy 0.0000 0.0000 0.0000 0.0000 0.0001
## psychotherapy 0.9179 0.0000 0.0000 0.0000 0.0000
## academic_drive 0.9341 0.0000 0.0000 0.0298 0.3830
## emotion_recognition 0.3743 0.2624 0.6482 0.0652
## depression 0.3743 0.0000 0.0000 0.0000
## anxiety 0.2624 0.0000 0.0000 0.0000
## exhaustion 0.6482 0.0000 0.0000 0.0000
## cynicism 0.0652 0.0000 0.0000 0.0000
## personal_efficacy 0.2996 0.0000 0.0000 0.0000 0.0000
## personal_efficacy
## age 0.1701
## sex 0.1926
## job 0.5824
## partner 0.1547
## study_hours 0.0024
## self_reported_health 0.0000
## empathy_jspe 0.0140
## cognitive_empathy 0.0000
## affective_empathy 0.0007
## psychotherapy 0.0000
## academic_drive 0.0000
## emotion_recognition 0.2996
## depression 0.0000
## anxiety 0.0000
## exhaustion 0.0000
## cynicism 0.0000
## personal_efficacy
To make it easier for us to focus on the significant correlations, we create a matrix in which only the correlations that are significant on a 5 percent level are displayed.
Cormatrix <- rcorr (as.matrix(Mental_Health_Data[,c('age','sex','job', 'partner', 'study_hours', 'self_reported_health','empathy_jspe','cognitive_empathy','affective_empathy','psychotherapy','academic_drive','emotion_recognition','depression','anxiety','exhaustion','cynicism','personal_efficacy')]))
flattenCorrMatrix(Cormatrix$r,Cormatrix$P)
## Error in flattenCorrMatrix(Cormatrix$r, Cormatrix$P): could not find function "flattenCorrMatrix"
length(Cormatrix$P)<-length(Cormatrix$r)
corrplot(Cormatrix$r, type="upper",
p.mat = Cormatrix$P, sig.level = 0.05, insig = "blank")
## Error in data.frame(..., check.names = FALSE): arguments imply differing number of rows: 153, 136
This visual provides us with a clear idea of the number and strength of
the significant correlations. If we need more detail, we can generate a
very similar matrix using the actual numbers of the
correlationcoefficients:
corrplot(Cormatrix$r, method="number", type="upper", number.cex = .5,
p.mat = Cormatrix$P, sig.level = 0.05, insig = "blank")
## Error in data.frame(..., check.names = FALSE): arguments imply differing number of rows: 153, 136
Our conclusion so far, is that there seem to be quite some correlations between the dependent variables (depression, anxiety and burnout). This may indicate that there is certain underlying ‘Mental health’ dimension.
There are also some correlations between de independent variables amongst each other and some correlations between dependent and independent variables.
To further unravel the connections and correct for multicollinearity, we will do some regression analyses.
intercept_only_depression <-lm(depression ~ 1, data=Mental_Health_Data[,c('age','sex','job', 'partner', 'study_hours', 'self_reported_health','empathy_jspe','cognitive_empathy','affective_empathy','psychotherapy','academic_drive','emotion_recognition','depression')])
all_depression <- lm(depression ~ ., data=Mental_Health_Data[,c('age','sex','job', 'partner', 'study_hours', 'self_reported_health','empathy_jspe','cognitive_empathy','affective_empathy','psychotherapy','academic_drive','emotion_recognition','depression')])
forward_depression <- step(intercept_only_depression, direction='forward', scope=formula(all_depression), trace=0)
forward_depression$anova
## Step Df Deviance Resid. Df Resid. Dev AIC
## 1 NA NA 885 116608.71 4325.558
## 2 + self_reported_health -1 14949.1722 884 101659.54 4206.004
## 3 + affective_empathy -1 6102.3016 883 95557.24 4153.157
## 4 + psychotherapy -1 4494.4898 882 91062.75 4112.472
## 5 + study_hours -1 3161.8948 881 87900.86 4083.162
## 6 + empathy_jspe -1 1979.7783 880 85921.08 4064.978
## 7 + sex -1 1361.4073 879 84559.67 4052.827
## 8 + academic_drive -1 886.9895 878 83672.68 4045.484
## 9 + age -1 643.5346 877 83029.15 4040.644
## 10 + partner -1 224.2780 876 82804.87 4040.247
forward_depression$coefficients
## (Intercept) self_reported_health affective_empathy
## 33.86021307 -3.21977804 0.42412202
## psychotherapy study_hours empathy_jspe
## 5.29780075 0.09057173 -0.14083035
## sex academic_drive age
## 2.41194048 -0.20495994 -0.25264881
## partner
## -1.04303066
backward_depression <- step(all_depression, direction='backward', scope=formula(all_depression), trace=0)
backward_depression$anova
## Step Df Deviance Resid. Df Resid. Dev AIC
## 1 NA NA 873 82530.47 4043.306
## 2 - emotion_recognition 1 4.217434 874 82534.68 4041.352
## 3 - job 1 83.495810 875 82618.18 4040.248
## 4 - cognitive_empathy 1 186.689120 876 82804.87 4040.247
backward_depression$coefficients
## (Intercept) age sex
## 33.86021307 -0.25264881 2.41194048
## partner study_hours self_reported_health
## -1.04303066 0.09057173 -3.21977804
## empathy_jspe affective_empathy psychotherapy
## -0.14083035 0.42412202 5.29780075
## academic_drive
## -0.20495994
both_depression <- step(intercept_only_depression, direction='both', scope=formula(all_depression), trace=0)
both_depression$anova
## Step Df Deviance Resid. Df Resid. Dev AIC
## 1 NA NA 885 116608.71 4325.558
## 2 + self_reported_health -1 14949.1722 884 101659.54 4206.004
## 3 + affective_empathy -1 6102.3016 883 95557.24 4153.157
## 4 + psychotherapy -1 4494.4898 882 91062.75 4112.472
## 5 + study_hours -1 3161.8948 881 87900.86 4083.162
## 6 + empathy_jspe -1 1979.7783 880 85921.08 4064.978
## 7 + sex -1 1361.4073 879 84559.67 4052.827
## 8 + academic_drive -1 886.9895 878 83672.68 4045.484
## 9 + age -1 643.5346 877 83029.15 4040.644
## 10 + partner -1 224.2780 876 82804.87 4040.247
both_depression$coefficients
## (Intercept) self_reported_health affective_empathy
## 33.86021307 -3.21977804 0.42412202
## psychotherapy study_hours empathy_jspe
## 5.29780075 0.09057173 -0.14083035
## sex academic_drive age
## 2.41194048 -0.20495994 -0.25264881
## partner
## -1.04303066
We learn from these analyses that the backward stepwise regression produced a different model than the forward and both directions model. The analysis generates a model with 9 predictors.
In the next step, we will do the same analyses for the dependent variable ‘anxiety’.
intercept_only_anxiety <-lm(anxiety ~ 1, data=Mental_Health_Data[,c('age','sex','job', 'partner', 'study_hours', 'self_reported_health','empathy_jspe','cognitive_empathy','affective_empathy','psychotherapy','academic_drive','emotion_recognition','anxiety')])
all_anxiety <- lm(anxiety ~ ., data=Mental_Health_Data[,c('age','sex','job', 'partner', 'study_hours', 'self_reported_health','empathy_jspe','cognitive_empathy','affective_empathy','psychotherapy','academic_drive','emotion_recognition','anxiety')])
forward_anxiety <- step(intercept_only_anxiety, direction='forward', scope=formula(all_anxiety), trace=0)
forward_anxiety$anova
## Step Df Deviance Resid. Df Resid. Dev AIC
## 1 NA NA 885 126982.86 4401.070
## 2 + affective_empathy -1 13941.8401 884 113041.02 4300.027
## 3 + self_reported_health -1 10311.2818 883 102729.74 4217.282
## 4 + academic_drive -1 6130.1605 882 96599.58 4164.769
## 5 + psychotherapy -1 5380.7563 881 91218.82 4115.989
## 6 + study_hours -1 2837.8658 880 88380.95 4089.988
## 7 + empathy_jspe -1 2004.4280 879 86376.53 4071.662
## 8 + sex -1 859.7539 878 85516.77 4064.799
## 9 + cognitive_empathy -1 605.1668 877 84911.60 4060.507
## 10 + job -1 219.2899 876 84692.31 4060.216
forward_anxiety$coefficients
## (Intercept) affective_empathy self_reported_health
## 55.89211583 0.66958826 -2.69640112
## academic_drive psychotherapy study_hours
## -0.35699843 6.13118711 0.09509483
## empathy_jspe sex cognitive_empathy
## -0.15547782 2.41009973 -0.15018646
## job
## -1.07033559
backward_anxiety <- step(all_anxiety, direction='backward', scope=formula(all_anxiety), trace=0)
backward_anxiety$anova
## Step Df Deviance Resid. Df Resid. Dev AIC
## 1 NA NA 873 84628.59 4065.549
## 2 - age 1 1.040617 874 84629.63 4063.560
## 3 - emotion_recognition 1 4.589701 875 84634.22 4061.608
## 4 - partner 1 58.097876 876 84692.31 4060.216
backward_anxiety$coefficients
## (Intercept) sex job
## 55.89211583 2.41009973 -1.07033559
## study_hours self_reported_health empathy_jspe
## 0.09509483 -2.69640112 -0.15547782
## cognitive_empathy affective_empathy psychotherapy
## -0.15018646 0.66958826 6.13118711
## academic_drive
## -0.35699843
both_anxiety <- step(intercept_only_anxiety, direction='both', scope=formula(all_anxiety), trace=0)
both_anxiety$anova
## Step Df Deviance Resid. Df Resid. Dev AIC
## 1 NA NA 885 126982.86 4401.070
## 2 + affective_empathy -1 13941.8401 884 113041.02 4300.027
## 3 + self_reported_health -1 10311.2818 883 102729.74 4217.282
## 4 + academic_drive -1 6130.1605 882 96599.58 4164.769
## 5 + psychotherapy -1 5380.7563 881 91218.82 4115.989
## 6 + study_hours -1 2837.8658 880 88380.95 4089.988
## 7 + empathy_jspe -1 2004.4280 879 86376.53 4071.662
## 8 + sex -1 859.7539 878 85516.77 4064.799
## 9 + cognitive_empathy -1 605.1668 877 84911.60 4060.507
## 10 + job -1 219.2899 876 84692.31 4060.216
both_anxiety$coefficients
## (Intercept) affective_empathy self_reported_health
## 55.89211583 0.66958826 -2.69640112
## academic_drive psychotherapy study_hours
## -0.35699843 6.13118711 0.09509483
## empathy_jspe sex cognitive_empathy
## -0.15547782 2.41009973 -0.15018646
## job
## -1.07033559
We can see that the resulting model is also a model with 9 predictors. Job and cognitive empathy are included in the model that predicts anxiety and not in the depression-model. It’s the other way around for the variables partner and age.
In the next step, we are going to calculate the indicators of burnout. There are 3 separate indicators for burn-out: 2 positive indicators (exhaustion and cynicism) and 1 negative (personal efficacy). By considering the 3 parameters together, it’s possible to get a good picture of the whole profile of a group or individual. It is, however, not methodologically correct to make the sum of the different variables to get 1 score (1 score can refer to different underlying realities).
For the sake of this exercise, we will use ‘exhaustion’ as an indicator of burnout. It seems wiser to be cautious and conscientiously work with a narrowed down, specific definition than to develop metrix that are methodologically questionable.
intercept_only_exhaustion <-lm(exhaustion ~ 1, data=Mental_Health_Data[,c('age','sex','job', 'partner', 'study_hours', 'self_reported_health','empathy_jspe','cognitive_empathy','affective_empathy','psychotherapy','academic_drive','emotion_recognition','exhaustion')])
all_exhaustion <- lm(exhaustion ~ ., data=Mental_Health_Data[,c('age','sex','job', 'partner', 'study_hours', 'self_reported_health','empathy_jspe','cognitive_empathy','affective_empathy','psychotherapy','academic_drive','emotion_recognition','exhaustion')])
forward_exhaustion <- step(intercept_only_exhaustion, direction='forward', scope=formula(all_exhaustion), trace=0)
forward_exhaustion$anova
## Step Df Deviance Resid. Df Resid. Dev AIC
## 1 NA NA 885 24448.84 2941.412
## 2 + self_reported_health -1 1993.72558 884 22455.11 2868.045
## 3 + affective_empathy -1 960.65200 883 21494.46 2831.306
## 4 + age -1 876.54838 882 20617.91 2796.418
## 5 + study_hours -1 363.61968 881 20254.29 2782.653
## 6 + psychotherapy -1 377.76027 880 19876.53 2767.972
## 7 + cognitive_empathy -1 139.03844 879 19737.49 2763.753
## 8 + partner -1 54.71038 878 19682.78 2763.293
forward_exhaustion$coefficients
## (Intercept) self_reported_health affective_empathy
## 22.09201383 -1.26630413 0.20275853
## age study_hours psychotherapy
## -0.24707207 0.04471573 1.58432376
## cognitive_empathy partner
## -0.06370225 0.51292423
backward_exhaustion <- step(all_exhaustion, direction='backward', scope=formula(all_exhaustion), trace=0)
backward_exhaustion$anova
## Step Df Deviance Resid. Df Resid. Dev AIC
## 1 NA NA 873 19602.29 2769.663
## 2 - emotion_recognition 1 0.5513058 874 19602.84 2767.687
## 3 - academic_drive 1 2.1358736 875 19604.97 2765.784
## 4 - job 1 7.9881306 876 19612.96 2764.145
## 5 - empathy_jspe 1 26.9527435 877 19639.92 2763.362
## 6 - sex 1 42.8651545 878 19682.78 2763.293
backward_exhaustion$coefficients
## (Intercept) age partner
## 22.09201383 -0.24707207 0.51292423
## study_hours self_reported_health cognitive_empathy
## 0.04471573 -1.26630413 -0.06370225
## affective_empathy psychotherapy
## 0.20275853 1.58432376
both_exhaustion <- step(intercept_only_exhaustion, direction='both', scope=formula(all_exhaustion), trace=0)
both_exhaustion$anova
## Step Df Deviance Resid. Df Resid. Dev AIC
## 1 NA NA 885 24448.84 2941.412
## 2 + self_reported_health -1 1993.72558 884 22455.11 2868.045
## 3 + affective_empathy -1 960.65200 883 21494.46 2831.306
## 4 + age -1 876.54838 882 20617.91 2796.418
## 5 + study_hours -1 363.61968 881 20254.29 2782.653
## 6 + psychotherapy -1 377.76027 880 19876.53 2767.972
## 7 + cognitive_empathy -1 139.03844 879 19737.49 2763.753
## 8 + partner -1 54.71038 878 19682.78 2763.293
both_exhaustion$coefficients
## (Intercept) self_reported_health affective_empathy
## 22.09201383 -1.26630413 0.20275853
## age study_hours psychotherapy
## -0.24707207 0.04471573 1.58432376
## cognitive_empathy partner
## -0.06370225 0.51292423
We learn from these analysis that the prediction of exhaustion requires a relatively more economic model (7 instead of 9 variables). Sex, job, JSPE measurement of empathy and academic motivation, were variables that were included in the models that predicted.
So far, we have identified the predictors of depression, anxiety and burnout/exhaustion in medical students. The resulting models include 7-9 independent variables. Whilst there is some overlap in the variables, we can also see some differences between the models.
In an attempt to generate a model that allows us to identify the group of students which is most at risk, we will explore the following approach: * Identify the top 25 percent of highest scores of depression, anxiety and exhaustion. * Identify the cases that are in the top 25 for the 3 metrics. * Perform a logistic regression to identify the variables that allow us to identify these high risk profiles.
Step 1 is to calculate the cut off scores for depression, anxiety and exhaustion:
quantile(Mental_Health_Data$depression, probs = 0.75)
## 75%
## 25
quantile(Mental_Health_Data$anxiety, probs = 0.75)
## 75%
## 51
quantile(Mental_Health_Data$exhaustion, probs = 0.75)
## 75%
## 20
Now that we know these values, we can create a binary variable to indicate if someone belongs to the top 25 percent or not.
Mental_Health_Data$depression_binary <- ifelse (Mental_Health_Data$depression >25,1,0)
Mental_Health_Data$anxiety_binary <- ifelse (Mental_Health_Data$anxiety >51,1,0)
Mental_Health_Data$exhaustion_binary <- ifelse (Mental_Health_Data$exhaustion >20,1,0)
Now that we have these new variables, we can create a new variable to indicate if somebody belongs to the top 25 pct of depression, anxiety and exhaustion:
Mental_Health_Data$Mental_Health_risk_cont <- Mental_Health_Data$depression_binary + Mental_Health_Data$anxiety_binary + Mental_Health_Data$exhaustion_binary
Mental_Health_Data$Mental_Health_risk_bin <- ifelse (Mental_Health_Data$Mental_Health_risk_cont > 2.5,1,0)
Mental_Health_Data$Mental_Health_risk_bin
## [1] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
## [38] 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0
## [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [112] 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1
## [149] 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [186] 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## [223] 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0
## [260] 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [297] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
## [334] 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 1 0 0
## [371] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [408] 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [445] 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0
## [482] 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
## [519] 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [556] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0
## [593] 1 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [630] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
## [667] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
## [704] 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
## [741] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [778] 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [815] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
## [852] 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0
Now that we have created this variable, we can check how many people are to be considered ‘high risk’ (in the sense that they are in the highest 25 pct scores of depression, anxiety and exhaustion)
factor(Mental_Health_Data$Mental_Health_risk_bin)
## [1] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
## [38] 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0
## [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [112] 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1
## [149] 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [186] 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## [223] 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0
## [260] 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [297] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
## [334] 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 1 0 0
## [371] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [408] 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [445] 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0
## [482] 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
## [519] 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [556] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0
## [593] 1 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [630] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
## [667] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
## [704] 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
## [741] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [778] 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [815] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
## [852] 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0
## Levels: 0 1
table(Mental_Health_Data$Mental_Health_risk_bin)
##
## 0 1
## 808 78
Before doing a final logistic regression, we explore the point-bisserial correlations between our new variable and the other independent variables by checking out the different signifant correlations (on a 5 pct level).
Cormatrix_risk <- rcorr (as.matrix(Mental_Health_Data[,c('age','sex','job', 'partner', 'study_hours', 'self_reported_health','empathy_jspe','cognitive_empathy','affective_empathy','psychotherapy','academic_drive','emotion_recognition','Mental_Health_risk_bin')]))
flattenCorrMatrix(Cormatrix_risk$r,Cormatrix_risk$P)
## Error in flattenCorrMatrix(Cormatrix_risk$r, Cormatrix_risk$P): could not find function "flattenCorrMatrix"
corrplot(Cormatrix_risk$r, method = "number", type="upper", number.cex = .5,
p.mat = Cormatrix_risk$P, sig.level = 0.05, insig = "blank")
## Error in data.frame(..., check.names = FALSE): arguments imply differing number of rows: 91, 78
Our last step consists of doing a logistical regression with the overall mental health risk variable we’ve just created as a dependent variable.
full.model <- glm(Mental_Health_risk_bin~.,data = Mental_Health_Data[,c('age','sex','job', 'partner', 'study_hours', 'self_reported_health','empathy_jspe','cognitive_empathy','affective_empathy','psychotherapy','academic_drive','emotion_recognition','Mental_Health_risk_bin')], family=binomial)
coef(full.model)
## (Intercept) age sex
## 0.242386799 -0.070676193 1.174014694
## job partner study_hours
## -0.292119953 -0.453792663 0.015851591
## self_reported_health empathy_jspe cognitive_empathy
## -0.604078122 -0.001360804 -0.040380690
## affective_empathy psychotherapy academic_drive
## 0.053058816 0.751145492 -0.001389802
## emotion_recognition
## -1.095532081
step.model <- full.model %>% stepAIC(trace=FALSE)
coef(step.model)
## (Intercept) age sex
## -0.37094644 -0.08080130 1.11888487
## partner study_hours self_reported_health
## -0.45326258 0.01727268 -0.60914421
## cognitive_empathy affective_empathy psychotherapy
## -0.04186289 0.05132707 0.76216492
The result is a model with 8 variables that have an impact on the probability of in the top 25 percent of depression, anxiety AND exhaustion: Age, sex, partner, hours of study, self reported health, cognitive empathy, affective empathy and having consulted a psycho therapist over the last 12 months.