Context: Mental Health in the Technology Industry

The tech industry is growing at a fast pace, worth billions of dollars. The conversation about mental health is also becoming quite prominent these past few years, and our group is interested in the intersection of these areas. Especially when our group members are interested in entering this industry, it is worth asking what the strongest predictors of mental illness are. Additionally, our group would like to investigate the general attitudes towards mental health in the industry.
A notion exists in the technology sector that its employees are highly motivated and highly productive. But the tech world is often fast-paced and high-pressure. This environment affects its employees, and affects their productivity. Mental health issues impede people from living happy lives at home and at work. We hope to shed light on attitudes about mental health in the technology workplace by attempting to explore these questions: are there any trends in the free-response questions of the survey? Is the sentiment of these trends positive or negative
We will be using data collected by Open Sourcing Mental Illness. OSMI is a non-profit whose work is dedicated to bringing light to mental health issues in the technology sector and assists companies in creating environments that are supportive of its employees when it comes to mental health.
Each year, OSMI publishes a publicly-available survey that details respondents’ demographic data and experience with mental illness and stigma as employees in the technology sector. Due to low response volume in 2020, we will be conducting analysis of the 2019 set of responses.
The data was first opened and cleaned in Excel. All open-ended text columns were taken out and moved to a separate Excel file so that they could be used for text analysis later on. Then, we inspected the data to see which columns had lots of missing values. We ended up deleting 13 columns for this reason. Many of these columns had information that was able to be captured in other columns, such as information about opening up about mental health.

Sample Responses

Describe the conversation you had with your employer about your mental health, including their reactions and what actions were taken to address your mental health issue/questions:
“My supervisor was dumbfounded the first time he observed me during a (job-related) anxiety attack when I completely forgot how to do something simple and routine. He spent two hours asking me leading questions to help me solve my problem instead of gently reminding me. I thought I was going to cry. He said”what can I do to help you“, made a few suggestions, faked empathy. And then fired me three months later after creating a performance plan that was impossible to complete successfully.”

Would you bring up a mental illness with a potential employer? Why or why not?
“Mental issues aren’t as easily”proven“. An employer might not fully understand what it is, what it entails. He or she might interpret it wrongly, maybe think I’m not capable.”

“Because it’s weird, and there hasn’t been an established trust. Would you tell a girl on your first date you were bipolar and lost every relationship because of it? You can ask leading questions to find out if they respect people’s personal time and needs.”

Exploratory Data Analysis

Heat Map

Figure 1

This is a visual representation of a two-way table showing whether or not employees might be willing to bring up mental or physical issues in an interview with a potential employer.

Side-by-side Boxplot

Figure 2 demonstrates whether or not respondents currently have a mental illness, with their age on the x axis.

Barchart

This figure shoes the gender distribution of those who are or aren’t openly identified as a person with a mental health disorder.

Hypothesized important variables

-Does your family have a history of mental illness?
-Has your employer ever formally discussed mental health?
-Would you feel more comfortable talking to your coworkers about your physical health or your mental health?

Method: Text Exploration

Because many of the survey questions were open ended, we decided to analyze the text data to assess the general sentiment around mental health in tech jobs. Some examples of the free response questions were “Describe the conversation with coworkers you had about your mental health including their reactions.”, “Would you be willing to bring up physical issue with potential employer? Why or why not?”, and “Briefly describe what you think the industry as a whole and/or employers could do to improve mental health support for employees.”

Sentiment Analysis

We first conducted sentiment analysis using NRC and Bing. The Bing sentiment analysis showed that the words used in the free response portions of the survey were more negative than positive, with 197 coded as negative and 113 coded as positive. Interestingly, the NRC sentiment analysis showed that more words were coded as positive (211) than negative (174).
The most common sentiments according to NRC were “trust”, “sadness”, “fear”, and “anticipation”. This leads us to believe that for positive accounts, trust - of employers or among coworkers - plays a role in how comfortable employees feel discussing mental health in the workplace. However, “sadness”, “fear”, and “anticipation” indicate that many respondents discussed poor mental health, including depression and anxiety, in relation to the workplace.

## 
## negative positive 
##      197      113
## 
##        anger anticipation      disgust         fear          joy     negative 
##           64           75           37           97           52          174 
##     positive      sadness     surprise        trust 
##          211           98           33          136

Sentiment Range

We also conducted AFFIN sentiment analysis to look at the sentiment range of responses, between -5 and 5. Most of the text analyzed were not coded as extremely positive or extremely negative, but the highest count (around 80 words) was around -2, leaning negative. The second most common sentiment score with about 60 words was 2. The AFINN sentiment range aligned with Bing sentiment analysis, because more words were coded as negative than positive. This suggests that while opinions and feelings surrounding mental health in the tech industry are not entirely negative, more of the words were associated with negative sentiments than positive. However, it is important to note that AFINN sentiment analysis looks at individual words and not phrases, this negative skew could be affected by single words like depression, anxiety, and disorder, even if the respondent indicated they had a positive experience discussing these issues in the workplace. For example, if someone had written that they felt comfortable having conversations about depression or other mental health diagnoses, AFFIN would still code the word “depression” as a negative word.

Word Cloud

Finally, we made a word cloud that provides a visual with the most common words occurring in the responses. Some of the common words were “disorder“, “depression“, “bipolar,“ “anxiety“, and “phobia“. This suggests that these issues could be particularly prevalent. Some other words that stuck out to us were “support“, “stress“, “conversation“, “discussed“, and “talk“.

Method: Decision Tree

Our target variable: Have you ever discussed your mental health with your employer?

We think that if we are able to identify factors that indicate whether or not an employee will discuss mental health with his or her employee, we would be able to address any potential issues or markers.

Initial step: Base Rate

## [1] 0.6719368

Our base rate is .67, which means that with no model we’d be able to randomly predict whether or not an employee has or has not discussed mental heath with his or her employer.

Gini method using all variables/ create initial tree

## n= 203 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 203 67 FALSE (0.6699507 0.3300493)  
##    2) Have.you.ever.discussed.your.mental.health.with.coworkers?< 0.5 102 14 FALSE (0.8627451 0.1372549) *
##    3) Have.you.ever.discussed.your.mental.health.with.coworkers?>=0.5 101 48 TRUE (0.4752475 0.5247525)  
##      6) Have.you.observed.or.experienced.a.*supportive.or.well.handled.response*.to.a.mental.health.issue.in.your.current.or.previous.workplace?=Maybe/Not sure,No,Yes, I observed 63 24 FALSE (0.6190476 0.3809524)  
##       12) Are.you.openly.identified.at.work.as.a.person.with.a.mental.health.issue?< 0.5 51 14 FALSE (0.7254902 0.2745098) *
##       13) Are.you.openly.identified.at.work.as.a.person.with.a.mental.health.issue?>=0.5 12  2 TRUE (0.1666667 0.8333333) *
##      7) Have.you.observed.or.experienced.a.*supportive.or.well.handled.response*.to.a.mental.health.issue.in.your.current.or.previous.workplace?=Yes, I experienced 38  9 TRUE (0.2368421 0.7631579) *

Decision Tree

CP plot and Variable importance

##                                                                                   Have.you.ever.discussed.your.mental.health.with.coworkers? 
##                                                                                                                                   15.9239453 
##     Have.you.observed.or.experienced.a.*supportive.or.well.handled.response*.to.a.mental.health.issue.in.your.current.or.previous.workplace? 
##                                                                                                                                   11.4288519 
##                                                     Have.you.ever.had.a.coworker.discuss.their.or.another.coworker's.mental.health.with.you? 
##                                                                                                                                    7.2381570 
##                                                                    Are.you.openly.identified.at.work.as.a.person.with.a.mental.health.issue? 
##                                                                                                                                    6.7961858 
##                                                                                           Have.you.had.a.mental.health.disorder.in.the.past? 
##                                                                                                                                    4.8254380 
##                                                                                            Do.you.*currently*.have.a.mental.health.disorder? 
##                                                                                                                                    4.6645900 
##              If.you.have.a.mental.health.disorder,.how.often.do.you.feel.that.it.interferes.with.your.work.*when.being.treated.effectively?* 
##                                                                                                                                    4.3428942 
## Have.you.observed.or.experienced.an.*unsupportive.or.badly.handled.response*.to.a.mental.health.issue.in.your.current.or.previous.workplace? 
##                                                                                                                                    0.9111987 
##                                                                      Overall,.how.much.importance.does.your.employer.place.on.mental.health? 
##                                                                                                                                    0.9111987 
##                                                                    Overall,.how.much.importance.does.your.employer.place.on.physical.health? 
##                                                                                                                                    0.5467192 
##                                                                                                                            What.is.your.age? 
##                                                                                                                                    0.5467192 
##                                                           Would.you.bring.up.your.*mental*.health.with.a.potential.employer.in.an.interview? 
##                                                                                                                                    0.5056022

The CP chart indicates that the optimal size of the tree is 4 nodes with a complexity parameter of .035.
Variable importance output suggests that ‘Have you ever discussed your mental health with a coworker?’ is important in predicting whether or not someone might discuss this with an employer. This makes sense intuitively– if someone has spoken with their peers about mental health, they might be more comfortable with speaking with their employer. This variable is also the top node of the decision tree.

Predict target variable, run against testing set

##           Reference
## Prediction FALSE TRUE
##      FALSE    31    6
##      TRUE      3   10

Error count, error rate

## [1] 9
## [1] 0.18

Our model has performed well, with an error rate of .18 (and a count of 9).

Final diagnostics: AUC, ROC, Confusion Matrix Output

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction FALSE TRUE
##      FALSE    31    6
##      TRUE      3   10
##                                           
##                Accuracy : 0.82            
##                  95% CI : (0.6856, 0.9142)
##     No Information Rate : 0.68            
##     P-Value [Acc > NIR] : 0.02057         
##                                           
##                   Kappa : 0.5648          
##                                           
##  Mcnemar's Test P-Value : 0.50499         
##                                           
##             Sensitivity : 0.9118          
##             Specificity : 0.6250          
##          Pos Pred Value : 0.8378          
##          Neg Pred Value : 0.7692          
##              Prevalence : 0.6800          
##          Detection Rate : 0.6200          
##    Detection Prevalence : 0.7400          
##       Balanced Accuracy : 0.7684          
##                                           
##        'Positive' Class : FALSE           
## 
## Setting levels: control = FALSE, case = TRUE
## Setting direction: controls < cases
## 
## Call:
## roc.default(response = testingset$`Have.you.ever.discussed.your.mental.health.with.your.employer?`,     predictor = as.numeric(prediction), plot = TRUE)
## 
## Data: as.numeric(prediction) in 34 controls (testingset$`Have.you.ever.discussed.your.mental.health.with.your.employer?` FALSE) < 16 cases (testingset$`Have.you.ever.discussed.your.mental.health.with.your.employer?` TRUE).
## Area under the curve: 0.7684

Decision Tree Conclusions

Overall, this model has performed well. We have a low p-value, a high accuracy, high sensitivity, moderate-to-high Area Under Curve, and moderate kappa. These are all indications that our model can perform well.
This means that the top predictors in the decision tree (Have you discussed mental health with your coworkers? and Have you observed or experienced a supportive or well handled response to a mental health issue in your current or previous workplace?) are strong and well-performing indicators when we are looking at whether or not an employee might talk to their employer about mental health. Presumably, then, if we want to increase the conversation volume between employee/employers, we should aim to facilitate conversation between coworkers as well.

Limitations, Future Analysis

Our analysis is somewhat limited in that there are few numerical variables to play around with in analysis. In addition, it is difficult to make any causal inferences from our conclusions. While we might be able to claim that two variables are correlated, we are unable to make claims about what possibly causes what. For example, we determined in our decision tree that whether or not an employee is willing to talk to their employer about mental health issues is related to whether or not that employee is willing to talk to their coworkers about mental health issues as well. While we are able to claim that these two are correlated, we have no information about what interaction might cause the other.
In a more indirect way, the coronavirus pandemic also limited our analysis. Since we chose to explore the 2019 survey data rather than work with the smaller 2020 set, we do not see anything about the pandemic reflected in our analysis. One could easily presume that the pandemic has changed the tech working environment and that might’ve been reflected in the more recent survey.
To further our analysis, we could compile data from multiple years of the OSMI survey. This would have been difficult in terms of data cleaning because the survey format changes slightly each year, but it would’ve enhanced our understanding to include an analysis of yearly changes. This also would have increased the response volume. In addition, if similar datasets exist in other sectors of the workforce, we could also look at other industries and see how rates of mental health issues compare with the tech world.