Project Overview

As Mental Health becomes more and more transparent in our society, treatment still lags far behind. As a student aspring to work in the tech industry, I have had my personal experience with mental health and its discussions. This project aims to observe the mental health condition in the tech industry and observe what are some key factors to note? For example, who is most likely to stay silent about their mental health struggles? What kind of employees seek the most help? Should more companies provide mental health assitance as their benefits? These are some questions that are being hoped to be answered as we explore the data. And most importantly and essentailly the goal of the project, what should tech companies know where they are lacking to support their employees?

Import Dataset

Importing the Mental Health Survey from the csv file

Dataset was from Kaggle here

This dataset contains information from an anonmyous survey in 2014 regarding mental health for employees in the tech industry. It was conducted by OSMI (Open Sourcing Mental Illness) with over 1,200 responses.

survey_data <- read_csv("survey.csv")
## Rows: 1259 Columns: 27
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (25): Gender, Country, state, self_employed, family_history, treatment,...
## dbl   (1): Age
## dttm  (1): Timestamp
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(survey_data)
## # A tibble: 6 × 27
##   Timestamp             Age Gender Country    state self_employed family_history
##   <dttm>              <dbl> <chr>  <chr>      <chr> <chr>         <chr>         
## 1 2014-08-27 11:29:31    37 Female United St… IL    <NA>          No            
## 2 2014-08-27 11:29:37    44 M      United St… IN    <NA>          No            
## 3 2014-08-27 11:29:44    32 Male   Canada     <NA>  <NA>          No            
## 4 2014-08-27 11:29:46    31 Male   United Ki… <NA>  <NA>          Yes           
## 5 2014-08-27 11:30:22    31 Male   United St… TX    <NA>          No            
## 6 2014-08-27 11:31:22    33 Male   United St… TN    <NA>          Yes           
## # ℹ 20 more variables: treatment <chr>, work_interfere <chr>,
## #   no_employees <chr>, remote_work <chr>, tech_company <chr>, benefits <chr>,
## #   care_options <chr>, wellness_program <chr>, seek_help <chr>,
## #   anonymity <chr>, leave <chr>, mental_health_consequence <chr>,
## #   phys_health_consequence <chr>, coworkers <chr>, supervisor <chr>,
## #   mental_health_interview <chr>, phys_health_interview <chr>,
## #   mental_vs_physical <chr>, obs_consequence <chr>, comments <chr>

Narrative and Explanation of the Data

The column variables are as follows for this dataset:

Age, Gender, Country

state: If you live in the United States, which state or territory do you live in?

self_employed: Are you self-employed?

family_history: Do you have a family history of mental illness?

treatment: Have you sought treatment for a mental health condition?

work_interfere: If you have a mental health condition, do you feel that it interferes with your work?

no_employees: How many employees does your company or organization have?

remote_work: Do you work remotely (outside of an office) at least 50% of the time?

tech_company: Is your employer primarily a tech company/organization?

benefits: Does your employer provide mental health benefits?

care_options: Do you know the options for mental health care your employer provides?

wellness_program: Has your employer ever discussed mental health as part of an employee wellness program?

seek_help: Does your employer provide resources to learn more about mental health issues and how to seek help?

anonymity: Is your anonymity protected if you choose to take advantage of mental health or substance abuse treatment resources?

leave: How easy is it for you to take medical leave for a mental health condition?

mental_health_consequence: Do you think that discussing a mental health issue with your employer would have negative consequences?

phys_health_consequence: Do you think that discussing a physical health issue with your employer would have negative consequences?

coworkers: Would you be willing to discuss a mental health issue with your coworkers?

supervisor: Would you be willing to discuss a mental health issue with your direct supervisor(s)?

mental_health_interview: Would you bring up a mental health issue with a potential employer in an interview?

phys_health_interview: Would you bring up a physical health issue with a potential employer in an interview?

mental_vs_physical: Do you feel that your employer takes mental health as seriously as physical health?

obs_consequence: Have you heard of or observed negative consequences for coworkers with mental health conditions in your workplace?

comments: Any additional notes or comments

This can be observed in the kaggle dataset it was retrieved from.

While this survey is not comprehensive of all the companies across the world, it certainly has a healthy sample size for those in the tech industry potentially draw some conclusions.

Cleaning the Data

Cleaning the data as we tidy up some null values and group data for better explatory analysis.

## Replace NA in self_employed with No
survey_data <- survey_data %>%
  mutate(self_employed = replace(self_employed, 
                                 is.na(self_employed), "No")) %>%
  ## Mutate work_interfere for easier analysis
  mutate(work_interfere = if_else(work_interfere %in% c("Often", "Sometimes"), 
                               "Yes", "No"))
# I've taken the small, medium, and large company sizes based on startup size, growing company size, and corporate companies.
survey_data <- survey_data %>%
  select(-Timestamp) %>% 
  mutate(family_history = if_else(family_history == "Yes", TRUE, FALSE)) %>% 
  mutate(company_size = case_when(
    no_employees < 6 ~ "small",
     no_employees < 100 ~ "medium",
    TRUE ~ "large"))
# The columns were checked with unique values and I've organized the different answers into categories.

# Before subgrouping the genders, here are the unique values 
unique(survey_data$Gender)
##  [1] "Female"                                        
##  [2] "M"                                             
##  [3] "Male"                                          
##  [4] "male"                                          
##  [5] "female"                                        
##  [6] "m"                                             
##  [7] "Male-ish"                                      
##  [8] "maile"                                         
##  [9] "Trans-female"                                  
## [10] "Cis Female"                                    
## [11] "F"                                             
## [12] "something kinda male?"                         
## [13] "Cis Male"                                      
## [14] "Woman"                                         
## [15] "f"                                             
## [16] "Mal"                                           
## [17] "Male (CIS)"                                    
## [18] "queer/she/they"                                
## [19] "non-binary"                                    
## [20] "Femake"                                        
## [21] "woman"                                         
## [22] "Make"                                          
## [23] "Nah"                                           
## [24] "All"                                           
## [25] "Enby"                                          
## [26] "fluid"                                         
## [27] "Genderqueer"                                   
## [28] "Androgyne"                                     
## [29] "Agender"                                       
## [30] "cis-female/femme"                              
## [31] "Guy (-ish) ^_^"                                
## [32] "male leaning androgynous"                      
## [33] "Man"                                           
## [34] "Trans woman"                                   
## [35] "msle"                                          
## [36] "Neuter"                                        
## [37] "Female (trans)"                                
## [38] "queer"                                         
## [39] "Female (cis)"                                  
## [40] "Mail"                                          
## [41] "cis male"                                      
## [42] "A little about you"                            
## [43] "Malr"                                          
## [44] "p"                                             
## [45] "femail"                                        
## [46] "Cis Man"                                       
## [47] "ostensibly male, unsure what that really means"
# This grouping is not to undermine those who identify other than of the male/female/non-binary/trans, rather grouped so possible patters can be observed for possible mental barriers for those who identify outside of the male/female/non-binary/trans group.
survey_data <- survey_data %>%   
  mutate(standardized_gender = 
           case_when(
             Gender %in% c("non-binary", "trans", "Trans") ~ "Non-binary/Trans",
             Gender %in% c('M', 'Male', 'male','Male ,', 'Man','m','Male ',"msle",'Cis Man', 'Malr','Mail', 'Man','Male-ish','maile','Mal','cis male','All','Cis Male','Male (CIS)', 'Make') ~ "Male",
             Gender %in% c('Female','Cis Female', 'Woman','Female ', 'f', 'Femake', 'woman','femail','female','Female (cis)','F') ~ "Female",
             TRUE ~ "Other"
           )
  ) %>%
  ## Filter between the ages that seem resonable for a work tech survey
  filter(between(Age, 10, 95)) %>%
  mutate(work_interfere = recode(work_interfere,
                             "No"= "No",
                             "Sometimes" = "Yes",
                             "Often" = "Yes")) %>% 
    mutate(standardized_gender = str_to_lower(standardized_gender))


  survey_data <- survey_data %>%
    mutate(mental_vs_physical = recode(mental_vs_physical,
                                       "Yes" = "Yes",
                                       "No" = "No"))
table(survey_data$standardized_gender)
## 
##           female             male non-binary/trans            other 
##              246              987                1               18

I’ve edited some columns to stay consistent to formatting. Then I also removed and standardized some columns to make data analysis easier for future research. The grouping of genders felt needed as there were many typos and a very small portion of the data actually represented a gender outside of female/male. A lot fo the data had sparse data and some had a lot of NA’s and blanks.

Descriptive Statistics and Analysis

# Age
mean(survey_data$Age, na.rm = TRUE)
## [1] 32.0599
# Treatment
mean(survey_data$treatment == "No")
## [1] 0.4952077
# Work Interference  
mean(survey_data$work_interfere == "Yes")
## [1] 0.4824281

Some quickfire overall data to ponder upon, the average age of this survey is around 32, which is a relatively young. This is important to note as the findings of this analysis might not properly acknowledge those of the older age in the industry.

The work interference stands at a mind-boggling 48%. This means that 48% of employees think that their mental health feels like it is effecting their work. Given this information, we can analyize further on who in this group actually seeks treatment and their experiences.

We can also already observe that the treatment percentage lies around 50% which is very low.

Employer Data Analysis

Here we observe some data regarding the benefits provided my the employer.

filter_NA_mental_vs_physical <- survey_data %>%
 filter(mental_vs_physical %in% c("Yes", "No")) %>%
 filter(benefits %in% c("Yes", "No"))
# Do Employees take mental health seriously?
ggplot(filter_NA_mental_vs_physical, aes(x = mental_vs_physical)) + 
  geom_bar() +
  scale_fill_manual(values = c("#69b3a2", "#69b3a2")) + 
  labs(title="Employer Mental Health Awareness",
    x = "Does Employer Take Mental Health as Seriously as Physical Health?",
       y = "Count")

mental_vs_physical_xtab <- table(filter_NA_mental_vs_physical$mental_vs_physical, filter_NA_mental_vs_physical$benefits)

# Melt the data for visualization
melted_table <- melt(mental_vs_physical_xtab)

# Create a heatmap
ggplot(melted_table, aes(x = Var1, y = Var2, fill = value)) +
  geom_tile(color = "white") +
  scale_fill_gradient(low = "white", high = "blue") +
  labs(title = "Employer Support for Mental Health",
       x = "Offers Benefits",
       y = "Treats Mental Health Equally") +
  theme_minimal()

We can clearly observe that companies that offer benefits, their employees feel that they feel that they treat mental health equally the most. And on the other hand, employees do not think it is treated equally when the companies do not provide any sort of mental health benefit. This is important to observe and should be noted that for the employees perception, a company should provide some sort of mental health benefit.

fil_leave_data <- survey_data %>%
  filter(leave %in% c("Somewhat difficult", "Somewhat easy", "Very difficult", "Very easy"))

ggplot(fil_leave_data, aes(x = leave)) +
  geom_bar()

table(fil_leave_data$leave, fil_leave_data$work_interfere)
##                     
##                       No Yes
##   Somewhat difficult  46  79
##   Somewhat easy      135 130
##   Very difficult      24  73
##   Very easy          114  90
# Work Interference  
mean(survey_data$obs_consequence == "Yes")
## [1] 0.1445687

This is another supporting view here as we can observe that for those who think it is very easy to leave work, tend to think that their mental health does not effect their work. On the other hand, we see that those who think is very difficult, there are almost 3 times the amount of people who think their mental health does interfere with their work compared to those who don’t. This relationship suggests that there could be correlation on companies providing work leave for mental health reasons can lead to potentially better work production.

An interesting observation to note is that only 15% of employees observe negative consequences regarding mental health. While it is still not 0% as it should be, it seems that the overall workforce is fairly clean on reducing negative consequences. Rather, with these statistics on employer benefits, the tech industry in general should all provide some sort of benefit to truly get the support across to their employees.

Gender Data Analysis

# Gender bar plot
survey_data %>%
  ggplot(aes(x = standardized_gender)) +
  geom_bar()

# Here is the gender bar plot without any groupings to show everyones responses.
ggplot(survey_data) +
  geom_bar(aes(x = Gender)) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

# Age density plot
ggplot(data = survey_data, aes(x = Age)) +
  
  geom_histogram(
    bins = 15,
    color="black", 
    fill="#69b3a2"
  ) +
  
  labs(
    title="Distribution of Ages",
    x="Age",
    y="Count"
  )

We can already observe some important statistics on average age in the industry, the dominant share of males in the industry and how only 50% of the people are seeking treatment. Something interesting to also note is that almost 48% of employees think there is some sort of work interference for seeking help regarding mental illness which should be important to look at in the future.

ggplot(survey_data, aes(x = benefits, fill = standardized_gender)) +
  geom_bar() +
  facet_wrap(~standardized_gender)

table(survey_data$leave, survey_data$standardized_gender)
##                     
##                      female male non-binary/trans other
##   Don't know            115  440                1     5
##   Somewhat difficult     26   92                0     7
##   Somewhat easy          56  206                0     3
##   Very difficult         19   76                0     2
##   Very easy              30  173                0     1

On this gender visibility visualization, we see that females are less likely to work at companies that either have no or unsure benefits. This diminishes the visibility of females in the workplace. This relationship suggests further problems in the workplace beyond the scope of just mental health rather representation in the workplace.

We also can observe that a far less percentage of women think it is easy to leave the work place for mental health reasons. This indicates that there are additional pressures since their male counterparts have an easier time leaving for mental health reasons.

Treatments and those who have sought it

ggplot(data = survey_data,  
       mapping = aes(x = treatment, fill = treatment)) +
  geom_bar(color="black", show.legend = FALSE) +
  scale_fill_manual(values = c("#69b3a2", "#69b3a2")) + 
  labs(
    title = "Counts of Those That Have Sought Treatment",
    x = "Treatment",
    y = "Count",
  ) +
  theme(legend.position="none")

ggplot(data = filter(survey_data, !is.na(benefits)), 
       mapping = aes(x = benefits, fill = benefits)) +
  geom_bar(color="black") + 
  scale_fill_manual(values = c("#69b3a2", "#404080", "#bbbbbb")) +
  labs(
    title = "Mental Health Benefits Offered By Employer",
    x = "Benefits Offered",
    y = "Count",
    fill = "Legend"
  ) +
  theme(legend.position = "bottom")

We used the treatment variable and we can potentially address questions on how many tech workers seek help regarding mental health and if it is different from any other industry. We see that it is around a 50/50 split on yes and no which is a relativley low number. Onto the 2nd visualization on if a workplace offers mental health benefits we can see it is also around a 50/50 split.

age_group_data <- survey_data %>%
  mutate(age_group = case_when(
    between(Age, 18, 25) ~ "18-25",
    between(Age, 26, 35) ~ "26-35",
    between(Age, 36, 45) ~ "36-45",
    between(Age, 46, 55) ~ "46-55",
    between(Age, 56, 65) ~ "56-65",
    TRUE ~ "65+"
  )) %>% 
  group_by(age_group) %>%
  summarize(
    mean_treatment = sum(treatment == "Yes", na.rm = TRUE),
    n = n()
  )

ggplot(age_group_data, aes(x = age_group, y = mean_treatment)) +
  geom_col() +
  geom_hline(yintercept = mean(survey_data$treatment == "Yes"), 
             linetype = "dashed",
             color = "red")

# Here is the faceted histogram for an even better view at this data
ggplot(survey_data, aes(x = Age)) +
  geom_histogram(bins = 20) +
  facet_wrap(~treatment) +
  labs(
    title = "Age Distribution by Treatment Status",
    x = "Age",
    y = "Count"
  )

We can observe that there is an even distribution by age on treatment. We see that the age groups around 24-30 have the highest amount of responses in both No and Yes categories.

added_data <- survey_data %>%
  mutate(age_group = case_when(
    between(Age, 18, 25) ~ "18-25",
    between(Age, 26, 35) ~ "26-35",
    between(Age, 36, 45) ~ "36-45",
    between(Age, 46, 55) ~ "46-55",
    between(Age, 56, 65) ~ "56-65",
    TRUE ~ "65+"
  )) %>% 
  group_by(age_group)

ggplot(added_data, aes(x = Age, fill = standardized_gender)) +
  geom_histogram(bins=20, alpha=0.5) + 
  facet_wrap(~standardized_gender) +
  scale_fill_brewer(palette="Spectral") +
  labs(title="Age Distribution by Gender",
       x="Age",
       y="Count")

added_data %>%
  group_by(age_group) %>%
  summarize(prop_with_benefits = mean(benefits == "Yes", na.rm = TRUE)) %>% 
  ggplot(aes(x = age_group, y = prop_with_benefits)) + 
    geom_col()

Other than to note that there is very low representation in other genders other than female and male, we observe that most of the female workers tend to be in the younger age group category.

In the last bar graph, we can observe that workers in their early-mid careers face more uncertainty with benefits while the older generation are actually aware of benefits by their company. With an earlier observation with benefit uncertainty in the younger age group, it may be important to note that benefit transparency might also play a factor in mental health issues in the workplace as only those who look for it may receive them.

Critical Conclusions

This concludes my examination of the 2014 OSMI mental health in tech survey aimed to uncover patterns in treatment seeking behaviors and employer support. With large amounts of resources, very useful insights have emerged.

Most important to note, 48% of respondents reported frequent work interference challenges due to their mental health conditions yet around 50% have actually sought treatment suggesting that there are unmet needs across the industry regarding mental health issues.

There also appears to be gaps in available support policies/benefits from tech companies. Only half of the survey members were able confirm their employer provides mental health benefits. Additionally, a majority of those that were working at a company felt that their own leadership does not prioritize psychological health equivalently to physical health when it comes to benefits highlighting the negative stigma around mental health in the work place.

Further analysis by factors like age and gender exposed additional themes:

In summary, despite the sample skewing relatively early-career there remain considerable hurdles across the tech workplace in helping employees find treatment and communicating supportive programs for mental health disorders. Being progressive on diminishing stigma through leadership, streamlining/making it easier for workers to access benefits, and investing in readily accessible help seems to be an imperative across the tech industry based on the survey data analyzed.

This data can be used as common ground for further exploration on mental health with even more of an extensive survey to pinpoint what may be the cause of such discrepancy of mental health issues in the workplace. This dataset itself and the analysis are limited to these surveys only and can be limited with the sample size.