The tech and startup world are built on the backs of incredibly bright minds. It’s known for its innovation and resilience and a culture that fosters high-productivity.
But it has a dark underbelly.
Tech is a fast-paced game with high stakes: Founders of startups have to transform an idea into a successful, scaleable business — quickly. They’re under intense pressure to run a successful company, stay on top of a fast-paced, competitive industry, all while maintaining the same image as the tech titans before them.
Employees of these companies operate under the same high-stress: late nights, abnormal hours, and tight deadlines, all while wearing multiple hats and being constantly available at any time of day.
The above isn’t unique to startups — the pressure to excel and climb the corporate ladder in the carrier world creates a culture that exacerbates mental health issues. And it’s also important to remember that there is no fixed state: mental health ebbs and flows along a spectrum, just like our physical health, ranging from thriving to coping or struggling to clinically-treated mental illness.
But the tech industry fosters a “crunch” culture (where demanding work must be completed in a short amount of time). And there’s an increased motivation to neglect one’s health by forgoing proper diet, exercise, and sleep in the name of increased output. And if left unchecked, this can lead to a rise in burnout, depression, anxiety, and loneliness.
Everyone around the world has mental health, but not everyone talks about it.
According to OSMI data, 51% of tech professionals have been diagnosed with a mental health condition. By comparison, 19.1% of U.S. adults experience mental illness, according to the National Alliance on Mental Illness.
A study by Michael Freeman found that entrepreneurs are 50% more likely to report having a mental health condition:
Founders are:
The terrifying problem with mental illness is that it is invisible; it’s a private battle that people have, and it’s hard to know when people need help.
Mental health affects your emotional, psychological and social well-being. It affects how we think, feel, and act. It also helps determine how we handle stress, relate to others, and make choices. In the workplace, communication and inclusion are keys skills for successful high performing teams or employees. The impact of mental health to an organization can mean an increase of absent days from work and a decrease in productivity and engagement. In the United States, approximately 70% of adults with depression are in the workforce. Employees with depression will miss an estimated 35 million workdays a year due mental illness. Those workers experiencing unresolved depression are estimated to encounter a 35% drop in their productivity, costing employers $105 billion dollars each year. In UK, better mental health support in the workplace can save UK business up to Eur 8 billion per year.
Open Sourcing Mental Illness (OSMI) is a non-profit, corporation dedicated to raising awareness, educating, and providing resources to support mental wellness in the tech and open source communities. OSMI began in 2013, with Ed Finkler speaking at tech conferences about his personal experiences as a web developer and open source advocate with a mental health disorder. The response was overwhelming, and thus OSMI was born.
Every year, OSMI came out with a new survey to see how employees want to get mental health treatment in tech companies around the world and I pick the survey from 2014.
This survey is filled by respondents who suffer from mental health disorders (diagnose or un-diagnosed by medical, even it’s just a feeling) in tech companies and see if any factors can affect the employee to get treatment or not.
From this research, We will create a machine learning can help HR to see what factors have the company needs to support so the employee wants to get mental health treatment. We call it Mental Health First Aid.
Mental Health First Aid teaches HR how to notice and support an individual who may be experiencing a mental health or substance use concern or crisis and connect them with the appropriate employee resources. It teaches employees critical communication and support skills that can influence your organizations bottom line.
Research shows that employees who go through Mental Health First Aid have an increased awareness of mental health among themselves and their co-workers. It allows them to recognize the signs of someone who maybe struggling and teaches them the skills to know when to reach out and what resources are available. Which in turn creates beneficial intervention that increases engagement and creates an environment of inclusion and support.
Employers can also offer robust benefit packages to support employees who go through mental health issues. That includes Employee Assistance Programs, Wellness programs that focus on mental and physical health, Health and Disability Insurance or flexible working schedules or time off policies.
Organizations that incorporate mental health awareness help to create a healthy and productive work environment that reduces the stigma associated with mental illness, increases the organizations mental health literacy and teaches the skills to safely and responsibly respond to a co-workers mental health concern.
Incorporating mental health awareness in the workplace can help lead the way for mental health issues throughout your community by equipping people with the tools they need to start a dialogue so that more people can get the help they need.
The output of this project is a dashboard analysis and prediction using machine learning using R Shiny dashboard. This dashboard can be utilized by HR team to predict whether any individual may be experiencing a mental health or not.
As mentioned in the problem statement, Employees with depression will miss an estimated 35 million workdays a year due mental illness. Those workers experiencing unresolved depression are estimated to encounter a 35% drop in their productivity, costing employers $105 billion dollars each year. This is a huge loss of money in terms of business.
If the employers can solve this issue, not only they can retain their employees, decrease the turnover rate, and increase employees productivity ,they also will save a huge lot of money.
Before we do the analysis we need to load the library required.
library(dplyr)
library(ggplot2)
library(plotly)
library(esquisse)Now we will load the data for further analysis
Data source : https://www.kaggle.com/osmi/mental-health-in-tech-survey
mental <- read.csv("survey.csv")
mentalBelow are data description on each columns for our understanding
glimpse(mental) #Check Data Types#> Rows: 1,259
#> Columns: 27
#> $ Timestamp <chr> "2014-08-27 11:29:31", "2014-08-27 11:29:37"~
#> $ Age <dbl> 37, 44, 32, 31, 31, 33, 35, 39, 42, 23, 31, ~
#> $ Gender <chr> "Female", "M", "Male", "Male", "Male", "Male~
#> $ Country <chr> "United States", "United States", "Canada", ~
#> $ state <chr> "IL", "IN", NA, NA, "TX", "TN", "MI", NA, "I~
#> $ self_employed <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ~
#> $ family_history <chr> "No", "No", "No", "Yes", "No", "Yes", "Yes",~
#> $ treatment <chr> "Yes", "No", "No", "Yes", "No", "No", "Yes",~
#> $ work_interfere <chr> "Often", "Rarely", "Rarely", "Often", "Never~
#> $ no_employees <chr> "6-25", "More than 1000", "6-25", "26-100", ~
#> $ remote_work <chr> "No", "No", "No", "No", "Yes", "No", "Yes", ~
#> $ tech_company <chr> "Yes", "No", "Yes", "Yes", "Yes", "Yes", "Ye~
#> $ benefits <chr> "Yes", "Don't know", "No", "No", "Yes", "Yes~
#> $ care_options <chr> "Not sure", "No", "No", "Yes", "No", "Not su~
#> $ wellness_program <chr> "No", "Don't know", "No", "No", "Don't know"~
#> $ seek_help <chr> "Yes", "Don't know", "No", "No", "Don't know~
#> $ anonymity <chr> "Yes", "Don't know", "Don't know", "No", "Do~
#> $ leave <chr> "Somewhat easy", "Don't know", "Somewhat dif~
#> $ mental_health_consequence <chr> "No", "Maybe", "No", "Yes", "No", "No", "May~
#> $ phys_health_consequence <chr> "No", "No", "No", "Yes", "No", "No", "Maybe"~
#> $ coworkers <chr> "Some of them", "No", "Yes", "Some of them",~
#> $ supervisor <chr> "Yes", "No", "Yes", "No", "Yes", "Yes", "No"~
#> $ mental_health_interview <chr> "No", "No", "Yes", "Maybe", "Yes", "No", "No~
#> $ phys_health_interview <chr> "Maybe", "No", "Yes", "Maybe", "Yes", "Maybe~
#> $ mental_vs_physical <chr> "Yes", "Don't know", "No", "No", "Don't know~
#> $ obs_consequence <chr> "No", "No", "No", "Yes", "No", "No", "No", "~
#> $ comments <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ~
colSums(is.na(mental)) #Check NaN#> Timestamp Age Gender
#> 0 0 0
#> Country state self_employed
#> 0 515 18
#> family_history treatment work_interfere
#> 0 0 264
#> no_employees remote_work tech_company
#> 0 0 0
#> benefits care_options wellness_program
#> 0 0 0
#> seek_help anonymity leave
#> 0 0 0
#> mental_health_consequence phys_health_consequence coworkers
#> 0 0 0
#> supervisor mental_health_interview phys_health_interview
#> 0 0 0
#> mental_vs_physical obs_consequence comments
#> 0 0 1095
Some informations we got if we look at the glimpse and the summary of the data:
mental %>% count(Country) %>% arrange(desc(n))Some notes if we look at data above:
Refer to the Data summary explanation above, we will drop columns: Timestamp, Country, States, and Comments
mental_clean <- mental %>% select(-Timestamp,-Country,-state,-comments)
head(mental_clean)In this section, I process the value of columns that not suitable to neatly arranged.
unique(mental_clean$Age)#> [1] 37 44 32 31 33 35
#> [7] 39 42 23 29 36 27
#> [13] 46 41 34 30 40 38
#> [19] 50 24 18 28 26 22
#> [25] 19 25 45 21 -29 43
#> [31] 56 60 54 329 55 99999999999
#> [37] 48 20 57 58 47 62
#> [43] 51 65 49 -1726 5 53
#> [49] 61 8 11 -1 72
mental_clean1 <- mental_clean %>% filter(Age > 15,
Age < 100
)
unique(mental_clean1$Age)#> [1] 37 44 32 31 33 35 39 42 23 29 36 27 46 41 34 30 40 38 50 24 18 28 26 22 19
#> [26] 25 45 21 43 56 60 54 55 48 20 57 58 47 62 51 65 49 53 61 72
unique(mental_clean1$Gender)#> [1] "Female"
#> [2] "M"
#> [3] "Male"
#> [4] "male"
#> [5] "female"
#> [6] "m"
#> [7] "Male-ish"
#> [8] "maile"
#> [9] "Trans-female"
#> [10] "Cis Female"
#> [11] "F"
#> [12] "something kinda male?"
#> [13] "Cis Male"
#> [14] "Woman"
#> [15] "f"
#> [16] "Mal"
#> [17] "Male (CIS)"
#> [18] "queer/she/they"
#> [19] "non-binary"
#> [20] "Femake"
#> [21] "woman"
#> [22] "Make"
#> [23] "Nah"
#> [24] "Enby"
#> [25] "fluid"
#> [26] "Genderqueer"
#> [27] "Female "
#> [28] "Androgyne"
#> [29] "Agender"
#> [30] "cis-female/femme"
#> [31] "Guy (-ish) ^_^"
#> [32] "male leaning androgynous"
#> [33] "Male "
#> [34] "Man"
#> [35] "Trans woman"
#> [36] "msle"
#> [37] "Neuter"
#> [38] "Female (trans)"
#> [39] "queer"
#> [40] "Female (cis)"
#> [41] "Mail"
#> [42] "cis male"
#> [43] "Malr"
#> [44] "femail"
#> [45] "Cis Man"
#> [46] "ostensibly male, unsure what that really means"
For the Gender column has 46 distinct responses. I rename and combine if it’s in the same meaning, so it will trim the data and separate it into following categories: - Male, or cis Male, means born as male and decide to be male. - Female, or cis Female, means born as female and decide to be female. - Queer, is a word that describes sexual and gender identities other than straight and cisgender. Lesbian, gay, bisexual, and transgender people may all identify with the word queer.
mental_clean1["Gender"][mental_clean1["Gender"] == 'Male ' |
mental_clean1["Gender"] == 'male' |
mental_clean1["Gender"] == 'M' |
mental_clean1["Gender"] == 'm' |
mental_clean1["Gender"] == 'Male' |
mental_clean1["Gender"] == 'Cis Male' |
mental_clean1["Gender"] == 'Man' |
mental_clean1["Gender"] == 'cis male' |
mental_clean1["Gender"] == 'Mail' |
mental_clean1["Gender"] == 'Male-ish' |
mental_clean1["Gender"] == 'Male (CIS)' |
mental_clean1["Gender"] == 'Cis Man' |
mental_clean1["Gender"] == 'msle' |
mental_clean1["Gender"] == 'Malr' |
mental_clean1["Gender"] == 'Mal' |
mental_clean1["Gender"] == 'maile' |
mental_clean1["Gender"] == 'Make'] <- "Male"
mental_clean1["Gender"][mental_clean1["Gender"] == 'Female ' |
mental_clean1["Gender"] == 'female' |
mental_clean1["Gender"] == 'F' |
mental_clean1["Gender"] == 'f' |
mental_clean1["Gender"] == 'Woman' |
mental_clean1["Gender"] == 'Female' |
mental_clean1["Gender"] == 'femail' |
mental_clean1["Gender"] == 'cis Female' |
mental_clean1["Gender"] == 'cis-female/femme' |
mental_clean1["Gender"] == 'Femake' |
mental_clean1["Gender"] == 'Female (cis)' |
mental_clean1["Gender"] == 'Cis Female' |
mental_clean1["Gender"] == 'woman' ] <- "Female"
mental_clean1["Gender"][mental_clean1["Gender"] == 'Female (trans)' |
mental_clean1["Gender"] == 'queer/she/they' |
mental_clean1["Gender"] == 'non-binary' |
mental_clean1["Gender"] == 'f' |
mental_clean1["Gender"] == 'fluid' |
mental_clean1["Gender"] == 'queer' |
mental_clean1["Gender"] == 'Androgyne' |
mental_clean1["Gender"] == 'Trans-female' |
mental_clean1["Gender"] == 'male leaning androgynous' |
mental_clean1["Gender"] == 'Agender' |
mental_clean1["Gender"] == 'A little about you' |
mental_clean1["Gender"] == 'Nah' |
mental_clean1["Gender"] == 'All' |
mental_clean1["Gender"] == 'ostensibly male, unsure what that really means' |
mental_clean1["Gender"] == 'Genderqueer' |
mental_clean1["Gender"] == 'Enby' |
mental_clean1["Gender"] == 'p' |
mental_clean1["Gender"] == 'Neuter' |
mental_clean1["Gender"] == 'something kinda male?' |
mental_clean1["Gender"] == 'Guy (-ish) ^_^' |
mental_clean1["Gender"] == 'Trans woman' ] <- "Queer"unique(mental_clean1$Gender)#> [1] "Female" "Male" "Queer"
We have NaN values in self_employed and work_interfere columns
colSums(is.na(mental_clean1))#> Age Gender self_employed
#> 0 0 18
#> family_history treatment work_interfere
#> 0 0 262
#> no_employees remote_work tech_company
#> 0 0 0
#> benefits care_options wellness_program
#> 0 0 0
#> seek_help anonymity leave
#> 0 0 0
#> mental_health_consequence phys_health_consequence coworkers
#> 0 0 0
#> supervisor mental_health_interview phys_health_interview
#> 0 0 0
#> mental_vs_physical obs_consequence
#> 0 0
Let us try to fill these null values and make our data ready for further processing.
mental_clean2 <- mental_clean1 %>%
mutate(work_interfere=ifelse(is.na(work_interfere),"Don't Know",work_interfere),
self_employed=ifelse(is.na(self_employed),"No",self_employed)
)
colSums(is.na(mental_clean2))#> Age Gender self_employed
#> 0 0 0
#> family_history treatment work_interfere
#> 0 0 0
#> no_employees remote_work tech_company
#> 0 0 0
#> benefits care_options wellness_program
#> 0 0 0
#> seek_help anonymity leave
#> 0 0 0
#> mental_health_consequence phys_health_consequence coworkers
#> 0 0 0
#> supervisor mental_health_interview phys_health_interview
#> 0 0 0
#> mental_vs_physical obs_consequence
#> 0 0
After we do some cleaning, now we change the incorrect data type columns to the correct data type
mental_clean2 <- mental_clean2 %>%
mutate_if(is.character,as.factor)
glimpse(mental_clean2)#> Rows: 1,251
#> Columns: 23
#> $ Age <dbl> 37, 44, 32, 31, 31, 33, 35, 39, 42, 23, 31, ~
#> $ Gender <fct> Female, Male, Male, Male, Male, Male, Female~
#> $ self_employed <fct> No, No, No, No, No, No, No, No, No, No, No, ~
#> $ family_history <fct> No, No, No, Yes, No, Yes, Yes, No, Yes, No, ~
#> $ treatment <fct> Yes, No, No, Yes, No, No, Yes, No, Yes, No, ~
#> $ work_interfere <fct> Often, Rarely, Rarely, Often, Never, Sometim~
#> $ no_employees <fct> 6-25, More than 1000, 6-25, 26-100, 100-500,~
#> $ remote_work <fct> No, No, No, No, Yes, No, Yes, Yes, No, No, Y~
#> $ tech_company <fct> Yes, No, Yes, Yes, Yes, Yes, Yes, Yes, Yes, ~
#> $ benefits <fct> Yes, Don't know, No, No, Yes, Yes, No, No, Y~
#> $ care_options <fct> Not sure, No, No, Yes, No, Not sure, No, Yes~
#> $ wellness_program <fct> No, Don't know, No, No, Don't know, No, No, ~
#> $ seek_help <fct> Yes, Don't know, No, No, Don't know, Don't k~
#> $ anonymity <fct> Yes, Don't know, Don't know, No, Don't know,~
#> $ leave <fct> Somewhat easy, Don't know, Somewhat difficul~
#> $ mental_health_consequence <fct> No, Maybe, No, Yes, No, No, Maybe, No, Maybe~
#> $ phys_health_consequence <fct> No, No, No, Yes, No, No, Maybe, No, No, No, ~
#> $ coworkers <fct> Some of them, No, Yes, Some of them, Some of~
#> $ supervisor <fct> Yes, No, Yes, No, Yes, Yes, No, No, Yes, Yes~
#> $ mental_health_interview <fct> No, No, Yes, Maybe, Yes, No, No, No, No, May~
#> $ phys_health_interview <fct> Maybe, No, Yes, Maybe, Yes, Maybe, No, No, M~
#> $ mental_vs_physical <fct> Yes, Don't know, No, No, Don't know, Don't k~
#> $ obs_consequence <fct> No, No, No, Yes, No, No, No, No, No, No, No,~
mental_clean2Let us begin the data analysis by understanding the target data
plot1 <- ggplot(mental_clean2) +
aes(x = treatment, fill = treatment) +
geom_bar() +
scale_fill_hue(direction = 1) +
labs(
x = "Treatment (Yes/No)",
y = "Counts",
title = "Do Respondents receive Treatments?"
) +
theme_classic()
ggplotly(plot1)This is the respondents result of question, ‘Have you sought treatment for a mental health condition?’.
This is our target variable. Looking at the first graph, we see that the percentage of respondents who want to get treatment is almost 50%. Workplaces that promote mental health and support people with mental disorders are more likely to have increased productivity, reduce absenteeism, and benefit from associated economic gains. If employees enjoy good mental health, employees can:
Now let’s take a look of our respondents Age distribution
plot2 <- ggplot(mental_clean2) +
aes(x = Age, colour = Age) +
geom_histogram(bins = 50L, fill = "orange") +
scale_color_distiller(palette = "PuBu",
direction = -1) +
labs(title = "Age Distribution") +
theme_classic()
ggplotly(plot2)plot3 <- ggplot(mental_clean2) +
aes(x = treatment, y = Age, fill = treatment) +
geom_boxplot(shape = "circle") +
scale_fill_hue(direction = 1) +
labs(title = "Treatments with Age Distribution") +
theme_classic()
ggplotly(plot3)If we look at Plot 2 and Plot 3:
Now we will take a look at Gender Distribution
plot4 <- ggplot(mental_clean2) +
aes(x = Gender, fill = treatment) +
geom_bar() +
scale_fill_hue(direction = 1) +
labs(title = "Treatments with Gender Distribution") +
theme_classic()
ggplotly(plot4)If we look at plot4 above Majority respondents are male, not surprisingly, especially in the tech field. The very large gap between men and women causes higher competitive pressure for women than men. Based on the plot, female that want to get treatment is high around 70%. Maybe some of them get sexual harrassment or racism at work because female are scarce in the tech industry.
There is a Queer entry of less than 2%. Although the percentage of queer is very low, it still deserves to dig out some new insights. For example, such a small proportion can show a significant difference in the count of who wants the treatments, indicating that for the queer, mental health problems are serious too. In my opinion, maybe they received hate speech or discrimination in the workplace.
plot5 <- ggplot(mental_clean2) +
aes(x = family_history, fill = treatment) +
geom_bar() +
scale_fill_hue(direction = 1) +
labs(title = "Family History with Illness") +
theme_classic()
ggplotly(plot5)From respondents who say that they have a family history of mental illness, the plot shows that they significantly want to get treatment rather than without a family history. This is acceptable, remember the fact that people with a family history pay more attention to mental illness. Family history is a significant risk factor for many mental health disorders. The apple does not fall far from the tree, as it is relatively common for families with mental illness symptoms to have one or more relatives with histories of similar difficulties.
plot6 <- ggplot(mental_clean2) +
aes(x = work_interfere, fill = treatment) +
geom_bar() +
scale_fill_hue(direction = 1) +
labs(title = "Work Interfere Survey Respondents") +
theme_classic()
ggplotly(plot6)This is the respondents result of question, ‘If you have a mental health condition, do you feel that it interferes with your work?’. More than half Respondents have experienced interference at work with a ratio of rarely, sometimes, and frequently with majority respondents want to get treatment.But it’s surprising to know even mental health never has interfered at work, there is a little group that still want to get treatment before it become a job stress. It can be triggered by the requirements of the job do not match the capabilities, resources or needs of the worker.
plot7 <-ggplot(mental_clean2) +
aes(x = remote_work, fill = treatment) +
geom_bar() +
scale_fill_hue(direction = 1) +
labs(title = "Working Style (Remote or Not)") +
theme_classic()
ggplotly(plot7)Majority of respondents don’t work remotely, which means the biggest factor of mental health disorder came up triggered on the workplace. On the other side, it has slightly different between an employee that want to get treatment and don’t want to get a treatment. But it’s getting interesting when we see a respondent who works 50% of the workday remotely. The employee who want to get treatment is a little bit higher. I have no idea why those employees work remotely to analyze more because the data doesn’t provide that information.
plot8 <- ggplot(mental_clean2) +
aes(x = tech_company, fill = treatment) +
geom_bar() +
scale_fill_hue(direction = 1) +
labs(title = "Company Type") +
theme_classic()
ggplotly(plot8)Even the main target of the survey is the tech field, there are small amount of companies belong to the non-tech field. But it can be seen from the plot whether the company belongs to the tech field or not, mental health still becomes a big problem. I think the environment affects a lot of employees and some of them can’t take it for granted like abuse at the workplace.
However, I found that the number of employees in the technology field that want to get treatment is slightly lower than no treatment. But the non-technical field is the opposite. Maybe the non-tech company give more support for employee to get treatment?
plot9 <- ggplot(mental_clean2) +
aes(x = coworkers, fill = treatment) +
geom_bar() +
scale_fill_hue(direction = 1) +
labs(title = "Coworkers of Survey Respondents") +
theme_classic()
plot10<- ggplot(mental_clean2) +
aes(x = supervisor, fill = treatment) +
geom_bar() +
scale_fill_hue(direction = 1) +
labs(title = "Supervisor of Survey Respondents") +
theme_classic()
ggplotly(plot9)ggplotly(plot10)This is the respondents result of question, ‘Would you be willing to discuss a mental health issue with your coworkers?’.
This is the respondents result of question, ‘Would you be willing to discuss a mental health issue with your direct supervisor(s)?’.
plot11 <- ggplot(mental_clean2) +
aes(x = obs_consequence, fill = treatment) +
geom_bar() +
scale_fill_hue(direction = 1) +
labs(title = "Observed Consequence of Survey Respondents") +
theme_classic()
ggplotly(plot11)This is the respondents result of question, ‘Have you heard of or observed negative consequences for coworkers with mental health conditions in your workplace?’. From all of respondents who say yes about knowing the negative consequences for coworkers with mental heatlh condition, almost 70% of them that want to get treatment. After the employee knows about the negative consequences, it becomes a good trigger for someone to get treatment to prevent mental health conditions.
plot12 <- ggplot(mental_clean2) +
aes(x = benefits, fill = treatment) +
geom_bar() +
scale_fill_hue(direction = 1) +
labs(title = "Employer Benefits Survey Respondents") +
theme_classic()
ggplotly(plot12)This is the respondents result of question, ‘Does your employer provide mental health benefits?’. Only around 1/3 of respondents know about mental health benefits that the company provides for them. For employees who know the benefits, almost 60% of the employees want to get treatment. Surprisingly, there is an employee who doesn’t know and says that the company doesn’t provide still want to get treatment. I assume that maybe the company can’t provide it properly because of budgeting or financial struggling.
plot13 <- ggplot(mental_clean2) +
aes(x = wellness_program, fill = treatment) +
geom_bar() +
scale_fill_hue(direction = 1) +
labs(title = "Wellness Program Survey Respondents") +
theme_classic()
ggplotly(plot13)This is the respondents result of question, ‘Has your employer ever discussed mental health as part of an employee wellness program?’. All of the repondents who say yes about become a part of employee wellness program, around 60% of employee want to get treatment. After become a part of wellness program, i assume that employee feels a good vibe about it.
Majority of respondents say that there aren’t any wellness programs that provide by their company. But half of the respondents want to get treatment, which means the company need to provide it soon. Based on my curiosity about company’s benefit before, I think it makes sense if it’s about company budgeting. I know it will spend a lot of money, moreover, the company has a lot of employees to taking care of. My second thought, it’s still about budgeting but for a small company, it’s will be a lot of struggle.
plot14 <- ggplot(mental_clean2) +
aes(x = anonymity, fill = treatment) +
geom_bar() +
scale_fill_hue(direction = 1) +
labs(title = "Anonymity Survey Respondents") +
theme_classic()
ggplotly(plot14)This is the respondents result of question, ‘Is your anonymity protected if you choose to take advantage of mental health or substance abuse treatment resources?’. Around 30% of respondents say yes if their anonymity is protected while taking advantage of mental health or substance abuse treatment resources and more than half of employees want to get treatment. The employee feels that the company protected their privacy and it’s a good move for the company to build trust with their employees. Because of that, the employee wants to get treatment to be better.
By providing employees access to mental health benefits, the company can begin to create a culture of understanding and compassion at the tech company. And having employees who feel cared for and happy isn’t just good, it’s good business.
Based on profiling the respondents, Companies must know that gender and family history greatly influence the decision to get treatment for employees. So if the company wants to provide more support, the company must make an assessment of the employee’s personality because different characters can determine different needs. Age can also be a trigger, considering that most of them are young so there is a high chance that they will be open-minded to get treatment.
Based on the work environment of respondents, Work interference is the most influential of employees who want to get treatment. This means the company should consider providing facilities to anticipate job stress on employees. Some of the companies decide to make a private room or silent room in case employees suddenly feel stress and need a private moment to relieve.
Based on the mental health facilities of respondents, The company needs to provide a good benefit for employees so they can maintain their mental health. If the company can don’t have resources for it, there are so many third parties who can be hired to maintain a wellness program for the company. Building trust like keep private about whom employee that gets treatment also can also give a trigger for employee want to get treatment.
So after we have done the EDA, next step is build the machine learning apps using R Shiny. The detail as follows:
Machine Learning : Supervised learning - Classification. I will try use 3 basic models and 4 ensemble models to predict.
Basic models:
Ensemble models:
Target Variable :
Predictor Variable :
Check Proportion of the target variable
prop.table(table(mental_clean2$treatment))#>
#> No Yes
#> 0.4948042 0.5051958
Cross validation
RNGkind(sample.kind = "Rounding")
set.seed(901)
index <- sample(nrow(mental_clean2),
nrow(mental_clean2) *0.8)
mental_train <- mental_clean2[index, ]
mental_test <- mental_clean2[-index, ] prop.table(table(mental_train$treatment))#>
#> No Yes
#> 0.492 0.508
prop.table(table(mental_test$treatment))#>
#> No Yes
#> 0.5059761 0.4940239
Model Fitting
set.seed(901)
model_mental1 <- glm(treatment ~ ., data = mental_train, family = "binomial")
summary(model_mental1)#>
#> Call:
#> glm(formula = treatment ~ ., family = "binomial", data = mental_train)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -2.6042 -0.3345 0.1470 0.5657 3.0040
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -6.270245 1.050890 -5.967 0.00000000242243824
#> Age 0.030615 0.014844 2.062 0.03916
#> GenderMale -0.625525 0.264289 -2.367 0.01794
#> GenderQueer -0.106362 0.752660 -0.141 0.88762
#> self_employedYes -0.269167 0.371419 -0.725 0.46864
#> family_historyYes 0.994806 0.207528 4.794 0.00000163821341525
#> work_interfereNever 2.527367 0.641595 3.939 0.00008175598010119
#> work_interfereOften 6.269190 0.677950 9.247 < 0.0000000000000002
#> work_interfereRarely 4.979120 0.634174 7.851 0.00000000000000412
#> work_interfereSometimes 5.629618 0.621845 9.053 < 0.0000000000000002
#> no_employees100-500 0.370461 0.447437 0.828 0.40769
#> no_employees26-100 0.235395 0.396009 0.594 0.55223
#> no_employees500-1000 0.240185 0.601238 0.399 0.68954
#> no_employees6-25 0.095547 0.368728 0.259 0.79554
#> no_employeesMore than 1000 -0.107364 0.434280 -0.247 0.80474
#> remote_workYes -0.105585 0.235832 -0.448 0.65436
#> tech_companyYes 0.026521 0.264340 0.100 0.92008
#> benefitsNo 0.246491 0.305606 0.807 0.41992
#> benefitsYes 0.453807 0.297866 1.524 0.12763
#> care_optionsNot sure -0.127078 0.275436 -0.461 0.64453
#> care_optionsYes 0.775483 0.270404 2.868 0.00413
#> wellness_programNo 0.041164 0.343696 0.120 0.90467
#> wellness_programYes -0.002521 0.415695 -0.006 0.99516
#> seek_helpNo -0.641289 0.296784 -2.161 0.03071
#> seek_helpYes -0.885077 0.372569 -2.376 0.01752
#> anonymityNo -0.118298 0.455709 -0.260 0.79518
#> anonymityYes 0.545620 0.263178 2.073 0.03815
#> leaveSomewhat difficult 0.311562 0.352825 0.883 0.37721
#> leaveSomewhat easy -0.541614 0.262469 -2.064 0.03906
#> leaveVery difficult -0.168617 0.387557 -0.435 0.66351
#> leaveVery easy 0.193794 0.336972 0.575 0.56522
#> mental_health_consequenceNo -0.157437 0.280416 -0.561 0.57450
#> mental_health_consequenceYes -0.249186 0.285645 -0.872 0.38301
#> phys_health_consequenceNo 0.169670 0.263001 0.645 0.51884
#> phys_health_consequenceYes -0.009532 0.473185 -0.020 0.98393
#> coworkersSome of them 0.437436 0.272071 1.608 0.10788
#> coworkersYes 1.089209 0.408831 2.664 0.00772
#> supervisorSome of them -0.389383 0.274613 -1.418 0.15621
#> supervisorYes -0.250359 0.324283 -0.772 0.44009
#> mental_health_interviewNo 0.533873 0.337731 1.581 0.11393
#> mental_health_interviewYes 0.681839 0.712726 0.957 0.33874
#> phys_health_interviewNo 0.210555 0.232338 0.906 0.36481
#> phys_health_interviewYes 0.722901 0.331206 2.183 0.02906
#> mental_vs_physicalNo -0.061048 0.255130 -0.239 0.81089
#> mental_vs_physicalYes 0.026089 0.278549 0.094 0.92538
#> obs_consequenceYes 0.310931 0.290390 1.071 0.28429
#>
#> (Intercept) ***
#> Age *
#> GenderMale *
#> GenderQueer
#> self_employedYes
#> family_historyYes ***
#> work_interfereNever ***
#> work_interfereOften ***
#> work_interfereRarely ***
#> work_interfereSometimes ***
#> no_employees100-500
#> no_employees26-100
#> no_employees500-1000
#> no_employees6-25
#> no_employeesMore than 1000
#> remote_workYes
#> tech_companyYes
#> benefitsNo
#> benefitsYes
#> care_optionsNot sure
#> care_optionsYes **
#> wellness_programNo
#> wellness_programYes
#> seek_helpNo *
#> seek_helpYes *
#> anonymityNo
#> anonymityYes *
#> leaveSomewhat difficult
#> leaveSomewhat easy *
#> leaveVery difficult
#> leaveVery easy
#> mental_health_consequenceNo
#> mental_health_consequenceYes
#> phys_health_consequenceNo
#> phys_health_consequenceYes
#> coworkersSome of them
#> coworkersYes **
#> supervisorSome of them
#> supervisorYes
#> mental_health_interviewNo
#> mental_health_interviewYes
#> phys_health_interviewNo
#> phys_health_interviewYes *
#> mental_vs_physicalNo
#> mental_vs_physicalYes
#> obs_consequenceYes
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 1386.04 on 999 degrees of freedom
#> Residual deviance: 694.46 on 954 degrees of freedom
#> AIC: 786.46
#>
#> Number of Fisher Scoring iterations: 7
library(car)
vif(model_mental1)#> GVIF Df GVIF^(1/(2*Df))
#> Age 1.265467 1 1.124930
#> Gender 1.217911 2 1.050519
#> self_employed 1.748977 1 1.322489
#> family_history 1.096573 1 1.047174
#> work_interfere 1.424562 4 1.045226
#> no_employees 3.101563 5 1.119845
#> remote_work 1.242115 1 1.114502
#> tech_company 1.156530 1 1.075421
#> benefits 2.918420 2 1.307034
#> care_options 1.998940 2 1.189050
#> wellness_program 2.742849 2 1.286917
#> seek_help 3.384830 2 1.356389
#> anonymity 1.671851 2 1.137102
#> leave 2.048889 4 1.093805
#> mental_health_consequence 2.692632 2 1.280986
#> phys_health_consequence 1.738562 2 1.148279
#> coworkers 1.948017 2 1.181403
#> supervisor 2.339783 2 1.236784
#> mental_health_interview 1.825899 2 1.162436
#> phys_health_interview 1.643544 2 1.132258
#> mental_vs_physical 1.871099 2 1.169564
#> obs_consequence 1.220991 1 1.104984
No multicolinearity (GVIF<10)
#linearity check
data.frame(prediction=model_mental1$fitted.values,
error=model_mental1$residuals) %>%
ggplot(aes(prediction,error)) +
geom_hline(yintercept=0) +
geom_point() +
geom_smooth() +
theme_bw()saveRDS(model_mental1, "model_logreg.RDS")Prediction
mental_test$pred_result <- predict(object = model_mental1,
newdata = mental_test,
type = "response")mental_test$pred_label <- ifelse(mental_test$pred_result < 0.5 ,"No", "Yes")
mental_test$pred_label <- as.factor(mental_test$pred_label)
head(mental_test)library(caret)
confusionMatrix(mental_test$pred_label, mental_test$treatment, positive = "Yes")#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction No Yes
#> No 97 9
#> Yes 30 115
#>
#> Accuracy : 0.8446
#> 95% CI : (0.7938, 0.8871)
#> No Information Rate : 0.506
#> P-Value [Acc > NIR] : < 0.00000000000000022
#>
#> Kappa : 0.6898
#>
#> Mcnemar's Test P-Value : 0.001362
#>
#> Sensitivity : 0.9274
#> Specificity : 0.7638
#> Pos Pred Value : 0.7931
#> Neg Pred Value : 0.9151
#> Prevalence : 0.4940
#> Detection Rate : 0.4582
#> Detection Prevalence : 0.5777
#> Balanced Accuracy : 0.8456
#>
#> 'Positive' Class : Yes
#>
Model Fitting
library(partykit)
set.seed(901)
model_dt <-ctree(treatment ~ ., mental_train)Prediction
pred_dt <- predict(model_dt, newdata = mental_test, type = "response")confusionMatrix(pred_dt, mental_test$treatment, positive = "Yes")#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction No Yes
#> No 86 8
#> Yes 41 116
#>
#> Accuracy : 0.8048
#> 95% CI : (0.7503, 0.8519)
#> No Information Rate : 0.506
#> P-Value [Acc > NIR] : < 0.00000000000000022
#>
#> Kappa : 0.6107
#>
#> Mcnemar's Test P-Value : 0.000004844
#>
#> Sensitivity : 0.9355
#> Specificity : 0.6772
#> Pos Pred Value : 0.7389
#> Neg Pred Value : 0.9149
#> Prevalence : 0.4940
#> Detection Rate : 0.4622
#> Detection Prevalence : 0.6255
#> Balanced Accuracy : 0.8063
#>
#> 'Positive' Class : Yes
#>
plot(model_dt, type="simple")Random Forest using a 5-Fold Cross Validation, with 3 repeats.
#set.seed(901)
#ctrl <- trainControl(method = "repeatedcv",
# number = 5,
# repeats = 3)
#model_forest <- train(treatment ~ .,
# data = mental_train,
# method = "rf",
# trControl = ctrl)
#saveRDS(model_forest, "model_forest_update.RDS")model_rf <- readRDS("model_forest_update.RDS")Prediction
pred_rf <- predict(model_rf, mental_test, type = "raw")confusionMatrix(pred_rf, mental_test$treatment, positive = "Yes")#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction No Yes
#> No 94 13
#> Yes 33 111
#>
#> Accuracy : 0.8167
#> 95% CI : (0.7632, 0.8626)
#> No Information Rate : 0.506
#> P-Value [Acc > NIR] : < 0.00000000000000022
#>
#> Kappa : 0.6341
#>
#> Mcnemar's Test P-Value : 0.005088
#>
#> Sensitivity : 0.8952
#> Specificity : 0.7402
#> Pos Pred Value : 0.7708
#> Neg Pred Value : 0.8785
#> Prevalence : 0.4940
#> Detection Rate : 0.4422
#> Detection Prevalence : 0.5737
#> Balanced Accuracy : 0.8177
#>
#> 'Positive' Class : Yes
#>
After We check the results of the model, our logistic regression model has better result.