Biostatistics Project
Introduction
Nowadays the mental health debate is gaining more and more attention, especially around the workplace. A company is no longer defined as successful solely because of the income it generates, employers must strive towards a more responsible approach and become aware of the ethical dimension of their actions.
From becoming more eco-friendly to looking after their workers’ well-being, companies must now build a reputation from the ground up to attract a higher number of customers and add to their success.
In recent years, or even months due to the COVID-19 pandemic, the technological sector has grown significantly, and this means that the number of people working for technological companies is also increasing.
Auditing and surveying employees on how they feel their employers handle mental health issues, is a surefire way to ensure that these companies are placed under scrutiny and held accountable for their employees’ welfare. This is especially true for the technological industry, since it is creating more and more jobs every year and these employees need to be taken care of by their employers.
This class project aims at shining a light into how mental health is perceived in the workplace, and therefore at identifying if companies are succeeding at guaranteeing their employees’ comfort and well-being.
The database I used can be found online by clicking on the following hyperlink:
https://www.kaggle.com/osmi/mental-health-in-tech-survey#survey.csv
The data corresponds to a mental health survey carried out in 2014 (both ‘tech’ and ’non-tech’employees were surveyed)
Thesis Statement
The aim of this project is to prove that, even though employees’ mental health is more and more taken care of in the workplace, a significant bias still exists between an employer’s perception of physical illness and mental illness.
I will mainly be focusing on how technological companies compare to non-technological ones in terms of how much of a bias there is towards mental health. Because this industry is more recent, I believe it will be more sensitive towards worker’s mental health than other more traditional companies and the bias regarding it will be smaller.
Data Visualization and analysis approach
The survey included the following information for every sruveyed employee:
Timestamp
Age
Gender
Country
state: If you live in the United States, which state or territory do you live in?
self_employed: Are you self-employed?
family_history: Do you have a family history of mental illness?
treatment: Have you sought treatment for a mental health condition?
work_interfere: If you have a mental health condition, do you feel that it interferes with your work?
no_employees: How many employees does your company organization have?
remote_work: Do you work remotely (outside of an office) at least 50% of the time?
tech_company: Is your employer primarily a tech company/organization?
benefits: Does your employer provide mental health benefits?
care_options: Do you know the options for mental health care your employer provides?
wellness_program: Has your employer ever discussed mental health as part of an employee wellness program?
seek_help: Does your employer provide resources to learn more about mental health issues and how to seek help?
anonymity: Is your anonymity protected if you choose to take advantage of mental health or substance abuse treatment resources?
leave: How easy is it for you to take medical leave for a mental health condition?
mentalhealthconsequence: Do you think that discussing a mental health issue with your employer would have negative consequences?
physhealthconsequence: Do you think that discussing a physical health issue with your employer would have negative consequences?
coworkers: Would you be willing to discuss a mental health issue with your coworkers?
supervisor: Would you be willing to discuss a mental health issue with your direct supervisor(s)?
mentalhealthinterview: Would you bring up a mental health issue with a potential employer in an interview?
physhealthinterview: Would you bring up a physical health issue with a potential employer in an interview?
mentalvsphysical: Do you feel that your employer takes mental health as seriously as physical health?
obs_consequence: Have you heard of or observed negative consequences for coworkers with mental health conditions in your workplace?
comments: Any additional notes or comments
In order to explore if employees believe their employers are biased in how they deal with mental health, I have selected a few key questions out of all those shown above. I find the employees’ answers to said questions provide the most relevant information for me to prove my thesis statement.
I will crosslink the information provided by the following questions in order to explore the presence of a mental health stigma both in technological and non-technological workplaces:
‘mentalhealthconsequence’: Do you think that discussing a mental health issue with your employer would have negative consequences?
‘physhealthconsequence’: Do you think that discussing a physical health issue with your employer would have negative consequences?
‘mentalhealthinterview’: Would you bring up a mental health issue with a potential employer in an interview?
‘physhealthinterview’: Would you bring up a physical health issue with a potential employer in an interview?
‘techcompany’: Is your employer primarily a tech company/organization?
‘mentalvsphysical’: Do you feel that your employer takes mental health as seriously as physical health?
‘obsconsequence’: Have you heard of or observed negative consequences for coworkers with mental health conditions in your workplace?
‘leave’: How easy is it for you to take medical leave for a mental health condition?
‘benefits’: Does your employer provide mental health benefits?
‘wellness_program’: Has your employer ever discussed mental health as part of an employee wellness program?
Below you can see a sample of the data:
#Three R packages are required for the code to run smoothly:'readxl', 'ggplot2', and 'dplyr'
#When using the read_excel function, you will most likely need to either modify the path leading to the document named Datos_Mariana_Fuentes manually or include the document in your current path.This is required for the program to run smoothly.
library("readxl")
library("dplyr")
library("ggplot2")
rawdata<-read_excel("C:\\Users\\maria\\OneDrive\\Escritorio\\Mariana Cuarto Biomed TECNUN\\Bioestadistica\\Datos_Mariana_Fuentes.xlsx")## # A tibble: 10 x 27
## Timestamp Age Gender Country state self_employed family_history
## <dttm> <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 2014-08-29 09:58:55 46 Femal~ United~ CT No No
## 2 2014-08-30 20:19:37 35 Male United~ NY No No
## 3 2014-08-29 09:33:43 26 female United~ OH No No
## 4 2014-08-28 01:30:12 31 Female United~ WA No No
## 5 2014-08-28 17:59:54 51 m United~ CO Yes No
## 6 2014-08-28 03:42:58 31 female Belgium NA No Yes
## 7 2014-08-28 22:17:15 38 Male United~ WA No Yes
## 8 2014-08-27 15:25:47 29 Male Canada NA No No
## 9 2014-08-28 17:50:32 24 Male United~ TX No No
## 10 2014-08-27 12:17:01 31 male Mexico NA No Yes
## # ... with 20 more variables: treatment <chr>, work_interfere <chr>,
## # no_employees <chr>, remote_work <chr>, tech_company <chr>, benefits <chr>,
## # care_options <chr>, wellness_program <chr>, seek_help <chr>,
## # anonymity <chr>, leave <chr>, mental_health_consequence <chr>,
## # phys_health_consequence <chr>, coworkers <chr>, supervisor <chr>,
## # mental_health_interview <chr>, phys_health_interview <chr>,
## # mental_vs_physical <chr>, obs_consequence <chr>, comments <chr>
Since the data is categorical I will work with contingency tables and logistic regression to carry out my analysis.
My null hypothesis will therefore be the opposite of the thesis statement: there is no difference between an employer’s perception of mental or physical health.
The dataset has a significant amount of datapoints, and so I will be carrying out my data analysis on a couple of questions at a time.
For each of the survey questions I analyze, the null hypothesis will be the equivalent of the above statement adapted to that specific case.
First line of inquiry: Are negative consequences more likely for mental health issues than for physical health issues?
The first two questions I selected in my analysis are the following:
Do you think that discussing a mental health issue with your employer would have negative consequences? variable name associated to this question: ‘mentalhealthconsequence’
Do you think that discussing a physical health issue with your employer would have negative consequences? variable name associated to this question:‘physhealthconsequence’
Analysing the answers to the questions above, allow me to dive into the employee’s perception of their work environment.
The results will show if workers believe that their job could be endangered due to a mental health issue, or if their employer is neglecting their mental health.
The null hypothesis in this case, is the following statement:
“In an employee’s opinion, there is no difference in how discussing a physical health issue or a mental health issue with their employers will affect their current position”. The first step is to create a contingency table for the results of these two questions, which I show and plot below:
##
## Maybe No Yes
## Maybe 153 320 4
## No 8 482 0
## Yes 112 123 57
Then, I carry out a Chi-squared test:
##
## Pearson's Chi-squared test
##
## data: cont_table_1
## X-squared = 404.42, df = 4, p-value < 2.2e-16
The resulting p-value is well below the threshold value, which leads to disregarding the null hypothesis.
This implies that there is a significant difference between how employees think discussing mental or physical health with their employers could affect their jobs (in terms of negative consequences).
So far, there seems to be a bias between how mental health and physical health are perceived by employers.
Second step in the analysis: How likely is it for employees to bring up mental or physical health issues during a job interview?
After my first step, I decided to look into how employees would deal with mentioning mental health issues during interviews, in contrast to how they would do so with physical health issues. For this purpose, I selected the following two questions:
Would you bring up a mental health issue with a potential employer in an interview? variable name associated to this question: ‘mentalhealthinterview’
Would you bring up a physical health issue with a potential employer in an interview? variable name associated to this question: ‘physhealthinterview’
Here the null hypothesis will be the following statement:
“Employees are equally as likely to bring up mental or physical health issues during an interview with their potential employer.”
Again, the first step will be to create a contingency table for these two questions and their results:
##
## Maybe No Yes
## Maybe 143 6 58
## No 413 492 103
## Yes 1 2 41
Then, I will carry out the chi-squared test:
##
## Pearson's Chi-squared test
##
## data: cont_table_2
## X-squared = 357.17, df = 4, p-value < 2.2e-16
Once again, the p value is well below 0.05 (the threshold value), the null hypothesis can be disregarded.
Therefore, there is a relevant difference between how comfortable employees are with mentioning physical health and mental health in an interview.
Conclusions for all sectors (no focus on tech yet)
These two Chi-squared tests were carried out on two very significant questions without taking into account whether the employees worked in a technological company or not.
The first conclusion I reach in my analysis is that, no matter the sector in which they work, employees do feel there are certain disadvantages to having a mental health condition.
They fear negative consequences for mentioning mental health issues in their current employment situation, and they would fear mentioning them when interviewing for any other job positions. This, however, is not the case for physical health
This further reinforces the notion that there might be a bias between how mental health and physical health are perceived by all employers, which was briefly mentioned in the beginning of the introduction.
Knowing this makes it worth continuing with the analysis and exploring how this bias exists in the technological industry in comparison to other industries, meaning whether it is more or less prominent.
Does this bias decrease when working in a technological company?
For the final part of the analysis, I picked the questions below to prove or dismiss my hypothesis, which is that there is a far less prominent bias in the technological companies than in other industries.
- Is your employer primarily a tech company/organization? variable name associated to this question: ‘techcompany’
- Do you feel that your employer takes mental health as seriously as physical health? variable name associated to this question: ‘mentalvsphysical’
- Have you heard of or observed negative consequences for coworkers with mental health conditions in your workplace? variable name associated to this question: ‘obsconsequence’
My aim is to create two distinct contingency tables (and therefore analyse two different phenomena).
The first one will link these two concepts:
whether the employer takes mental health as seriously as physical health
whether the employees work in the tech sector
The second table will link the negative consequences of talking about mental health, with working in the tech sector.
Therefore, there are two null hypotheses for this analysis: Firstly: “Working in a tech company does not affect how seriously a worker’s superior may take mental health in comparison to physical health.”
Secondly: “Working in a tech company has nothing to do with whether employees have suffered negative consequences in the workplace due to their mental health.”
Hopefully this 2 step analysis will allow me to find out if this bias is as strong in technological companies as it is non-tech companies, or if there is any difference
For the first step, I will create, print and plot the two contingency tables:
cont_table_3<-table(rawdata$tech_company,rawdata$mental_vs_physical)
cont_table_4<-table(rawdata$tech_company,rawdata$obs_consequence)##
## Don't know No Yes
## No 98 86 44
## Yes 478 254 299
##
## No Yes
## No 184 44
## Yes 891 140
Then, I will carry out the chi-squared test for each of the tables:
##
## Pearson's Chi-squared test
##
## data: cont_table_3
## X-squared = 18.752, df = 2, p-value = 8.473e-05
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: cont_table_4
## X-squared = 4.4464, df = 1, p-value = 0.03497
Again, the p values are lower than 0.05 for both cases (especially the first one). This implies that there might be an internal bias regarding how tech companies approach mental and physical health, meaning it might be lower or higher than in a more traditional industry.
The second contingency table and chi-squared test is one of the few relationships where both questions can only have a ‘Yes’ or ‘No’ answer, and this will enable me to carry out a logistic regression.
Through this logistic regression, I will find out what the probabilities are for a technological employer to have a bias regarding mental health. This will help me either confirm or diregard my theory that tech companies have less of a bias regarding mental health issues in comparison to other companies.
The first step is to create the model I need to carry out the logistic regression. For this I will use the
M<-rbind(c(140,891),c(44,184))
workstech<-c(1,0)
model<-glm(M~workstech, family = 'binomial')
summary(model)##
## Call:
## glm(formula = M ~ workstech, family = "binomial")
##
## Deviance Residuals:
## [1] 0 0
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.4307 0.1678 -8.526 <2e-16 ***
## workstech -0.4200 0.1909 -2.200 0.0278 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 4.6068e+00 on 1 degrees of freedom
## Residual deviance: 9.7700e-15 on 0 degrees of freedom
## AIC: 16.046
##
## Number of Fisher Scoring iterations: 3
After carrying out this logistic regression, I have obtained a value for the logarithm in base 10 of the odds ratio of the corresponding contingency table. The value I have obtained is -0.42.
I will now proceed to finding the value of the odds ratio to determine what type of relationship can be found between the two variables:
## The odds ratio value is: 0.3801894
The odds ratio is 0.3802, which is smaller than one, therefore that confirms that there is a negative relationship between working in tech and observing negative consequences This means that someone that works in tech is 0.3802 times as likely, meaning 61.98% less odds, of experiencing or observe negative consequences due to mental health issues that someone that works in another industry.
Overall, both p-values were below the threshold value, which implied that working in the tech sector did have an effect on the questions. The logistic regression I carried out confirms this and supports my hypothesis. Working in the tech sector implies a drastic decrease in the odds of observing negative consequences due to mental health issues.
Care options
In this final section of contingency tables an chi-squared analysis, I would like to analyze the healtcare options that employees might have access to regarding mental health. I will be analyzing whether working in the tech sector affects the healthcare ressources available to employees.
The questions I will analyze are:
- Is your employer primarily a tech company/organization? variable name associated to this question:‘techcompany’
- How easy is it for you to take medical leave for a mental health condition? variable name associated to this question:‘leave’
- Does your employer provide mental health benefits? variable name associated to this question:‘benefits’
- Has your employer ever discussed mental health as part of an employee wellness program? variable name associated to this question:‘wellness_program’
Here I will have three different null hypotheses, which are the following:
“Working in a tech company does not affect how easy it is for an employee to take a medical leave due to mental health.”
“Working in a tech company does not affect the benefits regarding mental health that are available to employees.”
“Working on a tech company has no impact on the wellness programs that the employees may access.”
Firstly, I will create the contingency tables.
cont_table_5<-table(rawdata$tech_company,rawdata$leave)
cont_table_6<-table(rawdata$tech_company,rawdata$benefits)
cont_table_7<-table(rawdata$tech_company,rawdata$wellness_program)##
## Don't know Somewhat difficult Somewhat easy Very difficult Very easy
## No 107 28 47 19 27
## Yes 456 98 219 79 179
##
## Don't know No Yes
## No 73 51 104
## Yes 335 323 373
##
## Don't know No Yes
## No 27 134 67
## Yes 161 708 162
Then, I will carry out the chi-squared tests.
##
## Pearson's Chi-squared test
##
## data: cont_table_5
## X-squared = 5.361, df = 4, p-value = 0.2522
##
## Pearson's Chi-squared test
##
## data: cont_table_6
## X-squared = 9.4468, df = 2, p-value = 0.008885
##
## Pearson's Chi-squared test
##
## data: cont_table_7
## X-squared = 23.707, df = 2, p-value = 7.113e-06
In the analysis for the care options that I just carried out one result stands out: the p-value for how easy it is to get a medical leave is above 0.05. This implies that the null hypothesis is confirmed. There is no difference between tech and non-tech companies in how difficult it is to get medical leave for mental health conditions.
For how easy it is to get a leave, I decided to create the following plot (I discarded the ‘Don’t know’ answers for more clarity since they don’t give us much information)
values<-c(19, 28, 47, 27,9,98,219,179)
workers<-c("Non-tech workers","Non-tech workers","Non-tech workers","Non-tech workers","Tech workers","Tech workers","Tech workers","Tech workers")
categories<-c("Very Difficult","Somewhat Difficult","Somewhat Easy","Very Easy","Very Difficult","Somewhat Difficult","Somewhat Easy","Very Easy")
df6<-data.frame(values,workers,categories)
ggplot(df6, aes(factor(categories), values, fill = workers)) +
geom_bar(stat="identity", position = "dodge") +
scale_fill_brewer(palette = "Set1")The other two results however, present lower p-values than the significance level. This implies that working in the tech sector does have an effect the benefits and wellness programs that employees have access to. If I base myself off the previous results (logistic regression), one could think that there might be a larger range of options available for the tech sector.
Overall conclusions
So far I have been able to confirm that:
- There is a difference between the consequences that employees believe they will face when discussing their mental VS physical health.
- There is a difference bewteen how likely it is for an employee to talk about mental VS physical health during and interview.
- Working in the tech sector has an effect on the following things:
- The seriousness with which a superior takes a physical health VS mental health issue.
- Wether negative consequences of suffering from mental illness have been observed in the workplace (less likely to happen in tech companies)
- The healthcare benefits regarding mental health that employees have access to.
- The wellness programs that employees have access to.
This all seems to indicate that my initial hypothesis statement is correct and that there is a bias in how mental health and physical health are treated in the workplace.
This also point to the fact that working in the tech sector definitely affects the way mental health is regarded in comparison to the non-tech sector. If we base ourselves on the logistic regression results, perhaps the initial hypothesis, where working in tech means less of a bias towards mental health issues, is correct.