email: “tanya.harsh@gmail.com” College: IIIT-Delhi
This dataset is from a 2014 survey that measures attitudes towards mental health and frequency of mental health disorders in the tech workplace.
A crucial aspect of a healthy and productive workplace is management’s understanding of the importance of mental health. Given the tech industry’s rapid growth over the past few decades, it would be valuable to examine the industry’s employee’s take on mental health.
This dataset contains the following data:
Timestamp
Age
Gender
Country
state: If you live in the United States, which state or territory do you live in?
self_employed: Are you self-employed?
family_history: Do you have a family history of mental illness?
treatment: Have you sought treatment for a mental health condition?
work_interfere: If you have a mental health condition, do you feel that it interferes with your work?
no_employees: How many employees does your company or organization have?
remote_work: Do you work remotely (outside of an office) at least 50% of the time?
tech_company: Is your employer primarily a tech company/organization?
benefits: Does your employer provide mental health benefits?
care_options: Do you know the options for mental health care your employer provides?
wellness_program: Has your employer ever discussed mental health as part of an employee wellness program?
seek_help: Does your employer provide resources to learn more about mental health issues and how to seek help?
anonymity: Is your anonymity protected if you choose to take advantage of mental health or substance abuse treatment resources?
leave: How easy is it for you to take medical leave for a mental health condition?
mental_health_consequence: Do you think that discussing a mental health issue with your employer would have negative consequences?
phys_health_consequence: Do you think that discussing a physical health issue with your employer would have negative consequences?
coworkers: Would you be willing to discuss a mental health issue with your coworkers?
supervisor: Would you be willing to discuss a mental health issue with your direct supervisor(s)?
mental_health_interview: Would you bring up a mental health issue with a potential employer in an interview?
phys_health_interview: Would you bring up a physical health issue with a potential employer in an interview?
mental_vs_physical: Do you feel that your employer takes mental health as seriously as physical health?
obs_consequence: Have you heard of or observed negative consequences for coworkers with mental health conditions in your workplace?
comments: Any additional notes or comments
Read your dataset in R and visualize the length and breadth of your dataset.
health.df <- read.csv(paste("survey.csv", sep=""))
#View(health.df)
dim(health.df)
## [1] 1259 27
Cleaning data
health.df<-subset(health.df,select = -c(comments))
health.df<-subset(health.df,select = -c(Timestamp))
dim(health.df)
## [1] 1259 25
Create a descriptive statistics (min, max, median etc) of each variable.
str(health.df)
## 'data.frame': 1259 obs. of 25 variables:
## $ Age : num 37 44 32 31 31 33 35 39 42 23 ...
## $ Gender : Factor w/ 49 levels "A little about you",..: 16 24 30 30 30 30 16 24 16 30 ...
## $ Country : Factor w/ 48 levels "Australia","Austria",..: 46 46 8 45 46 46 46 8 46 8 ...
## $ state : Factor w/ 45 levels "AL","AZ","CA",..: 11 12 NA NA 38 37 19 NA 11 NA ...
## $ self_employed : Factor w/ 2 levels "No","Yes": NA NA NA NA NA NA NA NA NA NA ...
## $ family_history : Factor w/ 2 levels "No","Yes": 1 1 1 2 1 2 2 1 2 1 ...
## $ treatment : Factor w/ 2 levels "No","Yes": 2 1 1 2 1 1 2 1 2 1 ...
## $ work_interfere : Factor w/ 4 levels "Never","Often",..: 2 3 3 2 1 4 4 1 4 1 ...
## $ no_employees : Factor w/ 6 levels "1-5","100-500",..: 5 6 5 3 2 5 1 1 2 3 ...
## $ remote_work : Factor w/ 2 levels "No","Yes": 1 1 1 1 2 1 2 2 1 1 ...
## $ tech_company : Factor w/ 2 levels "No","Yes": 2 1 2 2 2 2 2 2 2 2 ...
## $ benefits : Factor w/ 3 levels "Don't know","No",..: 3 1 2 2 3 3 2 2 3 1 ...
## $ care_options : Factor w/ 3 levels "No","Not sure",..: 2 1 1 3 1 2 1 3 3 1 ...
## $ wellness_program : Factor w/ 3 levels "Don't know","No",..: 2 1 2 2 1 2 2 2 2 1 ...
## $ seek_help : Factor w/ 3 levels "Don't know","No",..: 3 1 2 2 1 1 2 2 2 1 ...
## $ anonymity : Factor w/ 3 levels "Don't know","No",..: 3 1 1 2 1 1 2 3 2 1 ...
## $ leave : Factor w/ 5 levels "Don't know","Somewhat difficult",..: 3 1 2 2 1 1 2 1 4 1 ...
## $ mental_health_consequence: Factor w/ 3 levels "Maybe","No","Yes": 2 1 2 3 2 2 1 2 1 2 ...
## $ phys_health_consequence : Factor w/ 3 levels "Maybe","No","Yes": 2 2 2 3 2 2 1 2 2 2 ...
## $ coworkers : Factor w/ 3 levels "No","Some of them",..: 2 1 3 2 2 3 2 1 3 3 ...
## $ supervisor : Factor w/ 3 levels "No","Some of them",..: 3 1 3 1 3 3 1 1 3 3 ...
## $ mental_health_interview : Factor w/ 3 levels "Maybe","No","Yes": 2 2 3 1 3 2 2 2 2 1 ...
## $ phys_health_interview : Factor w/ 3 levels "Maybe","No","Yes": 1 2 3 1 3 1 2 2 1 1 ...
## $ mental_vs_physical : Factor w/ 3 levels "Don't know","No",..: 3 1 2 2 1 1 1 2 2 3 ...
## $ obs_consequence : Factor w/ 2 levels "No","Yes": 1 1 1 2 1 1 1 1 1 1 ...
summary(health.df)
## Age Gender Country state
## Min. :-1.726e+03 Male :615 United States :751 CA :138
## 1st Qu.: 2.700e+01 male :206 United Kingdom:185 WA : 70
## Median : 3.100e+01 Female :121 Canada : 72 NY : 57
## Mean : 7.943e+07 M :116 Germany : 45 TN : 45
## 3rd Qu.: 3.600e+01 female : 62 Ireland : 27 TX : 44
## Max. : 1.000e+11 F : 38 Netherlands : 27 (Other):390
## (Other):101 (Other) :152 NA's :515
## self_employed family_history treatment work_interfere
## No :1095 No :767 No :622 Never :213
## Yes : 146 Yes:492 Yes:637 Often :144
## NA's: 18 Rarely :173
## Sometimes:465
## NA's :264
##
##
## no_employees remote_work tech_company benefits
## 1-5 :162 No :883 No : 228 Don't know:408
## 100-500 :176 Yes:376 Yes:1031 No :374
## 26-100 :289 Yes :477
## 500-1000 : 60
## 6-25 :290
## More than 1000:282
##
## care_options wellness_program seek_help anonymity
## No :501 Don't know:188 Don't know:363 Don't know:819
## Not sure:314 No :842 No :646 No : 65
## Yes :444 Yes :229 Yes :250 Yes :375
##
##
##
##
## leave mental_health_consequence
## Don't know :563 Maybe:477
## Somewhat difficult:126 No :490
## Somewhat easy :266 Yes :292
## Very difficult : 98
## Very easy :206
##
##
## phys_health_consequence coworkers supervisor
## Maybe:273 No :260 No :393
## No :925 Some of them:774 Some of them:350
## Yes : 61 Yes :225 Yes :516
##
##
##
##
## mental_health_interview phys_health_interview mental_vs_physical
## Maybe: 207 Maybe:557 Don't know:576
## No :1008 No :500 No :340
## Yes : 44 Yes :202 Yes :343
##
##
##
##
## obs_consequence
## No :1075
## Yes: 184
##
##
##
##
##
Create contingency tables for the categorical variables in your dataset.
t <- xtabs(~Age , data= health.df)
t
## Age
## -1726 -29 -1 5 8 11
## 1 1 1 1 1 1
## 18 19 20 21 22 23
## 7 9 6 16 21 51
## 24 25 26 27 28 29
## 46 61 75 71 68 85
## 30 31 32 33 34 35
## 63 67 82 70 65 55
## 36 37 38 39 40 41
## 37 43 39 33 33 21
## 42 43 44 45 46 47
## 20 28 11 12 12 2
## 48 49 50 51 53 54
## 6 4 6 5 1 3
## 55 56 57 58 60 61
## 3 4 3 1 2 1
## 62 65 72 329 99999999999
## 1 1 1 1 1
prop.table(t)
## Age
## -1726 -29 -1 5 8
## 0.0007942812 0.0007942812 0.0007942812 0.0007942812 0.0007942812
## 11 18 19 20 21
## 0.0007942812 0.0055599682 0.0071485306 0.0047656871 0.0127084988
## 22 23 24 25 26
## 0.0166799047 0.0405083400 0.0365369341 0.0484511517 0.0595710882
## 27 28 29 30 31
## 0.0563939635 0.0540111199 0.0675138999 0.0500397141 0.0532168388
## 32 33 34 35 36
## 0.0651310564 0.0555996823 0.0516282764 0.0436854647 0.0293884035
## 37 38 39 40 41
## 0.0341540905 0.0309769658 0.0262112788 0.0262112788 0.0166799047
## 42 43 44 45 46
## 0.0158856235 0.0222398729 0.0087370929 0.0095313741 0.0095313741
## 47 48 49 50 51
## 0.0015885624 0.0047656871 0.0031771247 0.0047656871 0.0039714059
## 53 54 55 56 57
## 0.0007942812 0.0023828435 0.0023828435 0.0031771247 0.0023828435
## 58 60 61 62 65
## 0.0007942812 0.0015885624 0.0007942812 0.0007942812 0.0007942812
## 72 329 99999999999
## 0.0007942812 0.0007942812 0.0007942812
t <- xtabs(~Gender , data= health.df)
t
## Gender
## A little about you
## 1
## Agender
## 1
## All
## 1
## Androgyne
## 1
## Cis Female
## 1
## cis male
## 1
## Cis Male
## 2
## Cis Man
## 1
## cis-female/femme
## 1
## Enby
## 1
## f
## 15
## F
## 38
## femail
## 1
## Femake
## 1
## female
## 62
## Female
## 121
## Female
## 2
## Female (cis)
## 1
## Female (trans)
## 2
## fluid
## 1
## Genderqueer
## 1
## Guy (-ish) ^_^
## 1
## m
## 34
## M
## 116
## Mail
## 1
## maile
## 1
## Make
## 4
## Mal
## 1
## male
## 206
## Male
## 615
## Male
## 3
## Male (CIS)
## 1
## male leaning androgynous
## 1
## Male-ish
## 1
## Malr
## 1
## Man
## 2
## msle
## 1
## Nah
## 1
## Neuter
## 1
## non-binary
## 1
## ostensibly male, unsure what that really means
## 1
## p
## 1
## queer
## 1
## queer/she/they
## 1
## something kinda male?
## 1
## Trans woman
## 1
## Trans-female
## 1
## woman
## 1
## Woman
## 3
prop.table(t)
## Gender
## A little about you
## 0.0007942812
## Agender
## 0.0007942812
## All
## 0.0007942812
## Androgyne
## 0.0007942812
## Cis Female
## 0.0007942812
## cis male
## 0.0007942812
## Cis Male
## 0.0015885624
## Cis Man
## 0.0007942812
## cis-female/femme
## 0.0007942812
## Enby
## 0.0007942812
## f
## 0.0119142176
## F
## 0.0301826847
## femail
## 0.0007942812
## Femake
## 0.0007942812
## female
## 0.0492454329
## Female
## 0.0961080222
## Female
## 0.0015885624
## Female (cis)
## 0.0007942812
## Female (trans)
## 0.0015885624
## fluid
## 0.0007942812
## Genderqueer
## 0.0007942812
## Guy (-ish) ^_^
## 0.0007942812
## m
## 0.0270055600
## M
## 0.0921366164
## Mail
## 0.0007942812
## maile
## 0.0007942812
## Make
## 0.0031771247
## Mal
## 0.0007942812
## male
## 0.1636219222
## Male
## 0.4884829230
## Male
## 0.0023828435
## Male (CIS)
## 0.0007942812
## male leaning androgynous
## 0.0007942812
## Male-ish
## 0.0007942812
## Malr
## 0.0007942812
## Man
## 0.0015885624
## msle
## 0.0007942812
## Nah
## 0.0007942812
## Neuter
## 0.0007942812
## non-binary
## 0.0007942812
## ostensibly male, unsure what that really means
## 0.0007942812
## p
## 0.0007942812
## queer
## 0.0007942812
## queer/she/they
## 0.0007942812
## something kinda male?
## 0.0007942812
## Trans woman
## 0.0007942812
## Trans-female
## 0.0007942812
## woman
## 0.0007942812
## Woman
## 0.0023828435
t <- xtabs(~Country , data= health.df)
t
## Country
## Australia Austria Bahamas, The
## 21 3 1
## Belgium Bosnia and Herzegovina Brazil
## 6 1 6
## Bulgaria Canada China
## 4 72 1
## Colombia Costa Rica Croatia
## 2 1 2
## Czech Republic Denmark Finland
## 1 2 3
## France Georgia Germany
## 13 1 45
## Greece Hungary India
## 2 1 10
## Ireland Israel Italy
## 27 5 7
## Japan Latvia Mexico
## 1 1 3
## Moldova Netherlands New Zealand
## 1 27 8
## Nigeria Norway Philippines
## 1 1 1
## Poland Portugal Romania
## 7 2 1
## Russia Singapore Slovenia
## 3 4 1
## South Africa Spain Sweden
## 6 1 7
## Switzerland Thailand United Kingdom
## 7 1 185
## United States Uruguay Zimbabwe
## 751 1 1
prop.table(t)
## Country
## Australia Austria Bahamas, The
## 0.0166799047 0.0023828435 0.0007942812
## Belgium Bosnia and Herzegovina Brazil
## 0.0047656871 0.0007942812 0.0047656871
## Bulgaria Canada China
## 0.0031771247 0.0571882446 0.0007942812
## Colombia Costa Rica Croatia
## 0.0015885624 0.0007942812 0.0015885624
## Czech Republic Denmark Finland
## 0.0007942812 0.0015885624 0.0023828435
## France Georgia Germany
## 0.0103256553 0.0007942812 0.0357426529
## Greece Hungary India
## 0.0015885624 0.0007942812 0.0079428118
## Ireland Israel Italy
## 0.0214455917 0.0039714059 0.0055599682
## Japan Latvia Mexico
## 0.0007942812 0.0007942812 0.0023828435
## Moldova Netherlands New Zealand
## 0.0007942812 0.0214455917 0.0063542494
## Nigeria Norway Philippines
## 0.0007942812 0.0007942812 0.0007942812
## Poland Portugal Romania
## 0.0055599682 0.0015885624 0.0007942812
## Russia Singapore Slovenia
## 0.0023828435 0.0031771247 0.0007942812
## South Africa Spain Sweden
## 0.0047656871 0.0007942812 0.0055599682
## Switzerland Thailand United Kingdom
## 0.0055599682 0.0007942812 0.1469420175
## United States Uruguay Zimbabwe
## 0.5965051628 0.0007942812 0.0007942812
t <- xtabs(~family_history , data= health.df)
t
## family_history
## No Yes
## 767 492
prop.table(t)
## family_history
## No Yes
## 0.6092137 0.3907863
t <- xtabs(~treatment , data= health.df)
t
## treatment
## No Yes
## 622 637
prop.table(t)
## treatment
## No Yes
## 0.4940429 0.5059571
t <- xtabs(~work_interfere , data= health.df)
t
## work_interfere
## Never Often Rarely Sometimes
## 213 144 173 465
prop.table(t)
## work_interfere
## Never Often Rarely Sometimes
## 0.2140704 0.1447236 0.1738693 0.4673367
t <- xtabs(~tech_company , data= health.df)
t
## tech_company
## No Yes
## 228 1031
prop.table(t)
## tech_company
## No Yes
## 0.1810961 0.8189039
Draw plots for your suitable data fields.
barplot(table(health.df$Age), main = "Effect of Age", xlab = "Age", ylab = "Number of people" )
barplot(table(health.df$Gender), main = "Effect of Gender", xlab = "Gender", ylab = "Number of people")
barplot(table(health.df$Country), main = "Effect of Country", xlab = "Country", ylab = "Number of people")
barplot(table(health.df$family_history), main = "Do you have a family history of mental illness?", ylab = "Number of people")
barplot(table(health.df$treatment), main = "Have you sought treatment for a mental health condition?", ylab = "Number of people")
barplot(table(health.df$work_interfere), main = " If you have a mental health condition, do you feel that it interferes with your work?", ylab = "Number of people")
barplot(table(health.df$no_employees), main = "How many employees does your company or organization have?", ylab = "Number of people")
barplot(table(health.df$tech_company), main = " Is your employer primarily a tech company/organization?", ylab = "Number of people")
barplot(table(health.df$benefits), main = "Does your employer provide mental health benefits?", ylab = "Number of people")
barplot(table(health.df$leave), main = "How easy is it for you to take medical leave for a mental health condition?", ylab = "Number of people")
barplot(table(health.df$mental_health_consequence), main = "Do you think that discussing a mental health issue with your employer would have negative consequences?", ylab = "Number of people")
barplot(table(health.df$mental_vs_physical), main = "Do you feel that your employer takes mental health as seriously as physical health?", ylab = "Number of people")
Analysis indicates the following:
A large percentage of people in the tech industry have a family history of mental illness
A much larger percentage of people in the tech industry experience mental health issues, indicated by their seeking of treatment.
Nearly 50% of the sample indicated that they sometimes or often encounter interference with their work due to a mental health condition, whereas the other 50% did not answer, or indicated they never or rarely encounter such interference.
There is a need for help resources.
There is a lack of system for employees to discuss or take medical leave for mental health issues. Furthermore, a majority of respondents would not feel that discussing their mental health issues might cause negative consequences.