Prediksi Mental Health

Pendahuluan

Penyakit Mental health merupakan salah satu dari empat masalah kesehatan umum di negara-negara maju, tetapi masih kurang populer di kalangan masyarakat awan. Dimasa lalu banyak orang menaggap mental health merupakan penyakit yang tidak dapat diobati (Hwari, 2001). gangguan jiwa adalah gangguan pada fungsi mental, yang meliputi emosi, pikiran, prilaku, motivasi daya titik diri dari persepsi yang menyebabkan penuruan semua fungsi kejiwaan terutama minatd an motivasi sehingga menggangu seseorang dalam proses hidup dimasyarkaat (Nasir & Muhith 2011).

1.1 Rumusan Masalah

Berdasarkan pendahuluan diatas, maka peneliti merumuskan masalahnya sebagai berikut: Apakah ada pengaruh lingkungan Kerja terhadap Mental Health Illness?

1.2 Tujuan

Berdasarkan rumusan masalah diatas, maka tujuan penulisan untuk memprediksi faktor yang mempengaruhi mental illnes.

1.3 Metodologi

Adapun langkah-langkah kebenaran terhadap suatu fakta yang terkandung dalam data di projek ini adalah sebagai berikut:

1.4 Deskripsi Data

Data yang diguanaan dalam projek ini menggunakan data Mental Helath yang berada di kaggle

2. Persiapan Data

Pada bagian ini dilakukan suatu proses/langkah untuk membuat data mentah menjadi lebih mudah untuk dipahami dan berkualitas. Berikut ini adalah beberapa hal yang biasanya dilakukan pada tahap Persiapan Data (Data Preparation):

2.1 Packages

Berikut saya lampirkan beberapa packages yang digunakan dalam proses pengolahan data yang dilakukan dalam projek ini. Jika packages tersebut belum ter-install di PC anda, silahkan untuk melakukan instalasi sebelum anda mencoba mensumulasikan setiap program yang ada.

# load libraries
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(plyr)
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
## 
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
library(tidyverse)
## -- Attaching packages ------------------- tidyverse 1.3.0 --
## v tibble  3.0.1     v purrr   0.3.4
## v tidyr   1.1.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0
## -- Conflicts ---------------------- tidyverse_conflicts() --
## x plyr::arrange()   masks dplyr::arrange()
## x purrr::compact()  masks plyr::compact()
## x plyr::count()     masks dplyr::count()
## x plyr::failwith()  masks dplyr::failwith()
## x dplyr::filter()   masks stats::filter()
## x plyr::id()        masks dplyr::id()
## x dplyr::lag()      masks stats::lag()
## x plyr::mutate()    masks dplyr::mutate()
## x plyr::rename()    masks dplyr::rename()
## x plyr::summarise() masks dplyr::summarise()
## x plyr::summarize() masks dplyr::summarize()
library(plotly)
## 
## Attaching package: 'plotly'
## The following objects are masked from 'package:plyr':
## 
##     arrange, mutate, rename, summarise
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(tidytext)
library(knitr)
library(ggplot2)

2.2 Import Data

Dalam hal ini saya memberikan cara untuk import data.Import data dengan menggunakan fungsi read.csv():

Mental <- read.csv(file = "Mental.csv")
head(Mental)
##             Timestamp Age Gender        Country state self_employed
## 1 2014-08-27 11:29:31  37 Female  United States    IL          <NA>
## 2 2014-08-27 11:29:37  44      M  United States    IN          <NA>
## 3 2014-08-27 11:29:44  32   Male         Canada  <NA>          <NA>
## 4 2014-08-27 11:29:46  31   Male United Kingdom  <NA>          <NA>
## 5 2014-08-27 11:30:22  31   Male  United States    TX          <NA>
## 6 2014-08-27 11:31:22  33   Male  United States    TN          <NA>
##   family_history treatment work_interfere   no_employees remote_work
## 1             No       Yes          Often           6-25          No
## 2             No        No         Rarely More than 1000          No
## 3             No        No         Rarely           6-25          No
## 4            Yes       Yes          Often         26-100          No
## 5             No        No          Never        100-500         Yes
## 6            Yes        No      Sometimes           6-25          No
##   tech_company   benefits care_options wellness_program  seek_help  anonymity
## 1          Yes        Yes     Not sure               No        Yes        Yes
## 2           No Don't know           No       Don't know Don't know Don't know
## 3          Yes         No           No               No         No Don't know
## 4          Yes         No          Yes               No         No         No
## 5          Yes        Yes           No       Don't know Don't know Don't know
## 6          Yes        Yes     Not sure               No Don't know Don't know
##                leave mental_health_consequence phys_health_consequence
## 1      Somewhat easy                        No                      No
## 2         Don't know                     Maybe                      No
## 3 Somewhat difficult                        No                      No
## 4 Somewhat difficult                       Yes                     Yes
## 5         Don't know                        No                      No
## 6         Don't know                        No                      No
##      coworkers supervisor mental_health_interview phys_health_interview
## 1 Some of them        Yes                      No                 Maybe
## 2           No         No                      No                    No
## 3          Yes        Yes                     Yes                   Yes
## 4 Some of them         No                   Maybe                 Maybe
## 5 Some of them        Yes                     Yes                   Yes
## 6          Yes        Yes                      No                 Maybe
##   mental_vs_physical obs_consequence comments
## 1                Yes              No     <NA>
## 2         Don't know              No     <NA>
## 3                 No              No     <NA>
## 4                 No             Yes     <NA>
## 5         Don't know              No     <NA>
## 6         Don't know              No     <NA>

2.3 Struktur Data

Sebelum lebih jauh kearah analisa, ada baiknya kita meninjau struktur data set yang akan kita gunakan dengan cara berikut:

glimpse(Mental)
## Rows: 1,259
## Columns: 27
## $ Timestamp                 <chr> "2014-08-27 11:29:31", "2014-08-27 11:29:...
## $ Age                       <dbl> 37, 44, 32, 31, 31, 33, 35, 39, 42, 23, 3...
## $ Gender                    <chr> "Female", "M", "Male", "Male", "Male", "M...
## $ Country                   <chr> "United States", "United States", "Canada...
## $ state                     <chr> "IL", "IN", NA, NA, "TX", "TN", "MI", NA,...
## $ self_employed             <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ family_history            <chr> "No", "No", "No", "Yes", "No", "Yes", "Ye...
## $ treatment                 <chr> "Yes", "No", "No", "Yes", "No", "No", "Ye...
## $ work_interfere            <chr> "Often", "Rarely", "Rarely", "Often", "Ne...
## $ no_employees              <chr> "6-25", "More than 1000", "6-25", "26-100...
## $ remote_work               <chr> "No", "No", "No", "No", "Yes", "No", "Yes...
## $ tech_company              <chr> "Yes", "No", "Yes", "Yes", "Yes", "Yes", ...
## $ benefits                  <chr> "Yes", "Don't know", "No", "No", "Yes", "...
## $ care_options              <chr> "Not sure", "No", "No", "Yes", "No", "Not...
## $ wellness_program          <chr> "No", "Don't know", "No", "No", "Don't kn...
## $ seek_help                 <chr> "Yes", "Don't know", "No", "No", "Don't k...
## $ anonymity                 <chr> "Yes", "Don't know", "Don't know", "No", ...
## $ leave                     <chr> "Somewhat easy", "Don't know", "Somewhat ...
## $ mental_health_consequence <chr> "No", "Maybe", "No", "Yes", "No", "No", "...
## $ phys_health_consequence   <chr> "No", "No", "No", "Yes", "No", "No", "May...
## $ coworkers                 <chr> "Some of them", "No", "Yes", "Some of the...
## $ supervisor                <chr> "Yes", "No", "Yes", "No", "Yes", "Yes", "...
## $ mental_health_interview   <chr> "No", "No", "Yes", "Maybe", "Yes", "No", ...
## $ phys_health_interview     <chr> "Maybe", "No", "Yes", "Maybe", "Yes", "Ma...
## $ mental_vs_physical        <chr> "Yes", "Don't know", "No", "No", "Don't k...
## $ obs_consequence           <chr> "No", "No", "No", "Yes", "No", "No", "No"...
## $ comments                  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...

3. Pembersihan Data

Pembersihan Data (Data cleansing) atau yang disebut juga dengan data scrubbing merupakan suatu proses analisa mengenai kualitas dari data dengan mengubah. Bisa juga pengelola mengoreksi ataupun menghapus data tersebut. Data yang dibersihkan tersebut adalah data yang salah, rusak, tidak akurat, tidak lengkap dan salah format. Beberapa hal yang saya lakukan dalam projek ini adalah:

3.1 Menghapus Data NA

Melihat ukuran dataset dan plot nilai yang hilang, sepertinya kita dapat menghapus nilai yang hilang dan masih memiliki kumpulan data yang cukup besar untuk dikerjakan, jadi mari kita mulai dengan melakukan itu:

Mental <- na.omit(Mental)
dim(Mental)
## [1] 86 27
# First, re-value the variables of interest
Mental$no_employees <- as.factor(revalue(Mental$no_employees,
                                           c("1-5"="1", "6-25"="2", "26-100"="3", "100-500"="4", "500-1000"="5", "More than 1000"="6")))

Mental$treatment <- as.numeric(revalue(Mental$treatment,
                                           c("No"="0", "Yes"="1")))

# Gunakan grafik batang untuk `treatment` (treated mental health condition) pada `no_employees` (number of employees) dan `tech_company` (jenis perusahaan)
ggplot(Mental,aes(x=no_employees,y=treatment, fill=factor(tech_company)), color=factor(vs)) +  
  stat_summary(fun.y=mean,position=position_dodge(),geom="bar") + 
  labs(x = "Number of employees", y = "Probability of mental health condition", 
       title = "Probability of mental health illness by workplace type and size") +
  scale_x_discrete(labels=c("1" = "1-5", "2" = "6-25", "3" = "26-100", "4"="100-500", "5"="500-1000", "6"=">1000"))
## Warning: `fun.y` is deprecated. Use `fun` instead.

Bisakah kita menguji untuk melihat apakah no_employees dan tech_company adalah prediktor yang signifikan secara statistik untuk kondisi kesehatan mental?

# Menggunakan logistic regression 
Mental$tech_company <- as.factor(Mental$tech_company) 
# berubah menjadi variabel faktor
logit <- glm(treatment ~ tech_company + no_employees, data = Mental, family = "binomial")
summary(logit) 
## 
## Call:
## glm(formula = treatment ~ tech_company + no_employees, family = "binomial", 
##     data = Mental)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1932   0.1983   0.5543   0.6984   1.1774  
## 
## Coefficients:
##                 Estimate Std. Error z value Pr(>|z|)  
## (Intercept)       2.2273     1.2178   1.829   0.0674 .
## tech_companyYes  -1.6091     1.1151  -1.443   0.1490  
## no_employees2     0.9115     0.9548   0.955   0.3398  
## no_employees3     0.6685     0.7892   0.847   0.3969  
## no_employees4     1.6922     1.1848   1.428   0.1532  
## no_employees5    -0.6182     1.1430  -0.541   0.5886  
## no_employees6    -0.1350     0.7627  -0.177   0.8595  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 93.284  on 85  degrees of freedom
## Residual deviance: 84.382  on 79  degrees of freedom
## AIC: 98.382
## 
## Number of Fisher Scoring iterations: 5
# Estimasi dapat dieksponensial untuk memberi kita rasio odds
# Memberi kita rasio odds dan interval kepercayaan 95% 
exp(cbind(OR = coef(logit), confint(logit))) 
## Waiting for profiling to be done...
##                        OR      2.5 %     97.5 %
## (Intercept)     9.2746190 1.17656305 205.983851
## tech_companyYes 0.2000713 0.01016392   1.243415
## no_employees2   2.4879746 0.41683525  20.560870
## no_employees3   1.9513940 0.41499099   9.754965
## no_employees4   5.4312846 0.70115224 114.425345
## no_employees5   0.5389135 0.05052239   5.623884
## no_employees6   0.8737272 0.18855903   3.910407