Penyakit Mental health merupakan salah satu dari empat masalah kesehatan umum di negara-negara maju, tetapi masih kurang populer di kalangan masyarakat awan. Dimasa lalu banyak orang menaggap mental health merupakan penyakit yang tidak dapat diobati (Hwari, 2001). gangguan jiwa adalah gangguan pada fungsi mental, yang meliputi emosi, pikiran, prilaku, motivasi daya titik diri dari persepsi yang menyebabkan penuruan semua fungsi kejiwaan terutama minatd an motivasi sehingga menggangu seseorang dalam proses hidup dimasyarkaat (Nasir & Muhith 2011).
Berdasarkan pendahuluan diatas, maka peneliti merumuskan masalahnya sebagai berikut: Apakah ada pengaruh lingkungan Kerja terhadap Mental Health Illness?
Berdasarkan rumusan masalah diatas, maka tujuan penulisan untuk memprediksi faktor yang mempengaruhi mental illnes.
Adapun langkah-langkah kebenaran terhadap suatu fakta yang terkandung dalam data di projek ini adalah sebagai berikut:
Data yang diguanaan dalam projek ini menggunakan data Mental Helath yang berada di kaggle
Pada bagian ini dilakukan suatu proses/langkah untuk membuat data mentah menjadi lebih mudah untuk dipahami dan berkualitas. Berikut ini adalah beberapa hal yang biasanya dilakukan pada tahap Persiapan Data (Data Preparation):
Berikut saya lampirkan beberapa packages yang digunakan dalam proses pengolahan data yang dilakukan dalam projek ini. Jika packages tersebut belum ter-install di PC anda, silahkan untuk melakukan instalasi sebelum anda mencoba mensumulasikan setiap program yang ada.
# load libraries
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(plyr)
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
##
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
library(tidyverse)
## -- Attaching packages ------------------- tidyverse 1.3.0 --
## v tibble 3.0.1 v purrr 0.3.4
## v tidyr 1.1.0 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## -- Conflicts ---------------------- tidyverse_conflicts() --
## x plyr::arrange() masks dplyr::arrange()
## x purrr::compact() masks plyr::compact()
## x plyr::count() masks dplyr::count()
## x plyr::failwith() masks dplyr::failwith()
## x dplyr::filter() masks stats::filter()
## x plyr::id() masks dplyr::id()
## x dplyr::lag() masks stats::lag()
## x plyr::mutate() masks dplyr::mutate()
## x plyr::rename() masks dplyr::rename()
## x plyr::summarise() masks dplyr::summarise()
## x plyr::summarize() masks dplyr::summarize()
library(plotly)
##
## Attaching package: 'plotly'
## The following objects are masked from 'package:plyr':
##
## arrange, mutate, rename, summarise
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(tidytext)
library(knitr)
library(ggplot2)
Dalam hal ini saya memberikan cara untuk import data.Import data dengan menggunakan fungsi read.csv():
Mental <- read.csv(file = "Mental.csv")
head(Mental)
## Timestamp Age Gender Country state self_employed
## 1 2014-08-27 11:29:31 37 Female United States IL <NA>
## 2 2014-08-27 11:29:37 44 M United States IN <NA>
## 3 2014-08-27 11:29:44 32 Male Canada <NA> <NA>
## 4 2014-08-27 11:29:46 31 Male United Kingdom <NA> <NA>
## 5 2014-08-27 11:30:22 31 Male United States TX <NA>
## 6 2014-08-27 11:31:22 33 Male United States TN <NA>
## family_history treatment work_interfere no_employees remote_work
## 1 No Yes Often 6-25 No
## 2 No No Rarely More than 1000 No
## 3 No No Rarely 6-25 No
## 4 Yes Yes Often 26-100 No
## 5 No No Never 100-500 Yes
## 6 Yes No Sometimes 6-25 No
## tech_company benefits care_options wellness_program seek_help anonymity
## 1 Yes Yes Not sure No Yes Yes
## 2 No Don't know No Don't know Don't know Don't know
## 3 Yes No No No No Don't know
## 4 Yes No Yes No No No
## 5 Yes Yes No Don't know Don't know Don't know
## 6 Yes Yes Not sure No Don't know Don't know
## leave mental_health_consequence phys_health_consequence
## 1 Somewhat easy No No
## 2 Don't know Maybe No
## 3 Somewhat difficult No No
## 4 Somewhat difficult Yes Yes
## 5 Don't know No No
## 6 Don't know No No
## coworkers supervisor mental_health_interview phys_health_interview
## 1 Some of them Yes No Maybe
## 2 No No No No
## 3 Yes Yes Yes Yes
## 4 Some of them No Maybe Maybe
## 5 Some of them Yes Yes Yes
## 6 Yes Yes No Maybe
## mental_vs_physical obs_consequence comments
## 1 Yes No <NA>
## 2 Don't know No <NA>
## 3 No No <NA>
## 4 No Yes <NA>
## 5 Don't know No <NA>
## 6 Don't know No <NA>
Sebelum lebih jauh kearah analisa, ada baiknya kita meninjau struktur data set yang akan kita gunakan dengan cara berikut:
glimpse(Mental)
## Rows: 1,259
## Columns: 27
## $ Timestamp <chr> "2014-08-27 11:29:31", "2014-08-27 11:29:...
## $ Age <dbl> 37, 44, 32, 31, 31, 33, 35, 39, 42, 23, 3...
## $ Gender <chr> "Female", "M", "Male", "Male", "Male", "M...
## $ Country <chr> "United States", "United States", "Canada...
## $ state <chr> "IL", "IN", NA, NA, "TX", "TN", "MI", NA,...
## $ self_employed <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ family_history <chr> "No", "No", "No", "Yes", "No", "Yes", "Ye...
## $ treatment <chr> "Yes", "No", "No", "Yes", "No", "No", "Ye...
## $ work_interfere <chr> "Often", "Rarely", "Rarely", "Often", "Ne...
## $ no_employees <chr> "6-25", "More than 1000", "6-25", "26-100...
## $ remote_work <chr> "No", "No", "No", "No", "Yes", "No", "Yes...
## $ tech_company <chr> "Yes", "No", "Yes", "Yes", "Yes", "Yes", ...
## $ benefits <chr> "Yes", "Don't know", "No", "No", "Yes", "...
## $ care_options <chr> "Not sure", "No", "No", "Yes", "No", "Not...
## $ wellness_program <chr> "No", "Don't know", "No", "No", "Don't kn...
## $ seek_help <chr> "Yes", "Don't know", "No", "No", "Don't k...
## $ anonymity <chr> "Yes", "Don't know", "Don't know", "No", ...
## $ leave <chr> "Somewhat easy", "Don't know", "Somewhat ...
## $ mental_health_consequence <chr> "No", "Maybe", "No", "Yes", "No", "No", "...
## $ phys_health_consequence <chr> "No", "No", "No", "Yes", "No", "No", "May...
## $ coworkers <chr> "Some of them", "No", "Yes", "Some of the...
## $ supervisor <chr> "Yes", "No", "Yes", "No", "Yes", "Yes", "...
## $ mental_health_interview <chr> "No", "No", "Yes", "Maybe", "Yes", "No", ...
## $ phys_health_interview <chr> "Maybe", "No", "Yes", "Maybe", "Yes", "Ma...
## $ mental_vs_physical <chr> "Yes", "Don't know", "No", "No", "Don't k...
## $ obs_consequence <chr> "No", "No", "No", "Yes", "No", "No", "No"...
## $ comments <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
Pembersihan Data (Data cleansing) atau yang disebut juga dengan data scrubbing merupakan suatu proses analisa mengenai kualitas dari data dengan mengubah. Bisa juga pengelola mengoreksi ataupun menghapus data tersebut. Data yang dibersihkan tersebut adalah data yang salah, rusak, tidak akurat, tidak lengkap dan salah format. Beberapa hal yang saya lakukan dalam projek ini adalah:
Melihat ukuran dataset dan plot nilai yang hilang, sepertinya kita dapat menghapus nilai yang hilang dan masih memiliki kumpulan data yang cukup besar untuk dikerjakan, jadi mari kita mulai dengan melakukan itu:
Mental <- na.omit(Mental)
dim(Mental)
## [1] 86 27
# First, re-value the variables of interest
Mental$no_employees <- as.factor(revalue(Mental$no_employees,
c("1-5"="1", "6-25"="2", "26-100"="3", "100-500"="4", "500-1000"="5", "More than 1000"="6")))
Mental$treatment <- as.numeric(revalue(Mental$treatment,
c("No"="0", "Yes"="1")))
# Gunakan grafik batang untuk `treatment` (treated mental health condition) pada `no_employees` (number of employees) dan `tech_company` (jenis perusahaan)
ggplot(Mental,aes(x=no_employees,y=treatment, fill=factor(tech_company)), color=factor(vs)) +
stat_summary(fun.y=mean,position=position_dodge(),geom="bar") +
labs(x = "Number of employees", y = "Probability of mental health condition",
title = "Probability of mental health illness by workplace type and size") +
scale_x_discrete(labels=c("1" = "1-5", "2" = "6-25", "3" = "26-100", "4"="100-500", "5"="500-1000", "6"=">1000"))
## Warning: `fun.y` is deprecated. Use `fun` instead.
Bisakah kita menguji untuk melihat apakah no_employees dan tech_company adalah prediktor yang signifikan secara statistik untuk kondisi kesehatan mental?
# Menggunakan logistic regression
Mental$tech_company <- as.factor(Mental$tech_company)
# berubah menjadi variabel faktor
logit <- glm(treatment ~ tech_company + no_employees, data = Mental, family = "binomial")
summary(logit)
##
## Call:
## glm(formula = treatment ~ tech_company + no_employees, family = "binomial",
## data = Mental)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.1932 0.1983 0.5543 0.6984 1.1774
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.2273 1.2178 1.829 0.0674 .
## tech_companyYes -1.6091 1.1151 -1.443 0.1490
## no_employees2 0.9115 0.9548 0.955 0.3398
## no_employees3 0.6685 0.7892 0.847 0.3969
## no_employees4 1.6922 1.1848 1.428 0.1532
## no_employees5 -0.6182 1.1430 -0.541 0.5886
## no_employees6 -0.1350 0.7627 -0.177 0.8595
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 93.284 on 85 degrees of freedom
## Residual deviance: 84.382 on 79 degrees of freedom
## AIC: 98.382
##
## Number of Fisher Scoring iterations: 5
# Estimasi dapat dieksponensial untuk memberi kita rasio odds
# Memberi kita rasio odds dan interval kepercayaan 95%
exp(cbind(OR = coef(logit), confint(logit)))
## Waiting for profiling to be done...
## OR 2.5 % 97.5 %
## (Intercept) 9.2746190 1.17656305 205.983851
## tech_companyYes 0.2000713 0.01016392 1.243415
## no_employees2 2.4879746 0.41683525 20.560870
## no_employees3 1.9513940 0.41499099 9.754965
## no_employees4 5.4312846 0.70115224 114.425345
## no_employees5 0.5389135 0.05052239 5.623884
## no_employees6 0.8737272 0.18855903 3.910407