PADK Tugas Individu 2 Praktikum UAS

Packages

library(readxl)
library(corrplot)
## corrplot 0.92 loaded
library(ggplot2)
library(pscl)
## Warning: package 'pscl' was built under R version 4.3.2
## Classes and Methods for R developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University
## Simon Jackman
## hurdle and zeroinfl functions by Achim Zeileis
library(car)
## Warning: package 'car' was built under R version 4.3.2
## Loading required package: carData
library(rcompanion)
library(caret)
## Warning: package 'caret' was built under R version 4.3.2
## Loading required package: lattice
library(lmtest)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(generalhoslem)
## Warning: package 'generalhoslem' was built under R version 4.3.2
## Loading required package: reshape
## Loading required package: MASS
library(fastDummies)
## Warning: package 'fastDummies' was built under R version 4.3.2
## Thank you for using fastDummies!
## To acknowledge our work, please cite the package:
## Kaplan, J. & Schlegel, B. (2023). fastDummies: Fast Creation of Dummy (Binary) Columns and Rows from Categorical Variables. Version 1.7.1. URL: https://github.com/jacobkap/fastDummies, https://jacobkap.github.io/fastDummies/.
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:MASS':
## 
##     select
## The following object is masked from 'package:reshape':
## 
##     rename
## The following object is masked from 'package:car':
## 
##     recode
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(viridis)
## Loading required package: viridisLite
library(tm)
## Warning: package 'tm' was built under R version 4.3.2
## Loading required package: NLP
## 
## Attaching package: 'NLP'
## The following object is masked from 'package:ggplot2':
## 
##     annotate
library(wordcloud)
## Warning: package 'wordcloud' was built under R version 4.3.2
## Loading required package: RColorBrewer
library(RColorBrewer)
library(pROC)
## Warning: package 'pROC' was built under R version 4.3.2
## Type 'citation("pROC")' for a citation.
## 
## Attaching package: 'pROC'
## The following objects are masked from 'package:stats':
## 
##     cov, smooth, var

Input Data

Pada analisis ini, data yang digunakan berupa data sekunder yaitu data Turnover. Data terdiri dari 1129 amatan dan 7 peubah. Data sebagai berikut :

Nama Peubah Keterangan Tipe Peubah
Event (Y) Kejadian Turnover Kategorik (Turnover=1, lainnya=0)
Gender Jenis Kelamin Kategorik (Male/Female)
Age Usia Numerik (18 sampai 58 tahun)
Industry Bidang Pekerjaan Kategorik (16 Level)
Profession Profesi Kategorik (15 level)
Greywage Membayar Pajak/Tidak Kategorik (White atau Grey)
Way Transportasi Ke Kantor Kategorik (Bus, Car dan Foot)

Analisis ini hanya menggunakan 4 dari 7 peubah yaitu Event sebagai peubah respon (Y), dan Gender, Age, serta Way sebagai peubah prediktor ataupun dummy.

# Input Data
dt <- read_excel("D:/Kuliah S1 Statistika/Semester 5/PADK/Data_Tugas_UAS.xlsx")
head(dt)
## # A tibble: 6 × 7
##   event gender   age industry        profession greywage way  
##   <dbl> <chr>  <dbl> <chr>           <chr>      <chr>    <chr>
## 1     1 m         35 Banks           HR         white    bus  
## 2     1 m         33 Banks           HR         white    bus  
## 3     1 f         35 PowerGeneration HR         white    bus  
## 4     1 f         35 PowerGeneration HR         white    bus  
## 5     1 m         32 Retail          Commercial white    bus  
## 6     1 f         42 manufacture     HR         white    bus
str(dt)
## tibble [1,129 × 7] (S3: tbl_df/tbl/data.frame)
##  $ event     : num [1:1129] 1 1 1 1 1 1 1 1 1 1 ...
##  $ gender    : chr [1:1129] "m" "m" "f" "f" ...
##  $ age       : num [1:1129] 35 33 35 35 32 42 42 28 29 30 ...
##  $ industry  : chr [1:1129] "Banks" "Banks" "PowerGeneration" "PowerGeneration" ...
##  $ profession: chr [1:1129] "HR" "HR" "HR" "HR" ...
##  $ greywage  : chr [1:1129] "white" "white" "white" "white" ...
##  $ way       : chr [1:1129] "bus" "bus" "bus" "bus" ...
summary(dt)
##      event           gender               age          industry        
##  Min.   :0.0000   Length:1129        Min.   :18.00   Length:1129       
##  1st Qu.:0.0000   Class :character   1st Qu.:26.00   Class :character  
##  Median :1.0000   Mode  :character   Median :30.00   Mode  :character  
##  Mean   :0.5058                      Mean   :31.07                     
##  3rd Qu.:1.0000                      3rd Qu.:36.00                     
##  Max.   :1.0000                      Max.   :58.00                     
##   profession          greywage             way           
##  Length:1129        Length:1129        Length:1129       
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
## 

Preposesing Data

Cek Missing Value

dt[rowSums((is.na(dt))>0),]
## # A tibble: 0 × 7
## # ℹ 7 variables: event <dbl>, gender <chr>, age <dbl>, industry <chr>,
## #   profession <chr>, greywage <chr>, way <chr>

Tidak ada missing value

Eksplorasi Data

Bar Plot

ggplot(dt, aes(x= as.character(event),fill=as.character(event))) + 
  geom_bar() +
  theme_light() +
  scale_fill_manual(values = c('yellow','green')) +
  theme(legend.position = "NONE") + labs(x="Kejadian Turnover", y="Jumlah") + 
  geom_text(stat = "count", aes(label=..count..), vjust = -0.2, size = 4) + 
  ggtitle("Bar Plot Turnover Event")
## Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(count)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Pemodelan

Pembentukan Variable Dummy

Menggunakan data Event sebagai peubah respon (Y), dan Gender, Age, serta Way sebagai peubah prediktor

data <- dt[,c(1,2,3,7)]
data$event <- as.factor(data$event)
data$gender <- as.factor(data$gender)
data$way <- as.factor(data$way)
head(data)
## # A tibble: 6 × 4
##   event gender   age way  
##   <fct> <fct>  <dbl> <fct>
## 1 1     m         35 bus  
## 2 1     m         33 bus  
## 3 1     f         35 bus  
## 4 1     f         35 bus  
## 5 1     m         32 bus  
## 6 1     f         42 bus

Model Regresi Logistik

model_regresi_logistik= glm(event~gender+age+way, data=data, family = binomial(link="logit"))
summary(model_regresi_logistik)
## 
## Call:
## glm(formula = event ~ gender + age + way, family = binomial(link = "logit"), 
##     data = data)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept)  0.54976    0.27559   1.995  0.04606 * 
## genderm     -0.09498    0.14058  -0.676  0.49927   
## age         -0.01476    0.00865  -1.706  0.08797 . 
## waycar       0.06229    0.13631   0.457  0.64767   
## wayfoot     -0.62224    0.20684  -3.008  0.00263 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1565.0  on 1128  degrees of freedom
## Residual deviance: 1551.4  on 1124  degrees of freedom
## AIC: 1561.4
## 
## Number of Fisher Scoring iterations: 4

Didapatkan hanya peubah wayfoot yang signifikan

Model regresi logistik : \[Logit[μ(x)]=0.54976-0.09498(genderm)-0.01476(age)+0.06229(waycar)-0.62224(wayfoot)\]

Uji Simultan Model

nagelkerke(model_regresi_logistik)
## $Models
##                                                                           
## Model: "glm, event ~ gender + age + way, binomial(link = \"logit\"), data"
## Null:  "glm, event ~ 1, binomial(link = \"logit\"), data"                 
## 
## $Pseudo.R.squared.for.model.vs.null
##                              Pseudo.R.squared
## McFadden                            0.0086783
## Cox and Snell (ML)                  0.0119575
## Nagelkerke (Cragg and Uhler)        0.0159440
## 
## $Likelihood.ratio.test
##  Df.diff LogLik.diff  Chisq   p.value
##       -4     -6.7907 13.581 0.0087584
## 
## $Number.of.observations
##            
## Model: 1129
## Null:  1129
## 
## $Messages
## [1] "Note: For models fit with REML, these statistics are based on refitting with ML"
## 
## $Warnings
## [1] "None"

Hipotesis: \[H_0:\beta_0=\beta_1=\beta_2=\beta_3=\beta_4=0\] \[H_1:\] ada satu \[\beta_i\ne0\] Hasil uji simultan menggunakan uji G menerangkan bahwa nilai p-value sebesar 0.0087584 yang lebih kecil dari alpha 0,05. Hal itu berarti tolak H0. Artinya, setidaknya ada satu peubah yang berpengaruh.

Uji Parsial

Anova(model_regresi_logistik, type="II", test="Wald")
## Analysis of Deviance Table (Type II tests)
## 
## Response: event
##        Df   Chisq Pr(>Chisq)   
## gender  1  0.4565   0.499267   
## age     1  2.9111   0.087971 . 
## way     2 10.2808   0.005855 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Hasil uji simultan menerangkan hal yang sama yaitu adanya nilai p-value yang lebih kecil dari alpha 0,05 yaitu pada peubah way. Artinya peubah yang berpengaruh signifikan terhadap model adalah peubah way.