PADK Tugas Individu 2 Praktikum UAS
Packages
## corrplot 0.92 loaded
## Warning: package 'pscl' was built under R version 4.3.2
## Classes and Methods for R developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University
## Simon Jackman
## hurdle and zeroinfl functions by Achim Zeileis
## Warning: package 'car' was built under R version 4.3.2
## Loading required package: carData
## Warning: package 'caret' was built under R version 4.3.2
## Loading required package: lattice
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Warning: package 'generalhoslem' was built under R version 4.3.2
## Loading required package: reshape
## Loading required package: MASS
## Warning: package 'fastDummies' was built under R version 4.3.2
## Thank you for using fastDummies!
## To acknowledge our work, please cite the package:
## Kaplan, J. & Schlegel, B. (2023). fastDummies: Fast Creation of Dummy (Binary) Columns and Rows from Categorical Variables. Version 1.7.1. URL: https://github.com/jacobkap/fastDummies, https://jacobkap.github.io/fastDummies/.
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:MASS':
##
## select
## The following object is masked from 'package:reshape':
##
## rename
## The following object is masked from 'package:car':
##
## recode
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Loading required package: viridisLite
## Warning: package 'tm' was built under R version 4.3.2
## Loading required package: NLP
##
## Attaching package: 'NLP'
## The following object is masked from 'package:ggplot2':
##
## annotate
## Warning: package 'wordcloud' was built under R version 4.3.2
## Loading required package: RColorBrewer
## Warning: package 'pROC' was built under R version 4.3.2
## Type 'citation("pROC")' for a citation.
##
## Attaching package: 'pROC'
## The following objects are masked from 'package:stats':
##
## cov, smooth, var
Input Data
Pada analisis ini, data yang digunakan berupa data sekunder yaitu data Turnover. Data terdiri dari 1129 amatan dan 7 peubah. Data sebagai berikut :
| Nama Peubah | Keterangan | Tipe Peubah |
|---|---|---|
| Event (Y) | Kejadian Turnover | Kategorik (Turnover=1, lainnya=0) |
| Gender | Jenis Kelamin | Kategorik (Male/Female) |
| Age | Usia | Numerik (18 sampai 58 tahun) |
| Industry | Bidang Pekerjaan | Kategorik (16 Level) |
| Profession | Profesi | Kategorik (15 level) |
| Greywage | Membayar Pajak/Tidak | Kategorik (White atau Grey) |
| Way | Transportasi Ke Kantor | Kategorik (Bus, Car dan Foot) |
Analisis ini hanya menggunakan 4 dari 7 peubah yaitu Event sebagai peubah respon (Y), dan Gender, Age, serta Way sebagai peubah prediktor ataupun dummy.
# Input Data
dt <- read_excel("D:/Kuliah S1 Statistika/Semester 5/PADK/Data_Tugas_UAS.xlsx")
head(dt)## # A tibble: 6 × 7
## event gender age industry profession greywage way
## <dbl> <chr> <dbl> <chr> <chr> <chr> <chr>
## 1 1 m 35 Banks HR white bus
## 2 1 m 33 Banks HR white bus
## 3 1 f 35 PowerGeneration HR white bus
## 4 1 f 35 PowerGeneration HR white bus
## 5 1 m 32 Retail Commercial white bus
## 6 1 f 42 manufacture HR white bus
## tibble [1,129 × 7] (S3: tbl_df/tbl/data.frame)
## $ event : num [1:1129] 1 1 1 1 1 1 1 1 1 1 ...
## $ gender : chr [1:1129] "m" "m" "f" "f" ...
## $ age : num [1:1129] 35 33 35 35 32 42 42 28 29 30 ...
## $ industry : chr [1:1129] "Banks" "Banks" "PowerGeneration" "PowerGeneration" ...
## $ profession: chr [1:1129] "HR" "HR" "HR" "HR" ...
## $ greywage : chr [1:1129] "white" "white" "white" "white" ...
## $ way : chr [1:1129] "bus" "bus" "bus" "bus" ...
## event gender age industry
## Min. :0.0000 Length:1129 Min. :18.00 Length:1129
## 1st Qu.:0.0000 Class :character 1st Qu.:26.00 Class :character
## Median :1.0000 Mode :character Median :30.00 Mode :character
## Mean :0.5058 Mean :31.07
## 3rd Qu.:1.0000 3rd Qu.:36.00
## Max. :1.0000 Max. :58.00
## profession greywage way
## Length:1129 Length:1129 Length:1129
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
Preposesing Data
Eksplorasi Data
Bar Plot
ggplot(dt, aes(x= as.character(event),fill=as.character(event))) +
geom_bar() +
theme_light() +
scale_fill_manual(values = c('yellow','green')) +
theme(legend.position = "NONE") + labs(x="Kejadian Turnover", y="Jumlah") +
geom_text(stat = "count", aes(label=..count..), vjust = -0.2, size = 4) +
ggtitle("Bar Plot Turnover Event")## Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(count)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Pemodelan
Pembentukan Variable Dummy
Menggunakan data Event sebagai peubah respon (Y), dan Gender, Age, serta Way sebagai peubah prediktor
data <- dt[,c(1,2,3,7)]
data$event <- as.factor(data$event)
data$gender <- as.factor(data$gender)
data$way <- as.factor(data$way)
head(data)## # A tibble: 6 × 4
## event gender age way
## <fct> <fct> <dbl> <fct>
## 1 1 m 35 bus
## 2 1 m 33 bus
## 3 1 f 35 bus
## 4 1 f 35 bus
## 5 1 m 32 bus
## 6 1 f 42 bus
Model Regresi Logistik
model_regresi_logistik= glm(event~gender+age+way, data=data, family = binomial(link="logit"))
summary(model_regresi_logistik)##
## Call:
## glm(formula = event ~ gender + age + way, family = binomial(link = "logit"),
## data = data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.54976 0.27559 1.995 0.04606 *
## genderm -0.09498 0.14058 -0.676 0.49927
## age -0.01476 0.00865 -1.706 0.08797 .
## waycar 0.06229 0.13631 0.457 0.64767
## wayfoot -0.62224 0.20684 -3.008 0.00263 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1565.0 on 1128 degrees of freedom
## Residual deviance: 1551.4 on 1124 degrees of freedom
## AIC: 1561.4
##
## Number of Fisher Scoring iterations: 4
Didapatkan hanya peubah wayfoot yang signifikan
Model regresi logistik : \[Logit[μ(x)]=0.54976-0.09498(genderm)-0.01476(age)+0.06229(waycar)-0.62224(wayfoot)\]
Uji Simultan Model
## $Models
##
## Model: "glm, event ~ gender + age + way, binomial(link = \"logit\"), data"
## Null: "glm, event ~ 1, binomial(link = \"logit\"), data"
##
## $Pseudo.R.squared.for.model.vs.null
## Pseudo.R.squared
## McFadden 0.0086783
## Cox and Snell (ML) 0.0119575
## Nagelkerke (Cragg and Uhler) 0.0159440
##
## $Likelihood.ratio.test
## Df.diff LogLik.diff Chisq p.value
## -4 -6.7907 13.581 0.0087584
##
## $Number.of.observations
##
## Model: 1129
## Null: 1129
##
## $Messages
## [1] "Note: For models fit with REML, these statistics are based on refitting with ML"
##
## $Warnings
## [1] "None"
Hipotesis: \[H_0:\beta_0=\beta_1=\beta_2=\beta_3=\beta_4=0\] \[H_1:\] ada satu \[\beta_i\ne0\] Hasil uji simultan menggunakan uji G menerangkan bahwa nilai p-value sebesar 0.0087584 yang lebih kecil dari alpha 0,05. Hal itu berarti tolak H0. Artinya, setidaknya ada satu peubah yang berpengaruh.
Uji Parsial
## Analysis of Deviance Table (Type II tests)
##
## Response: event
## Df Chisq Pr(>Chisq)
## gender 1 0.4565 0.499267
## age 1 2.9111 0.087971 .
## way 2 10.2808 0.005855 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Hasil uji simultan menerangkan hal yang sama yaitu adanya nilai p-value yang lebih kecil dari alpha 0,05 yaitu pada peubah way. Artinya peubah yang berpengaruh signifikan terhadap model adalah peubah way.