Deep learning is a framework that is often used when dealing with unstructured data. Therefore, let’s classify images using deep learning with a hard framework. We will use HR Analytics Employee Attrition & Performance data. We will try to find out whether employees will leave / resign with the data we have. Previously we will import the libbrary according to below;
Source: Kaggle
We’ll call data with the read.csv function below:
Attrition <- read.csv("WA_Fn-UseC_-HR-Employee-Attrition.csv")with head() function we’ll take a quick look at our data below;
head(Attrition)The data contains:
age : Age of employeeAttrition : Attrition of employee (Yes, No)BusinessTravel : Frequency of business travel
(Non-Travel, Travel_Rarely, Travel_Frequently)DailyRate : Amount of money a company has to pay
employee to work for them for a dayDepartment : Work Department (Research and Development,
Sales, Human Resources)DistanceFromHome : Distance between company and
homeEducation : Level of education (1: Below College, 2:
College, 3: Bachelor, 4: Master, 5: Doctor)EducationField : Field of Education (Life Sciences,
Medical, Human Resources, Technical Degree, Marketing, Other)EmployeeCount : Count of employee (always 1)EmployeeNumber : ID EmployeeEnvironmentSatisfaction: Satisfaction of environment
score(1: Low, 2: Medium, 3: High, 4: Very High)Gender : Gender (male/female)HourlyRate : Amount of money a company has to pay
employee to work for them for an hourJobInvolvement : Level of job involvement (1: Low, 2:
Medium, 3: High, 4: Very High)JobLevel : Level of job (1 - 5)JobRole : Role of job (Sales Executive, Research
Scientist, Laboratory Technician, Manager, Healthcare Representative,
Sales Representative, Manufacturing Director, Human Resources,
Manager)JobSatisfaction : Satisfaction of job (1: Low, 2:
Medium, 3: High, 4: Very High)MaritalStatus : Marital Status (Married, Single,
Divorced)MonthlyIncome : Monthly IncomeMonthlyRate : Percent of salary of hikeNumCompaniesWorked : Total number of companies have
been worked withOver18 : Employee age over 18 years old (Yes, No)OverTime : Frequently spent overtime working (Yes,
No)PercentSalaryHike : Percent of salary of hikePerformanceRating : Level of performance assessment (1:
Low, 2: Good, 3: Excellent, 4: Outstanding)RelationshipSatisfaction: Level of relationship
satisfaction (1: Low, 2: Medium, 3: High, 4: Very High)StandardHours : Standard work hours (always 80)StockOptionLevel : Stock option level (0 - 3)TotalWorkingYears : Years of total workingTrainingTimesLastYear : Training times of last
yearWorkLifeBalance : Level of work life balance (1: Bad,
2: Good, 3: Better, 4: Best)YearsAtCompany : Years at companyYearsInCurrentRole : Years in current roleYearsSinceLastPromotion: Years since last
promotionYearsWithCurrManager : Years with current managerWith the command glimpse () we will check whether the data is in accordance with the type or whether there is something that must be changed
glimpse(Attrition)#> Rows: 1,470
#> Columns: 35
#> $ Age <int> 41, 49, 37, 33, 27, 32, 59, 30, 38, 36, 35, 2…
#> $ Attrition <chr> "Yes", "No", "Yes", "No", "No", "No", "No", "…
#> $ BusinessTravel <chr> "Travel_Rarely", "Travel_Frequently", "Travel…
#> $ DailyRate <int> 1102, 279, 1373, 1392, 591, 1005, 1324, 1358,…
#> $ Department <chr> "Sales", "Research & Development", "Research …
#> $ DistanceFromHome <int> 1, 8, 2, 3, 2, 2, 3, 24, 23, 27, 16, 15, 26, …
#> $ Education <int> 2, 1, 2, 4, 1, 2, 3, 1, 3, 3, 3, 2, 1, 2, 3, …
#> $ EducationField <chr> "Life Sciences", "Life Sciences", "Other", "L…
#> $ EmployeeCount <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ EmployeeNumber <int> 1, 2, 4, 5, 7, 8, 10, 11, 12, 13, 14, 15, 16,…
#> $ EnvironmentSatisfaction <int> 2, 3, 4, 4, 1, 4, 3, 4, 4, 3, 1, 4, 1, 2, 3, …
#> $ Gender <chr> "Female", "Male", "Male", "Female", "Male", "…
#> $ HourlyRate <int> 94, 61, 92, 56, 40, 79, 81, 67, 44, 94, 84, 4…
#> $ JobInvolvement <int> 3, 2, 2, 3, 3, 3, 4, 3, 2, 3, 4, 2, 3, 3, 2, …
#> $ JobLevel <int> 2, 2, 1, 1, 1, 1, 1, 1, 3, 2, 1, 2, 1, 1, 1, …
#> $ JobRole <chr> "Sales Executive", "Research Scientist", "Lab…
#> $ JobSatisfaction <int> 4, 2, 3, 3, 2, 4, 1, 3, 3, 3, 2, 3, 3, 4, 3, …
#> $ MaritalStatus <chr> "Single", "Married", "Single", "Married", "Ma…
#> $ MonthlyIncome <int> 5993, 5130, 2090, 2909, 3468, 3068, 2670, 269…
#> $ MonthlyRate <int> 19479, 24907, 2396, 23159, 16632, 11864, 9964…
#> $ NumCompaniesWorked <int> 8, 1, 6, 1, 9, 0, 4, 1, 0, 6, 0, 0, 1, 0, 5, …
#> $ Over18 <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", …
#> $ OverTime <chr> "Yes", "No", "Yes", "Yes", "No", "No", "Yes",…
#> $ PercentSalaryHike <int> 11, 23, 15, 11, 12, 13, 20, 22, 21, 13, 13, 1…
#> $ PerformanceRating <int> 3, 4, 3, 3, 3, 3, 4, 4, 4, 3, 3, 3, 3, 3, 3, …
#> $ RelationshipSatisfaction <int> 1, 4, 2, 3, 4, 3, 1, 2, 2, 2, 3, 4, 4, 3, 2, …
#> $ StandardHours <int> 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 8…
#> $ StockOptionLevel <int> 0, 1, 0, 0, 1, 0, 3, 1, 0, 2, 1, 0, 1, 1, 0, …
#> $ TotalWorkingYears <int> 8, 10, 7, 8, 6, 8, 12, 1, 10, 17, 6, 10, 5, 3…
#> $ TrainingTimesLastYear <int> 0, 3, 3, 3, 3, 2, 3, 2, 2, 3, 5, 3, 1, 2, 4, …
#> $ WorkLifeBalance <int> 1, 3, 3, 3, 3, 2, 2, 3, 3, 2, 3, 3, 2, 3, 3, …
#> $ YearsAtCompany <int> 6, 10, 0, 8, 2, 7, 1, 1, 9, 7, 5, 9, 5, 2, 4,…
#> $ YearsInCurrentRole <int> 4, 7, 0, 7, 2, 7, 0, 0, 7, 7, 4, 5, 2, 2, 2, …
#> $ YearsSinceLastPromotion <int> 0, 1, 0, 3, 2, 3, 0, 0, 1, 7, 0, 0, 4, 1, 0, …
#> $ YearsWithCurrManager <int> 5, 7, 0, 0, 2, 6, 0, 0, 8, 7, 3, 8, 3, 2, 3, …
From the function above, we can get information that the data has
1.470 rows with 35 columns. Some of the data types that we will delete
include: EmployeeNumber ,over18,
EmployeeCount and StandardHours. For data that
has a character type, we will change it to a factor.
attrition_clean <- Attrition %>%
select(-EmployeeNumber, -Over18, -EmployeeCount, -StandardHours) %>%
mutate_if(.predicate = is.character, .funs = as.factor) %>% mutate(Attrition = if_else(Attrition == "Yes", 1, 0),
Attrition = as.factor(Attrition))head(attrition_clean)prop.table(table(attrition_clean$Attrition))#>
#> 0 1
#> 0.8387755 0.1612245
Next we will check if there are any missing values
colSums(is.na(attrition_clean))#> Age Attrition BusinessTravel
#> 0 0 0
#> DailyRate Department DistanceFromHome
#> 0 0 0
#> Education EducationField EnvironmentSatisfaction
#> 0 0 0
#> Gender HourlyRate JobInvolvement
#> 0 0 0
#> JobLevel JobRole JobSatisfaction
#> 0 0 0
#> MaritalStatus MonthlyIncome MonthlyRate
#> 0 0 0
#> NumCompaniesWorked OverTime PercentSalaryHike
#> 0 0 0
#> PerformanceRating RelationshipSatisfaction StockOptionLevel
#> 0 0 0
#> TotalWorkingYears TrainingTimesLastYear WorkLifeBalance
#> 0 0 0
#> YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion
#> 0 0 0
#> YearsWithCurrManager
#> 0
From the function above, we can see that there are no missing values
# upsampling
RNGkind(sample.kind = "Rounding")
set.seed(100)
library(caret)
attrition_up <- upSample(x = attrition_clean %>% select(-Attrition),
y = attrition_clean$Attrition,
yname = "Attrition")prop.table(table(attrition_up$Attrition))#>
#> 0 1
#> 0.5 0.5
clas target has balance
we will do one hot encoding, with the aim of changing the categorical data type to a dummy variable except coloumn target
attrition_ohe <- model.matrix(Attrition ~ .,attrition_up)We must delete (intercept) and additional column target
Attrition
attrition_ohe_clean1 <-
attrition_ohe %>%
as.data.frame() %>%
select(-'(Intercept)')attrition_ohe_clean1$Attrition <- attrition_up$AttritionAfter doing One Hot Encoding, there are several new column names that have *_, space* symbols. While the neuralnet model cannot accept a column name that has a symbol, therefore we must remove the symbol
colnames(attrition_ohe_clean1)#> [1] "Age" "BusinessTravelTravel_Frequently"
#> [3] "BusinessTravelTravel_Rarely" "DailyRate"
#> [5] "DepartmentResearch & Development" "DepartmentSales"
#> [7] "DistanceFromHome" "Education"
#> [9] "EducationFieldLife Sciences" "EducationFieldMarketing"
#> [11] "EducationFieldMedical" "EducationFieldOther"
#> [13] "EducationFieldTechnical Degree" "EnvironmentSatisfaction"
#> [15] "GenderMale" "HourlyRate"
#> [17] "JobInvolvement" "JobLevel"
#> [19] "JobRoleHuman Resources" "JobRoleLaboratory Technician"
#> [21] "JobRoleManager" "JobRoleManufacturing Director"
#> [23] "JobRoleResearch Director" "JobRoleResearch Scientist"
#> [25] "JobRoleSales Executive" "JobRoleSales Representative"
#> [27] "JobSatisfaction" "MaritalStatusMarried"
#> [29] "MaritalStatusSingle" "MonthlyIncome"
#> [31] "MonthlyRate" "NumCompaniesWorked"
#> [33] "OverTimeYes" "PercentSalaryHike"
#> [35] "PerformanceRating" "RelationshipSatisfaction"
#> [37] "StockOptionLevel" "TotalWorkingYears"
#> [39] "TrainingTimesLastYear" "WorkLifeBalance"
#> [41] "YearsAtCompany" "YearsInCurrentRole"
#> [43] "YearsSinceLastPromotion" "YearsWithCurrManager"
#> [45] "Attrition"
With prompt gsub we can remove space, & , _
colnames(attrition_ohe_clean1) <- gsub(pattern = "[ &_]", replacement = "", x = colnames(attrition_ohe_clean1))In this section, we will show some statistics on the distribution of
data for each variable in the attrition_clean dataset. By
using the summary() function, the following output is
produced:
summary(attrition_clean)#> Age Attrition BusinessTravel DailyRate
#> Min. :18.00 0:1233 Non-Travel : 150 Min. : 102.0
#> 1st Qu.:30.00 1: 237 Travel_Frequently: 277 1st Qu.: 465.0
#> Median :36.00 Travel_Rarely :1043 Median : 802.0
#> Mean :36.92 Mean : 802.5
#> 3rd Qu.:43.00 3rd Qu.:1157.0
#> Max. :60.00 Max. :1499.0
#>
#> Department DistanceFromHome Education
#> Human Resources : 63 Min. : 1.000 Min. :1.000
#> Research & Development:961 1st Qu.: 2.000 1st Qu.:2.000
#> Sales :446 Median : 7.000 Median :3.000
#> Mean : 9.193 Mean :2.913
#> 3rd Qu.:14.000 3rd Qu.:4.000
#> Max. :29.000 Max. :5.000
#>
#> EducationField EnvironmentSatisfaction Gender HourlyRate
#> Human Resources : 27 Min. :1.000 Female:588 Min. : 30.00
#> Life Sciences :606 1st Qu.:2.000 Male :882 1st Qu.: 48.00
#> Marketing :159 Median :3.000 Median : 66.00
#> Medical :464 Mean :2.722 Mean : 65.89
#> Other : 82 3rd Qu.:4.000 3rd Qu.: 83.75
#> Technical Degree:132 Max. :4.000 Max. :100.00
#>
#> JobInvolvement JobLevel JobRole JobSatisfaction
#> Min. :1.00 Min. :1.000 Sales Executive :326 Min. :1.000
#> 1st Qu.:2.00 1st Qu.:1.000 Research Scientist :292 1st Qu.:2.000
#> Median :3.00 Median :2.000 Laboratory Technician :259 Median :3.000
#> Mean :2.73 Mean :2.064 Manufacturing Director :145 Mean :2.729
#> 3rd Qu.:3.00 3rd Qu.:3.000 Healthcare Representative:131 3rd Qu.:4.000
#> Max. :4.00 Max. :5.000 Manager :102 Max. :4.000
#> (Other) :215
#> MaritalStatus MonthlyIncome MonthlyRate NumCompaniesWorked OverTime
#> Divorced:327 Min. : 1009 Min. : 2094 Min. :0.000 No :1054
#> Married :673 1st Qu.: 2911 1st Qu.: 8047 1st Qu.:1.000 Yes: 416
#> Single :470 Median : 4919 Median :14236 Median :2.000
#> Mean : 6503 Mean :14313 Mean :2.693
#> 3rd Qu.: 8379 3rd Qu.:20462 3rd Qu.:4.000
#> Max. :19999 Max. :26999 Max. :9.000
#>
#> PercentSalaryHike PerformanceRating RelationshipSatisfaction StockOptionLevel
#> Min. :11.00 Min. :3.000 Min. :1.000 Min. :0.0000
#> 1st Qu.:12.00 1st Qu.:3.000 1st Qu.:2.000 1st Qu.:0.0000
#> Median :14.00 Median :3.000 Median :3.000 Median :1.0000
#> Mean :15.21 Mean :3.154 Mean :2.712 Mean :0.7939
#> 3rd Qu.:18.00 3rd Qu.:3.000 3rd Qu.:4.000 3rd Qu.:1.0000
#> Max. :25.00 Max. :4.000 Max. :4.000 Max. :3.0000
#>
#> TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany
#> Min. : 0.00 Min. :0.000 Min. :1.000 Min. : 0.000
#> 1st Qu.: 6.00 1st Qu.:2.000 1st Qu.:2.000 1st Qu.: 3.000
#> Median :10.00 Median :3.000 Median :3.000 Median : 5.000
#> Mean :11.28 Mean :2.799 Mean :2.761 Mean : 7.008
#> 3rd Qu.:15.00 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.: 9.000
#> Max. :40.00 Max. :6.000 Max. :4.000 Max. :40.000
#>
#> YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
#> Min. : 0.000 Min. : 0.000 Min. : 0.000
#> 1st Qu.: 2.000 1st Qu.: 0.000 1st Qu.: 2.000
#> Median : 3.000 Median : 1.000 Median : 3.000
#> Mean : 4.229 Mean : 2.188 Mean : 4.123
#> 3rd Qu.: 7.000 3rd Qu.: 3.000 3rd Qu.: 7.000
#> Max. :18.000 Max. :15.000 Max. :17.000
#>
Insight:
age is almost sameattrition is NO with value 1233BusinessTravel most in data is
Travel_RarelyDailyRate for minimum and maximum is different
highDepartement count most is
Research & DevelopmentDistanceFromHome most far is 29kmEducation is bachelor degreeEducationField most is Life Sciences with
count 606Satisfaction of environment score mean is 2.722Gender male is dominan, but gap is not to
farHourlyRate rate mean is 65.89, but minimum hourly rate
is 3 more times than maximum hourly rateJob level involement mean is almost high
levelLevel job mean is 2Job role most count is Sales Executive with 326Job satisfaction mean almost high, almost same as
medianMartial status most is marriedMonthly income mean is 6503, but minimum monthly income
is 20 more times than maximum mounthly income.Monthly rate hike mean is 14313NumCompaniesWorked mean is 2.693overtime 1054PercentSalaryHike mean is 15.21PerformanceRating mean is 3.154RelationshipSatisfaction mean is 2.712 almost highStockOptionLevel mean is 0.79TotalWorkingYears maximum is 40, and mean is 11.28TrainingTimesLastYear mean is 2.79WorkLifeBalance mean is 2.76 that mena is almost better
(3)YearsAtCompany maximum 40th yearsYearsInCurrentRole mean 4.299YearsSinceLastPromotion maximum is 15 yearsYearsWithCurrManager maximum is 17 yearsFrom the preprocessing data we will explore what we get, including the correlation between variables and their predictors
attrition_BT <- attrition_clean %>% group_by(Attrition, BusinessTravel) %>% summarise(count= n())
ggplot(attrition_BT, mapping = aes(x = count, y = reorder(Attrition, count)))+
geom_col(aes(fill = BusinessTravel), position = "dodge")+
scale_fill_brewer(palette = "Blues")+
labs(
title = "Total Attrition in HR Analytics Employee Attrition & Performance",
subtitle = "Attrition Vs Business Travel",
x = "Count Attrition",
y = NULL,
fill = "Business Travel"
) + theme_light()From above plot Travel_rarely is most
attrition no and travel_rarely is
most attrition yes
attrition_dpt <- attrition_clean %>% group_by(Attrition, Department) %>% summarise(count= n())
ggplot(attrition_dpt, mapping = aes(x = count, y = reorder(Attrition, count)))+
geom_col(aes(fill = Department), position = "dodge")+
scale_fill_brewer(palette = "Blues")+
labs(
title = "Total Attrition in HR Analytics Employee Attrition & Performance",
subtitle = "Attrition Vs Department",
x = "Count Attrition",
y = NULL,
fill = "Department"
) + theme_light()From Above Plot, department
Research & development is most attrition
no and departement
Research & development is most attrition
yes
attrition_ef <- attrition_clean %>% group_by(Attrition, EducationField) %>% summarise(count= n())
ggplot(attrition_ef, mapping = aes(x = count, y = reorder(Attrition, count)))+
geom_col(aes(fill = EducationField), position = "dodge")+
scale_fill_brewer(palette = "Blues")+
labs(
title = "Total Attrition in HR Analytics Employee Attrition & Performance",
subtitle = "Attrition Vs Education Field",
x = "Count Attrition",
y = NULL,
fill = "Education Field"
) + theme_light()From Above Plot, education life sciencis is
most attrition no and education
life sciencis is most attrition
yes
attrition_jr <- attrition_clean %>% group_by(Attrition, JobRole) %>% summarise(count= n())
ggplot(attrition_jr, mapping = aes(x = count, y = reorder(Attrition, count)))+
geom_col(aes(fill = JobRole), position = "dodge")+
scale_fill_brewer(palette = "Blues")+
labs(
title = "Total Attrition in HR Analytics Employee Attrition & Performance",
subtitle = "Attrition Vs Job Role",
x = "Count Attrition",
y = NULL,
fill = "Job Role"
) + theme_light()From Above Plot, job role sales executive
is most attrition no and than
reaserch scientist and job role
laboratory technician is most attrition
yes
attrition_ms <- attrition_clean %>% group_by(Attrition, MaritalStatus) %>% summarise(count= n())
ggplot(attrition_ms, mapping = aes(x = count, y = reorder(Attrition, count)))+
geom_col(aes(fill = MaritalStatus), position = "dodge")+
scale_fill_brewer(palette = "Blues")+
labs(
title = "Total Attrition in HR Analytics Employee Attrition & Performance",
subtitle = "Attrition Vs Marital Status",
x = "Count Attrition",
y = NULL,
fill = "Marital Status"
) + theme_light()From Above Plot, martial status married
most attrition no and
martial status single most
attrition yes
attrition_ms <- attrition_clean %>% group_by(Attrition, OverTime) %>% summarise(count= n())
ggplot(attrition_ms, mapping = aes(x = count, y = reorder(Attrition, count)))+
geom_col(aes(fill = OverTime), position = "dodge")+
scale_fill_brewer(palette = "Blues")+
labs(
title = "Total Attrition in HR Analytics Employee Attrition & Performance",
subtitle = "Attrition Vs Over Time",
x = "Count Attrition",
y = NULL,
fill = "Over Time"
) + theme_light()From Above Plot, over time no most
attrition no and over time
yes most attrition yes
Perform cross validation using initial_split with a
proportion of 80% data for training data
set.seed(100)
# Spliting
attrition_split <- initial_split(data = attrition_ohe_clean1, # nama df
prop = 0.8, # proporsi untuk data train
strata = "Attrition") # nama label target untuk membuat label target balanced
# Data train
attrition_train <- training(attrition_split)
# Data validation
attrition_test <- testing(attrition_split)We will cek balance data train
prop.table(table(attrition_train$Attrition))#>
#> 0 1
#> 0.5 0.5
Separate the target-predictors, turn the data into a matrix
#prediktor
train_x <- attrition_train %>%
select(-Attrition) %>%
as.matrix()
test_x <- attrition_test %>%
select(-Attrition) %>%
as.matrix()
#target
train_y <- attrition_train %>%
select(Attrition) %>%
as.matrix()
test_y <- attrition_test %>%
select(Attrition) %>%
as.matrix()Keras framework accepts data in array form. So predictor
data in matrix form needs to be converted into array form using
array_reshape().
# prediktor
train_x_keras <- train_x %>%
array_reshape(dim = dim(train_x))
test_x_keras <- test_x %>%
array_reshape(dim = dim(test_x))Convert the target (categorical data) to a One Hot Encoding
variable using the to_categorical() function:
train_y_keras <- train_y %>%
to_categorical()
test_y_keras <- test_y %>%
to_categorical()The first step we do is build the architecture of our deep learning model. In the following, there are some provisions to help deep learning model architecture using Keras.
step 1. Always prefixed with
keras_model_sequential()
# keras initialization
model <- keras_model_sequential()step 2. Building each layer (Input, Hidden & Output)
# Untuk mengunci random
set.seed(100)
# Please type your answer
model %>%
layer_dense(input_shape = 44,
units = 256,
activation = "relu",
name = "H1") %>%
layer_dense(units = 128,
activation = "relu",
name = "H2") %>%
layer_dense(units = 32,
activation = "relu",
name = "H3") %>%
layer_dense(units = 2,
activation = "sigmoid",
name = "Out")
summary(model)#> Model: "sequential"
#> ________________________________________________________________________________
#> Layer (type) Output Shape Param #
#> ================================================================================
#> H1 (Dense) (None, 256) 11520
#> H2 (Dense) (None, 128) 32896
#> H3 (Dense) (None, 32) 4128
#> Out (Dense) (None, 2) 66
#> ================================================================================
#> Total params: 48,610
#> Trainable params: 48,610
#> Non-trainable params: 0
#> ________________________________________________________________________________
Insight:
Total params : weight of model 2.122Trainable params : parameters/connections whose weights
can change according to the training processNon-trainable params: weight/parameter values don’t
change or are locked in valueAt the stage of building the architecture, we have not provided an
error calculation that will be used. It is at this stage that we will
provide a way of calculating errors, using the compile()
function.
set.seed(100)
model %>%
compile(loss = "binary_crossentropy",
optimizer = optimizer_adam(learning_rate =0.001),
metrics = "accuracy")set.seed(100)
history <- model %>%
fit(x = train_x_keras,
y = train_y_keras,
batch_size = 5,
epochs = 15,
verbose = T,
test_data = list(test_x_keras, test_y_keras))Plotting Model:
plot(history)Make predictions on the test_x_keras data by using the
predict() function
# Please type your answer
set.seed(00)
predict_class <- predict(model, test_x_keras) %>%
k_argmax() %>%
as.array() %>%
as.factor()to see the prediction results
predict_class %>%
head()#> [1] 1 1 0 1 1 0
#> Levels: 0 1
Business Question: Predict whether the employee
submits attrition yes or no
# Please type your answer
caret::confusionMatrix(predict_class, as.factor(attrition_test$Attrition), positive = "0")#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction 0 1
#> 0 55 34
#> 1 192 213
#>
#> Accuracy : 0.5425
#> 95% CI : (0.4974, 0.5871)
#> No Information Rate : 0.5
#> P-Value [Acc > NIR] : 0.03249
#>
#> Kappa : 0.085
#>
#> Mcnemar's Test P-Value : < 0.0000000000000002
#>
#> Sensitivity : 0.2227
#> Specificity : 0.8623
#> Pos Pred Value : 0.6180
#> Neg Pred Value : 0.5259
#> Prevalence : 0.5000
#> Detection Rate : 0.1113
#> Detection Prevalence : 0.1802
#> Balanced Accuracy : 0.5425
#>
#> 'Positive' Class : 0
#>
From evaluation, we have matrix:
FN: The attrition prediction model is yes,
even though the actual attrition is no. Risk:
the company provides facilities/salary increase/position promotion.
FP: The attrition prediction model is no,
even though the actual attrition is yes. Risk:
the company loses employees.
The risk that is concerning is if an FP event occurs so take the Precision evaluation metric
The total predictors are 48, but we only use 44 because 4 predictors
have unique values that do not vary. In business travel,
travel rarely has the highest effect on
attrition yes and no. The
research & development department has the
highest influence on attrition no and
yes. In education field
life science has the highest influence on
attrition no and yes. The
sales representative job role has the highest
attrition no, the
laboratory technician job role has the highest
yes attrition. In martial status
married it has the highest influence on
attrition no, in single
martial status it has the highest influence on
attrition yes. In overtime
no has the highest attrition no,
in overtime yes has the highest
attrition yes. The hiden layer that we use
uses 4 layers. The accuracy of our model at epoch 15. The
results of our predictions are 0 and 1, where 0 is no and 1 is yes. From
the matrix that we use is precision
(Pos Pred Value)with the risk to the company is the loss of
employees