Classification is used for feature categorization, and only allows one output response for every input pattern as opposed to permitting various faults to occur with a specific set of operating parameters. The category that has the greatest output value is chosen by the classification network. When integrated with numerous forms of predictive neural networks in a hybrid system, classification neural networks become incredibly powerful.
This IBM HR Analytics Employee Attrition & Performance clean data
set obtained from github
and the raw data from Kaggle.
We will predict attrition of IBM valuable employees by using Neural
Network Classification. We will use two main library on building the
model, neuralnet and keras.
library(dplyr)
library(keras)
library(neuralnet)
library(tensorflow)
library(rsample)
library(deepviz)
library(lime)
library(plotly)
library(tidyverse)
library(tidymodels)
library(deepviz)
library(themis)
data_clean <- read.csv("data/data-clean.csv")
rmarkdown::paged_table(data_clean)
We have 35 total column. The first one is attrition
column as our target variable and the other are the predictor variables.
We will not convert our data into factor or numeric because we will do
hot encoding process.
Before we start the analysis process, we should make sure the
proportion of our data. Our target column is Attrition.
prop.table(table(data_clean$attrition))
##
## no yes
## 0.8387755 0.1612245
Our data is imbalance, then we need to downsampling our data.
data_clean %>% is.na() %>% colSums()
## attrition age
## 0 0
## business_travel daily_rate
## 0 0
## department distance_from_home
## 0 0
## education education_field
## 0 0
## employee_count employee_number
## 0 0
## environment_satisfaction gender
## 0 0
## hourly_rate job_involvement
## 0 0
## job_level job_role
## 0 0
## job_satisfaction marital_status
## 0 0
## monthly_income monthly_rate
## 0 0
## num_companies_worked over_18
## 0 0
## over_time percent_salary_hike
## 0 0
## performance_rating relationship_satisfaction
## 0 0
## standard_hours stock_option_level
## 0 0
## total_working_years training_times_last_year
## 0 0
## work_life_balance years_at_company
## 0 0
## years_in_current_role years_since_last_promotion
## 0 0
## years_with_curr_manager
## 0
We don’t have any missing value in our data.
At this stage, data splitting will be performed. The amount of
training data is 80% of the total data. The splitting process will use
stratified sampling based on the Attrition column.
Stratified sampling is a sampling process whose proportions follow the
proportions of the strata. This sampling technique is more suitable for
imbalanced data.
# create initial split
set.seed(133)
test_split <- initial_split(data_clean, prop = 0.8, strata = "attrition")
# create validation split
set.seed(133)
val_split <- initial_split(training(test_split), prop = 0.8, strata = "attrition")
# quick check
test_split
## <Training/Testing/Total>
## <1175/295/1470>
Before we build the NN architecture, we need to preprocess the data.
In the first step of preprocessing, we will remove some columns we don’t
need in our process, downsample our data because it is imbalanced, scale
the numeric data, and convert all data into numerical data. We will use
the recipe library we already call from the more extensive
library, the tidyverse library. The second step of
preprocessing, we will do hot encoding and convert our data into
arrays.
Recipe# define preprocess recipe from train dataset
rec <- recipe(attrition ~ ., data = training(val_split)) %>%
step_rm(employee_count, employee_number) %>%
step_nzv(all_predictors()) %>%
step_string2factor(all_nominal(), -attrition) %>%
step_string2factor(attrition, levels = c("yes", "no")) %>%
step_downsample(attrition, under_ratio = 1, seed = 100) %>%
step_center(all_numeric()) %>%
step_scale(all_numeric()) %>%
step_dummy(all_nominal(), -attrition, one_hot = FALSE) %>%
prep(strings_as_factors = FALSE)
# get train-val-test dataset
data_train <- juice(rec)
data_val <- bake(rec, testing(val_split))
data_test <- bake(rec, testing(test_split))
# quick check
head(data_train, 10)
## # A tibble: 10 × 45
## age daily_rate distance_from_home education environment_satisfaction
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 -0.130 -0.188 -0.467 -1.84 -0.520
## 2 0.371 0.305 -0.588 0.117 1.21
## 3 1.47 0.240 -0.831 -0.863 -1.38
## 4 -0.932 1.47 1.84 0.117 -1.38
## 5 0.571 1.48 0.262 0.117 -0.520
## 6 -0.731 1.66 -0.588 1.10 0.343
## 7 0.0703 -1.15 -0.103 0.117 1.21
## 8 1.07 -0.264 -0.103 -0.863 0.343
## 9 0.171 0.0836 -0.467 1.10 0.343
## 10 -1.53 0.245 -0.467 0.117 1.21
## # ℹ 40 more variables: hourly_rate <dbl>, job_involvement <dbl>,
## # job_level <dbl>, job_satisfaction <dbl>, monthly_income <dbl>,
## # monthly_rate <dbl>, num_companies_worked <dbl>, percent_salary_hike <dbl>,
## # performance_rating <dbl>, relationship_satisfaction <dbl>,
## # stock_option_level <dbl>, total_working_years <dbl>,
## # training_times_last_year <dbl>, work_life_balance <dbl>,
## # years_at_company <dbl>, years_in_current_role <dbl>, …
Our target variable is attrition. We will do hot
encoding process because we will classify the target variable. Not only
hot encoding, but also we need to convert our data into arrays using
data.matrix() function.
# prepare train arrays
train_y <- to_categorical(as.numeric(data_train$attrition) - 1)
train_x <- data_train %>%
select(-attrition) %>%
data.matrix()
# prepare validation arrays
val_y <- to_categorical(as.numeric(data_val$attrition) - 1)
val_x <- data_val %>%
select(-attrition) %>%
data.matrix()
# prepare test arrays
test_y <- to_categorical(as.numeric(data_test$attrition) - 1)
test_x <- data_test %>%
select(-attrition) %>%
data.matrix()
# quick check
dim(train_x)
## [1] 302 44
dim(train_y)
## [1] 302 2
neuralnetmodel1 <- neuralnet(formula = attrition ~ .,
data = data_train,
hidden = c(3, 2),
err.fct = "ce",
act.fct = "logistic",
linear.output = FALSE
)
plot(model1)
# predict on test
pred_1 <- as_tibble(predict(model1, test_x)) %>%
set_names(levels(data_train$attrition)) %>%
mutate(class = ifelse(yes > 0.5, "yes", "no")) %>%
mutate(class = factor(class, levels = levels(data_train$attrition))) %>%
set_names(paste0(".pred_", colnames(.)))
# combine with test dataset
pred_1 <- data_test %>%
select(attrition) %>%
bind_cols(pred_1)
# quick check
head(pred_1, 10)
## # A tibble: 10 × 4
## attrition .pred_yes .pred_no .pred_class
## <fct> <dbl> <dbl> <fct>
## 1 yes 0.0130 0.987 no
## 2 yes 0.0000347 1.00 no
## 3 no 0.937 0.0633 yes
## 4 no 0.937 0.0633 yes
## 5 yes 0.00000000683 1.00 no
## 6 no 0.937 0.0632 yes
## 7 no 0.164 0.838 no
## 8 no 0.106 0.896 no
## 9 yes 0.00000000683 1.00 no
## 10 yes 0.0000000198 1.00 no
# confusion matrix
pred_1 %>%
conf_mat(attrition, .pred_class) %>%
autoplot(type = "heatmap")
# metrics summary
metrics1 <- pred_1 %>%
summarise(
accuracy = accuracy_vec(attrition, .pred_class),
sensitivity = sens_vec(attrition, .pred_class),
specificity = spec_vec(attrition, .pred_class),
precision = precision_vec(attrition, .pred_class)
)
metrics1
## # A tibble: 1 × 4
## accuracy sensitivity specificity precision
## <dbl> <dbl> <dbl> <dbl>
## 1 0.305 0.25 0.316 0.0663
set_random_seed(133)
model <- keras_model_sequential()
# define input
input <- layer_input(name = "input", shape = ncol(train_x))
# define hidden layers
hiddens <- input %>%
layer_dense(name = "dense_1", units = 64) %>%
layer_activation_leaky_relu(name = "dense_1_act") %>%
layer_batch_normalization(name = "dense_1_bn") %>%
layer_dropout(name = "dense_1_dp", rate = 0.15) %>%
layer_dense(name = "dense_2", units = 32) %>%
layer_activation_leaky_relu(name = "dense_2_act") %>%
layer_batch_normalization(name = "dense_2_bn") %>%
layer_dropout(name = "dense_2_dp", rate = 0.15)
# define output
output <- hiddens %>%
layer_dense(name = "output", units = ncol(train_y)) %>%
layer_batch_normalization(name = "output_bn") %>%
layer_activation(name = "output_act", activation = "sigmoid")
# define full model
model2 <- keras_model(inputs = input, outputs = output)
# compile the model
model2 %>% compile(
optimizer = optimizer_adamax(lr = 0.001),
metrics = "accuracy",
loss = "binary_crossentropy"
)
# model summary
summary(model2)
## Model: "model"
## ________________________________________________________________________________
## Layer (type) Output Shape Param # Trainable
## ================================================================================
## input (InputLayer) [(None, 44)] 0 Y
## dense_1 (Dense) (None, 64) 2880 Y
## dense_1_act (LeakyReLU) (None, 64) 0 Y
## dense_1_bn (BatchNormalizatio (None, 64) 256 Y
## n)
## dense_1_dp (Dropout) (None, 64) 0 Y
## dense_2 (Dense) (None, 32) 2080 Y
## dense_2_act (LeakyReLU) (None, 32) 0 Y
## dense_2_bn (BatchNormalizatio (None, 32) 128 Y
## n)
## dense_2_dp (Dropout) (None, 32) 0 Y
## output (Dense) (None, 2) 66 Y
## output_bn (BatchNormalization (None, 2) 8 Y
## )
## output_act (Activation) (None, 2) 0 Y
## ================================================================================
## Total params: 5,418
## Trainable params: 5,222
## Non-trainable params: 196
## ________________________________________________________________________________
# visualize model
plot_model(model2)
# fit the model
history <- model2 %>% fit(
x = train_x,
y = train_y,
batch_size = 3000,
epochs = 40,
validation_data = list(val_x,val_y)
)
## Epoch 1/40
## 1/1 - 1s - loss: 0.8212 - accuracy: 0.4636 - val_loss: 0.5843 - val_accuracy: 0.7924 - 1s/epoch - 1s/step
## Epoch 2/40
## 1/1 - 0s - loss: 0.7864 - accuracy: 0.4901 - val_loss: 0.5793 - val_accuracy: 0.7966 - 43ms/epoch - 43ms/step
## Epoch 3/40
## 1/1 - 0s - loss: 0.7484 - accuracy: 0.5430 - val_loss: 0.5748 - val_accuracy: 0.8178 - 29ms/epoch - 29ms/step
## Epoch 4/40
## 1/1 - 0s - loss: 0.7386 - accuracy: 0.5894 - val_loss: 0.5705 - val_accuracy: 0.8263 - 28ms/epoch - 28ms/step
## Epoch 5/40
## 1/1 - 0s - loss: 0.7455 - accuracy: 0.5894 - val_loss: 0.5663 - val_accuracy: 0.8263 - 28ms/epoch - 28ms/step
## Epoch 6/40
## 1/1 - 0s - loss: 0.7277 - accuracy: 0.5993 - val_loss: 0.5628 - val_accuracy: 0.8093 - 29ms/epoch - 29ms/step
## Epoch 7/40
## 1/1 - 0s - loss: 0.7017 - accuracy: 0.6060 - val_loss: 0.5594 - val_accuracy: 0.8051 - 30ms/epoch - 30ms/step
## Epoch 8/40
## 1/1 - 0s - loss: 0.7026 - accuracy: 0.6258 - val_loss: 0.5566 - val_accuracy: 0.8051 - 27ms/epoch - 27ms/step
## Epoch 9/40
## 1/1 - 0s - loss: 0.6618 - accuracy: 0.6291 - val_loss: 0.5542 - val_accuracy: 0.8008 - 28ms/epoch - 28ms/step
## Epoch 10/40
## 1/1 - 0s - loss: 0.6406 - accuracy: 0.6589 - val_loss: 0.5521 - val_accuracy: 0.8008 - 30ms/epoch - 30ms/step
## Epoch 11/40
## 1/1 - 0s - loss: 0.6560 - accuracy: 0.6755 - val_loss: 0.5502 - val_accuracy: 0.7966 - 28ms/epoch - 28ms/step
## Epoch 12/40
## 1/1 - 0s - loss: 0.6603 - accuracy: 0.6722 - val_loss: 0.5482 - val_accuracy: 0.7924 - 29ms/epoch - 29ms/step
## Epoch 13/40
## 1/1 - 0s - loss: 0.6466 - accuracy: 0.6788 - val_loss: 0.5467 - val_accuracy: 0.7839 - 28ms/epoch - 28ms/step
## Epoch 14/40
## 1/1 - 0s - loss: 0.6388 - accuracy: 0.6523 - val_loss: 0.5455 - val_accuracy: 0.7712 - 28ms/epoch - 28ms/step
## Epoch 15/40
## 1/1 - 0s - loss: 0.6292 - accuracy: 0.6987 - val_loss: 0.5445 - val_accuracy: 0.7754 - 30ms/epoch - 30ms/step
## Epoch 16/40
## 1/1 - 0s - loss: 0.6186 - accuracy: 0.7020 - val_loss: 0.5438 - val_accuracy: 0.7754 - 28ms/epoch - 28ms/step
## Epoch 17/40
## 1/1 - 0s - loss: 0.6212 - accuracy: 0.7020 - val_loss: 0.5430 - val_accuracy: 0.7797 - 30ms/epoch - 30ms/step
## Epoch 18/40
## 1/1 - 0s - loss: 0.6052 - accuracy: 0.7152 - val_loss: 0.5424 - val_accuracy: 0.7712 - 28ms/epoch - 28ms/step
## Epoch 19/40
## 1/1 - 0s - loss: 0.6059 - accuracy: 0.7119 - val_loss: 0.5418 - val_accuracy: 0.7712 - 27ms/epoch - 27ms/step
## Epoch 20/40
## 1/1 - 0s - loss: 0.6093 - accuracy: 0.7152 - val_loss: 0.5414 - val_accuracy: 0.7627 - 29ms/epoch - 29ms/step
## Epoch 21/40
## 1/1 - 0s - loss: 0.5915 - accuracy: 0.7517 - val_loss: 0.5413 - val_accuracy: 0.7542 - 30ms/epoch - 30ms/step
## Epoch 22/40
## 1/1 - 0s - loss: 0.5973 - accuracy: 0.6887 - val_loss: 0.5410 - val_accuracy: 0.7500 - 28ms/epoch - 28ms/step
## Epoch 23/40
## 1/1 - 0s - loss: 0.5933 - accuracy: 0.7252 - val_loss: 0.5407 - val_accuracy: 0.7500 - 29ms/epoch - 29ms/step
## Epoch 24/40
## 1/1 - 0s - loss: 0.5951 - accuracy: 0.7119 - val_loss: 0.5404 - val_accuracy: 0.7500 - 28ms/epoch - 28ms/step
## Epoch 25/40
## 1/1 - 0s - loss: 0.5864 - accuracy: 0.7053 - val_loss: 0.5402 - val_accuracy: 0.7415 - 30ms/epoch - 30ms/step
## Epoch 26/40
## 1/1 - 0s - loss: 0.5928 - accuracy: 0.7351 - val_loss: 0.5401 - val_accuracy: 0.7415 - 28ms/epoch - 28ms/step
## Epoch 27/40
## 1/1 - 0s - loss: 0.5891 - accuracy: 0.7351 - val_loss: 0.5399 - val_accuracy: 0.7373 - 28ms/epoch - 28ms/step
## Epoch 28/40
## 1/1 - 0s - loss: 0.5860 - accuracy: 0.7252 - val_loss: 0.5399 - val_accuracy: 0.7373 - 29ms/epoch - 29ms/step
## Epoch 29/40
## 1/1 - 0s - loss: 0.5874 - accuracy: 0.7285 - val_loss: 0.5396 - val_accuracy: 0.7373 - 30ms/epoch - 30ms/step
## Epoch 30/40
## 1/1 - 0s - loss: 0.5796 - accuracy: 0.7351 - val_loss: 0.5394 - val_accuracy: 0.7415 - 35ms/epoch - 35ms/step
## Epoch 31/40
## 1/1 - 0s - loss: 0.5646 - accuracy: 0.7219 - val_loss: 0.5392 - val_accuracy: 0.7373 - 30ms/epoch - 30ms/step
## Epoch 32/40
## 1/1 - 0s - loss: 0.5651 - accuracy: 0.7351 - val_loss: 0.5391 - val_accuracy: 0.7415 - 29ms/epoch - 29ms/step
## Epoch 33/40
## 1/1 - 0s - loss: 0.5862 - accuracy: 0.7219 - val_loss: 0.5391 - val_accuracy: 0.7373 - 29ms/epoch - 29ms/step
## Epoch 34/40
## 1/1 - 0s - loss: 0.5586 - accuracy: 0.7384 - val_loss: 0.5391 - val_accuracy: 0.7373 - 29ms/epoch - 29ms/step
## Epoch 35/40
## 1/1 - 0s - loss: 0.5718 - accuracy: 0.7219 - val_loss: 0.5389 - val_accuracy: 0.7373 - 30ms/epoch - 30ms/step
## Epoch 36/40
## 1/1 - 0s - loss: 0.5519 - accuracy: 0.7550 - val_loss: 0.5389 - val_accuracy: 0.7458 - 27ms/epoch - 27ms/step
## Epoch 37/40
## 1/1 - 0s - loss: 0.5778 - accuracy: 0.7351 - val_loss: 0.5389 - val_accuracy: 0.7500 - 30ms/epoch - 30ms/step
## Epoch 38/40
## 1/1 - 0s - loss: 0.5558 - accuracy: 0.7649 - val_loss: 0.5391 - val_accuracy: 0.7500 - 28ms/epoch - 28ms/step
## Epoch 39/40
## 1/1 - 0s - loss: 0.5360 - accuracy: 0.7682 - val_loss: 0.5390 - val_accuracy: 0.7542 - 28ms/epoch - 28ms/step
## Epoch 40/40
## 1/1 - 0s - loss: 0.5538 - accuracy: 0.7384 - val_loss: 0.5391 - val_accuracy: 0.7500 - 29ms/epoch - 29ms/step
# plot history
plot(history)
# predict on test
pred_2 <- as_tibble(predict(model2, test_x)) %>%
set_names(levels(data_train$attrition)) %>%
mutate(class = ifelse(yes > 0.5, "yes", "no")) %>%
mutate(class = factor(class, levels = levels(data_train$attrition))) %>%
set_names(paste0(".pred_", colnames(.)))
## 10/10 - 0s - 116ms/epoch - 12ms/step
# combine with test dataset
pred_2 <- data_test %>%
select(attrition) %>%
bind_cols(pred_2)
# quick check
head(pred_2, 10)
## # A tibble: 10 × 4
## attrition .pred_yes .pred_no .pred_class
## <fct> <dbl> <dbl> <fct>
## 1 yes 0.493 0.413 no
## 2 yes 0.595 0.407 yes
## 3 no 0.398 0.592 no
## 4 no 0.524 0.499 yes
## 5 yes 0.710 0.359 yes
## 6 no 0.356 0.615 no
## 7 no 0.373 0.465 no
## 8 no 0.487 0.519 no
## 9 yes 0.561 0.542 yes
## 10 yes 0.605 0.381 yes
# confusion matrix
pred_2 %>%
conf_mat(attrition, .pred_class) %>%
autoplot(type = "heatmap")
# metrics summary
metrics2 <- pred_2 %>%
summarise(
accuracy = accuracy_vec(attrition, .pred_class),
sensitivity = sens_vec(attrition, .pred_class),
specificity = spec_vec(attrition, .pred_class),
precision = precision_vec(attrition, .pred_class)
)
metrics2
## # A tibble: 1 × 4
## accuracy sensitivity specificity precision
## <dbl> <dbl> <dbl> <dbl>
## 1 0.715 0.708 0.717 0.327
We should consider the following when selecting the optimal model for
our neural network:
- Select the most basic model.
- Computation time
- model is not overfit or underfit, as we require the model to perform
well on both train and test data.
bind_rows(list(metrics1, metrics2), .id = "Model")
## # A tibble: 2 × 5
## Model accuracy sensitivity specificity precision
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1 0.305 0.25 0.316 0.0663
## 2 2 0.715 0.708 0.717 0.327
Based on the comparison above, we can conclude that model2,
keras model is better than model1, neuralnet
model.