Introduction

Classification is used for feature categorization, and only allows one output response for every input pattern as opposed to permitting various faults to occur with a specific set of operating parameters. The category that has the greatest output value is chosen by the classification network. When integrated with numerous forms of predictive neural networks in a hybrid system, classification neural networks become incredibly powerful.

This IBM HR Analytics Employee Attrition & Performance clean data set obtained from github and the raw data from Kaggle. We will predict attrition of IBM valuable employees by using Neural Network Classification. We will use two main library on building the model, neuralnet and keras.

Data Wrangling

Importing Libraries

library(dplyr)
library(keras)
library(neuralnet)
library(tensorflow)
library(rsample)
library(deepviz)
library(lime)
library(plotly)
library(tidyverse)
library(tidymodels)
library(deepviz)
library(themis)

Importing Dataset

data_clean <- read.csv("data/data-clean.csv")
rmarkdown::paged_table(data_clean)

We have 35 total column. The first one is attrition column as our target variable and the other are the predictor variables. We will not convert our data into factor or numeric because we will do hot encoding process.

Exploratory Data Analysis

Is our data balance or imbalance?

Before we start the analysis process, we should make sure the proportion of our data. Our target column is Attrition.

prop.table(table(data_clean$attrition))
## 
##        no       yes 
## 0.8387755 0.1612245

Our data is imbalance, then we need to downsampling our data.

Do we have missing value?

data_clean %>% is.na() %>% colSums()
##                  attrition                        age 
##                          0                          0 
##            business_travel                 daily_rate 
##                          0                          0 
##                 department         distance_from_home 
##                          0                          0 
##                  education            education_field 
##                          0                          0 
##             employee_count            employee_number 
##                          0                          0 
##   environment_satisfaction                     gender 
##                          0                          0 
##                hourly_rate            job_involvement 
##                          0                          0 
##                  job_level                   job_role 
##                          0                          0 
##           job_satisfaction             marital_status 
##                          0                          0 
##             monthly_income               monthly_rate 
##                          0                          0 
##       num_companies_worked                    over_18 
##                          0                          0 
##                  over_time        percent_salary_hike 
##                          0                          0 
##         performance_rating  relationship_satisfaction 
##                          0                          0 
##             standard_hours         stock_option_level 
##                          0                          0 
##        total_working_years   training_times_last_year 
##                          0                          0 
##          work_life_balance           years_at_company 
##                          0                          0 
##      years_in_current_role years_since_last_promotion 
##                          0                          0 
##    years_with_curr_manager 
##                          0

We don’t have any missing value in our data.

Cross Validation

At this stage, data splitting will be performed. The amount of training data is 80% of the total data. The splitting process will use stratified sampling based on the Attrition column. Stratified sampling is a sampling process whose proportions follow the proportions of the strata. This sampling technique is more suitable for imbalanced data.

# create initial split
set.seed(133)

test_split <- initial_split(data_clean, prop = 0.8, strata = "attrition")

# create validation split
set.seed(133)

val_split <- initial_split(training(test_split), prop = 0.8, strata = "attrition")

# quick check
test_split
## <Training/Testing/Total>
## <1175/295/1470>

Data Preprocessing

Before we build the NN architecture, we need to preprocess the data. In the first step of preprocessing, we will remove some columns we don’t need in our process, downsample our data because it is imbalanced, scale the numeric data, and convert all data into numerical data. We will use the recipe library we already call from the more extensive library, the tidyverse library. The second step of preprocessing, we will do hot encoding and convert our data into arrays.

Data preparation using Recipe

# define preprocess recipe from train dataset
rec <- recipe(attrition ~ ., data = training(val_split)) %>% 
  step_rm(employee_count, employee_number) %>%
  step_nzv(all_predictors()) %>% 
  step_string2factor(all_nominal(), -attrition) %>%
  step_string2factor(attrition, levels = c("yes", "no")) %>%
  step_downsample(attrition, under_ratio = 1, seed = 100) %>%
  step_center(all_numeric()) %>%
  step_scale(all_numeric()) %>%
  step_dummy(all_nominal(), -attrition, one_hot = FALSE) %>%
  prep(strings_as_factors = FALSE)

# get train-val-test dataset
data_train <- juice(rec)
data_val <- bake(rec, testing(val_split))
data_test <- bake(rec, testing(test_split))

# quick check
head(data_train, 10)
## # A tibble: 10 × 45
##        age daily_rate distance_from_home education environment_satisfaction
##      <dbl>      <dbl>              <dbl>     <dbl>                    <dbl>
##  1 -0.130     -0.188              -0.467    -1.84                    -0.520
##  2  0.371      0.305              -0.588     0.117                    1.21 
##  3  1.47       0.240              -0.831    -0.863                   -1.38 
##  4 -0.932      1.47                1.84      0.117                   -1.38 
##  5  0.571      1.48                0.262     0.117                   -0.520
##  6 -0.731      1.66               -0.588     1.10                     0.343
##  7  0.0703    -1.15               -0.103     0.117                    1.21 
##  8  1.07      -0.264              -0.103    -0.863                    0.343
##  9  0.171      0.0836             -0.467     1.10                     0.343
## 10 -1.53       0.245              -0.467     0.117                    1.21 
## # ℹ 40 more variables: hourly_rate <dbl>, job_involvement <dbl>,
## #   job_level <dbl>, job_satisfaction <dbl>, monthly_income <dbl>,
## #   monthly_rate <dbl>, num_companies_worked <dbl>, percent_salary_hike <dbl>,
## #   performance_rating <dbl>, relationship_satisfaction <dbl>,
## #   stock_option_level <dbl>, total_working_years <dbl>,
## #   training_times_last_year <dbl>, work_life_balance <dbl>,
## #   years_at_company <dbl>, years_in_current_role <dbl>, …

Hot encoding and convert data into arrays

Our target variable is attrition. We will do hot encoding process because we will classify the target variable. Not only hot encoding, but also we need to convert our data into arrays using data.matrix() function.

# prepare train arrays
train_y <- to_categorical(as.numeric(data_train$attrition) - 1)

train_x <- data_train %>% 
  select(-attrition) %>% 
  data.matrix()

# prepare validation arrays
val_y <- to_categorical(as.numeric(data_val$attrition) - 1)

val_x <- data_val %>% 
  select(-attrition) %>% 
  data.matrix()

# prepare test arrays
test_y <- to_categorical(as.numeric(data_test$attrition) - 1)

test_x <- data_test %>% 
  select(-attrition) %>% 
  data.matrix()

# quick check
dim(train_x)
## [1] 302  44
dim(train_y)
## [1] 302   2

Model & Evaluation

Model Deep NN using neuralnet

Build The Model

model1 <- neuralnet(formula = attrition ~ .,
                    data = data_train,
                    hidden = c(3, 2),
                    err.fct = "ce",
                    act.fct = "logistic",
                    linear.output = FALSE
                    )

plot(model1)

Predict

# predict on test
pred_1 <- as_tibble(predict(model1, test_x)) %>% 
  set_names(levels(data_train$attrition)) %>% 
  mutate(class = ifelse(yes > 0.5, "yes", "no")) %>% 
  mutate(class = factor(class, levels = levels(data_train$attrition))) %>% 
  set_names(paste0(".pred_", colnames(.)))

# combine with test dataset
pred_1 <- data_test %>% 
  select(attrition) %>% 
  bind_cols(pred_1)

# quick check
head(pred_1, 10)
## # A tibble: 10 × 4
##    attrition     .pred_yes .pred_no .pred_class
##    <fct>             <dbl>    <dbl> <fct>      
##  1 yes       0.0130          0.987  no         
##  2 yes       0.0000347       1.00   no         
##  3 no        0.937           0.0633 yes        
##  4 no        0.937           0.0633 yes        
##  5 yes       0.00000000683   1.00   no         
##  6 no        0.937           0.0632 yes        
##  7 no        0.164           0.838  no         
##  8 no        0.106           0.896  no         
##  9 yes       0.00000000683   1.00   no         
## 10 yes       0.0000000198    1.00   no

Confusion Matrix

# confusion matrix
pred_1 %>%
  conf_mat(attrition, .pred_class) %>%
  autoplot(type = "heatmap")

# metrics summary
metrics1 <- pred_1 %>%
  summarise(
    accuracy = accuracy_vec(attrition, .pred_class),
    sensitivity = sens_vec(attrition, .pred_class),
    specificity = spec_vec(attrition, .pred_class),
    precision = precision_vec(attrition, .pred_class)
  )
metrics1
## # A tibble: 1 × 4
##   accuracy sensitivity specificity precision
##      <dbl>       <dbl>       <dbl>     <dbl>
## 1    0.305        0.25       0.316    0.0663

Model using keras

Build The Model

set_random_seed(133)
model <- keras_model_sequential()

# define input
input <- layer_input(name = "input", shape = ncol(train_x))

# define hidden layers
hiddens <- input %>% 
  layer_dense(name = "dense_1", units = 64) %>% 
  layer_activation_leaky_relu(name = "dense_1_act") %>% 
  layer_batch_normalization(name = "dense_1_bn") %>% 
  layer_dropout(name = "dense_1_dp", rate = 0.15) %>% 
  layer_dense(name = "dense_2", units = 32) %>% 
  layer_activation_leaky_relu(name = "dense_2_act") %>% 
  layer_batch_normalization(name = "dense_2_bn") %>% 
  layer_dropout(name = "dense_2_dp", rate = 0.15)

# define output
output <- hiddens %>% 
  layer_dense(name = "output", units = ncol(train_y)) %>% 
  layer_batch_normalization(name = "output_bn") %>%
  layer_activation(name = "output_act", activation = "sigmoid")

# define full model
model2 <- keras_model(inputs = input, outputs = output)

# compile the model
model2 %>% compile(
  optimizer = optimizer_adamax(lr = 0.001),
  metrics = "accuracy",
  loss = "binary_crossentropy"
)
# model summary
summary(model2)
## Model: "model"
## ________________________________________________________________________________
##  Layer (type)                  Output Shape               Param #    Trainable  
## ================================================================================
##  input (InputLayer)            [(None, 44)]               0          Y          
##  dense_1 (Dense)               (None, 64)                 2880       Y          
##  dense_1_act (LeakyReLU)       (None, 64)                 0          Y          
##  dense_1_bn (BatchNormalizatio  (None, 64)                256        Y          
##  n)                                                                             
##  dense_1_dp (Dropout)          (None, 64)                 0          Y          
##  dense_2 (Dense)               (None, 32)                 2080       Y          
##  dense_2_act (LeakyReLU)       (None, 32)                 0          Y          
##  dense_2_bn (BatchNormalizatio  (None, 32)                128        Y          
##  n)                                                                             
##  dense_2_dp (Dropout)          (None, 32)                 0          Y          
##  output (Dense)                (None, 2)                  66         Y          
##  output_bn (BatchNormalization  (None, 2)                 8          Y          
##  )                                                                              
##  output_act (Activation)       (None, 2)                  0          Y          
## ================================================================================
## Total params: 5,418
## Trainable params: 5,222
## Non-trainable params: 196
## ________________________________________________________________________________
# visualize model
plot_model(model2)

Model Fitting

# fit the model
history <- model2 %>% fit(
  x = train_x,
  y = train_y,
  batch_size = 3000,
  epochs = 40,
  validation_data = list(val_x,val_y)
)
## Epoch 1/40
## 1/1 - 1s - loss: 0.8212 - accuracy: 0.4636 - val_loss: 0.5843 - val_accuracy: 0.7924 - 1s/epoch - 1s/step
## Epoch 2/40
## 1/1 - 0s - loss: 0.7864 - accuracy: 0.4901 - val_loss: 0.5793 - val_accuracy: 0.7966 - 43ms/epoch - 43ms/step
## Epoch 3/40
## 1/1 - 0s - loss: 0.7484 - accuracy: 0.5430 - val_loss: 0.5748 - val_accuracy: 0.8178 - 29ms/epoch - 29ms/step
## Epoch 4/40
## 1/1 - 0s - loss: 0.7386 - accuracy: 0.5894 - val_loss: 0.5705 - val_accuracy: 0.8263 - 28ms/epoch - 28ms/step
## Epoch 5/40
## 1/1 - 0s - loss: 0.7455 - accuracy: 0.5894 - val_loss: 0.5663 - val_accuracy: 0.8263 - 28ms/epoch - 28ms/step
## Epoch 6/40
## 1/1 - 0s - loss: 0.7277 - accuracy: 0.5993 - val_loss: 0.5628 - val_accuracy: 0.8093 - 29ms/epoch - 29ms/step
## Epoch 7/40
## 1/1 - 0s - loss: 0.7017 - accuracy: 0.6060 - val_loss: 0.5594 - val_accuracy: 0.8051 - 30ms/epoch - 30ms/step
## Epoch 8/40
## 1/1 - 0s - loss: 0.7026 - accuracy: 0.6258 - val_loss: 0.5566 - val_accuracy: 0.8051 - 27ms/epoch - 27ms/step
## Epoch 9/40
## 1/1 - 0s - loss: 0.6618 - accuracy: 0.6291 - val_loss: 0.5542 - val_accuracy: 0.8008 - 28ms/epoch - 28ms/step
## Epoch 10/40
## 1/1 - 0s - loss: 0.6406 - accuracy: 0.6589 - val_loss: 0.5521 - val_accuracy: 0.8008 - 30ms/epoch - 30ms/step
## Epoch 11/40
## 1/1 - 0s - loss: 0.6560 - accuracy: 0.6755 - val_loss: 0.5502 - val_accuracy: 0.7966 - 28ms/epoch - 28ms/step
## Epoch 12/40
## 1/1 - 0s - loss: 0.6603 - accuracy: 0.6722 - val_loss: 0.5482 - val_accuracy: 0.7924 - 29ms/epoch - 29ms/step
## Epoch 13/40
## 1/1 - 0s - loss: 0.6466 - accuracy: 0.6788 - val_loss: 0.5467 - val_accuracy: 0.7839 - 28ms/epoch - 28ms/step
## Epoch 14/40
## 1/1 - 0s - loss: 0.6388 - accuracy: 0.6523 - val_loss: 0.5455 - val_accuracy: 0.7712 - 28ms/epoch - 28ms/step
## Epoch 15/40
## 1/1 - 0s - loss: 0.6292 - accuracy: 0.6987 - val_loss: 0.5445 - val_accuracy: 0.7754 - 30ms/epoch - 30ms/step
## Epoch 16/40
## 1/1 - 0s - loss: 0.6186 - accuracy: 0.7020 - val_loss: 0.5438 - val_accuracy: 0.7754 - 28ms/epoch - 28ms/step
## Epoch 17/40
## 1/1 - 0s - loss: 0.6212 - accuracy: 0.7020 - val_loss: 0.5430 - val_accuracy: 0.7797 - 30ms/epoch - 30ms/step
## Epoch 18/40
## 1/1 - 0s - loss: 0.6052 - accuracy: 0.7152 - val_loss: 0.5424 - val_accuracy: 0.7712 - 28ms/epoch - 28ms/step
## Epoch 19/40
## 1/1 - 0s - loss: 0.6059 - accuracy: 0.7119 - val_loss: 0.5418 - val_accuracy: 0.7712 - 27ms/epoch - 27ms/step
## Epoch 20/40
## 1/1 - 0s - loss: 0.6093 - accuracy: 0.7152 - val_loss: 0.5414 - val_accuracy: 0.7627 - 29ms/epoch - 29ms/step
## Epoch 21/40
## 1/1 - 0s - loss: 0.5915 - accuracy: 0.7517 - val_loss: 0.5413 - val_accuracy: 0.7542 - 30ms/epoch - 30ms/step
## Epoch 22/40
## 1/1 - 0s - loss: 0.5973 - accuracy: 0.6887 - val_loss: 0.5410 - val_accuracy: 0.7500 - 28ms/epoch - 28ms/step
## Epoch 23/40
## 1/1 - 0s - loss: 0.5933 - accuracy: 0.7252 - val_loss: 0.5407 - val_accuracy: 0.7500 - 29ms/epoch - 29ms/step
## Epoch 24/40
## 1/1 - 0s - loss: 0.5951 - accuracy: 0.7119 - val_loss: 0.5404 - val_accuracy: 0.7500 - 28ms/epoch - 28ms/step
## Epoch 25/40
## 1/1 - 0s - loss: 0.5864 - accuracy: 0.7053 - val_loss: 0.5402 - val_accuracy: 0.7415 - 30ms/epoch - 30ms/step
## Epoch 26/40
## 1/1 - 0s - loss: 0.5928 - accuracy: 0.7351 - val_loss: 0.5401 - val_accuracy: 0.7415 - 28ms/epoch - 28ms/step
## Epoch 27/40
## 1/1 - 0s - loss: 0.5891 - accuracy: 0.7351 - val_loss: 0.5399 - val_accuracy: 0.7373 - 28ms/epoch - 28ms/step
## Epoch 28/40
## 1/1 - 0s - loss: 0.5860 - accuracy: 0.7252 - val_loss: 0.5399 - val_accuracy: 0.7373 - 29ms/epoch - 29ms/step
## Epoch 29/40
## 1/1 - 0s - loss: 0.5874 - accuracy: 0.7285 - val_loss: 0.5396 - val_accuracy: 0.7373 - 30ms/epoch - 30ms/step
## Epoch 30/40
## 1/1 - 0s - loss: 0.5796 - accuracy: 0.7351 - val_loss: 0.5394 - val_accuracy: 0.7415 - 35ms/epoch - 35ms/step
## Epoch 31/40
## 1/1 - 0s - loss: 0.5646 - accuracy: 0.7219 - val_loss: 0.5392 - val_accuracy: 0.7373 - 30ms/epoch - 30ms/step
## Epoch 32/40
## 1/1 - 0s - loss: 0.5651 - accuracy: 0.7351 - val_loss: 0.5391 - val_accuracy: 0.7415 - 29ms/epoch - 29ms/step
## Epoch 33/40
## 1/1 - 0s - loss: 0.5862 - accuracy: 0.7219 - val_loss: 0.5391 - val_accuracy: 0.7373 - 29ms/epoch - 29ms/step
## Epoch 34/40
## 1/1 - 0s - loss: 0.5586 - accuracy: 0.7384 - val_loss: 0.5391 - val_accuracy: 0.7373 - 29ms/epoch - 29ms/step
## Epoch 35/40
## 1/1 - 0s - loss: 0.5718 - accuracy: 0.7219 - val_loss: 0.5389 - val_accuracy: 0.7373 - 30ms/epoch - 30ms/step
## Epoch 36/40
## 1/1 - 0s - loss: 0.5519 - accuracy: 0.7550 - val_loss: 0.5389 - val_accuracy: 0.7458 - 27ms/epoch - 27ms/step
## Epoch 37/40
## 1/1 - 0s - loss: 0.5778 - accuracy: 0.7351 - val_loss: 0.5389 - val_accuracy: 0.7500 - 30ms/epoch - 30ms/step
## Epoch 38/40
## 1/1 - 0s - loss: 0.5558 - accuracy: 0.7649 - val_loss: 0.5391 - val_accuracy: 0.7500 - 28ms/epoch - 28ms/step
## Epoch 39/40
## 1/1 - 0s - loss: 0.5360 - accuracy: 0.7682 - val_loss: 0.5390 - val_accuracy: 0.7542 - 28ms/epoch - 28ms/step
## Epoch 40/40
## 1/1 - 0s - loss: 0.5538 - accuracy: 0.7384 - val_loss: 0.5391 - val_accuracy: 0.7500 - 29ms/epoch - 29ms/step
# plot history
plot(history)

Predict

# predict on test
pred_2 <- as_tibble(predict(model2, test_x)) %>% 
  set_names(levels(data_train$attrition)) %>% 
  mutate(class = ifelse(yes > 0.5, "yes", "no")) %>% 
  mutate(class = factor(class, levels = levels(data_train$attrition))) %>% 
  set_names(paste0(".pred_", colnames(.)))
## 10/10 - 0s - 116ms/epoch - 12ms/step
# combine with test dataset
pred_2 <- data_test %>% 
  select(attrition) %>% 
  bind_cols(pred_2)

# quick check
head(pred_2, 10)
## # A tibble: 10 × 4
##    attrition .pred_yes .pred_no .pred_class
##    <fct>         <dbl>    <dbl> <fct>      
##  1 yes           0.493    0.413 no         
##  2 yes           0.595    0.407 yes        
##  3 no            0.398    0.592 no         
##  4 no            0.524    0.499 yes        
##  5 yes           0.710    0.359 yes        
##  6 no            0.356    0.615 no         
##  7 no            0.373    0.465 no         
##  8 no            0.487    0.519 no         
##  9 yes           0.561    0.542 yes        
## 10 yes           0.605    0.381 yes

Confusion Matrix

# confusion matrix
pred_2 %>%
  conf_mat(attrition, .pred_class) %>%
  autoplot(type = "heatmap")

# metrics summary
metrics2 <- pred_2 %>%
  summarise(
    accuracy = accuracy_vec(attrition, .pred_class),
    sensitivity = sens_vec(attrition, .pred_class),
    specificity = spec_vec(attrition, .pred_class),
    precision = precision_vec(attrition, .pred_class)
  )
metrics2
## # A tibble: 1 × 4
##   accuracy sensitivity specificity precision
##      <dbl>       <dbl>       <dbl>     <dbl>
## 1    0.715       0.708       0.717     0.327

Conclusion

We should consider the following when selecting the optimal model for our neural network:
- Select the most basic model.
- Computation time
- model is not overfit or underfit, as we require the model to perform well on both train and test data.

bind_rows(list(metrics1, metrics2), .id = "Model")
## # A tibble: 2 × 5
##   Model accuracy sensitivity specificity precision
##   <chr>    <dbl>       <dbl>       <dbl>     <dbl>
## 1 1        0.305       0.25        0.316    0.0663
## 2 2        0.715       0.708       0.717    0.327

Based on the comparison above, we can conclude that model2, keras model is better than model1, neuralnet model.