1 Introduction LBB Project

In this LBB Project, we will analyze factors leading to employee attrition using fictional data set created by IBM data scientists provided to us from Kaggle.

The data used for our project will be the clean version derived from the above source which has been pre-processed by another source in Github, which we renamed as “ibm_attrition_data_clean.csv”

The goal is to predict as close to the ground truth whether the classification of employee attrition attrition is “yes” or “no” based on several factors contributing to its prediction.

2 Data Preparation

# Library Setup and Installation necessary packages
# data wrangling
library(dplyr)
# neural network
library(neuralnet) 
library(keras)
# cross-validation
library(rsample)
library(caret)
library(recipes)
library(tensorflow)
# set graphic theme
theme_set(theme_minimal())
options(scipen = 999)

2.1 Read Data

Before we proceed further, let us explore our dataset

# Read dataset
ibm_raw <- read.csv("data_input/ibm_attrition_data_clean.csv")
head(ibm_raw)  # Check dataset

We can also check on simple information containing in our dataset

# Quick overview of dataset
str(ibm_raw)
#> 'data.frame':    1470 obs. of  35 variables:
#>  $ attrition                 : chr  "yes" "no" "yes" "no" ...
#>  $ age                       : int  41 49 37 33 27 32 59 30 38 36 ...
#>  $ business_travel           : chr  "travel_rarely" "travel_frequently" "travel_rarely" "travel_frequently" ...
#>  $ daily_rate                : int  1102 279 1373 1392 591 1005 1324 1358 216 1299 ...
#>  $ department                : chr  "sales" "research_development" "research_development" "research_development" ...
#>  $ distance_from_home        : int  1 8 2 3 2 2 3 24 23 27 ...
#>  $ education                 : int  2 1 2 4 1 2 3 1 3 3 ...
#>  $ education_field           : chr  "life_sciences" "life_sciences" "other" "life_sciences" ...
#>  $ employee_count            : int  1 1 1 1 1 1 1 1 1 1 ...
#>  $ employee_number           : int  1 2 4 5 7 8 10 11 12 13 ...
#>  $ environment_satisfaction  : int  2 3 4 4 1 4 3 4 4 3 ...
#>  $ gender                    : chr  "female" "male" "male" "female" ...
#>  $ hourly_rate               : int  94 61 92 56 40 79 81 67 44 94 ...
#>  $ job_involvement           : int  3 2 2 3 3 3 4 3 2 3 ...
#>  $ job_level                 : int  2 2 1 1 1 1 1 1 3 2 ...
#>  $ job_role                  : chr  "sales_executive" "research_scientist" "laboratory_technician" "research_scientist" ...
#>  $ job_satisfaction          : int  4 2 3 3 2 4 1 3 3 3 ...
#>  $ marital_status            : chr  "single" "married" "single" "married" ...
#>  $ monthly_income            : int  5993 5130 2090 2909 3468 3068 2670 2693 9526 5237 ...
#>  $ monthly_rate              : int  19479 24907 2396 23159 16632 11864 9964 13335 8787 16577 ...
#>  $ num_companies_worked      : int  8 1 6 1 9 0 4 1 0 6 ...
#>  $ over_18                   : chr  "y" "y" "y" "y" ...
#>  $ over_time                 : chr  "yes" "no" "yes" "yes" ...
#>  $ percent_salary_hike       : int  11 23 15 11 12 13 20 22 21 13 ...
#>  $ performance_rating        : int  3 4 3 3 3 3 4 4 4 3 ...
#>  $ relationship_satisfaction : int  1 4 2 3 4 3 1 2 2 2 ...
#>  $ standard_hours            : int  80 80 80 80 80 80 80 80 80 80 ...
#>  $ stock_option_level        : int  0 1 0 0 1 0 3 1 0 2 ...
#>  $ total_working_years       : int  8 10 7 8 6 8 12 1 10 17 ...
#>  $ training_times_last_year  : int  0 3 3 3 3 2 3 2 2 3 ...
#>  $ work_life_balance         : int  1 3 3 3 3 2 2 3 3 2 ...
#>  $ years_at_company          : int  6 10 0 8 2 7 1 1 9 7 ...
#>  $ years_in_current_role     : int  4 7 0 7 2 7 0 0 7 7 ...
#>  $ years_since_last_promotion: int  0 1 0 3 2 3 0 0 1 7 ...
#>  $ years_with_curr_manager   : int  5 7 0 0 2 6 0 0 8 7 ...

Based on the information above, we can summarize that our dataset contains 35 columns with the target variable named attrition and the rest 34 columns is the contributing factors leading to status of our employee attrition: yes or no.

Our current dataset has two different datatype which is character and integer. From our observation, we can noted that all the character type of columns can be changed to factor type.

This is a classification case with 2 output ( attrition = yes/no )

2.2 Data Preprocessing

2.2.1 Remove Unnecessary Columns

We will remove two columns named employee_count and employee_number as those do not provide relevant information for further analysis

ibm_raw <- ibm_raw %>% 
  select(-c(employee_count, employee_number))

str(ibm_raw)
#> 'data.frame':    1470 obs. of  33 variables:
#>  $ attrition                 : chr  "yes" "no" "yes" "no" ...
#>  $ age                       : int  41 49 37 33 27 32 59 30 38 36 ...
#>  $ business_travel           : chr  "travel_rarely" "travel_frequently" "travel_rarely" "travel_frequently" ...
#>  $ daily_rate                : int  1102 279 1373 1392 591 1005 1324 1358 216 1299 ...
#>  $ department                : chr  "sales" "research_development" "research_development" "research_development" ...
#>  $ distance_from_home        : int  1 8 2 3 2 2 3 24 23 27 ...
#>  $ education                 : int  2 1 2 4 1 2 3 1 3 3 ...
#>  $ education_field           : chr  "life_sciences" "life_sciences" "other" "life_sciences" ...
#>  $ environment_satisfaction  : int  2 3 4 4 1 4 3 4 4 3 ...
#>  $ gender                    : chr  "female" "male" "male" "female" ...
#>  $ hourly_rate               : int  94 61 92 56 40 79 81 67 44 94 ...
#>  $ job_involvement           : int  3 2 2 3 3 3 4 3 2 3 ...
#>  $ job_level                 : int  2 2 1 1 1 1 1 1 3 2 ...
#>  $ job_role                  : chr  "sales_executive" "research_scientist" "laboratory_technician" "research_scientist" ...
#>  $ job_satisfaction          : int  4 2 3 3 2 4 1 3 3 3 ...
#>  $ marital_status            : chr  "single" "married" "single" "married" ...
#>  $ monthly_income            : int  5993 5130 2090 2909 3468 3068 2670 2693 9526 5237 ...
#>  $ monthly_rate              : int  19479 24907 2396 23159 16632 11864 9964 13335 8787 16577 ...
#>  $ num_companies_worked      : int  8 1 6 1 9 0 4 1 0 6 ...
#>  $ over_18                   : chr  "y" "y" "y" "y" ...
#>  $ over_time                 : chr  "yes" "no" "yes" "yes" ...
#>  $ percent_salary_hike       : int  11 23 15 11 12 13 20 22 21 13 ...
#>  $ performance_rating        : int  3 4 3 3 3 3 4 4 4 3 ...
#>  $ relationship_satisfaction : int  1 4 2 3 4 3 1 2 2 2 ...
#>  $ standard_hours            : int  80 80 80 80 80 80 80 80 80 80 ...
#>  $ stock_option_level        : int  0 1 0 0 1 0 3 1 0 2 ...
#>  $ total_working_years       : int  8 10 7 8 6 8 12 1 10 17 ...
#>  $ training_times_last_year  : int  0 3 3 3 3 2 3 2 2 3 ...
#>  $ work_life_balance         : int  1 3 3 3 3 2 2 3 3 2 ...
#>  $ years_at_company          : int  6 10 0 8 2 7 1 1 9 7 ...
#>  $ years_in_current_role     : int  4 7 0 7 2 7 0 0 7 7 ...
#>  $ years_since_last_promotion: int  0 1 0 3 2 3 0 0 1 7 ...
#>  $ years_with_curr_manager   : int  5 7 0 0 2 6 0 0 8 7 ...

2.2.2 Any Missing or Duplicated Values?

First, let us confirm whether our dataset has any null values and duplicated info

colSums(is.na(ibm_raw))
#>                  attrition                        age            business_travel                 daily_rate                 department         distance_from_home 
#>                          0                          0                          0                          0                          0                          0 
#>                  education            education_field   environment_satisfaction                     gender                hourly_rate            job_involvement 
#>                          0                          0                          0                          0                          0                          0 
#>                  job_level                   job_role           job_satisfaction             marital_status             monthly_income               monthly_rate 
#>                          0                          0                          0                          0                          0                          0 
#>       num_companies_worked                    over_18                  over_time        percent_salary_hike         performance_rating  relationship_satisfaction 
#>                          0                          0                          0                          0                          0                          0 
#>             standard_hours         stock_option_level        total_working_years   training_times_last_year          work_life_balance           years_at_company 
#>                          0                          0                          0                          0                          0                          0 
#>      years_in_current_role years_since_last_promotion    years_with_curr_manager 
#>                          0                          0                          0
sum(duplicated(ibm_raw))
#> [1] 0

There is neither missing values nor duplicated values in our dataset.

2.2.3 Train-Test Splitting

Let us split our prepared dataset into ratio of 80:20 for train:test dataset using stratified sampling so that the sampling

set.seed(100)

index <- initial_split(data = ibm_raw, # dataset used for training
                       prop = 0.8, # 80% for training dataset
                       strata = "attrition") 

Using library recipes, we will implemented the Pre-processing Data to prepare for further analysis :

ibm_clean <- recipe(attrition ~ .,
                    data = training(index)) %>% 
  step_nzv(all_predictors()) %>% 
  step_center(all_numeric()) %>%
  step_scale(all_numeric()) %>%
  step_dummy(all_nominal(), -attrition, one_hot = FALSE) %>%
  prep()

Here we will process the splitting of Training and Testing Data

ibm_train <- juice(ibm_clean)
ibm_test <- bake(ibm_clean, testing(index))

# Check the proportion table of Training Data
prop.table(table(ibm_train$attrition))
#> 
#>        no       yes 
#> 0.8391489 0.1608511
# Check the proportion table of Testing Data
prop.table(table(ibm_test$attrition))
#> 
#>        no       yes 
#> 0.8372881 0.1627119

Based on the above information, we noted that our training and testing dataset ibm_train and ibm_test respectively still maintain its balanced proportion with proportion of 84:16

Next, we will start the process of Model Building to use with Neural Network

2.2.4 Step 1 : Separation Target and Predictor Variables & Conversion to Matrix

train_x <- ibm_train %>% 
  select(-attrition) %>%  # predictor variables only in our training dataset
  data.matrix()  # change dataset into matrix type

train_y <- to_categorical(as.numeric(ibm_train$attrition) - 1)  # target variable

test_x <- ibm_test %>% 
  select(-attrition) %>%  # predictor variables only in our training dataset
  data.matrix()  # change dataset into matrix type

test_y <- to_categorical(as.numeric(ibm_test$attrition) - 1)   # target variable

3 Neural Network Architectural

As our dataset is a classification with 2 (two) output values, therefore it is a case of binary cross-entropy

3.1 Model Deep Neural Network with neuralnet function

# Building Neural Network with 2 hidden layer with 5 and 3 neurons
nn_ibm <- neuralnet(formula = attrition ~ .,
                    data = ibm_train,
                    hidden = c(5, 3),
                    err.fct = "ce",
                    act.fct = "logistic",
                    linear.output = FALSE
                    )

plot(nn_ibm)

3.1.1 Predicting the Output

pred_nn <- compute(x = nn_ibm, 
                   covariate = ibm_train)

pred_nn$net.result %>% head()
#>           [,1]       [,2]
#> [1,] 0.9733027 0.02669739
#> [2,] 0.9732972 0.02670290
#> [3,] 0.9732981 0.02670197
#> [4,] 0.9733026 0.02669749
#> [5,] 0.9733027 0.02669739
#> [6,] 0.9733027 0.02669739
# Convert probability into class
pred_nn_class <- ifelse(pred_nn$net.result > 0.5,
                        1, # if pred value > 0.5, then the class value is 1
                        0) # otherwise, the class value is 0

pred_nn_class %>% head()
#>      [,1] [,2]
#> [1,]    1    0
#> [2,]    1    0
#> [3,]    1    0
#> [4,]    1    0
#> [5,]    1    0
#> [6,]    1    0

3.2 Model NN with Keras

Let us first create object input_dim to store information of the number of columns from predictor variables and number of categories from target variables into object num_class

input_dim <- ncol(train_x)  # number of columns of predictor variables
num_class <- n_distinct(ibm_train$attrition) # number of target variables
input_dim
#> [1] 44
num_class
#> [1] 2

The input layer will be equals to the number of columns of predictor variables which we have defined above as input_dim

The output layer of our modeling is a binary classification with ONLY two output as “yes” or “no”, therefore the Loss Function used will be Binary Cross Entropy

As it is a binary classification case, the Activation Function used will be logistic / sigmoid

In summary, our Neural Network model will using the following fixed parameter:

  • input layer = 44 predictors
  • output layer = 2 neurons
  • activation: “sigmoid”
tensorflow::set_random_seed(100)

# Create architectural
model1 <- keras_model_sequential(name="model1") %>% 

  # First Hidden Layer
  layer_dense(input_shape = input_dim, # number of predictors
              units = input_dim, # number of nodes in the first hidden layer
              activation = "sigmoid", 
              name = "Hidden_layer") %>% 


  # Output layer
  layer_dense(units = num_class, 
              activation = "sigmoid", 
              name = "output")


model1
#> Model: "model1"
#> ________________________________________________________________________________________________________________________________________________________________________________________
#>  Layer (type)                                                                     Output Shape                                                              Param #                     
#> ========================================================================================================================================================================================
#>  Hidden_layer (Dense)                                                             (None, 44)                                                                1980                        
#>  output (Dense)                                                                   (None, 2)                                                                 90                          
#> ========================================================================================================================================================================================
#> Total params: 2,070
#> Trainable params: 2,070
#> Non-trainable params: 0
#> ________________________________________________________________________________________________________________________________________________________________________________________

3.2.1 Model Compilation

To compile the model, we will need to define the valuse of our error function, optimizer and evaluation metrics with compile() funtion.

In this project, the parameters used will be:

  • Error/Loss Function : Classification with two class target values, therefore it is a Binary Cross-Entropy function
  • Optimizer : Stochastic Gradient Descent
  • learning_rate : 0.5
  • metrics : accuracy because we would like to calculate how often the predictions equals to its “ground-truth” labels
model1 %>% 
  compile(loss = "binary_crossentropy",
          optimizer = optimizer_sgd(learning_rate = 0.5),
          metrics = "accuracy")

model1
#> Model: "model1"
#> ________________________________________________________________________________________________________________________________________________________________________________________
#>  Layer (type)                                                                     Output Shape                                                              Param #                     
#> ========================================================================================================================================================================================
#>  Hidden_layer (Dense)                                                             (None, 44)                                                                1980                        
#>  output (Dense)                                                                   (None, 2)                                                                 90                          
#> ========================================================================================================================================================================================
#> Total params: 2,070
#> Trainable params: 2,070
#> Non-trainable params: 0
#> ________________________________________________________________________________________________________________________________________________________________________________________

3.2.2 Model Fitting

Model Fitting using fit() function will have the following paramenter:

  • x: prediktor
  • y: target
  • epochs: number of iterations for training model
  • batch_size
  • validation_data: unseen data for metrics evalution (prediktor and target) while the model in training mode
  • verbose
nrow(train_x)
#> [1] 1175

Our training dataset has total of 1,175 number of rows, let us choose number of batch = 5 so that our batch size = 235 :

  • batch_size = 235
history <- model1 %>% fit(x = train_x,
                          y = train_y,
                          epochs = 10,
                          batch_size = 235,
                          validation_data = list(test_x, test_y),
                          verbose = 1
                          )
#> Epoch 1/10
#> 
1/5 [=====>........................] - ETA: 0s - loss: 0.9339 - accuracy: 0.1787
5/5 [==============================] - 0s 4ms/step - loss: 0.5469 - accuracy: 0.7106
#> 
5/5 [==============================] - 1s 166ms/step - loss: 0.5469 - accuracy: 0.7106 - val_loss: 0.4269 - val_accuracy: 0.8373
#> Epoch 2/10
#> 
1/5 [=====>........................] - ETA: 0s - loss: 0.4324 - accuracy: 0.8426
5/5 [==============================] - 0s 7ms/step - loss: 0.4308 - accuracy: 0.8391
#> 
5/5 [==============================] - 0s 41ms/step - loss: 0.4308 - accuracy: 0.8391 - val_loss: 0.4203 - val_accuracy: 0.8373
#> Epoch 3/10
#> 
1/5 [=====>........................] - ETA: 0s - loss: 0.4184 - accuracy: 0.8468
5/5 [==============================] - 0s 5ms/step - loss: 0.4257 - accuracy: 0.8391
#> 
5/5 [==============================] - 0s 38ms/step - loss: 0.4257 - accuracy: 0.8391 - val_loss: 0.4145 - val_accuracy: 0.8373
#> Epoch 4/10
#> 
1/5 [=====>........................] - ETA: 0s - loss: 0.4223 - accuracy: 0.8383
5/5 [==============================] - 0s 5ms/step - loss: 0.4199 - accuracy: 0.8391
#> 
5/5 [==============================] - 0s 38ms/step - loss: 0.4199 - accuracy: 0.8391 - val_loss: 0.4086 - val_accuracy: 0.8373
#> Epoch 5/10
#> 
1/5 [=====>........................] - ETA: 0s - loss: 0.4590 - accuracy: 0.8170
5/5 [==============================] - 0s 4ms/step - loss: 0.4160 - accuracy: 0.8391
#> 
5/5 [==============================] - 0s 37ms/step - loss: 0.4160 - accuracy: 0.8391 - val_loss: 0.4035 - val_accuracy: 0.8373
#> Epoch 6/10
#> 
1/5 [=====>........................] - ETA: 0s - loss: 0.4581 - accuracy: 0.8000
5/5 [==============================] - 0s 4ms/step - loss: 0.4123 - accuracy: 0.8391
#> 
5/5 [==============================] - 0s 36ms/step - loss: 0.4123 - accuracy: 0.8391 - val_loss: 0.3990 - val_accuracy: 0.8373
#> Epoch 7/10
#> 
1/5 [=====>........................] - ETA: 0s - loss: 0.3775 - accuracy: 0.8681
5/5 [==============================] - 0s 4ms/step - loss: 0.4086 - accuracy: 0.8391
#> 
5/5 [==============================] - 0s 36ms/step - loss: 0.4086 - accuracy: 0.8391 - val_loss: 0.3946 - val_accuracy: 0.8373
#> Epoch 8/10
#> 
1/5 [=====>........................] - ETA: 0s - loss: 0.4943 - accuracy: 0.7872
5/5 [==============================] - 0s 5ms/step - loss: 0.4048 - accuracy: 0.8391
#> 
5/5 [==============================] - 0s 38ms/step - loss: 0.4048 - accuracy: 0.8391 - val_loss: 0.3909 - val_accuracy: 0.8373
#> Epoch 9/10
#> 
1/5 [=====>........................] - ETA: 0s - loss: 0.4524 - accuracy: 0.8170
5/5 [==============================] - 0s 4ms/step - loss: 0.4005 - accuracy: 0.8391
#> 
5/5 [==============================] - 0s 37ms/step - loss: 0.4005 - accuracy: 0.8391 - val_loss: 0.3868 - val_accuracy: 0.8373
#> Epoch 10/10
#> 
1/5 [=====>........................] - ETA: 0s - loss: 0.4012 - accuracy: 0.8383
5/5 [==============================] - 0s 4ms/step - loss: 0.3973 - accuracy: 0.8391
#> 
5/5 [==============================] - 0s 37ms/step - loss: 0.3973 - accuracy: 0.8391 - val_loss: 0.3830 - val_accuracy: 0.8373
plot(history)

# Compute Accuracy difference between train data with test data/validation
(0.8391 - 0.8373) * 100
#> [1] 0.18

Based on the result above, our model at the beginning is overfitt but then it will reach quite optimal because the result generated has:

  • High Accuracy (> 83% ) for our train data and test data (validation)
  • Accuracy difference between train data (accuracy) with test data/validation (val_accuracy) = 0.18% < 20%

Our current model is already optimal

3.2.3 Model Evaluation Machine Learning

We will use predict() function to predict the result

model1_pred <- predict(model1,
                test_x) %>% 
  k_argmax() %>% 
  as.array() %>% 
  as.factor()
#> 10/10 - 0s - 52ms/epoch - 5ms/step
model1_pred %>% head()
#> [1] 0 0 0 0 0 0
#> Levels: 0

3.3 (Optional Model) Optimization Attempt

Let us create another model using different optimizer method with the following parameters tuning:

tensorflow::set_random_seed(8)

# Create architectural
model2 <- keras_model_sequential(name="model2") %>% 

  # First Hidden Layer
  layer_dense(input_shape = input_dim, # number of predictors
              units = input_dim, # number of nodes in the first hidden layer
              activation = "sigmoid", 
              name = "Hidden_layer") %>% 


  # Output layer
  layer_dense(units = num_class, 
              activation = "sigmoid", 
              name = "output")

model2 %>% 
  compile(loss = "binary_crossentropy",
          optimizer = optimizer_adam(learning_rate = 0.2),
          metrics = "accuracy")

model2
#> Model: "model2"
#> ________________________________________________________________________________________________________________________________________________________________________________________
#>  Layer (type)                                                                     Output Shape                                                              Param #                     
#> ========================================================================================================================================================================================
#>  Hidden_layer (Dense)                                                             (None, 44)                                                                1980                        
#>  output (Dense)                                                                   (None, 2)                                                                 90                          
#> ========================================================================================================================================================================================
#> Total params: 2,070
#> Trainable params: 2,070
#> Non-trainable params: 0
#> ________________________________________________________________________________________________________________________________________________________________________________________
history2 <- model2 %>% fit(x = train_x,
                          y = train_y,
                          epochs = 10,
                          batch_size = 200,
                          validation_data = list(test_x, test_y),
                          verbose = 1
                          )
#> Epoch 1/10
#> 
1/6 [====>.........................] - ETA: 1s - loss: 0.4732 - accuracy: 0.8300
6/6 [==============================] - 0s 12ms/step - loss: 0.5682 - accuracy: 0.8017
#> 
6/6 [==============================] - 1s 143ms/step - loss: 0.5682 - accuracy: 0.8017 - val_loss: 0.4109 - val_accuracy: 0.8373
#> Epoch 2/10
#> 
1/6 [====>.........................] - ETA: 0s - loss: 0.4342 - accuracy: 0.8450
6/6 [==============================] - 0s 8ms/step - loss: 0.4258 - accuracy: 0.8451
#> 
6/6 [==============================] - 0s 34ms/step - loss: 0.4258 - accuracy: 0.8451 - val_loss: 0.3736 - val_accuracy: 0.8373
#> Epoch 3/10
#> 
1/6 [====>.........................] - ETA: 0s - loss: 0.5125 - accuracy: 0.8050
6/6 [==============================] - 0s 6ms/step - loss: 0.3968 - accuracy: 0.8323
#> 
6/6 [==============================] - 0s 32ms/step - loss: 0.3968 - accuracy: 0.8323 - val_loss: 0.3759 - val_accuracy: 0.8339
#> Epoch 4/10
#> 
1/6 [====>.........................] - ETA: 0s - loss: 0.3050 - accuracy: 0.8700
6/6 [==============================] - 0s 6ms/step - loss: 0.3596 - accuracy: 0.8409
#> 
6/6 [==============================] - 0s 32ms/step - loss: 0.3596 - accuracy: 0.8409 - val_loss: 0.3594 - val_accuracy: 0.8305
#> Epoch 5/10
#> 
1/6 [====>.........................] - ETA: 0s - loss: 0.3825 - accuracy: 0.8150
6/6 [==============================] - 0s 6ms/step - loss: 0.3300 - accuracy: 0.8579
#> 
6/6 [==============================] - 0s 32ms/step - loss: 0.3300 - accuracy: 0.8579 - val_loss: 0.3498 - val_accuracy: 0.8407
#> Epoch 6/10
#> 
1/6 [====>.........................] - ETA: 0s - loss: 0.3232 - accuracy: 0.8600
6/6 [==============================] - 0s 5ms/step - loss: 0.2965 - accuracy: 0.8826
#> 
6/6 [==============================] - 0s 31ms/step - loss: 0.2965 - accuracy: 0.8826 - val_loss: 0.3312 - val_accuracy: 0.8678
#> Epoch 7/10
#> 
1/6 [====>.........................] - ETA: 0s - loss: 0.2825 - accuracy: 0.9100
6/6 [==============================] - 0s 6ms/step - loss: 0.2669 - accuracy: 0.9064
#> 
6/6 [==============================] - 0s 32ms/step - loss: 0.2669 - accuracy: 0.9064 - val_loss: 0.3303 - val_accuracy: 0.8712
#> Epoch 8/10
#> 
1/6 [====>.........................] - ETA: 0s - loss: 0.2013 - accuracy: 0.9200
6/6 [==============================] - 0s 6ms/step - loss: 0.2350 - accuracy: 0.9183
#> 
6/6 [==============================] - 0s 32ms/step - loss: 0.2350 - accuracy: 0.9183 - val_loss: 0.3240 - val_accuracy: 0.8712
#> Epoch 9/10
#> 
1/6 [====>.........................] - ETA: 0s - loss: 0.1915 - accuracy: 0.9400
6/6 [==============================] - 0s 7ms/step - loss: 0.2136 - accuracy: 0.9251
#> 
6/6 [==============================] - 0s 32ms/step - loss: 0.2136 - accuracy: 0.9251 - val_loss: 0.3438 - val_accuracy: 0.8780
#> Epoch 10/10
#> 
1/6 [====>.........................] - ETA: 0s - loss: 0.1626 - accuracy: 0.9400
6/6 [==============================] - 0s 6ms/step - loss: 0.1862 - accuracy: 0.9302
#> 
6/6 [==============================] - 0s 32ms/step - loss: 0.1862 - accuracy: 0.9302 - val_loss: 0.3478 - val_accuracy: 0.8746
plot(history2)

Using ADAM optimizer, the modelling is much worse and tend to be underfitting