1 Background

This is a learn by building project to classify whether kyphosis is present or absent using Neural Network Model Analysis.

Kyphosis is a medical condition that causes a forward curving of the back.

Kyphosis is an exaggerated, forward rounding of the back. It can occur at any age but is most common in older women.

Age-related kyphosis is often due to weakness in the spinal bones that causes them to compress or crack. Other types of kyphosis can appear in infants or teens due to malformation of the spine or wedging of the spinal bones over time.

Mild kyphosis causes few problems. Severe kyphosis can cause pain and be disfiguring. Treatment for kyphosis depends on your age, and the cause and effects of the curvature.

https://www.mayoclinic.org/diseases-conditions/kyphosis/symptoms-causes/syc-20374205

2 Source of Dataset

The analysis will use dataset from https://www.kaggle.com/abbasit/kyphosis-dataset.

The data will be splitted into training and testing dataset.

Training dataset will be used to build machine learning models.

Testing dataset will be used to see how well machine learning models perform on unseen data.

3 Initialization Library

library(tidyverse)
library(keras)
library(caret)
library(ggplot2)

4 Importing Dataset

First, we will read a kyphosis dataset.

df <- read.csv("kyphosis.csv")
str(df)

## 'data.frame':    81 obs. of  4 variables:
##  $ Kyphosis: Factor w/ 2 levels "absent","present": 1 1 2 1 1 1 1 1 1 2 ...
##  $ Age     : int  71 158 128 2 1 1 61 37 113 59 ...
##  $ Number  : int  3 3 4 5 4 2 2 3 2 6 ...
##  $ Start   : int  5 14 5 1 15 16 17 16 16 12 ...

head(df)

5 Data Wrangling

The model will be developed using variables as follows:

Target variable: Kyphosis (“1” = present, “0” = absent)
Predictor variable: Age, Number, Start

df$Kyphosis <- ifelse(df$Kyphosis == "present", "1", "0")
df$Kyphosis <- as.factor(df$Kyphosis)
df

We will use df data frame for further cross validation.

6 Cross Validation

First, we will create the index that we shall use to split the data into a training and testing dataset. We will use 70% of the data on training and the other 30% for testing dataset.

index <- createDataPartition(df$Kyphosis, p=0.7, list=FALSE)
train_m <- df[index,]
test_m <- df[-index,]

6.1 Prepare Predictors & Target

We will separate the predictors and the target in our train_m and test_m data.

# Predictor variables in `train_m`
train_x <-  train_m[,-1]

# Predictor variables in `test_m`
test_x <- test_m[,-1]

# Target variables in `train_m`
train_y <- train_m[,1]

# Target variables in `test_m`
test_y <- test_m[,1]

6.2 Feature Scaling

We will scale the value of the predictors and the target using scale function.

train_x.keras <- scale(train_x)
test_x.keras <- scale(test_x)

To prepare the data for the training model, we apply one-hot encoding to the target variable (train_y) using to_categorical() function from Keras and stored it as train_y.keras object.

train_y.keras <- to_categorical(train_y)

7 Building Convolutional Neural Network Model

We’ll use the keras_model_sequential() to initialize the model. Then, we’ll define dense layers using the popular relu activation function.

We’ll add the output layer with the sigmoid activation function.

We’ll compile the model using the binary_crossentropy loss function due to solving a binary classification problem.

We’ll use the adam optimizer for gradient descent and use accuracy for the metrics.

We’ll fit our model to the training and testing dataset.

Our first model will run on 50 epochs using a batch size of 10, learning rate 0.001, and a 30% validation split.

Before building the architecture, we will set the initializer to make sure the result will not change.

Our first model, stored it as model_first, by defining the nodes as follows:
- the first layer contains 3 nodes, relu activation function
- the second layer contains 3 nodes, relu activation function
- the third layer contains 2 nodes, sigmoid activation function

set.seed(100)
initializer <- initializer_random_normal(seed = 100)

model_first <- keras_model_sequential() 

model_first %>% 
  layer_dense(units = 3, activation = 'relu', input_shape = ncol(train_x),
              kernel_initializer = initializer, bias_initializer = initializer) %>% 
  
  layer_dense(units = 3, activation = 'relu',
              kernel_initializer = initializer, bias_initializer = initializer) %>%
  
  layer_dense(units = 2, activation = 'sigmoid',
              kernel_initializer = initializer, bias_initializer = initializer)

history <- model_first %>% compile(
  loss = 'binary_crossentropy',
  optimizer = optimizer_adam(lr = 0.001),
  metrics = c('accuracy')
)

model_first %>% fit(
  train_x.keras, train_y.keras, 
  epochs = 50, 
  batch_size = 10,
  validation_split = 0.3
)

Now we take look at our first model’s summary.

summary(model_first)

## Model: "sequential"
## ________________________________________________________________________________
## Layer (type)                        Output Shape                    Param #     
## ================================================================================
## dense (Dense)                       (None, 3)                       12          
## ________________________________________________________________________________
## dense_1 (Dense)                     (None, 3)                       12          
## ________________________________________________________________________________
## dense_2 (Dense)                     (None, 2)                       8           
## ================================================================================
## Total params: 32
## Trainable params: 32
## Non-trainable params: 0
## ________________________________________________________________________________

8 Model Evaluation

We will check the training loss and its accuracy.

model_first %>% evaluate(train_x.keras, train_y.keras)

## $loss
## [1] 0.5720072
## 
## $accuracy
## [1] 0.7894737

We will move forward to make predictions using the predict_classes Keras function.

predictions <- model_first %>% predict_classes(test_x.keras)

We will show the confusion matrix. We can see how many predictions are being made correctly. We have the accuracy 79.17%.

We will compare with the second model later by tuning several parameters.

confusionMatrix(as.factor(predictions), as.factor(test_y), positive = "1")

## Warning in confusionMatrix.default(as.factor(predictions), as.factor(test_y), :
## Levels are not in the same order for reference and data. Refactoring data to
## match.

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 19  5
##          1  0  0
##                                           
##                Accuracy : 0.7917          
##                  95% CI : (0.5785, 0.9287)
##     No Information Rate : 0.7917          
##     P-Value [Acc > NIR] : 0.61676         
##                                           
##                   Kappa : 0               
##                                           
##  Mcnemar's Test P-Value : 0.07364         
##                                           
##             Sensitivity : 0.0000          
##             Specificity : 1.0000          
##          Pos Pred Value :    NaN          
##          Neg Pred Value : 0.7917          
##              Prevalence : 0.2083          
##          Detection Rate : 0.0000          
##    Detection Prevalence : 0.0000          
##       Balanced Accuracy : 0.5000          
##                                           
##        'Positive' Class : 1               
##

9 Model Tuning

We will tune the first model, stored it as model_tuning, by adding one hidden node as follows:
- the first layer contains 6 nodes, relu activation function
- the second layer contains 6 nodes, relu activation function
- the third layer contains 6 nodes, relu activation function
- the fourth layer contains 2 nodes, sigmoid activation function

Our tuning model will run on 100 epochs using a batch size of 5, learning rate 0.001, and a 30% validation split.

model_tuning <- keras_model_sequential() 

model_tuning %>% 
  layer_dense(units = 6, activation = 'relu', input_shape = ncol(train_x),
              kernel_initializer = initializer, bias_initializer = initializer) %>% 
  
  layer_dense(units = 6, activation = 'relu',
              kernel_initializer = initializer, bias_initializer = initializer) %>%
  
  layer_dense(units = 6, activation = 'relu',
              kernel_initializer = initializer, bias_initializer = initializer) %>%
  
  layer_dense(units = 2, activation = 'sigmoid',
              kernel_initializer = initializer, bias_initializer = initializer)

history <- model_tuning %>% compile(
  loss = 'binary_crossentropy',
  optimizer = optimizer_adam(lr = 0.001),
  metrics = c('accuracy')
)

model_tuning %>% fit(
  train_x.keras, train_y.keras, 
  epochs = 100, 
  batch_size = 5,
  validation_split = 0.3
)

summary(model_tuning)

## Model: "sequential_1"
## ________________________________________________________________________________
## Layer (type)                        Output Shape                    Param #     
## ================================================================================
## dense_3 (Dense)                     (None, 6)                       24          
## ________________________________________________________________________________
## dense_4 (Dense)                     (None, 6)                       42          
## ________________________________________________________________________________
## dense_5 (Dense)                     (None, 6)                       42          
## ________________________________________________________________________________
## dense_6 (Dense)                     (None, 2)                       14          
## ================================================================================
## Total params: 122
## Trainable params: 122
## Non-trainable params: 0
## ________________________________________________________________________________

predictions_tuning <- model_tuning %>% predict_classes(test_x.keras)

confusionMatrix(as.factor(predictions_tuning), as.factor(test_y), positive = "1")

## Warning in confusionMatrix.default(as.factor(predictions_tuning),
## as.factor(test_y), : Levels are not in the same order for reference and data.
## Refactoring data to match.

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 19  5
##          1  0  0
##                                           
##                Accuracy : 0.7917          
##                  95% CI : (0.5785, 0.9287)
##     No Information Rate : 0.7917          
##     P-Value [Acc > NIR] : 0.61676         
##                                           
##                   Kappa : 0               
##                                           
##  Mcnemar's Test P-Value : 0.07364         
##                                           
##             Sensitivity : 0.0000          
##             Specificity : 1.0000          
##          Pos Pred Value :    NaN          
##          Neg Pred Value : 0.7917          
##              Prevalence : 0.2083          
##          Detection Rate : 0.0000          
##    Detection Prevalence : 0.0000          
##       Balanced Accuracy : 0.5000          
##                                           
##        'Positive' Class : 1               
##

10 Summary

A convolutional neural network topology with more layers offers more opportunity for the network to extract key features and recombine them in a useful nonlinear ways.

Nevertheless, due to small dataset with only 81 observations of only 4 variables include the Kyphosis label, we can not see improvement from tuning the model. The confusion matrix stay at accuracy level 79.17%.

From the first model, we notice that the training loss is lower than testing loss, meanwhile the training accuracy is higher than the testing accuracy.

From the second model, we observe that the training loss is becoming lower than the first model, meanwhile the training accuracy is the same as the first model.

This could be the case of overfitting due to small data and simple classification model.

Convolutional Neural Network Model Analysis - Kyphosis Dataset

Laurensius Wiwiek Winarta

2020-03-26