Begin your R Session by Installing keras Package

Keras is a high-level neural network application programming interface (API) for deep learning. Keras uses TensorFlow by Google for a backend.

######################### download and install the keras package ###########################
library(keras)
#install_keras()

# After running the install_keras function above, you should be provided the following instructions
# to place into your terminal (Mac) or other equivalent terminal program:

# $ sudo /usr/bin/easy_install pip
# $ sudo /usr/local/bin/pip install —upgrade virtualenv

# [copy text after hashes only and place into Mac terminal]

Read in the Example Dataset

At this point we can use the following example dataset which is called the $cardiotocographic.csv$ file. The dataset is a multivariate dataset with 23 variables. Details can be found here:

https://archive.ics.uci.edu/ml/datasets/cardiotocography

The rows in the dataset pertain to from 2126 fetal cardiographs (2126 observations). The 22 columns pertain to the types of automated output from diagnostic tools used to assess cardiac patterns in utero. The final column, NSP, pertain to the consensus classification of the heart as either normal (N), suspect (S), or pathologic (P) by a group of three expert clinicians.

#################################### read in the data and normalise ######################################
setwd("/Users/matthewcourtney/Desktop/Wu.Course/Deep Learning")
data <- read.csv("Cardiotocographic.csv", header=TRUE)

# change the data to matrix
data<-as.matrix(data)

# remove the dimension names for simplicity
dimnames(data) <- NULL    # now the columns names are just V1 to Nn

# Now we can normalise the independent variables (using keras::normalize function)
data[,1:21] <- keras::normalize(data[,1:21])

# Define the last variable, NSP, as numeric
data[,22] <- as.numeric(data[,22])-1   # the minus 1 ensures values become 0,1,2

# summarise to check
summary(data)

Using the 21 independent variables (first 21 variables), we want to predict the dependent variable (column 22), normal, suspect, pathologic, coded 1, 2, 3, respectively.

To do this, we create a deep learning model based on multi-layered perceptron neural networks which include at least three layers of nodes.

Undertake Deep Analysis with keras package

Here we undertake the analysis of the data. First we need to partition the dataset into a training set and a validation set. We can do a 70:30 split and use 70% of the observations for a training set…

#################################### Undertake analysis ######################################
# set seed to the dataset can be split in the exact same way
set.seed(1234)
# use the sample function to create a sample vector
ind <- sample(2, nrow(data), replace=TRUE, prob = c(0.7,0.3))

# 2 is the number of samples (2 partitions) (sample will be drawn from 1s and 2s)
# the sample vector is nrow(data) long to align with the 2126 rows in data
# replace = TRUE means that the same observation can be selected for each partition (1s, and 2s)
# prob = a weighted random distribution forcing the sample to be biased toward 1s (the first element)

# apply the vector to partition the dataset
training <- data[ind==1, 1:21] # includes all independent variables
test <- data[ind==2, 1:21]     # includes all independent variables

# also don't forget to identify the target variable (we can use one for training and test)
trainingtarget <- data[ind==1, 22]  # includes the dependent variable NSP for the training data
testtarget <- data[ind==2, 22]      # includes the dependent variable NSP for the test data

One Hot Encoding

We also need to do what is called ‘hot encoding’ which is basically redinfining the dependent variable as a set of dummy variables.

#################################### Undertake analysis ######################################
# create the categorical variables
trainLables <- keras::to_categorical(trainingtarget)
testLables <-  keras::to_categorical(testtarget)

Now we have developed our data for prediction modelling…

Create the Model

Now let’s move on to the next step to create the model…

########################## Store the model information in 'model' ############################
# create the first model design
model <- keras_model_sequential()
# the keras_model_sequential consists of a linear stack of layers (in some sequential linear order)

# now we use the pipe function (%>%) to pass info from left to right, i.e., add additonal functions to 'model'
model %>%
  layer_dense(units=8, activation = 'relu', input_shape = 21) %>%     # this is for independent variables
  layer_dense(units=3, activation = 'softmax')                        # this is for dependent variable

# the layer_dense means that the neural network is fully connected
# we start with a small number of nodes or neurons in the hidden layer
# relu: rectified linear units (popular method)
# input_shape is the number of independent variables, 21 in this case
# softmax activation function in the output layer helps to keep range between 0 and 1 for probabilities

Requesting the summary of the model set up produces the following output:

Layer (type) Output Shape Param #

dense_3 (Dense) (None, 8) 176
_________________________________________________________________________________________________________
dense_4 (Dense) (None, 3) 27
=========================================================================================================
Total params: 203
Trainable params: 203
Non-trainable params: 0
_________________________________________________________________________________________________________

input layer…

We note 176 paramters based on first hidden layer which has 8 nodes (small number we choose). Also note that we have 21 independent variables, so 21 x 8 is 168. We also add 8 constant values for each of the nodes to get 176 total paramters.

output layer

We note 27 total paramters. We have 3 dependent variables each of which are joined to the 8 nodes, so 3 x 8 is 24. We also add 3 as a constant to get 27 total parameters for the dependent variable.

This gives us a total of 203 parameters for the model that we are going to create…

Compile Model to Configure the Learning Process

Here we set up the model for the learning process (step) with the training data which represents approximately 70% of the observations.

########################## Configure the model for the learning process ############################
model %>% keras::compile(loss='categorical_crossentropy',
                         optimizer='adam',
                         metrics='accuracy')

# categorical_crossentrophy is used when we have categorical variables (3 options here; NSP)
# for binary, we can use binary_crossentrophy 
# adam is a commonly used optimiser
# accuracy is how accurate the predicted model matches the observed result. This is the metric

Fit the Model

Now that the model is set up and configured correctly for training, we can fit it to the data.

########################################### Fit the model ###########################################
history <- model%>%
  fit(training, # this is the input, the first 21 independent variables
      trainLables,
      epoch=200,
      batch=32,
      validation_split = 0.2)

# here we use the model we created to fit the training data (training)
# to fit the dependent variables (3 dummy coded), trainLabels
# and run the model 200 times.
# we use 32 batches as the number of samples we can use per gradient
# use 20% of the data for the validation split

We get the following results:

Result for training partition

We note that after about 30 replications, the accuracy starts to improve. This seems to start to taper at about 200, though.

In the figure at the top, we note a loss for the training data (blue). We also note a loss for the validation split (20%). These run in parallel which is a good sign. If the validation loss happens to increase, that means that we are overfitting the model.

For the lower graph, we note that after about 120 iterations, the loss and accuracy stabilizers.

We note in the output that the accuracy (blue line, bottom graph) really starts to improve beyond 100 iterations. It starts at around 77% then reaches 86% near the last few iterations. Note that the accuracy is based on 80% of the training data which totals 1523 X .80 = 1218.

Evaluate the Model with Test Data

This is the final step. To run, follow instructions below…

########################################### Evaluate the model ###########################################
model%>%
  keras::evaluate(test,testLables) # we simply use the 30% 'test' data, and labels (3 categoric output nodes)

We note the output as follows:

Show in New WindowClear OutputExpand/Collapse Output
603/603 [==============================] - 0s 13us/step
$loss
[1] 0.427521

$acc
[1] 0.8656716

Therefore, our trained model is 87% accurate for our test data.

Look at the overall performance (confusion matrix)

Here we can see how well the model functions to predict the outcome…

########################################### Evaluate the model ###########################################
prob<-model%>%
  predict_proba(test)
  
pred<-model%>%
  predict_classes(test)

table(Predicted = pred, Actual=testtarget)

##          Actual
## Predicted   0   1   2
##         0 432  37  19
##         1  25  55   8
##         2   3   2  22

In the results above, the table reveals the actual correct classification of the test data in the diagonals.

Make Improvements to Model

We can change a few paramters to see how much improvements can be made…

########################################### Evaluate the model ###########################################
# See the ** for changes
rm(list=ls())
setwd("/Users/matthewcourtney/Desktop/Wu.Course/Deep Learning")
data <- read.csv("Cardiotocographic.csv", header=TRUE)

# change the data to matrix
data<-as.matrix(data)

# remove the dimension names for simplicity
dimnames(data) <- NULL    # now the columns names are just V1 to Nn

# Now we can normalise the independent variables (using keras::normalize function)
data[,1:21] <- keras::normalize(data[,1:21])

# Define the last variable, NSP, as numeric
data[,22] <- as.numeric(data[,22])-1   # the minus 1 ensures values become 0,1,2

# set seed to the dataset can be split in the exact same way
set.seed(1234)
# use the sample function to create a sample vector
ind <- sample(2, nrow(data), replace=TRUE, prob = c(0.7,0.3))

# 2 is the number of samples (2 partitions) (sample will be drawn from 1s and 2s)
# the sample vector is nrow(data) long to align with the 2126 rows in data
# replace = TRUE means that the same observation can be selected for each partition (1s, and 2s)
# prob = a weighted random distribution forcing the sample to be biased toward 1s (the first element)

# apply the vector to partition the dataset
training <- data[ind==1, 1:21] # includes all independent variables
test <- data[ind==2, 1:21]     # includes all independent variables

# also don't forget to identify the target variable (we can use one for training and test)
trainingtarget <- data[ind==1, 22]  # includes the dependent variable NSP for the training data
testtarget <- data[ind==2, 22]      # includes the dependent variable NSP for the test data

# create the categorical variables
trainLables <- keras::to_categorical(trainingtarget)
testLables <-  keras::to_categorical(testtarget)

# create the first model design
model1 <- keras_model_sequential()
# the keras_model_sequential consists of a linear stack of layers (in some sequential linear order)

# now we use the pipe function (%>%) to pass info from left to right, i.e., add additonal functions to 'model'
model1 %>%
  layer_dense(units=50, activation = 'relu', input_shape = 21) %>%    #** incease here to 50
  layer_dense(units=3, activation = 'softmax')                        # this is for dependent variable

model1 %>% keras::compile(loss='categorical_crossentropy',
                         optimizer='adam',
                         metrics='accuracy')

history <- model1%>%
  fit(training, # this is the input, the first 21 independent variables
      trainLables,
      epoch=200,
      batch=32,
      validation_split = 0.2)

model1%>%
  keras::evaluate(test,testLables) # we simply use the 30% 'test' data, and labels (3 categoric output nodes)

## $loss
## [1] 0.4064881
## 
## $acc
## [1] 0.8507463

prob<-model1%>%
  predict_proba(test)
  
pred<-model1%>%
  predict_classes(test)

table1<-table(Predicted = pred, Actual=testtarget)

print(table1)

##          Actual
## Predicted   0   1   2
##         0 419  32   6
##         1  32  55   4
##         2   9   7  39

Results are:

$loss
[1] 0.39444

$acc
[1] 0.86733

This shows just a little improvement in accuracy. Model accuracy with 8 nodes was 0.8656716. However, now it’s 0.86733.

Demonstration of Machine Learning with keras R package

Matthew Courtney

8/16/2018