Keras is a high-level neural network application programming interface (API) for deep learning. Keras uses TensorFlow by Google for a backend.
######################### download and install the keras package ###########################
library(keras)
#install_keras()
# After running the install_keras function above, you should be provided the following instructions
# to place into your terminal (Mac) or other equivalent terminal program:
# $ sudo /usr/bin/easy_install pip
# $ sudo /usr/local/bin/pip install —upgrade virtualenv
# [copy text after hashes only and place into Mac terminal]
At this point we can use the following example dataset which is called the \(cardiotocographic.csv\) file. The dataset is a multivariate dataset with 23 variables. Details can be found here:
https://archive.ics.uci.edu/ml/datasets/cardiotocography
The rows in the dataset pertain to from 2126 fetal cardiographs (2126 observations). The 22 columns pertain to the types of automated output from diagnostic tools used to assess cardiac patterns in utero. The final column, NSP, pertain to the consensus classification of the heart as either normal (N), suspect (S), or pathologic (P) by a group of three expert clinicians.
#################################### read in the data and normalise ######################################
setwd("/Users/matthewcourtney/Desktop/Wu.Course/Deep Learning")
data <- read.csv("Cardiotocographic.csv", header=TRUE)
# change the data to matrix
data<-as.matrix(data)
# remove the dimension names for simplicity
dimnames(data) <- NULL # now the columns names are just V1 to Nn
# Now we can normalise the independent variables (using keras::normalize function)
data[,1:21] <- keras::normalize(data[,1:21])
# Define the last variable, NSP, as numeric
data[,22] <- as.numeric(data[,22])-1 # the minus 1 ensures values become 0,1,2
# summarise to check
summary(data)
Using the 21 independent variables (first 21 variables), we want to predict the dependent variable (column 22), normal, suspect, pathologic, coded 1, 2, 3, respectively.
To do this, we create a deep learning model based on multi-layered perceptron neural networks which include at least three layers of nodes.
Here we undertake the analysis of the data. First we need to partition the dataset into a training set and a validation set. We can do a 70:30 split and use 70% of the observations for a training set…
#################################### Undertake analysis ######################################
# set seed to the dataset can be split in the exact same way
set.seed(1234)
# use the sample function to create a sample vector
ind <- sample(2, nrow(data), replace=TRUE, prob = c(0.7,0.3))
# 2 is the number of samples (2 partitions) (sample will be drawn from 1s and 2s)
# the sample vector is nrow(data) long to align with the 2126 rows in data
# replace = TRUE means that the same observation can be selected for each partition (1s, and 2s)
# prob = a weighted random distribution forcing the sample to be biased toward 1s (the first element)
# apply the vector to partition the dataset
training <- data[ind==1, 1:21] # includes all independent variables
test <- data[ind==2, 1:21] # includes all independent variables
# also don't forget to identify the target variable (we can use one for training and test)
trainingtarget <- data[ind==1, 22] # includes the dependent variable NSP for the training data
testtarget <- data[ind==2, 22] # includes the dependent variable NSP for the test data
We also need to do what is called ‘hot encoding’ which is basically redinfining the dependent variable as a set of dummy variables.
#################################### Undertake analysis ######################################
# create the categorical variables
trainLables <- keras::to_categorical(trainingtarget)
testLables <- keras::to_categorical(testtarget)
Now we have developed our data for prediction modelling…
Now let’s move on to the next step to create the model…
########################## Store the model information in 'model' ############################
# create the first model design
model <- keras_model_sequential()
# the keras_model_sequential consists of a linear stack of layers (in some sequential linear order)
# now we use the pipe function (%>%) to pass info from left to right, i.e., add additonal functions to 'model'
model %>%
layer_dense(units=8, activation = 'relu', input_shape = 21) %>% # this is for independent variables
layer_dense(units=3, activation = 'softmax') # this is for dependent variable
# the layer_dense means that the neural network is fully connected
# we start with a small number of nodes or neurons in the hidden layer
# relu: rectified linear units (popular method)
# input_shape is the number of independent variables, 21 in this case
# softmax activation function in the output layer helps to keep range between 0 and 1 for probabilities
Requesting the summary of the model set up produces the following output:
dense_3 (Dense) (None, 8) 176
_________________________________________________________________________________________________________
dense_4 (Dense) (None, 3) 27
=========================================================================================================
Total params: 203
Trainable params: 203
Non-trainable params: 0
_________________________________________________________________________________________________________
We note 176 paramters based on first hidden layer which has 8 nodes (small number we choose). Also note that we have 21 independent variables, so 21 x 8 is 168. We also add 8 constant values for each of the nodes to get 176 total paramters.
We note 27 total paramters. We have 3 dependent variables each of which are joined to the 8 nodes, so 3 x 8 is 24. We also add 3 as a constant to get 27 total parameters for the dependent variable.
This gives us a total of 203 parameters for the model that we are going to create…
Here we set up the model for the learning process (step) with the training data which represents approximately 70% of the observations.
########################## Configure the model for the learning process ############################
model %>% keras::compile(loss='categorical_crossentropy',
optimizer='adam',
metrics='accuracy')
# categorical_crossentrophy is used when we have categorical variables (3 options here; NSP)
# for binary, we can use binary_crossentrophy
# adam is a commonly used optimiser
# accuracy is how accurate the predicted model matches the observed result. This is the metric
Now that the model is set up and configured correctly for training, we can fit it to the data.
########################################### Fit the model ###########################################
history <- model%>%
fit(training, # this is the input, the first 21 independent variables
trainLables,
epoch=200,
batch=32,
validation_split = 0.2)
# here we use the model we created to fit the training data (training)
# to fit the dependent variables (3 dummy coded), trainLabels
# and run the model 200 times.
# we use 32 batches as the number of samples we can use per gradient
# use 20% of the data for the validation split
We get the following results:
Result for training partition
We note that after about 30 replications, the accuracy starts to improve. This seems to start to taper at about 200, though.
In the figure at the top, we note a loss for the training data (blue). We also note a loss for the validation split (20%). These run in parallel which is a good sign. If the validation loss happens to increase, that means that we are overfitting the model.
For the lower graph, we note that after about 120 iterations, the loss and accuracy stabilizers.
We note in the output that the accuracy (blue line, bottom graph) really starts to improve beyond 100 iterations. It starts at around 77% then reaches 86% near the last few iterations. Note that the accuracy is based on 80% of the training data which totals 1523 X .80 = 1218.
This is the final step. To run, follow instructions below…
########################################### Evaluate the model ###########################################
model%>%
keras::evaluate(test,testLables) # we simply use the 30% 'test' data, and labels (3 categoric output nodes)
We note the output as follows:
Show in New WindowClear OutputExpand/Collapse Output
603/603 [==============================] - 0s 13us/step
$loss
[1] 0.427521
$acc
[1] 0.8656716
Therefore, our trained model is 87% accurate for our test data.
Here we can see how well the model functions to predict the outcome…
########################################### Evaluate the model ###########################################
prob<-model%>%
predict_proba(test)
pred<-model%>%
predict_classes(test)
table(Predicted = pred, Actual=testtarget)
## Actual
## Predicted 0 1 2
## 0 432 37 19
## 1 25 55 8
## 2 3 2 22
In the results above, the table reveals the actual correct classification of the test data in the diagonals.
We can change a few paramters to see how much improvements can be made…
########################################### Evaluate the model ###########################################
# See the ** for changes
rm(list=ls())
setwd("/Users/matthewcourtney/Desktop/Wu.Course/Deep Learning")
data <- read.csv("Cardiotocographic.csv", header=TRUE)
# change the data to matrix
data<-as.matrix(data)
# remove the dimension names for simplicity
dimnames(data) <- NULL # now the columns names are just V1 to Nn
# Now we can normalise the independent variables (using keras::normalize function)
data[,1:21] <- keras::normalize(data[,1:21])
# Define the last variable, NSP, as numeric
data[,22] <- as.numeric(data[,22])-1 # the minus 1 ensures values become 0,1,2
# set seed to the dataset can be split in the exact same way
set.seed(1234)
# use the sample function to create a sample vector
ind <- sample(2, nrow(data), replace=TRUE, prob = c(0.7,0.3))
# 2 is the number of samples (2 partitions) (sample will be drawn from 1s and 2s)
# the sample vector is nrow(data) long to align with the 2126 rows in data
# replace = TRUE means that the same observation can be selected for each partition (1s, and 2s)
# prob = a weighted random distribution forcing the sample to be biased toward 1s (the first element)
# apply the vector to partition the dataset
training <- data[ind==1, 1:21] # includes all independent variables
test <- data[ind==2, 1:21] # includes all independent variables
# also don't forget to identify the target variable (we can use one for training and test)
trainingtarget <- data[ind==1, 22] # includes the dependent variable NSP for the training data
testtarget <- data[ind==2, 22] # includes the dependent variable NSP for the test data
# create the categorical variables
trainLables <- keras::to_categorical(trainingtarget)
testLables <- keras::to_categorical(testtarget)
# create the first model design
model1 <- keras_model_sequential()
# the keras_model_sequential consists of a linear stack of layers (in some sequential linear order)
# now we use the pipe function (%>%) to pass info from left to right, i.e., add additonal functions to 'model'
model1 %>%
layer_dense(units=50, activation = 'relu', input_shape = 21) %>% #** incease here to 50
layer_dense(units=3, activation = 'softmax') # this is for dependent variable
model1 %>% keras::compile(loss='categorical_crossentropy',
optimizer='adam',
metrics='accuracy')
history <- model1%>%
fit(training, # this is the input, the first 21 independent variables
trainLables,
epoch=200,
batch=32,
validation_split = 0.2)
model1%>%
keras::evaluate(test,testLables) # we simply use the 30% 'test' data, and labels (3 categoric output nodes)
## $loss
## [1] 0.4064881
##
## $acc
## [1] 0.8507463
prob<-model1%>%
predict_proba(test)
pred<-model1%>%
predict_classes(test)
table1<-table(Predicted = pred, Actual=testtarget)
print(table1)
## Actual
## Predicted 0 1 2
## 0 419 32 6
## 1 32 55 4
## 2 9 7 39
Results are:
$loss
[1] 0.39444
$acc
[1] 0.86733
This shows just a little improvement in accuracy. Model accuracy with 8 nodes was 0.8656716. However, now it’s 0.86733.