keras_iris

library(tidyverse)

## -- Attaching packages ----------------------------------------------------------------------------- tidyverse 1.2.1 --

## v ggplot2 3.2.1     v purrr   0.3.2
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   1.0.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0

## -- Conflicts -------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(keras)
library(tensorflow)
library(datasets)

data(iris)

Start

To use the normalize() function from the keras package, you first need to make sure that you’re working with a matrix. As you probably remember from earlier, the characteristic of matrices is that the matrix data elements are of the same basic type; In this case, you have target values that are of type factor, while the rest is all numeric.

This needs to change first.

You can use the as.numeric() function to convert the data to numbers:

iris[,5] <- as.numeric(iris[,5]) -1

# Turn `iris` into a matrix
iris <- as.matrix(iris)

# Set iris `dimnames` to `NULL`
dimnames(iris) <- NULL

A numerical data frame is alright, but you’ll need to convert the data to an array or a matrix if you want to make use of the keras package. You can easily do this with the as.matrix() function; Don’t forget here to set the dimnames to NULL.

Training And Test Sets

Now that you have checked the quality of your data and you know that it’s not necessary to normalize your data, you can continue to work with the original data and split it into training and test sets so that you’re finally ready to start building your model. By doing this, you ensure that you can make honest assessments of the performance of your predicted model afterwards.

Before you split your data into training and test sets, you best first set a seed. You can easily do this with set.seed(): use this exact function and just pass a random integer to it. A seed is a number of R’s random number generator. The major advantage of setting a seed is that you can get the same sequence of random numbers whenever you supply the same seed in the random number generator.

This is great for the reproducibility of your code!

You use the sample() function to take a sample with a size that is set as the number of rows of the Iris data set, or 150. You sample with replacement: you choose from a vector of 2 elements and assign either 1 or 2 to the 150 rows of the Iris data set. The assignment of the elements is subject to probability weights of 0.67 and 0.33.

# Determine sample size
ind <- sample(2, nrow(iris), replace=TRUE, prob=c(0.67, 0.33))

# Split the `iris` data
iris.training <- iris[ind==1, 1:4]
iris.test <- iris[ind==2, 1:4]

# Split the class attribute
iris.trainingtarget <- iris[ind==1, 5]
iris.testtarget <- iris[ind==2, 5]

One-Hot Encoding

You have successfully split your data, but there is still one step that you need to go through to start building your model. Can you guess which one?

When you want to model multi-class classification problems with neural networks, it is generally a good practice to make sure that you transform your target attribute from a vector that contains values for each class value to a matrix with a boolean for each class value and whether or not a given instance has that class value or not.

This is a loose explanation of One Hot Encoding (OHE). It sounds quite complex, doesn’t it?

Luckily, the keras package has a to_categorical() function that will do all of this for you; Pass in the iris.trainingtarget and the iris.testtarget to this function and store the result in iris.trainLabels and iris.testLabels:

# One hot encode training target values
iris.trainLabels <- to_categorical(iris.trainingtarget)

# One hot encode test target values
iris.testLabels <- to_categorical(iris.testtarget)

# Print out the iris.testLabels to double check the result
print(iris.testLabels)

##       [,1] [,2] [,3]
##  [1,]    1    0    0
##  [2,]    1    0    0
##  [3,]    1    0    0
##  [4,]    1    0    0
##  [5,]    1    0    0
##  [6,]    1    0    0
##  [7,]    1    0    0
##  [8,]    1    0    0
##  [9,]    1    0    0
## [10,]    1    0    0
## [11,]    1    0    0
## [12,]    1    0    0
## [13,]    1    0    0
## [14,]    1    0    0
## [15,]    1    0    0
## [16,]    1    0    0
## [17,]    0    1    0
## [18,]    0    1    0
## [19,]    0    1    0
## [20,]    0    1    0
## [21,]    0    1    0
## [22,]    0    1    0
## [23,]    0    1    0
## [24,]    0    1    0
## [25,]    0    1    0
## [26,]    0    1    0
## [27,]    0    1    0
## [28,]    0    1    0
## [29,]    0    1    0
## [30,]    0    1    0
## [31,]    0    1    0
## [32,]    0    1    0
## [33,]    0    1    0
## [34,]    0    1    0
## [35,]    0    1    0
## [36,]    0    1    0
## [37,]    0    1    0
## [38,]    0    1    0
## [39,]    0    1    0
## [40,]    0    0    1
## [41,]    0    0    1
## [42,]    0    0    1
## [43,]    0    0    1
## [44,]    0    0    1
## [45,]    0    0    1
## [46,]    0    0    1
## [47,]    0    0    1
## [48,]    0    0    1
## [49,]    0    0    1

Constructing the Model

To start constructing a model, you should first initialize a sequential model with the help of the keras_model_sequential() function. Then, you’re ready to start modeling.

However, before you begin, it’s a good idea to revisit your original question about this data set: can you predict the species of a certain Iris flower? It’s easier to work with numerical data, and you have preprocessed the data and one hot encoded the values of the target variable: a flower is either of type versicolor, setosa or virginica and this is reflected with binary 1 and 0 values.

A type of network that performs well on such a problem is a multi-layer perceptron. This type of neural network is often fully connected. That means that you’re looking to build a relatively simple stack of fully-connected layers to solve this problem. As for the activation functions that you will use, it’s best to use one of the most common ones here for the purpose of getting familiar with Keras and neural networks, which is the relu activation function. This rectifier activation function is used in a hidden layer, which is generally speaking a good practice.

In addition, you also see that the softmax activation function is used in the output layer. You do this because you want to make sure that the output values are in the range of 0 and 1 and may be used as predicted probabilities:

# Initialize a sequential model
model <- keras_model_sequential() 

# Add layers to the model
model %>% 
    layer_dense(units = 8, activation = 'relu', input_shape = c(4)) %>% 
    layer_dense(units = 3, activation = 'softmax')

Note how the output layer creates 3 output values, one for each Iris class (versicolor, virginica or setosa). The first layer, which contains 8 hidden notes, on the other hand, has an input_shape of 4. This is because your training data iris.training has 4 columns.

You can further inspect your model with the following functions:

You can use the summary() function to print a summary representation of your model; get_config() will return a list that contains the configuration of the model; get_layer() will return the layer configuration. layers attribute can be used to retrieve a flattened list of the model’s layers; To list the input tensors, you can use the inputs attribute; and Lastly, to retrieve the output tensors, you can make use of the outputs attribute.

# Print a summary of a model
summary(model)

## Model: "sequential"
## ___________________________________________________________________________
## Layer (type)                     Output Shape                  Param #     
## ===========================================================================
## dense (Dense)                    (None, 8)                     40          
## ___________________________________________________________________________
## dense_1 (Dense)                  (None, 3)                     27          
## ===========================================================================
## Total params: 67
## Trainable params: 67
## Non-trainable params: 0
## ___________________________________________________________________________

# Get model configuration
get_config(model)

## {'name': 'sequential', 'layers': [{'class_name': 'Dense', 'config': {'name': 'dense', 'trainable': True, 'batch_input_shape': (None, 4), 'dtype': 'float32', 'units': 8, 'activation': 'relu', 'use_bias': True, 'kernel_initializer': {'class_name': 'GlorotUniform', 'config': {'seed': None}}, 'bias_initializer': {'class_name': 'Zeros', 'config': {}}, 'kernel_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}}, {'class_name': 'Dense', 'config': {'name': 'dense_1', 'trainable': True, 'dtype': 'float32', 'units': 3, 'activation': 'softmax', 'use_bias': True, 'kernel_initializer': {'class_name': 'GlorotUniform', 'config': {'seed': None}}, 'bias_initializer': {'class_name': 'Zeros', 'config': {}}, 'kernel_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}}]}

# Get layer configuration
get_layer(model, index = 1)

## <tensorflow.python.keras.layers.core.Dense>

# List the model's layers
model$layers

## [[1]]
## <tensorflow.python.keras.layers.core.Dense>
## 
## [[2]]
## <tensorflow.python.keras.layers.core.Dense>

# List the input tensors
model$inputs

## [[1]]
## Tensor("dense_input:0", shape=(None, 4), dtype=float32)

# List the output tensors
model$outputs

## [[1]]
## Tensor("dense_1/Identity:0", shape=(None, 3), dtype=float32)

Compile And Fit The Model

Now that you have set up the architecture of your model, it’s time to compile and fit the model to the data. To compile your model, you configure the model with the adam optimizer and the categorical_crossentropy loss function. Additionally, you also monitor the accuracy during the training by passing ‘accuracy’ to the metrics argument.

# Compile the model
model %>% compile(
     loss = 'categorical_crossentropy',
     optimizer = 'adam',
     metrics = 'accuracy'
 )

The optimizer and the loss are two arguments that are required if you want to compile the model.

Some of the most popular optimization algorithms used are the Stochastic Gradient Descent (SGD), ADAM and RMSprop. Depending on whichever algorithm you choose, you’ll need to tune certain parameters, such as learning rate or momentum. The choice for a loss function depends on the task that you have at hand: for example, for a regression problem, you’ll usually use the Mean Squared Error (MSE).

As you see in this example, you used categorical_crossentropy loss function for the multi-class classification problem of determining whether an iris is of type versicolor, virginica or setosa. However, note that if you would have had a binary-class classification problem, you should have made use of the binary_crossentropy loss function.

Next, you can also fit the model to your data; In this case, you train the model for 200 epochs or iterations over all the samples in iris.training and iris.trainLabels, in batches of 5 samples.

# Store the fitting history in `history` 
history <- model %>% fit(
     iris.training, 
     iris.trainLabels, 
     epochs = 200,
     batch_size = 5, 
     validation_split = 0.2
 )

# Plot the history
plot(history)

Tip if you want, you can also specify the verbose argument in the fit() function. By setting this argument to 1, you indicate that you want to see progress bar logging.

What you do with the code above is training the model for a specified number of epochs or exposures to the training dataset. An epoch is a single pass through the entire training set, followed by testing of the verification set. The batch size that you specify in the code above defines the number of samples that going to be propagated through the network. Also, by doing this, you optimize efficiency because you make sure that you don’t load too many input patterns into memory at the same time.

Visualize The Model Training History

Also, it’s good to know that you can also visualize the fitting if you assign the lines of code in the DataCamp Light chunk above to a variable. You can then pass the variable to the plot() function, as you see in this particular code chunk!

At first sight, it’s no surprise that this all looks a tad messy. You might not entirely know what you’re looking at, right?

One good thing to know is that the loss and acc indicate the loss and accuracy of the model for the training data, while the val_loss and val_acc are the same metrics, loss and accuracy, for the test or validation data.

But, even as you know this, it’s not easy to interpret these two graphs. Let’s try to break this up into pieces that you might understand more easily! You’ll split up these two plots and make two separate ones instead: you’ll make one for the model loss and another one for the model accuracy. Luckily, you can easily make use of the $ operator to access the data and plot it step by step.

# Plot the model loss of the training data
plot(history$metrics$loss, main="Model Loss", xlab = "epoch", ylab="loss", col="blue", type="l")

# Plot the model loss of the test data
lines(history$metrics$val_loss, col="green")

# Add legend
legend("topright", c("train","test"), col=c("blue", "green"), lty=c(1,1))

In this first plot, you plotted the loss of the model on the training and test data. Now it’s time to also do the same, but then for the accuracy of the model:

# Plot the accuracy of the training data 
plot(history$metrics$acc, main="Model Accuracy", xlab = "epoch", ylab="accuracy", col="blue", type="l")

# Plot the accuracy of the validation data
lines(history$metrics$val_acc, col="green")

# Add Legend
legend("bottomright", c("train","test"), col=c("blue", "green"), lty=c(1,1))

Some things to keep in mind here are the following:

If your training data accuracy keeps improving while your validation data accuracy gets worse, you are probably overfitting: your model starts to just memorize the data instead of learning from it. If the trend for accuracy on both datasets is still rising for the last few epochs, you can clearly see that the model has not yet over-learned the training dataset.

Predict Labels of New Data

Now that your model is created, compiled and has been fitted to the data, it’s time to actually use your model to predict the labels for your test set iris.test. As you might have expected, you can use the predict() function to do this. After, you can print out the confusion matrix to check out the predictions and the real labels of the iris.test data with the help of the table() function.

# Predict the classes for the test data
classes <- model %>% predict_classes(iris.test, batch_size = 128)

# Confusion matrix
table(iris.testtarget, classes)

##                classes
## iris.testtarget  0  1  2
##               0 16  0  0
##               1  0 23  0
##               2  0  1  9

Evaluating Your Model

Even though you already have a slight idea of how your model performed by looking at the predicted labels for iris.test, it’s still important that you take the time to evaluate your model. Use the evaluate() function to do this: pass in the test data iris.test, the test labels iris.testLabels and define the batch size. Store the result in a variable score, like in the code example below:

# Evaluate on test data and labels
score <- model %>% evaluate(iris.test, iris.testLabels, batch_size = 128)

# Print the score
print(score)

## $loss
## [1] 0.08962845
## 
## $accuracy
## [1] 0.9795918

Fine-tuning Your Model

Fine-tuning your model is probably something that you’ll be doing a lot, especially in the beginning, because not all classification and regression problems are as straightforward as the one that you saw in the first part of this tutorial. As you read above, there are already two key decisions that you’ll probably want to adjust: how many layers you’re going to use and how many “hidden units” you will choose for each layer.

In the beginning, this will really be quite a journey.

Besides playing around with the number of epochs or the batch size, there are other ways in which you can tweak your model in the hopes that it will perform better: by adding layers, by increasing the number of hidden units and by passing your own optimization parameters to the compile() function. This section will go over these three options.

Adding Layers

What would happen if you add another layer to your model? What if it would look like this?

# Initialize the sequential model
model2 <- keras_model_sequential() 

# Add layers to model
model2 %>% 
    layer_dense(units = 8, activation = 'relu', input_shape = c(4)) %>% 
    layer_dense(units = 5, activation = 'relu') %>% 
    layer_dense(units = 3, activation = 'softmax')

# Compile the model
model2 %>% compile(
     loss = 'categorical_crossentropy',
     optimizer = 'adam',
     metrics = 'accuracy'
 )


# Fit the model to the data
model2 %>% fit(
     iris.training, iris.trainLabels, 
     epochs = 200, batch_size = 5, 
     validation_split = 0.2
 )

# Evaluate the model
score2 <- model2 %>% evaluate(iris.test, iris.testLabels, batch_size = 128)

# Print the score
print(score2)

## $loss
## [1] 0.4383008
## 
## $accuracy
## [1] 0.8979592

Hidden Units

Also try out the effect of adding more hidden units to your model’s architecture and study the impact on the evaluation, just like this:

# Initialize a sequential model
model3 <- keras_model_sequential() 

# Add layers to the model
model3 %>% 
    layer_dense(units = 28, activation = 'relu', input_shape = c(4)) %>% 
    layer_dense(units = 3, activation = 'softmax')

# Compile the model
model3 %>% compile(
     loss = 'categorical_crossentropy',
     optimizer = 'adam',
     metrics = 'accuracy'
 )

# Fit the model to the data
model3 %>% fit(
     iris.training, iris.trainLabels, 
     epochs = 200, batch_size = 5, 
     validation_split = 0.2
 )

# Evaluate the model
score3 <- model3 %>% evaluate(iris.test, iris.testLabels, batch_size = 128)

# Print the score
print(score3)

## $loss
## [1] 0.06132511
## 
## $accuracy
## [1] 0.9795918

Note that, in general, this is not always the best optimization because, if you don’t have a ton of data, the overfitting can and will be worse. That’s why you should try to use a small network with small datasets like this one.

Saving, Loading or Exporting Your Model

There is one last thing that remains in your journey with the keras package and that is saving or exporting your model so that you can load it back in at another moment.

Firstly, you can easily make use of the save_model_hdf5() and load_model_hdf5() functions to save and load your model into your workspace:

save_model_hdf5(model, "my_model.h5")
model <- load_model_hdf5("my_model.h5")

Additionally, you can also save and load the model weights with the save_model_weights_hdf5() and load_model_weights_hdf5() functions:

save_model_weights_hdf5(model, "my_model_weights.h5")
model %>% load_model_weights_hdf5("my_model_weights.h5")

Lastly, it’s good to know that you can also export your model configuration to JSON or YAML. Here, the functions model_to_json() and model_to_yaml() will help you out. To load the configurations back into your workspace, you can just use the model_from_json() and model_from yaml() functions:

json_string <- model_to_json(model)
model <- model_from_json(json_string)