This problem aims at training neural networks to predict the existence of breast cancer (binary classification), using the Wisconsin Breast Cancer Database created by Dr. WIlliam H. Wolberg. More details about the dataset are available here.
The Breast Cancer data set is included in the R package mlbench. We first install this package and also the keras package for deep learning.
library(keras)
library(mlbench)
# Require the data set and remove the incomplete samples.
data("BreastCancer")
BreastCancer <- as.matrix(BreastCancer[which(complete.cases(BreastCancer)==TRUE), ]) # drop the incomplete samples
head(BreastCancer)
## Id Cl.thickness Cell.size Cell.shape Marg.adhesion Epith.c.size
## 1 "1000025" "5" "1" "1" "1" "2"
## 2 "1002945" "5" "4" "4" "5" "7"
## 3 "1015425" "3" "1" "1" "1" "2"
## 4 "1016277" "6" "8" "8" "1" "3"
## 5 "1017023" "4" "1" "1" "3" "2"
## 6 "1017122" "8" "10" "10" "8" "7"
## Bare.nuclei Bl.cromatin Normal.nucleoli Mitoses Class
## 1 "1" "3" "1" "1" "benign"
## 2 "10" "3" "2" "1" "benign"
## 3 "2" "3" "1" "1" "benign"
## 4 "4" "3" "7" "1" "benign"
## 5 "1" "3" "1" "1" "benign"
## 6 "10" "9" "7" "1" "malignant"
The data matrix BreastCancer has 683 rows and 11
columns, where each row represents a sample case, and each column
represents a feature. The first column is the ID of the cases, which is
useless and will be discarded in this problem. The 2nd to the 10th
columns represent 9 different medical characteristics, with numerical
values from 1 to 10. The last column indicates whether the tumor is
benign or not. In our problem, we will use columns 2-10 as the
predictors X and the last column as the label
y (0 for benign and 1 for malignant).
# Separate the predictors X and the label y.
X <- BreastCancer[ ,2:10]
y <- BreastCancer[ ,11] == "malignant"
X <- apply(X, 2, as.numeric) # convert each column of X into numeric values
head(X)
## Cl.thickness Cell.size Cell.shape Marg.adhesion Epith.c.size Bare.nuclei
## [1,] 5 1 1 1 2 1
## [2,] 5 4 4 5 7 10
## [3,] 3 1 1 1 2 2
## [4,] 6 8 8 1 3 4
## [5,] 4 1 1 3 2 1
## [6,] 8 10 10 8 7 10
## Bl.cromatin Normal.nucleoli Mitoses
## [1,] 3 1 1
## [2,] 3 2 1
## [3,] 3 1 1
## [4,] 3 7 1
## [5,] 3 1 1
## [6,] 9 7 1
head(y)
## 1 2 3 4 5 6
## FALSE FALSE FALSE FALSE FALSE TRUE
Now we are ready to do the train/test splitting. Randomly choose 70% of the samples (i.e., 683*0.7 ≈ 478 samples) to be the training set and let the rest be the test set.
set.seed(1234)
N <- dim(BreastCancer)[1]
training_id <- sample(N, size=floor(0.7 * N), replace=F)
X_train <- X[training_id, ]
X_test <- X[-training_id, ]
y_train <- y[training_id]
y_test <- y[-training_id]
Build a neural network with two fully connected layers (name it network). The first layer should have 32 hidden units and a relu activation. The second layer should have only one unit, and the activation function should be the sigmoid function, since we are dealing with a binary classification problem.
tensorflow::set_random_seed(1234) # Set a random seed for reproducability
network <- keras_model_sequential() %>%
layer_dense(units = 32, activation = "relu", input_shape = c(9)) %>%
layer_dense(units = 1, activation = "sigmoid")
Next, compile the network with a proper optimizer and a proper loss function. Also, set metrics = “accuracy” to record the training accuracy in each epoch.
network %>% compile(
optimizer = "rmsprop",
loss = "binary_crossentropy",
metrics = c("accuracy")
)
Fit the network on the training set with 20 epochs and batch size 32. Remember to store the outputs into a variable history for visualizing the training process.
history <- network %>% fit(
X_train, y_train,
epochs = 20, batch_size = 32)
Plot history to see the training process. Report the loss and the accuracy on the training set in the final epoch, which can be displayed by the command print(history).
plot(history)
print(history)
##
## Final epoch (plot to see history):
## loss: 0.2178
## accuracy: 0.9435
Finally, report the loss and the accuracy of your network on the test set.
test_metrics <- network %>% evaluate(X_test, y_test)
cat("Test loss:", test_metrics[[1]], "\nTest accuracy:", test_metrics[[2]], "\n")
## Test loss: 0.2054043
## Test accuracy: 0.9512195
Generate predictions on new data using the predict() function and create a table to compare the predicted labels with the true labels using the table() function in R. (See the MNIST example if you forget how to do it.)
probs <- network %>% predict(X_test)
predictions <- ifelse(probs > 0.5, 1, 0)
table(Predicted = predictions, True = y_test)
## True
## Predicted FALSE TRUE
## 0 137 5
## 1 5 58
Now, based on the network you just built, build three
different networks as follows.
network_deeper: Add another dense layer between the two
layers of network. The newly-added layer should contain 16
hidden units and a relu activation.network_regularized: Add an \(\ell_2\) regularizer to the first layer of
network, with regularization parameter
0.01.network_dropout: Add a dropout layer between the first
layer and the last layer of network, with the drop rate
0.5.Train these networks using the same compile step and the same fit
step as network. Report their accuracies on the test
set.
tensorflow::set_random_seed(1234)
# network_deeper
network_deeper <- keras_model_sequential() %>%
layer_dense(units = 32, activation = "relu", input_shape = c(9)) %>%
layer_dense(units = 16, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid")
network_deeper %>% compile(
optimizer = "rmsprop",
loss = "binary_crossentropy",
metrics = c("accuracy")
)
history_deeper <- network_deeper %>% fit(
X_train, y_train,
epochs = 20, batch_size = 32,
validation_data = list(X_test, y_test)
)
# network_regularized
network_regularized <- keras_model_sequential() %>%
layer_dense(units = 32, activation = "relu", input_shape = c(9), kernel_regularizer = regularizer_l2(0.01)) %>%
layer_dense(units = 1, activation = "sigmoid")
network_regularized %>% compile(
optimizer = "rmsprop",
loss = "binary_crossentropy",
metrics = c("accuracy")
)
history_regularized <- network_regularized %>% fit(
X_train, y_train,
epochs = 20, batch_size = 32,
validation_data = list(X_test, y_test)
)
# network_dropout
network_dropout <- keras_model_sequential() %>%
layer_dense(units = 32, activation = "relu", input_shape = c(9)) %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = 1, activation = "sigmoid")
network_dropout %>% compile(
optimizer = "rmsprop",
loss = "binary_crossentropy",
metrics = c("accuracy")
)
history_dropout <- network_dropout %>% fit(
X_train, y_train,
epochs = 20, batch_size = 32,
validation_data = list(X_test, y_test)
)
##Based on the test accuracies, which one of the three networks perform the best?
#Evaluate the three networks on the test set:
test_metrics_deeper <- network_deeper %>% evaluate(X_test, y_test)
test_metrics_regularized <- network_regularized %>% evaluate(X_test, y_test)
test_metrics_dropout <- network_dropout %>% evaluate(X_test, y_test)
cat("Test accuracy (network_deeper):", test_metrics_deeper[[2]], "\n")
## Test accuracy (network_deeper): 0.9707317
cat("Test accuracy (network_regularized):", test_metrics_regularized[[2]], "\n")
## Test accuracy (network_regularized): 0.9414634
cat("Test accuracy (network_dropout):", test_metrics_dropout[[2]], "\n")
## Test accuracy (network_dropout): 0.9414634
# To identify the best network based on test accuracy
best_network <- which.max(c(test_metrics_deeper[[2]], test_metrics_regularized[[2]], test_metrics_dropout[[2]]))
network_names <- c("network_deeper", "network_regularized", "network_dropout")
cat("Best network based on test accuracy:", network_names[best_network], "\n")
## Best network based on test accuracy: network_deeper
After loading and pre-processing the dataset, I trained a simple neural network (network) with two fully connected layers. To improve upon the initial model, I created three different networks: network_deeper, network_regularized, and network_dropout. Each of these networks utilized the same compile and fit steps as the original network. Upon comparing their accuracies on the test set, I found that network_deeper, which featured an additional dense layer with 16 hidden units and a relu activation, outperformed the other models with an impressive accuracy of .97
In this problem, we will train a convolutional neural network (CNN) for clothing classification using the Fashion_MNIST dataset. Fashion_MNIST is a dataset of Zalando’s article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes of clothing, such as shoes, t-shirts, dresses, and so on. See here for more details.
Load the Fashion_MNIST dataset, which is included in
keras, and then construct the training and test sets.
data <- dataset_fashion_mnist()
X_train <- data$train$x
y_train <- data$train$y
X_test <- data$test$x
y_test <- data$test$y
The structure of X and y can be shown by the str()
function in R.
str(X_train)
## int [1:60000, 1:28, 1:28] 0 0 0 0 0 0 0 0 0 0 ...
str(y_train)
## int [1:60000(1d)] 9 0 0 3 0 2 7 2 5 5 ...
str(X_test)
## int [1:10000, 1:28, 1:28] 0 0 0 0 0 0 0 0 0 0 ...
str(y_test)
## int [1:10000(1d)] 9 2 1 1 6 1 4 6 5 7 ...
Similar to the MNIST dataset, the images are encoded as 3D arrays, and the labels are a 1D array of categories, ranging from 0 to 9. The training images are stored in an array of 60,000 matrices of 28 × 28 integers. Each such matrix is a grayscale image, with values between 0 and 255. The first 36 images are visualized as follows.
par(mfcol=c(6, 6))
par(mar=c(0, 0, 3, 0), xaxs='i', yaxs='i') # set edges
for (idx in 1:36) {
im <- X_train[idx, , ]
plot(as.raster(im, max = 255)) # create a raster object (representing a bitmap image)
}
First, preprocess the data through the following steps: 1. Reshape
the images into the shape that CNN expects. The shape of
X_train and X_test should be (60000, 28, 28,
1) and (10000, 28, 28, 1), respectively. 2. Scale X_train
and X_test so that their values are in the [0, 1] interval.
3. One-hot encode the labels y_train and
y_test.
# TODO
#1. Reshaping the Images
X_train <- array_reshape(X_train, c(60000, 28, 28, 1))
X_test <- array_reshape(X_test, c(10000, 28, 28, 1))
#2. Scale X_train and X_test
X_train <- X_train / 255
X_test <- X_test / 255
#3. One-Hot Encoding
y_train <- to_categorical(y_train)
y_test <- to_categorical(y_test)
Now build a CNN model that sequentially includes:
"relu" activation;"softmax"
activation.model <- keras_model_sequential() %>%
# The first 2D convolutional layer
layer_conv_2d(
filter = 16,
kernel_size = c(3, 3),
input_shape = c(28, 28, 1),
padding = "same",
activation = "relu") %>%
# Add a max pooling layer following the convolutional layer
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
# A flatten layer
layer_flatten() %>%
# Feed the vector into a densely connected layer with 64 hidden units
layer_dense(units = 64, activation = "relu") %>%
# The final layer with 10 outputs and a softmax activation
layer_dense(units = 10, activation = "softmax")
summary(model)
## Model: "sequential_4"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## conv2d (Conv2D) (None, 28, 28, 16) 160
## max_pooling2d (MaxPooling2D) (None, 14, 14, 16) 0
## flatten (Flatten) (None, 3136) 0
## dense_10 (Dense) (None, 64) 200768
## dense_9 (Dense) (None, 10) 650
## ================================================================================
## Total params: 201,578
## Trainable params: 201,578
## Non-trainable params: 0
## ________________________________________________________________________________
Now compile the CNN you just built with a proper loss function, the
"rmsprop" optimizer , and the "acc"
metric.
model %>% compile(
loss = "categorical_crossentropy",
optimizer = optimizer_rmsprop(learning_rate = 1e-3),
metrics = c("acc")
)
Fit the model on the training set with 10 epochs. Remember to record
the output into a history variable.
history <- model %>% fit(X_train, y_train, batch_size = 64, epochs = 10)
Plot the history and report the training accuracy in the final epoch.
plot(history)
print(history)
##
## Final epoch (plot to see history):
## loss: 0.1555
## acc: 0.9438
Evaluate the model on the test set. Report the test accuracy.
model %>% evaluate(X_test, y_test)
## loss acc
## 0.2711755 0.9054000
Now make a table to see the model’s performance on predicting each category.
y_test_pred <- model %>% predict(X_test) %>% k_argmax()
table(true = data$test$y, predicted = as.vector(y_test_pred))
## predicted
## true 0 1 2 3 4 5 6 7 8 9
## 0 849 1 9 10 1 0 123 0 7 0
## 1 0 979 0 12 3 0 5 0 1 0
## 2 30 1 813 7 50 0 98 0 1 0
## 3 21 5 8 904 26 0 36 0 0 0
## 4 2 1 50 27 845 0 75 0 0 0
## 5 0 0 0 0 0 974 1 12 0 13
## 6 84 1 32 21 42 0 815 0 5 0
## 7 0 0 0 0 0 10 0 912 0 78
## 8 6 2 2 3 0 2 6 1 978 0
## 9 0 0 0 0 0 4 1 10 0 985
I loaded the dataset as instructed in the provided .ipynb file and proceeded with train/test splitting. I preprocessed the data by reshaping the images to the required shape for the CNN, scaling X train and X test values to the [0, 1] interval, and one-hot encoding the labels y train and y test. I then built a CNN model with a specified architecture, compiled it using an appropriate loss function, the “rmsprop” optimizer, and the “acc” metric. After fitting the model on the training set for 10 epochs and recording the output in a history variable, I plotted the history and observed a training accuracy of .905 in the final epoch. Evaluating the model on the test set yielded a loss function value of .27, further highlighting the model’s performance on predicting each category.