Today we’ll be doing some digit prediction using a data set that you can get from here: https://www.kaggle.com/c/digit-recognizer/data?select=train.csv The data set contains information regarding images of different numbers. We’ll be using it to create a keras model.
Let’s read the data as always using read.csv.
Before we take a look at our data, let’s check if there are any missing values.
## [1] FALSE
Seems like we’re safe in this regard. Now let’s take a look using head().
Seems like our data has a whole lot of columns, 785 of them to be
exact! They are all integers as well. The label
column
seems to represent the actual number that is shown in the image. The
rest of the columns represents a pixel in the image, with the value of 0
being black and 255 being white. Let’s use the function below in order
to visualize our data.
vizTrain <- function(input){
dimmax <- sqrt(ncol(input[,-1]))
dimn <- ceiling(sqrt(nrow(input)))
par(mfrow=c(dimn, dimn), mar=c(.1, .1, .1, .1))
for (i in 1:nrow(input)){
m1 <- as.matrix(input[i,2:785])
dim(m1) <- c(28,28)
m1 <- apply(apply(m1, 1, rev), 1, t)
image(1:28, 1:28,
m1, col=grey.colors(255),
# remove axis text
xaxt = 'n', yaxt = 'n')
text(2, 20, col="white", cex=1.2, input[i, 1])
}
}
As we can see above, these images are of handwritten numbers. They
also match the
label
column that we’ve mentioned
earlier.
As we usually do, we have to split our data into training and testing data sets by using initial_split. We’ll use the usual 80:20 proportion.
We’ll store it like below, into train_data and test_data respectively.
Before we proceed, let’s check the distribution proportion of our data.
##
## 0 1 2 3 4 5 6
## 0.09838095 0.11152381 0.09945238 0.10359524 0.09695238 0.09035714 0.09850000
## 7 8 9
## 0.10478571 0.09673810 0.09971429
It seems like the data was distributed quite evenly, I see no large difference in values. Next, we’ll have to separate the label column, as it is the target variable.
In order to make the modeling process faster, we should scale our data. In this case where all the columns have the same range of values, we can simply divide by the maximum amount. I’ve already mentioned the range previously, but let’s check it using range() to be sure.
## [1] 0 255
Seems like we were correct. Now let’s simply divide by 255.
As keras only accepts array format for the predictors, we’ll have to convert our data set into an array as below.
train_x <- array_reshape(x=as.matrix(train_x),
dim=dim(train_x))
test_x <- array_reshape(x=as.matrix(test_x),
dim=dim(test_x))
And since we’re doing classification, we’ll have to do One Hot Encoding, which just means turning the target variable into a categorical value.
Now we can begin modeling. We’ll start by creating the base of the model using keras_model_sequential. We’ll then add an input and hidden layer with these values: - 784 input_shape to represent all the pixels - 256 neurons, typically decided using 2^n - relu for data sets whose target predictor falls between 0 and +inf As well as an output layer with these values: - 10 units representing the range of the target variable (0-9) - softmax for multiclass classification
set_random_seed(777)
model <- keras_model_sequential(name="keras") %>%
# hidden layer
layer_dense(input_shape=784,
units=256,
activation="relu") %>%
# output layer
layer_dense(units=10,
activation="softmax")
model
## Model: "keras"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## dense_1 (Dense) (None, 256) 200960
## dense (Dense) (None, 10) 2570
## ================================================================================
## Total params: 203,530
## Trainable params: 203,530
## Non-trainable params: 0
## ________________________________________________________________________________
Our next step would be to compile our model. For this we’ll be using: - categorical_crossentropy for the loss method, as our model is for multiclass classification - optimizer_adam for the optimizer, as it’s a further development from SGD. I’ll simply set the learning_rate as 0.15 - accuracy for metrics for model evaluation purposes
model %>% compile(loss="categorical_crossentropy",
optimizer= optimizer_adam(learning_rate=0.15),
metrics="accuracy")
I’ll set my batch size to 4200 and have it run for 15 epochs.
modelHistory <- model %>% fit(x=train_x,
y=train_y,
epochs=15,
batch_size=4200,
validation_data=list(test_x,test_y))
## Epoch 1/15
## 8/8 - 1s - loss: 42.1994 - accuracy: 0.3305 - val_loss: 3.9949 - val_accuracy: 0.4706 - 1s/epoch - 135ms/step
## Epoch 2/15
## 8/8 - 0s - loss: 1.8472 - accuracy: 0.5148 - val_loss: 1.4061 - val_accuracy: 0.5200 - 231ms/epoch - 29ms/step
## Epoch 3/15
## 8/8 - 0s - loss: 1.2068 - accuracy: 0.5816 - val_loss: 1.0513 - val_accuracy: 0.6694 - 246ms/epoch - 31ms/step
## Epoch 4/15
## 8/8 - 0s - loss: 0.9164 - accuracy: 0.6993 - val_loss: 0.8393 - val_accuracy: 0.7307 - 234ms/epoch - 29ms/step
## Epoch 5/15
## 8/8 - 0s - loss: 0.7416 - accuracy: 0.7467 - val_loss: 0.7214 - val_accuracy: 0.7762 - 226ms/epoch - 28ms/step
## Epoch 6/15
## 8/8 - 0s - loss: 0.6311 - accuracy: 0.7941 - val_loss: 0.6218 - val_accuracy: 0.8148 - 234ms/epoch - 29ms/step
## Epoch 7/15
## 8/8 - 0s - loss: 0.5417 - accuracy: 0.8242 - val_loss: 0.5600 - val_accuracy: 0.8381 - 229ms/epoch - 29ms/step
## Epoch 8/15
## 8/8 - 0s - loss: 0.4833 - accuracy: 0.8432 - val_loss: 0.5080 - val_accuracy: 0.8494 - 236ms/epoch - 30ms/step
## Epoch 9/15
## 8/8 - 0s - loss: 0.4451 - accuracy: 0.8580 - val_loss: 0.4785 - val_accuracy: 0.8557 - 224ms/epoch - 28ms/step
## Epoch 10/15
## 8/8 - 0s - loss: 0.4027 - accuracy: 0.8710 - val_loss: 0.4483 - val_accuracy: 0.8637 - 230ms/epoch - 29ms/step
## Epoch 11/15
## 8/8 - 0s - loss: 0.3645 - accuracy: 0.8825 - val_loss: 0.4117 - val_accuracy: 0.8759 - 225ms/epoch - 28ms/step
## Epoch 12/15
## 8/8 - 0s - loss: 0.3268 - accuracy: 0.8968 - val_loss: 0.3782 - val_accuracy: 0.8885 - 217ms/epoch - 27ms/step
## Epoch 13/15
## 8/8 - 0s - loss: 0.2972 - accuracy: 0.9056 - val_loss: 0.3581 - val_accuracy: 0.8967 - 218ms/epoch - 27ms/step
## Epoch 14/15
## 8/8 - 0s - loss: 0.2705 - accuracy: 0.9134 - val_loss: 0.3494 - val_accuracy: 0.9015 - 216ms/epoch - 27ms/step
## Epoch 15/15
## 8/8 - 0s - loss: 0.2524 - accuracy: 0.9207 - val_loss: 0.3528 - val_accuracy: 0.9045 - 225ms/epoch - 28ms/step
Before we evaluate our model, we can take a look at the visualization of the fitting process by simply plotting it like below.
We can see that our model’s accuracy continues to increase, while the
loss decreases along with the amount of epochs that was ran. In order to
evaluate our model, we have to predict the testing data set first so we
have something to validate. We’ll put it through k_argmax so that it
outputs the class (in this case, label value) with the highest
probability instead of just the probabilities of all the classes. We’ll
finally convert it into a vector.
## 263/263 - 1s - 748ms/epoch - 3ms/step
## [1] 0 8 2 9 7 5
Now we can evaluate our model using confusionMatrix and inputting the predicted data into data and the original data into reference.
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 790 0 11 1 0 4 7 2 3 4
## 1 0 888 8 2 1 3 3 13 6 7
## 2 2 3 788 30 5 2 0 14 3 0
## 3 7 2 28 785 2 105 0 10 85 15
## 4 1 0 8 2 750 5 4 5 6 44
## 5 5 5 2 31 1 536 9 0 9 5
## 6 9 0 2 0 7 26 847 0 5 0
## 7 0 1 14 3 4 3 0 797 1 23
## 8 6 8 10 28 3 43 4 3 686 13
## 9 1 0 0 5 25 5 1 16 3 733
##
## Overall Statistics
##
## Accuracy : 0.9045
## 95% CI : (0.8981, 0.9107)
## No Information Rate : 0.108
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.8939
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity 0.96224 0.9791 0.90471 0.88501 0.93985 0.73224
## Specificity 0.99578 0.9943 0.99217 0.96620 0.99014 0.99126
## Pos Pred Value 0.96107 0.9538 0.93034 0.75553 0.90909 0.88889
## Neg Pred Value 0.99591 0.9975 0.98901 0.98615 0.99367 0.97487
## Prevalence 0.09771 0.1080 0.10367 0.10557 0.09498 0.08712
## Detection Rate 0.09403 0.1057 0.09379 0.09343 0.08926 0.06379
## Detection Prevalence 0.09783 0.1108 0.10081 0.12366 0.09819 0.07177
## Balanced Accuracy 0.97901 0.9867 0.94844 0.92560 0.96499 0.86175
## Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity 0.9680 0.92674 0.85006 0.86848
## Specificity 0.9935 0.99350 0.98446 0.99259
## Pos Pred Value 0.9453 0.94208 0.85323 0.92902
## Neg Pred Value 0.9963 0.99166 0.98407 0.98542
## Prevalence 0.1041 0.10236 0.09605 0.10045
## Detection Rate 0.1008 0.09486 0.08165 0.08724
## Detection Prevalence 0.1066 0.10069 0.09569 0.09391
## Balanced Accuracy 0.9807 0.96012 0.91726 0.93054
Seems like our model is already quite nice, with an overall accuracy of 91%
Using keras, we can create image prediction/classification models based off the pixel data of the images. There are also a lot of components when building a keras model, such as the input, hidden, and output layers; which all have a lot of different parameters that we can tune in order to create a better model.