The data files train.csv and test.csv contain gray-scale images of hand-drawn digits, from zero through nine.
Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255, inclusive.
The training data set, (train.csv), has 785 columns. The first column, called “label”, is the digit that was drawn by the user. The rest of the columns contain the pixel-values of the associated image.
Each pixel column in the training set has a name like pixelx, where x is an integer between 0 and 783, inclusive. To locate this pixel on the image, suppose that we have decomposed x as x = i * 28 + j, where i and j are integers between 0 and 27, inclusive. Then pixelx is located on row i and column j of a 28 x 28 matrix, (indexing by zero).
This dataset taken from kaggle https://www.kaggle.com/competitions/digit-recognizer/data.
Our Goal is to make a deep learning model, to predict the image we have, using predictor (the pixel size).
library(keras)
library(dplyr)##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(caret)## Loading required package: ggplot2
## Loading required package: lattice
p <- keras_model_sequential() ## Loaded Tensorflow version 2.0.0
tensorflow::set_random_seed(42)df_train <- read.csv("MNIST in CV/train.csv")
df_test <- read.csv("MNIST in CV/test.csv")head(df_test)dim(df_train)## [1] 42000 785
Our data contain 785 column and 42000 observation/row in data Train.
vizTrain <- function(input){
dimmax <- sqrt(ncol(input[,-1]))
dimn <- ceiling(sqrt(nrow(input)))
par(mfrow=c(dimn, dimn), mar=c(.1, .1, .1, .1))
for (i in 1:nrow(input)){
m1 <- as.matrix(input[i,2:785])
dim(m1) <- c(28,28)
m1 <- apply(apply(m1, 1, rev), 1, t)
image(1:28, 1:28,
m1, col=grey.colors(255),
# remove axis text
xaxt = 'n', yaxt = 'n')
text(2, 20, col="white", cex=1.2, input[i, 1])
}
}The code above is a function to make our data can see by a visualization.
# your code here
vizTrain(head(df_train, 36))So our data are look like that. Those are a hand-written number from 0 to 9.
library(rsample)
set.seed(100)
initializer <- initializer_random_normal(seed = 100)
index <- initial_split(df_train, prop=0.8, strata="label")
data_train <- training(index)
data_test <- testing(index)prop.table(table(data_train$label))##
## 0 1 2 3 4 5 6
## 0.09833919 0.11197095 0.09905352 0.10354783 0.09699982 0.09051134 0.09893446
## 7 8 9
## 0.10420263 0.09705935 0.09938092
We devide our data to 80:20 proportion, to data train
and data test, this is for training the data, and model
evalutation later.
train_x <- data_train %>% select(-label) %>% as.matrix() / 255
train_y <- data_train %>% select(label)
test_x <- data_test %>% select(-label) %>% as.matrix() / 255
test_y <- data_test%>% select(label)
range(train_x)## [1] 0 1
Our data is don’t have same scale.
The pixel size is 255, so for scaling we have to devide it with 255, and the result will be 0-1 and we convert it to matrix. The Deep Learning need a scaled data.
So we have 2 data train, for all predictor, and for
label only. It same to data test.
train_x <- array_reshape(train_x, dim=dim(train_x))
test_x <- array_reshape(test_x, dim=dim(test_x))For data predictor (x), we convert it to array.
# One-hot encoding target variable
train_y <- to_categorical(train_y$label, num_classes = 10)
test_y <- to_categorical(test_y$label, num_classes = 10)The target variabel we do One-Hot Encoding. We convert it to categorical and adjust according to the number of labels (0 until 9, it means 10 label)
# Membuat arsitektur
model1 <- keras_model_sequential(name="model_keras") %>%
layer_dense(units=256, activation="relu", input_shape=784, name="hidden_1") %>%
layer_dense(units=128, activation="relu", name="hidden_2") %>%
# layer_dense(units=16, activation="relu", name="hidden_3") %>%
layer_dense(units=10, activation="softmax", name="output")
model1## Model: "model_keras"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## hidden_1 (Dense) (None, 256) 200960
## ________________________________________________________________________________
## hidden_2 (Dense) (None, 128) 32896
## ________________________________________________________________________________
## output (Dense) (None, 10) 1290
## ================================================================================
## Total params: 235,146
## Trainable params: 235,146
## Non-trainable params: 0
## ________________________________________________________________________________
For Model Architecture, we do keras model, we have 3 hidden layer
with relu activation function, and softmax for
ouput layer (the last layer)
The next step is to determine the error function, optimizer, and metrics that will be shown during training.
# your code here
model1 %>% compile(loss=loss_categorical_crossentropy(),
optimizer=optimizer_sgd(learning_rate=0.1),
metrics="accuracy")For compiling, we use loss_categorical_crossentropy because our data will predict a categorical data.
Whis step, we will visualize our model. So the model will train, and the validation data as measurement.
# your code here
history <- model1 %>% fit(x=train_x,
y=train_y,
validation_data=list(test_x, test_y),
batch_size=21000,
epoch=20) %>% plot()
plot(history)The good model is when the red line (our model) and the blue line (validation data) is close together like above.
So we can say that our model is good enough as the accuracy is 0.8+
Then we make our prediction using our model above
pred <- predict(model1, test_x) %>% k_argmax() %>% as.array() %>% as.factor()
head(pred)## [1] 1 3 9 3 7 6
## Levels: 0 1 2 3 4 5 6 7 8 9
After that we have to evaluate the model. So if we have a bad model, we can tuning it again.
confusionMatrix(pred, reference = as.factor(data_test$label))## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 784 0 14 5 3 9 11 10 3 9
## 1 0 900 16 6 7 13 14 20 25 13
## 2 8 9 697 33 6 18 14 12 17 2
## 3 1 3 23 712 0 80 0 2 54 14
## 4 1 0 18 2 699 35 10 12 8 54
## 5 13 6 2 47 0 539 16 5 37 4
## 6 8 1 32 7 15 19 743 1 13 1
## 7 2 0 11 13 2 8 0 789 4 43
## 8 11 3 32 39 3 25 5 10 609 12
## 9 0 0 4 8 78 8 0 39 32 697
##
## Overall Statistics
##
## Accuracy : 0.8532
## 95% CI : (0.8455, 0.8608)
## No Information Rate : 0.1097
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.8369
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity 0.94686 0.9761 0.82097 0.81651 0.85978 0.71485
## Specificity 0.99155 0.9848 0.98424 0.97649 0.98155 0.98300
## Pos Pred Value 0.92453 0.8876 0.85417 0.80090 0.83313 0.80568
## Neg Pred Value 0.99418 0.9970 0.97996 0.97870 0.98493 0.97220
## Prevalence 0.09855 0.1097 0.10105 0.10378 0.09676 0.08974
## Detection Rate 0.09331 0.1071 0.08296 0.08474 0.08319 0.06415
## Detection Prevalence 0.10093 0.1207 0.09712 0.10581 0.09986 0.07962
## Balanced Accuracy 0.96920 0.9804 0.90261 0.89650 0.92067 0.84893
## Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity 0.91390 0.87667 0.75935 0.82097
## Specificity 0.98722 0.98894 0.98158 0.97762
## Pos Pred Value 0.88452 0.90482 0.81308 0.80485
## Neg Pred Value 0.99074 0.98526 0.97478 0.97983
## Prevalence 0.09676 0.10712 0.09545 0.10105
## Detection Rate 0.08843 0.09391 0.07248 0.08296
## Detection Prevalence 0.09998 0.10378 0.08915 0.10307
## Balanced Accuracy 0.95056 0.93280 0.87047 0.89930
As we can see above, our model have 85 % accuaracy, which is we can say it’s good enough.
As we have df_test above, let’s try to implement our
model to really unseen data, because it has no label.
We do the same thing like above. We scale and convert to array.
preprocess_x <- function(x){
train_x <- x %>% as.matrix() / 255
train_x <- array_reshape(train_x, dim=dim(train_x))
return(train_x)
}
testt_x <- preprocess_x(df_test)We predict the data with our model above.
pred2 <- predict(model1, testt_x) %>% k_argmax() %>% as.array() %>% as.factor()df_test$label <- pred2
df_test[,c(780:785)]This is what the the model predict the unseen data.
So the model will predict the unseen data with 86% accuracy.
We can’t evaluate it with Confusion Matrix, because the
df_test have really no label data.
🖋 Insight :
It means, the model have accuracy 85% (right) to predict the unseen data later.
For choosing the best model for our Neural Network/Deep Learning, we
should consider few things:
- Choose the simplest model
- Time consumption
- Model is not overfit / underfit, because we need the model to be good
in both data (train & test)