Sharia's Closet, a clothing donation resource for San Diego, CA, https://www.instagram.com/shariascloset/p/DFjEBDmNTLV/?img_index=1

Sharia’s Closet, a clothing donation resource for San Diego, CA, https://www.instagram.com/shariascloset/p/DFjEBDmNTLV/?img_index=1

Introduction

Machine learning (ML) is transforming how we analyze data, make predictions, and automate decision making. One key application is classification, where a model “learns” to assign labels to data based on known features. One real world application of classification is in the clothing industry. From sorting your laundry, to deciding if an outfit is business-casual, we can all relate to sorting clothing. One place I would love to see this applied is in the nonprofit sector such as at Sharia’s Closet. Sharia’s closet is a nonprofit organization ran by the volunteers who sort, manage, and curate wardrobes for anyone in need, regardless of income. Traditionally, sorting and organizing clothing items in donation centers or retail stores is a manual process, but machine learning can aid with this task. Not only could this allow volunteers to be faster at sorting clothes, but could increase turn around time for local San Diegan’s who need clothes for their rapidly growing toddler, or interview outfits for a new job.

In this code through, we will build a classification model using the Fashion-MNIST data set from Kaggle.com and train it using R’s torch package, a deep learning framework inspired by PyTorch, one of the gold standard libraries for deep learning in Python. I did most of my machine learning in the past using Python, and I’m excited to dive into ML with R!


Content Overview

Don’t be intimidated! We’ll focus on these main points:

  • Briefly explain machine learning and classification models
  • Introduce the Fashion-MNIST data from Kaggle.com, where we will prep the data for ML training
  • Build a neural network using torch
  • Evaluate the model’s performance and accuracy


Why You Should Care

Classification models are widely used in a variety of industries! From diagnosing disease in healthcare, identifying fraudulent transactions in finance, to even applications like virtual try-on technology in fashion, all of these things use classification. Using machine learning, we can see how a neural network can learn patterns in the data to make accurate predictions.

As mentioned previously, an industry where clothing classification would be beneficial is within community clothing banks and donation centers, where sorting clothes efficiently can help curate clothing bags for individuals in need in a timely manner, potentially helping more individuals from every volunteer hour. With a well trained ML model, we can streamline this process by identifying clothing types automatically. Additional resources could allow for this type of sorting to be improved upon, potentially being trained on clothing types (formal/ casual) or even sizing! But for now, we’ll keep it simple with just clothing types.


Learning Objectives

By the end of this code through, you should be able to:

  • Understand how classification models work
  • Load, explore, and preprocess the Fashion-MNIST data set
  • Build and train a deep learning model using torch
  • Evaluate model accuracy and interpret classification results



Understanding Machine Learning

Machine learning (ML) is a powerful tool that allows computers to “learn” from data and make predictions without being explicitly programmed. Instead of writing specific rules for every decision, ML models analyze patterns in historical data and use them to classify new information or make informed predictions. While there are three main types of machine learning, this guide focuses on classification, which falls under supervised learning. Supervised learning means that we train the model using labeled data, where each input has a known correct output.

Classification models follow a structured learning process:

  • Training Phase: The model is trained on a labeled data set, meaning the data set contains input features along with their corresponding correct classifications.
  • Pattern Learning: During training, the model analyzes examples and learns patterns that differentiate one category from another.
  • Prediction Phase: Once trained, the model can make predictions on new, unseen data.
  • Evaluation: The model’s effectiveness is measured by how accurately it assigns the correct labels to new data points.
Infographic on Supervised Machine Learning, https://goldinlocks.github.io/Basic-ML/

Infographic on Supervised Machine Learning, https://goldinlocks.github.io/Basic-ML/

In the step 1 of the infographic above, we see the training phase. The model is given labeled images of cats, meaning each image is explicitly tagged as “cat”. The more variety in cat pictures, the better to avoid bias! As any good statistician knows, good models are built from diverse samples.

  • Data tip: If you only train your model on pictures of your tuxedo cat, it won’t recognize grey or orange cats, leading to poor generalization. Your model might become excellent at identifying your cat, but not cats in general—which is a critical flaw to keep in mind when building classification models.

In step 2, the model enters the testing phase. Now, it is given a set of unlabeled images, some of which contain cats, and others that do not. While we recognize the non-cat images as dogs, the model was only trained on “cats”, so it will attempt to classify the new images based on what it learned, labelling the dogs as “not cats”.

Missing from the infographic is Step 3: Model Evaluation. At this stage, we assess the model’s accuracy. How often did it correctly classify a cat? How many times did it mistake a dog for a cat? If the accuracy is low, we might need to adjust the model by adding more training data or tweaking its parameters, known as hyperparametric tuning.


Importing Data & Preprocessing

Now that we understand the basics of machine learning and classification, it’s time to move on to preparing the Fashion-MNIST data set from Kaggle.com for training our model. Data preprocessing is a crucial step in machine learning, as raw data is often messy, contains missing values, or needs transformation before it can be used effectively. In this section we will:

  • Load Fashion-MNIST data set
  • Explore the data set, handle missing values, and prepare image transformations
  • Split the data set into our training and testing data


Load & Explore Dataset

Our goal with classification is to categorize clothing items into 10 different classes based on their image representations. First step is to read in our data. We will also print out the dimensions and check for missing values within each data set.

train_data <- read_csv("fashion-mnist_train.csv")
test_data <- read_csv("fashion-mnist_test.csv")

cat("Training Data: ", nrow(train_data), "rows ×", ncol(train_data), "columns | Missing Values:", sum(is.na(train_data)), "\n")
## Training Data:  60000 rows × 785 columns | Missing Values: 0
cat("Testing Data: ", nrow(test_data), "rows ×", ncol(test_data), "columns | Missing Values:", sum(is.na(test_data)), "\n")
## Testing Data:  10000 rows × 785 columns | Missing Values: 0

Now we need to understand our data.

  • There are 70,000 samples of clothing, 60,000 are for training, 10,000 are for testing
  • There are 785 columns in each data set
  • Column 1 “label” contains the numerical value that represents our clothing “class” label
  • Columns 2-785 contains the pixel values (0-255) for the flattened grayscale picture

If you are curious to learn more about how computers store images numerically (flattened), check out Resource V under Resource Materials! But we can take a peek at the first few rows of our train_data below.

head(train_data)

The numerical labels for our clothing class categories are:

    1. T-shirt / top
    1. Trouser
    1. Pullover
    1. Dress
    1. Coat
    1. Sandals
    1. Shirt
    1. Sneaker
    1. Bag
    1. Ankle Boot

Note: Since our label index starts with 0, and R indexes start at 1, we are going to adjust the labels for our categories to start with 1 instead of zero, and apply to both our training and testing data sets.

train_data$label <- train_data$label + 1 # add 1 to every value in train data labels
test_data$label <- test_data$label + 1  # add 1 to every value in test data labels

Double check that data is 1-10 instead of 0-9.

unique(train_data$label)
##  [1]  3 10  7  1  4  5  6  9  8  2
unique(test_data$label)
##  [1]  1  2  3  4  9  7  6  5  8 10

Verify our labels are correct

Here we are going to set ourselves up for success and assign a string of our class labels, so the words can be displayed instead of the numerical numbers for each category of clothing. Before moving on, we want to make sure that our model images and labels are still lined up! We will also create an indexing function for displaying whatever index of images we want.

# define the class labels! we'll need this later in our plots
class_labels <- c("T-shirt/top", "Trouser", "Pullover", "Dress", "Coat", 
                  "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot")

# plotting function, images are flattened 28 x 28 pixels
plot_fashion_mnist <- function(index) {
  img_matrix <- matrix(as.numeric(train_data[index, -1]), nrow = 28, ncol = 28, byrow = TRUE)
  label_index <- as.numeric(train_data$label[index])  
  label <- as.character(class_labels[label_index])    
  
  ggplot() +
    geom_raster(aes(x = rep(1:28, each = 28), y = rep(28:1, times = 28), fill = as.vector(img_matrix))) +
    scale_fill_gradient(low = "white", high = "black", guide = "none") +
    ggtitle(paste("Label:", label)) +  
    theme_void()
}

# generate the plots for the first 6 images
indices <- 1:6 ## change range to test other samples ##
plots <- lapply(indices, plot_fashion_mnist)
grid.arrange(grobs = plots, ncol = 3)

The first few items are pullover, ankle boot, shirt, t-shirt/top, dress, and coat. This matches the output for head(train_data) earlier.

Constructing the Neural Network & Model

Now is time for the hard work. Neural network sounds intimidating, and I highly suggest watching Resource VII from YouTuber 3Blue1Brown, who has a fascinating way of explaining the concepts of deep learning (and many other mathematical topics). Neural networks are ML models inspired by the human brain. The same way our brain forms speedy connections down pathways it travels frequently, so does the neural network of a machine learning model! It consists of layers of interconnected nodes (like our neurons) that process information and learn patterns from data.

We are going to create a 3-layer multi-layer perceptron (MLP). We can start off with this balance of input, two hidden, and one output layer. Hidden layers improve feature extraction which allows the model to learn more abstract features, which could improve accuracy. However, more layers does not always mean better and could lead to overfitting. As these images are not high resolution, nor in full color, the three layers should be sufficient. Once we evaluate the model’s performance, we may fine-tune it by adjusting the number of layers, neurons, or activation functions.

Define Neural Network Architecture

Basic neural network architecture consists of:

  • Input layer - this receives our raw data, such as the pixel values of an image
  • Hidden layers - these detect patterns using mathematical operations
  • Output layer - this makes our predictions, such as assigning an image to a clothing category

Below within the comments of the code, you can find:

  • Input layer will have 784 neurons, 1 per pixel in the 28 x 28 image.
  • Hidden layer 1 will have 128 neurons with ReLU activation
  • Hidden layer 2 will have 64 neurons with ReLU activation
  • Output layer will have 10 neurons (one per clothing category) with softmax activation

Notes:

  • ReLU activation: Rectified Linear Unit introduces non-linearity into the model. It helps the network learn complex patterns instead of just linear relationships. Positive values remain the same,negative values become zero.
  • Softmax activation is used in the output layer to convert the raw output scores into probabilities, ensuring the sum of all class probabilities equals 1.
  • Since MLPs require 1D input vectors, we flatten the 28×28 image into a 784-dimensional vector.
net <- nn_module(                    # set up our network
  initialize = function() {          # defines the layers that exist in the model
    self$fc1 <- nn_linear(784, 128)  # input 784 neurons to hidden 128
    self$fc2 <- nn_linear(128, 64)   # hidden layer 128 to hidden 64
    self$output <- nn_linear(64, 10) #  hidden 64 to output of 10 categories (output layer)
  },
  
  forward = function(x) {           # defines the flow through the layers
    x <- x$view(c(x$size(1), 784))  # flatten input image since MLPS require 1D input vectors
    x <- self$fc1(x) %>% nnf_relu() # pass through fc1 + ReLU activation 
    x <- self$fc2(x) %>% nnf_relu() # pass through fc2 + ReLU activation 
    x <- self$output(x)             # final layer 
    x
  }
)

Define Training Parameters

To train our neural network, we define three key components:

  • the model
  • the loss function
  • the optimizer

The model starts with random weights and learns through exposure to training data.

For classification, we use cross-entropy loss, which penalizes incorrect predictions by comparing the model’s predicted probabilities to the actual class labels. Minimizing this loss helps the model make more accurate classifications.

To optimize learning, we use the Adam optimizer. Adam adapts learning rates dynamically. The learning rate (lr = 0.001) controls how much weights update with each step, balancing learning speed and stability.

model <- net()
criterion <- nn_cross_entropy_loss()
optimizer <- optim_adam(model$parameters, lr = 0.001)

Convert Data Set into Tensors for torch

To train the model with torch we need to convert our data set into tensors which are the primary data structure used in deep learning. Then we will create our data set objects, which groups our input (x) and labels (y) for handling. Lastly we will prepare data loaders, which organizes data into mini-batches for efficient training. Here we will:

  • Convert data to tensors
  • Create data set objects with tensor_dataset()
  • Prepare data_loader()

Notes:

  • batch size of 64 allows us to be faster and memory-efficient by processing data in smaller groups
  • Training data is shuffled to randomize order to prevent the model from memorizing patterns. Much like humans, it will try to take the easy way out
  • Testing data is not shuffled to maintain consistency in evaluation
# convert training & testing data as torch
train_x <- torch_tensor(as.matrix(train_data[, -1]), dtype = torch_float())
train_y <- torch_tensor(as.integer(train_data$label), dtype = torch_long())

test_x <- torch_tensor(as.matrix(test_data[, -1]), dtype = torch_float())
test_y <- torch_tensor(as.integer(test_data$label), dtype = torch_long())

# set up data objects
train_dataset <- tensor_dataset(train_x, train_y)
test_dataset <- tensor_dataset(test_x, test_y)

# prepare data loader
train_loader <- dataloader(train_dataset, batch_size = 64, shuffle = TRUE)
test_loader <- dataloader(test_dataset, batch_size = 64, shuffle = FALSE)

Model Training

Training the neural network involves iterating over the data set multiple times (epochs), updating weights after each batch to minimize loss. Weights are the learnable parameters that determine how input data flows through the layers. We will start with 5 iterations of training loops. We will load our images in batches with train_loader to keep our loads in manageable batches. Next we will compute the loss, which is the models predictions compared to actual labels. This is computed using cross-entropy loss. We also are computing the gradient with loss$backward() and updating our weights with optimizer$step(). These updates during training aim to minimize the error between predictions and actual labels.

  • Lower loss over epochs = model is learning effectively
  • If loss plateaus = tuning hyperparameters may be necessary
epochs <- 5 # 5 loops

for (epoch in 1:epochs) { 
  total_loss <- 0
  
  coro::loop(for (batch in train_loader) { # train loader feeds 64 images per batch
    optimizer$zero_grad()
    
    output <- model(batch[[1]])  
    loss <- criterion(output, batch[[2]])  
    
    loss$backward()    # computes gradients
    optimizer$step()   # updates weights
    
    total_loss <- total_loss + loss$item() # compute loss
  })
  
  cat(sprintf("Epoch [%d/%d] - Loss: %.4f\n", epoch, epochs, total_loss / length(train_loader)))
}
## Epoch [1/5] - Loss: 0.6872
## Epoch [2/5] - Loss: 0.4400
## Epoch [3/5] - Loss: 0.4040
## Epoch [4/5] - Loss: 0.3804
## Epoch [5/5] - Loss: 0.3614

Let’s evaluate the epochs. Here we see the model starts with relatively high loss (.7199) in the first epoch. But with each epoch, the loss steadily decreases, suggesting the model is successfully learning the data. You might be thinking, ‘let’s add more training epochs- more training may improve more accuracy!’ but we should be careful. After 5 epochs, loss dropped to 0.3603, showing good progress. However, when training continued (below) for 10 more epochs, the loss improved only slightly, reaching 0.3014, while test accuracy barely changed. This suggests the model is approaching convergence, where additional training no longer provides meaningful improvements and could even lead to overfitting. Instead of blindly adding more epochs, we should stop training when the loss stabilizes to ensure the model generalizes well rather than memorizing the training data.

epochs <- 10 # 10 loops

for (epoch in 1:epochs) { 
  total_loss <- 0
  
  coro::loop(for (batch in train_loader) { # train loader feeds 64 images per batch
    optimizer$zero_grad()
    
    output <- model(batch[[1]])  
    loss <- criterion(output, batch[[2]])  
    
    loss$backward()    # computes gradients
    optimizer$step()   # updates weights
    
    total_loss <- total_loss + loss$item() # compute loss
  })
  
  cat(sprintf("Epoch [%d/%d] - Loss: %.4f\n", epoch, epochs, total_loss / length(train_loader)))
}
## Epoch [1/10] - Loss: 0.3498
## Epoch [2/10] - Loss: 0.3450
## Epoch [3/10] - Loss: 0.3378
## Epoch [4/10] - Loss: 0.3323
## Epoch [5/10] - Loss: 0.3204
## Epoch [6/10] - Loss: 0.3202
## Epoch [7/10] - Loss: 0.3154
## Epoch [8/10] - Loss: 0.3110
## Epoch [9/10] - Loss: 0.3036
## Epoch [10/10] - Loss: 0.2945

Evaluate the Model

Once training is complete, we evaluate the model’s performance by testing it on unseen data. The test data set is processed in batches, with each batch passed through the trained model to generate predictions. Using torch_argmax(), the model selects the class with the highest probability for each image. These predictions are then compared to the actual labels to count the number of correct classifications. Finally, accuracy is computed as the percentage of correct predictions out of the total test samples.

correct <- 0
total <- 0

coro::loop(for (batch in test_loader) { # process in batches
  output <- model(batch[[1]])  
  predicted <- torch_argmax(output, dim = 2)  # torch_argmax select highest probability

  correct <- correct + sum(predicted == batch[[2]])  # make labels
  total <- total + batch[[2]]$size(1)  
})

accuracy <- correct$item() / total  
cat(sprintf("Test Accuracy: %.2f%%\n", accuracy * 100)) # accuracy computed
## Test Accuracy: 86.92%


Interpret classification results

Now that the model has been trained and evaluated, we need to understand how well it classifies clothing items. Interpretation of classification results involves analyzing misclassifications with a confusion matrix, and visualizing predictions by plotting test images.

predictions <- c()
actual_labels <- c()

coro::loop(for (batch in test_loader) {
  output <- model(batch[[1]])
  predicted <- torch_argmax(output, dim = 2)$to(device = "cpu")  
  predictions <- c(predictions, as.numeric(predicted))
  actual_labels <- c(actual_labels, as.numeric(batch[[2]]$to(device = "cpu")))
})
predictions_named <- factor(predictions, levels = 1:10, labels = class_labels)
actual_labels_named <- factor(actual_labels, levels = 1:10, labels = class_labels)

# create confusion matrix
conf_matrix <- table(Predicted = predictions_named, Actual = actual_labels_named)

# turn into heatmap
conf_matrix_melted <- as.data.frame(as.table(conf_matrix))

# plot
ggplot(conf_matrix_melted, aes(x = Actual, y = Predicted, fill = Freq)) +
  geom_tile(color = "white") +
  scale_fill_gradient(low = "white", high = "red") +
  geom_text(aes(label = Freq), color = "black", size = 4) +
  labs(title = "Confusion Matrix", x = "Actual Label", y = "Predicted Label") +
  theme_minimal()

With our heat map / confusion matrix, colors indicate frequency: darker red = more misclassifications. A strong diagonal presence means the model performs well in most categories.

The heat map reveals that while the model performs well overal, there are certain categories that are freqently misclassified. T-shirts and shirts are most likely confused due to similar shapes, especially in grayscale and 28 pixels. Dresses and pullovers also show overlap, likely because some long-sleeved dresses resemble sweaters at such a low-resolution. Coats and shirts are another challenge, suggesting that texture and material differences are not well captured. On the other hand, footwear categories (sneakers, sandals, ankle boots) are classified with high accuracy, likely due to their distinct shapes. These misclassifications highlight the model’s reliance on shape rather than finer details, indicating areas for improvement.

plot_prediction <- function(index) {
  # Convert test image to 28x28 matrix
  img_matrix <- matrix(as.numeric(test_x[index, ]), nrow = 28, ncol = 28, byrow = TRUE)
  
  # Ensure indices are valid within class_labels
  actual_index <- as.numeric(actual_labels[index])  # Convert to numeric index
  predicted_index <- as.numeric(predictions[index]) 
  
  # Retrieve corresponding labels
  actual <- ifelse(actual_index >= 1 & actual_index <= length(class_labels), 
                   class_labels[actual_index], "Unknown")

  predicted <- ifelse(predicted_index >= 1 & predicted_index <= length(class_labels), 
                      class_labels[predicted_index], "Unknown")

  # Generate plot
  ggplot() +
    geom_raster(aes(x = rep(1:28, each = 28), y = rep(28:1, times = 28), fill = as.vector(img_matrix))) +
    scale_fill_gradient(low = "white", high = "black", guide = "none") +
    ggtitle(paste0("Actual: ", actual, "\nPredicted: ", predicted)) +  
    theme_void() +
    theme(
      plot.title = element_text(hjust = 0.5, size = 12, face = "bold"),  
      plot.margin = margin(5, 5, 5, 5)  
    )
}

indices <- 7:12 ## change range to test other samples ##
plots <- lapply(indices, plot_prediction)
grid.arrange(grobs = plots, ncol = 3)

Here we can see actual vs predicted for the next six items– notice the second item is miscategorized! However, at this low resolution I think most of us would have a hard time telling if that was a shirt or a coat.

How could we improve the model?

  • Switching from an MLP to a CNN (convolutional neural network) could potentially enhance accuracy by preserving spatial relationships and detecting finer image details.
  • Data augmentation, such as rotations and brightness adjustments to the images, could increase variety in training images, improving generalization
  • Dropout regularization can reduce overfitting by forcing the model to learn broader patterns rather than memorizing training data.
  • Hyperparameter tuning, including adjusting batch size, learning rate, and adding early stopping to training, would further refine model efficiency.

These improvements would help address misclassifications and make the model more effective for real world applications, such as automated clothing sorting in donation centers.


# Conclusion This project explored image classification using a image classification machine learning model in R with torch to categorize clothing items from the Fashion-MNIST data set. The data set consisted of 60,000 training and 10,000 test grayscale images (28×28 pixels) labeled into 10 clothing categories. After preprocessing the data and adjusting labels for R’s 1-based indexing, we built a 3-layer neural network with ReLU activation and trained it using cross-entropy loss and the Adam optimizer. The model showed high accuracy, but the confusion matrix revealed misclassifications, particularly between T-shirts and shirts and dresses and pullovers, while footwear categories performed well. To improve performance, data augmentation, CNN architectures, or hyperparameter tuning could enhance feature extraction and reduce errors. This project demonstrated how deep learning can effectively classify fashion images, but further refinements could make the model even more comprehensive and applicable to more specific types of clothing! Maybe one day it could even take measurements and estimate sizing.

Bonus Model

Just for fun, let’s try the CNN and see if we can’t get a more accurate model. While our initial model used a simple multi-layer perceptron (MLP), we can improve classification performance by introducing a convolutional neural network. CNNs are specifically designed for image processing, as they use convolutional layers to extract spatial features, such as edges, textures, and patterns, making them highly effective for visual tasks like clothing classification.

Unlike MLPs, which flatten images into 1D vectors, CNNs preserve spatial relationships by processing images as 2D feature maps. This allows the network to learn important local features, such as the structure of a sneaker versus a sandal, leading to more accurate classifications.

While implementing a CNN requires additional computational resources, it often results in higher accuracy and better generalization, reducing misclassifications observed in our MLP model. In future iterations, we could experiment with deeper architectures or further enhance model performance with feature tuning. As we see from the model below, we were able to achieve 90% accuracy!

But was less than 3% improvement in accuracy worth the extra processing load on our computers? Could we have achieved similar results with hyperparemetric tuning instead? All great questions to consider while you are trying out building your first machine learning model! Play around with extra layers, adjusting the model architecture, and comparing model performance to see which model works best for your data set. Browse the Resources tab to find the torch documentation as well as great supplemental videos all on the vast world of machine learning!

# build the cnn model
cnn_model <- nn_module(
  initialize = function() {
    self$conv1 <- nn_conv2d(in_channels = 1, out_channels = 32, 
                            kernel_size = 3, stride = 1, padding = 1)
    self$conv2 <- nn_conv2d(in_channels = 32, out_channels = 64, 
                            kernel_size = 3, stride = 1, padding = 1)
    self$pool <- nn_max_pool2d(kernel_size = 2, stride = 2)
    self$fc1 <- nn_linear(64 * 7 * 7, 128)
    self$fc2 <- nn_linear(128, 10)
  },
  
  forward = function(x) {
    x <- x$view(c(-1, 1, 28, 28))   # reshape
    x <- self$pool(nnf_relu(self$conv1(x)))
    x <- self$pool(nnf_relu(self$conv2(x)))
    x <- x$view(c(x$size(1), -1))   
    x <- nnf_relu(self$fc1(x))
    x <- self$fc2(x)
    x
  }
)

# train the CNN
model <- cnn_model()
criterion <- nn_cross_entropy_loss()
optimizer <- optim_adam(model$parameters, lr = 0.001)

epochs <- 5

for (epoch in 1:epochs) { 
  total_loss <- 0
  
  coro::loop(for (batch in train_loader) { 
    optimizer$zero_grad()
    
    output <- model(batch[[1]])  
    loss <- criterion(output, batch[[2]])  
    
    loss$backward()
    optimizer$step()
    
    total_loss <- total_loss + loss$item()
  })
  
  cat(sprintf("Epoch [%d/%d] - Loss: %.4f\n", epoch, epochs, total_loss / length(train_loader)))
}
## Epoch [1/5] - Loss: 0.5837
## Epoch [2/5] - Loss: 0.2965
## Epoch [3/5] - Loss: 0.2535
## Epoch [4/5] - Loss: 0.2293
## Epoch [5/5] - Loss: 0.2051
# test accuracy
correct <- 0
total <- 0

coro::loop(for (batch in test_loader) { 
  output <- model(batch[[1]])  
  predicted <- torch_argmax(output, dim = 2)

  correct <- correct + sum(predicted == batch[[2]])  
  total <- total + batch[[2]]$size(1)  
})

accuracy <- correct$item() / total  
cat(sprintf("CNN Test Accuracy: %.2f%%\n", accuracy * 100))
## CNN Test Accuracy: 90.29%


And we’ll take a look at visualizing the CNNs model’s predictions. This time, it will pick a random 6 images every time the code chunk is ran.

plot_cnn_prediction <- function(index) {
  img_matrix <- matrix(as.numeric(test_x[index, ]), nrow = 28, ncol = 28, byrow = TRUE)
  
  actual_index <- as.numeric(actual_labels[index])  
  predicted_index <- as.numeric(predictions[index]) 
  
  actual <- ifelse(actual_index >= 1 & actual_index <= length(class_labels), 
                   class_labels[actual_index], "Unknown")

  predicted <- ifelse(predicted_index >= 1 & predicted_index <= length(class_labels), 
                      class_labels[predicted_index], "Unknown")

  ggplot() +
    geom_raster(aes(x = rep(1:28, each = 28), y = rep(28:1, times = 28), fill = as.vector(img_matrix))) +
    scale_fill_gradient(low = "white", high = "black", guide = "none") +
    ggtitle(paste0("Actual: ", actual, "\nPredicted: ", predicted)) +  
    theme_void() +
    theme(
      plot.title = element_text(hjust = 0.5, size = 12, face = "bold"),  
      plot.margin = margin(5, 5, 5, 5)  
    )
}

random_indices <- sample(1:nrow(test_x), 6) # get 6 random

plots <- lapply(random_indices, plot_cnn_prediction)
grid.arrange(grobs = plots, ncol = 3)