Sharia’s Closet, a clothing donation resource for San Diego, CA, https://www.instagram.com/shariascloset/p/DFjEBDmNTLV/?img_index=1
Machine learning (ML) is transforming how we analyze data, make predictions, and automate decision making. One key application is classification, where a model “learns” to assign labels to data based on known features. One real world application of classification is in the clothing industry. From sorting your laundry, to deciding if an outfit is business-casual, we can all relate to sorting clothing. One place I would love to see this applied is in the nonprofit sector such as at Sharia’s Closet. Sharia’s closet is a nonprofit organization ran by the volunteers who sort, manage, and curate wardrobes for anyone in need, regardless of income. Traditionally, sorting and organizing clothing items in donation centers or retail stores is a manual process, but machine learning can aid with this task. Not only could this allow volunteers to be faster at sorting clothes, but could increase turn around time for local San Diegan’s who need clothes for their rapidly growing toddler, or interview outfits for a new job.
In this code through, we will build a classification model using the
Fashion-MNIST data set from Kaggle.com and
train it using R’s torch package, a deep learning framework
inspired by PyTorch, one of the gold standard libraries for deep
learning in Python. I did most of my machine learning in the past using
Python, and I’m excited to dive into ML with R!
Don’t be intimidated! We’ll focus on these main points:
Kaggle.com, where we will prep the data for ML
trainingtorchClassification models are widely used in a variety of industries! From diagnosing disease in healthcare, identifying fraudulent transactions in finance, to even applications like virtual try-on technology in fashion, all of these things use classification. Using machine learning, we can see how a neural network can learn patterns in the data to make accurate predictions.
As mentioned previously, an industry where clothing classification would be beneficial is within community clothing banks and donation centers, where sorting clothes efficiently can help curate clothing bags for individuals in need in a timely manner, potentially helping more individuals from every volunteer hour. With a well trained ML model, we can streamline this process by identifying clothing types automatically. Additional resources could allow for this type of sorting to be improved upon, potentially being trained on clothing types (formal/ casual) or even sizing! But for now, we’ll keep it simple with just clothing types.
By the end of this code through, you should be able to:
torch
Machine learning (ML) is a powerful tool that allows computers to “learn” from data and make predictions without being explicitly programmed. Instead of writing specific rules for every decision, ML models analyze patterns in historical data and use them to classify new information or make informed predictions. While there are three main types of machine learning, this guide focuses on classification, which falls under supervised learning. Supervised learning means that we train the model using labeled data, where each input has a known correct output.
Classification models follow a structured learning process:
Infographic on Supervised Machine Learning, https://goldinlocks.github.io/Basic-ML/
In the step 1 of the infographic above, we see the training phase. The model is given labeled images of cats, meaning each image is explicitly tagged as “cat”. The more variety in cat pictures, the better to avoid bias! As any good statistician knows, good models are built from diverse samples.
In step 2, the model enters the testing phase. Now, it is given a set of unlabeled images, some of which contain cats, and others that do not. While we recognize the non-cat images as dogs, the model was only trained on “cats”, so it will attempt to classify the new images based on what it learned, labelling the dogs as “not cats”.
Missing from the infographic is Step 3: Model Evaluation. At this stage, we assess the model’s accuracy. How often did it correctly classify a cat? How many times did it mistake a dog for a cat? If the accuracy is low, we might need to adjust the model by adding more training data or tweaking its parameters, known as hyperparametric tuning.
Now that we understand the basics of machine learning and
classification, it’s time to move on to preparing the
Fashion-MNIST data set from Kaggle.com for
training our model. Data preprocessing is a crucial step in machine
learning, as raw data is often messy, contains missing values, or needs
transformation before it can be used effectively. In this section we
will:
Our goal with classification is to categorize clothing items into 10 different classes based on their image representations. First step is to read in our data. We will also print out the dimensions and check for missing values within each data set.
train_data <- read_csv("fashion-mnist_train.csv")
test_data <- read_csv("fashion-mnist_test.csv")
cat("Training Data: ", nrow(train_data), "rows ×", ncol(train_data), "columns | Missing Values:", sum(is.na(train_data)), "\n")## Training Data: 60000 rows × 785 columns | Missing Values: 0
cat("Testing Data: ", nrow(test_data), "rows ×", ncol(test_data), "columns | Missing Values:", sum(is.na(test_data)), "\n")## Testing Data: 10000 rows × 785 columns | Missing Values: 0
Now we need to understand our data.
If you are curious to learn more about how computers store images
numerically (flattened), check out Resource V under
Resource Materials! But we can take a peek at the first few rows of our
train_data below.
The numerical labels for our clothing class categories are:
Note: Since our label index starts with 0, and R indexes start at 1, we are going to adjust the labels for our categories to start with 1 instead of zero, and apply to both our training and testing data sets.
train_data$label <- train_data$label + 1 # add 1 to every value in train data labels
test_data$label <- test_data$label + 1 # add 1 to every value in test data labelsDouble check that data is 1-10 instead of 0-9.
## [1] 3 10 7 1 4 5 6 9 8 2
## [1] 1 2 3 4 9 7 6 5 8 10
Here we are going to set ourselves up for success and assign a string of our class labels, so the words can be displayed instead of the numerical numbers for each category of clothing. Before moving on, we want to make sure that our model images and labels are still lined up! We will also create an indexing function for displaying whatever index of images we want.
# define the class labels! we'll need this later in our plots
class_labels <- c("T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",
"Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot")
# plotting function, images are flattened 28 x 28 pixels
plot_fashion_mnist <- function(index) {
img_matrix <- matrix(as.numeric(train_data[index, -1]), nrow = 28, ncol = 28, byrow = TRUE)
label_index <- as.numeric(train_data$label[index])
label <- as.character(class_labels[label_index])
ggplot() +
geom_raster(aes(x = rep(1:28, each = 28), y = rep(28:1, times = 28), fill = as.vector(img_matrix))) +
scale_fill_gradient(low = "white", high = "black", guide = "none") +
ggtitle(paste("Label:", label)) +
theme_void()
}
# generate the plots for the first 6 images
indices <- 1:6 ## change range to test other samples ##
plots <- lapply(indices, plot_fashion_mnist)
grid.arrange(grobs = plots, ncol = 3)The first few items are pullover, ankle boot, shirt, t-shirt/top,
dress, and coat. This matches the output for
head(train_data) earlier.
Now is time for the hard work. Neural network sounds intimidating, and I highly suggest watching Resource VII from YouTuber 3Blue1Brown, who has a fascinating way of explaining the concepts of deep learning (and many other mathematical topics). Neural networks are ML models inspired by the human brain. The same way our brain forms speedy connections down pathways it travels frequently, so does the neural network of a machine learning model! It consists of layers of interconnected nodes (like our neurons) that process information and learn patterns from data.
We are going to create a 3-layer multi-layer perceptron (MLP). We can start off with this balance of input, two hidden, and one output layer. Hidden layers improve feature extraction which allows the model to learn more abstract features, which could improve accuracy. However, more layers does not always mean better and could lead to overfitting. As these images are not high resolution, nor in full color, the three layers should be sufficient. Once we evaluate the model’s performance, we may fine-tune it by adjusting the number of layers, neurons, or activation functions.
Basic neural network architecture consists of:
Below within the comments of the code, you can find:
Notes:
net <- nn_module( # set up our network
initialize = function() { # defines the layers that exist in the model
self$fc1 <- nn_linear(784, 128) # input 784 neurons to hidden 128
self$fc2 <- nn_linear(128, 64) # hidden layer 128 to hidden 64
self$output <- nn_linear(64, 10) # hidden 64 to output of 10 categories (output layer)
},
forward = function(x) { # defines the flow through the layers
x <- x$view(c(x$size(1), 784)) # flatten input image since MLPS require 1D input vectors
x <- self$fc1(x) %>% nnf_relu() # pass through fc1 + ReLU activation
x <- self$fc2(x) %>% nnf_relu() # pass through fc2 + ReLU activation
x <- self$output(x) # final layer
x
}
)To train our neural network, we define three key components:
The model starts with random weights and learns through exposure to training data.
For classification, we use cross-entropy loss, which penalizes incorrect predictions by comparing the model’s predicted probabilities to the actual class labels. Minimizing this loss helps the model make more accurate classifications.
To optimize learning, we use the Adam optimizer. Adam adapts learning
rates dynamically. The learning rate (lr = 0.001) controls
how much weights update with each step, balancing learning speed and
stability.
torchTo train the model with torch we need to convert our
data set into tensors which are the primary data
structure used in deep learning. Then we will create our data set
objects, which groups our input (x) and labels
(y) for handling. Lastly we will prepare data loaders,
which organizes data into mini-batches for efficient training. Here we
will:
tensor_dataset()data_loader()Notes:
# convert training & testing data as torch
train_x <- torch_tensor(as.matrix(train_data[, -1]), dtype = torch_float())
train_y <- torch_tensor(as.integer(train_data$label), dtype = torch_long())
test_x <- torch_tensor(as.matrix(test_data[, -1]), dtype = torch_float())
test_y <- torch_tensor(as.integer(test_data$label), dtype = torch_long())
# set up data objects
train_dataset <- tensor_dataset(train_x, train_y)
test_dataset <- tensor_dataset(test_x, test_y)
# prepare data loader
train_loader <- dataloader(train_dataset, batch_size = 64, shuffle = TRUE)
test_loader <- dataloader(test_dataset, batch_size = 64, shuffle = FALSE)Training the neural network involves iterating over the data set
multiple times (epochs), updating weights after each
batch to minimize loss. Weights are the learnable
parameters that determine how input data flows through the
layers. We will start with 5 iterations of training loops. We will load
our images in batches with train_loader to keep our loads
in manageable batches. Next we will compute the loss, which is the
models predictions compared to actual labels. This is computed using
cross-entropy loss. We also are computing the gradient with
loss$backward() and updating our weights with
optimizer$step(). These updates during training aim to
minimize the error between predictions and actual labels.
epochs <- 5 # 5 loops
for (epoch in 1:epochs) {
total_loss <- 0
coro::loop(for (batch in train_loader) { # train loader feeds 64 images per batch
optimizer$zero_grad()
output <- model(batch[[1]])
loss <- criterion(output, batch[[2]])
loss$backward() # computes gradients
optimizer$step() # updates weights
total_loss <- total_loss + loss$item() # compute loss
})
cat(sprintf("Epoch [%d/%d] - Loss: %.4f\n", epoch, epochs, total_loss / length(train_loader)))
}## Epoch [1/5] - Loss: 0.6872
## Epoch [2/5] - Loss: 0.4400
## Epoch [3/5] - Loss: 0.4040
## Epoch [4/5] - Loss: 0.3804
## Epoch [5/5] - Loss: 0.3614
Let’s evaluate the epochs. Here we see the model starts with relatively high loss (.7199) in the first epoch. But with each epoch, the loss steadily decreases, suggesting the model is successfully learning the data. You might be thinking, ‘let’s add more training epochs- more training may improve more accuracy!’ but we should be careful. After 5 epochs, loss dropped to 0.3603, showing good progress. However, when training continued (below) for 10 more epochs, the loss improved only slightly, reaching 0.3014, while test accuracy barely changed. This suggests the model is approaching convergence, where additional training no longer provides meaningful improvements and could even lead to overfitting. Instead of blindly adding more epochs, we should stop training when the loss stabilizes to ensure the model generalizes well rather than memorizing the training data.
epochs <- 10 # 10 loops
for (epoch in 1:epochs) {
total_loss <- 0
coro::loop(for (batch in train_loader) { # train loader feeds 64 images per batch
optimizer$zero_grad()
output <- model(batch[[1]])
loss <- criterion(output, batch[[2]])
loss$backward() # computes gradients
optimizer$step() # updates weights
total_loss <- total_loss + loss$item() # compute loss
})
cat(sprintf("Epoch [%d/%d] - Loss: %.4f\n", epoch, epochs, total_loss / length(train_loader)))
}## Epoch [1/10] - Loss: 0.3498
## Epoch [2/10] - Loss: 0.3450
## Epoch [3/10] - Loss: 0.3378
## Epoch [4/10] - Loss: 0.3323
## Epoch [5/10] - Loss: 0.3204
## Epoch [6/10] - Loss: 0.3202
## Epoch [7/10] - Loss: 0.3154
## Epoch [8/10] - Loss: 0.3110
## Epoch [9/10] - Loss: 0.3036
## Epoch [10/10] - Loss: 0.2945
Once training is complete, we evaluate the model’s performance by
testing it on unseen data. The test data set is processed in batches,
with each batch passed through the trained model to generate
predictions. Using torch_argmax(), the model selects the
class with the highest probability for each image. These predictions are
then compared to the actual labels to count the number of correct
classifications. Finally, accuracy is computed as the percentage of
correct predictions out of the total test samples.
correct <- 0
total <- 0
coro::loop(for (batch in test_loader) { # process in batches
output <- model(batch[[1]])
predicted <- torch_argmax(output, dim = 2) # torch_argmax select highest probability
correct <- correct + sum(predicted == batch[[2]]) # make labels
total <- total + batch[[2]]$size(1)
})
accuracy <- correct$item() / total
cat(sprintf("Test Accuracy: %.2f%%\n", accuracy * 100)) # accuracy computed## Test Accuracy: 86.92%
Now that the model has been trained and evaluated, we need to understand how well it classifies clothing items. Interpretation of classification results involves analyzing misclassifications with a confusion matrix, and visualizing predictions by plotting test images.
predictions <- c()
actual_labels <- c()
coro::loop(for (batch in test_loader) {
output <- model(batch[[1]])
predicted <- torch_argmax(output, dim = 2)$to(device = "cpu")
predictions <- c(predictions, as.numeric(predicted))
actual_labels <- c(actual_labels, as.numeric(batch[[2]]$to(device = "cpu")))
})
predictions_named <- factor(predictions, levels = 1:10, labels = class_labels)
actual_labels_named <- factor(actual_labels, levels = 1:10, labels = class_labels)
# create confusion matrix
conf_matrix <- table(Predicted = predictions_named, Actual = actual_labels_named)
# turn into heatmap
conf_matrix_melted <- as.data.frame(as.table(conf_matrix))
# plot
ggplot(conf_matrix_melted, aes(x = Actual, y = Predicted, fill = Freq)) +
geom_tile(color = "white") +
scale_fill_gradient(low = "white", high = "red") +
geom_text(aes(label = Freq), color = "black", size = 4) +
labs(title = "Confusion Matrix", x = "Actual Label", y = "Predicted Label") +
theme_minimal()
With our heat map / confusion matrix, colors indicate frequency: darker
red = more misclassifications. A strong diagonal presence means the
model performs well in most categories.
The heat map reveals that while the model performs well overal, there are certain categories that are freqently misclassified. T-shirts and shirts are most likely confused due to similar shapes, especially in grayscale and 28 pixels. Dresses and pullovers also show overlap, likely because some long-sleeved dresses resemble sweaters at such a low-resolution. Coats and shirts are another challenge, suggesting that texture and material differences are not well captured. On the other hand, footwear categories (sneakers, sandals, ankle boots) are classified with high accuracy, likely due to their distinct shapes. These misclassifications highlight the model’s reliance on shape rather than finer details, indicating areas for improvement.
plot_prediction <- function(index) {
# Convert test image to 28x28 matrix
img_matrix <- matrix(as.numeric(test_x[index, ]), nrow = 28, ncol = 28, byrow = TRUE)
# Ensure indices are valid within class_labels
actual_index <- as.numeric(actual_labels[index]) # Convert to numeric index
predicted_index <- as.numeric(predictions[index])
# Retrieve corresponding labels
actual <- ifelse(actual_index >= 1 & actual_index <= length(class_labels),
class_labels[actual_index], "Unknown")
predicted <- ifelse(predicted_index >= 1 & predicted_index <= length(class_labels),
class_labels[predicted_index], "Unknown")
# Generate plot
ggplot() +
geom_raster(aes(x = rep(1:28, each = 28), y = rep(28:1, times = 28), fill = as.vector(img_matrix))) +
scale_fill_gradient(low = "white", high = "black", guide = "none") +
ggtitle(paste0("Actual: ", actual, "\nPredicted: ", predicted)) +
theme_void() +
theme(
plot.title = element_text(hjust = 0.5, size = 12, face = "bold"),
plot.margin = margin(5, 5, 5, 5)
)
}
indices <- 7:12 ## change range to test other samples ##
plots <- lapply(indices, plot_prediction)
grid.arrange(grobs = plots, ncol = 3)
Here we can see actual vs predicted for the next six items– notice the
second item is miscategorized! However, at this low resolution I think
most of us would have a hard time telling if that was a shirt or a coat.
These improvements would help address misclassifications and make the model more effective for real world applications, such as automated clothing sorting in donation centers.
# Conclusion This project explored image classification using a
image classification machine learning model in R with torch
to categorize clothing items from the Fashion-MNIST data set. The data
set consisted of 60,000 training and 10,000 test grayscale images (28×28
pixels) labeled into 10 clothing categories. After preprocessing the
data and adjusting labels for R’s 1-based indexing, we built a 3-layer
neural network with ReLU activation and trained it using cross-entropy
loss and the Adam optimizer. The model showed high accuracy, but the
confusion matrix revealed misclassifications, particularly between
T-shirts and shirts and dresses and pullovers, while footwear categories
performed well. To improve performance, data augmentation, CNN
architectures, or hyperparameter tuning could enhance feature extraction
and reduce errors. This project demonstrated how deep learning can
effectively classify fashion images, but further refinements could make
the model even more comprehensive and applicable to more specific types
of clothing! Maybe one day it could even take measurements and estimate
sizing.
Just for fun, let’s try the CNN and see if we can’t get a more accurate model. While our initial model used a simple multi-layer perceptron (MLP), we can improve classification performance by introducing a convolutional neural network. CNNs are specifically designed for image processing, as they use convolutional layers to extract spatial features, such as edges, textures, and patterns, making them highly effective for visual tasks like clothing classification.
Unlike MLPs, which flatten images into 1D vectors, CNNs preserve spatial relationships by processing images as 2D feature maps. This allows the network to learn important local features, such as the structure of a sneaker versus a sandal, leading to more accurate classifications.
While implementing a CNN requires additional computational resources, it often results in higher accuracy and better generalization, reducing misclassifications observed in our MLP model. In future iterations, we could experiment with deeper architectures or further enhance model performance with feature tuning. As we see from the model below, we were able to achieve 90% accuracy!
But was less than 3% improvement in accuracy worth the extra
processing load on our computers? Could we have achieved similar results
with hyperparemetric tuning instead? All great questions to consider
while you are trying out building your first machine learning model!
Play around with extra layers, adjusting the model architecture, and
comparing model performance to see which model works best for your data
set. Browse the Resources tab to find the
torch documentation as well as great supplemental videos
all on the vast world of machine learning!
# build the cnn model
cnn_model <- nn_module(
initialize = function() {
self$conv1 <- nn_conv2d(in_channels = 1, out_channels = 32,
kernel_size = 3, stride = 1, padding = 1)
self$conv2 <- nn_conv2d(in_channels = 32, out_channels = 64,
kernel_size = 3, stride = 1, padding = 1)
self$pool <- nn_max_pool2d(kernel_size = 2, stride = 2)
self$fc1 <- nn_linear(64 * 7 * 7, 128)
self$fc2 <- nn_linear(128, 10)
},
forward = function(x) {
x <- x$view(c(-1, 1, 28, 28)) # reshape
x <- self$pool(nnf_relu(self$conv1(x)))
x <- self$pool(nnf_relu(self$conv2(x)))
x <- x$view(c(x$size(1), -1))
x <- nnf_relu(self$fc1(x))
x <- self$fc2(x)
x
}
)
# train the CNN
model <- cnn_model()
criterion <- nn_cross_entropy_loss()
optimizer <- optim_adam(model$parameters, lr = 0.001)
epochs <- 5
for (epoch in 1:epochs) {
total_loss <- 0
coro::loop(for (batch in train_loader) {
optimizer$zero_grad()
output <- model(batch[[1]])
loss <- criterion(output, batch[[2]])
loss$backward()
optimizer$step()
total_loss <- total_loss + loss$item()
})
cat(sprintf("Epoch [%d/%d] - Loss: %.4f\n", epoch, epochs, total_loss / length(train_loader)))
}## Epoch [1/5] - Loss: 0.5837
## Epoch [2/5] - Loss: 0.2965
## Epoch [3/5] - Loss: 0.2535
## Epoch [4/5] - Loss: 0.2293
## Epoch [5/5] - Loss: 0.2051
# test accuracy
correct <- 0
total <- 0
coro::loop(for (batch in test_loader) {
output <- model(batch[[1]])
predicted <- torch_argmax(output, dim = 2)
correct <- correct + sum(predicted == batch[[2]])
total <- total + batch[[2]]$size(1)
})
accuracy <- correct$item() / total
cat(sprintf("CNN Test Accuracy: %.2f%%\n", accuracy * 100))## CNN Test Accuracy: 90.29%
And we’ll take a look at visualizing the CNNs model’s
predictions. This time, it will pick a random 6 images every time the
code chunk is ran.
plot_cnn_prediction <- function(index) {
img_matrix <- matrix(as.numeric(test_x[index, ]), nrow = 28, ncol = 28, byrow = TRUE)
actual_index <- as.numeric(actual_labels[index])
predicted_index <- as.numeric(predictions[index])
actual <- ifelse(actual_index >= 1 & actual_index <= length(class_labels),
class_labels[actual_index], "Unknown")
predicted <- ifelse(predicted_index >= 1 & predicted_index <= length(class_labels),
class_labels[predicted_index], "Unknown")
ggplot() +
geom_raster(aes(x = rep(1:28, each = 28), y = rep(28:1, times = 28), fill = as.vector(img_matrix))) +
scale_fill_gradient(low = "white", high = "black", guide = "none") +
ggtitle(paste0("Actual: ", actual, "\nPredicted: ", predicted)) +
theme_void() +
theme(
plot.title = element_text(hjust = 0.5, size = 12, face = "bold"),
plot.margin = margin(5, 5, 5, 5)
)
}
random_indices <- sample(1:nrow(test_x), 6) # get 6 random
plots <- lapply(random_indices, plot_cnn_prediction)
grid.arrange(grobs = plots, ncol = 3)Learn more about torch, the Fashion-MNIST data set, or
machine learning with the following:
Resource I torch for R
Resource II Fashion-MNIST - Kaggle
Resource III How Machine Learning Works - Codecademy via YouTube
Resource IV Machine Learning in R - Data Professor via YouTube
Resource V From Image to Algorithm - EasyTechSci via YouTube
Resource VI Deep Learning and Scientific Computing with R torch
Resource VII Sharia’s Closet - Emergency Clothing with Dignity and Respect
Resource VIII 3Blue1Brown
Neural Network Playlist