Machine Learning Methods for Modeling and Classification of Fashion-MNIST

David Blumenstiel, Bonnie Cooper, Robert Welk, Leo Yi
"2021-12-10"

Project Goal: apply dimensionality reduction techniques to the Fashion MNIST dataset and evaluate the effectiveness of the results for classification using a variety of machine learning algorithms

Fashion MNIST

dataset of clothing images labeled by item category

60k training images, 10k test images

images: grayscale, 28x28 pixels

more challenging substitute for MNIST

Our Approach

2 different approaches to dimensionality reduction:
- Feature Engineering
- Principal Component Analysis
Compare Performance of models trained with or without dimensionality reduction
Evaluate the performance of a variety of models trained with dimensionality reduction

Why Dimensionality Reduction?

Not all pixels are informative

Many pixels in the periphery have consistently low values
Some pixels in the center have consistently high values
this redundancy suggests we can reduce the feature space to more efficiently represent the data

Dimensionality Reduction: Feature Engineering

We observed categorical patterns in the pixel values
We engineered a new feature based on the patterns
Reduced the number of features from 784 \( \rightarrow \) ~80

Modeling Fashion-MNIST with the Feature Engineered Dataset

We trained multiple machine learning models for classification on the engineered feature set:

Random Forest
Support Vector Machine
k-Nearest Neighbor
Multinomial logistic regression
Naive Bayes

We found radial SVM performed best. However, Random Forest had comparable performance and trained much faster

Model Type	Training Duration	Test Accuracy
SVM (radial)	19.0	0.856
Mult. Log. Reg.	9.3	0.824
Random Forest	5.5	0.855
kNN	2.6	0.795
Naive Bayes	0.2	0.713

Dimensionality Reduction: PCA

We performed PCA on the 784 pixel value features from the original dataset.

The skree plot (left) shows that the first ~12 components describe most of the data variance

The Cumulative Explained Variance plot (right) shows that 95% of the variance can be explained by the first 187 components

kNN with & without PCA dimensionality reduction

We trained kNN models on the original & the PCA feature sets

hyperparameter tuning curves for original (top) & PCA (bottom)

PCA kNN model performs slightly better (82.35% vs 81.23% overall accuracy; a significant difference at α = 0.05)

we hypothesize that PCA performs better because of increased separability

Modeling Fashion-MNIST using PCA: Additional Machine Learning Models

To further evaluate classification with the PCA set, we trained the following models:

Stochastic Gradient Boosting
Random Forest
a Neural Network

Based on Accuracy & Kappa, Random Forest had the best overall performance to classify the test data (top panel)

Based on Accuracy & Sensitivity, Random Forest performed best by category

All models struggled with the same categories (e.g. 'Shirt')

Summary & Conclusions

We used a feature engineering approach to reduce the dimensionality of Fashion-MNIST from 784 \( \rightarrow \) to ~80 features. The resulting dataset trained faster and yielded acceptable classification accuracy. Radial SVM performed the best, but Random Forest had similar results with a shorter training duration.

We also used PCA to reduce Fashion-MNIST dimensionality from 784 \( \rightarrow \) 187 features. For kNN model fits, we found a slight improvement in classification accuracy with PCA. Furthermore, we demonstrated improved accuracy with other machine learning models; Random Forest performed the best.

In conclusion, we find dimensionality reduction to be a powerful means to facilitate classification tasks when using large and high-dimensional data.

Thank you for your attention