Introduction

In this project we use a Convolutional Neural Network to solve the following problem:

Problem Statement
How can we develop a model that accurately classifies images?

Image classification has become one of the most influencial innovations in Computer Vision since the first digital image scanner. Developing models that can classify images has made tremendous advances in the way people interact (social media, search engines & image processing), retail (both in person and online), marketing, theatre & the performing arts, government, survelance, law enforcement, etc. Thanks to image classification algorithms we are able to recieve notifications on social media when someone has posted a picture that may look like us, or object recognition in self driving cars! The idea of a program being able to identify meaningful objects in an image and make a judgement as to what it is, what it’s connected with and where it belongs based on only the information found in an image has endless applications.

Convolutional Neural Network

In this project we explore image classification via a Convolutional Neural Network (CNN) which has become the “gold standard” for solving image classification problems. A CNN is a class of deep learning neural networks that uses a series of filters to extract features from a particular data set, while keeping parameters relatively low.


Neural Networks are superior to other feature classification algorithms such as SIFT, FAST, SURF, BRIEF, etc because it solves problems associated with feature detectors being too simple or too complex to generalize categories. Using a neural network allows for models to learn features for particular objects (regardless of abstraction) and a system for feature learning can be developed to classify images in a way that bypasses explicitlydefined features. Traditional neural networks such as the Multiplayer Perceptron (MLP) are not a robust solution because nodes only allow one input


(pixels of an image multiplied by 3 for color is too large), weights become unmanageable and chances of overfitting increases, and are not translation invariant (you lose spatial information ie. location of an object is located on an image).

Convolutional Neural Networks (CNN) on the other hand solve all these issues. CNNs analyze pixels in groups with their neighbors by sliding filters (or convolving filters) across the pixels of an image. Each filter’s purpose can be to detect various patterns within images. For example, one filter can contribute to detecting eyes in a facial recognition model; another may be responsible for detecting a nose or a mouth. Each filter essentially executes an operation on pixel data and indicates how strongly a particular feature appears in an image, where it is located and it’s frequency. This process reduces the number of parameters the CNN must learn as compared to an MLP, and does not loose spatial information.



Filters change as a response to training and therefore initially begin with arbitrary values. Essentially what is being trained are these filters responsible for identifying unique features for each image or image category. Feature maps for each image are generated for each filter and provided to an activation function at the node which determines if a feature is present in a given location. This process is continued with multiple layers throughout the CNN. You can view the following resources for more in depth information on CNNs: Lecture describing concept behind CNN layers & filters and CNN article on Medium


Keras

In this project, the keras package is used to contruct the model. Keras is a high level deep learning library that would allow the use of a fully connected neural network to train a CNN keras model to recognitize images that fall into one of three categories. Specifically, the functions that I will be using utilizes tensorflow library. TensorFlow utilizes vectors as tensors to easily create the sturcture of the neural network that I will be using. This is actually a python package, however keras uses miniconda to access these powerful python tools in R. For more information refer to the Keras Documentation and Guide to Sequential Models in R.



Data

In this project seven samples of images in three categories are imported from the internet and used to create testing and training data.

  • Motorcyles
  • Cars
  • Bicycles

Import Data

We can import images located in the current directory using the readImage() function from the EBImage library. Since we have multiple images, we can save them all in a list.

library(EBImage) #Load Library

#Save Image names in a vector
pics <- c("moto1.jpg","moto2.jpg","moto3.jpg","moto4.jpg","moto5.jpg", "moto6.jpg","moto7.jpg", "car1.jpg", "car2.jpg",  "car3.jpg",  "car4.jpg",  "car5.jpg",  "car6.jpg","car7.jpg", "bike1.jpg", "bike2.jpg", "bike3.jpg", "bike4.jpg", "bike5.jpg", "bike6.jpg", "bike7.jpg")

#Create list to save each image
mypic <- list()

#Load files into list using a for loop
for(x in 1:length(pics)){
  mypic[[x]] <- readImage(pics[x])
}


Train/Test Split

After images are loaded into a list, we can split our training and testing data. We know that our images are organized in a list mypics according to the following labels:

  • Motorcycles are the first 7 images
  • Cars are the next 7 images (specifically Sedans)
  • Bikes are the next 7 images

Considering the organization of images, we are going to select the first 5 images in each category to be our training data set and the last 2 to be in our testing set. This will ensure that our data has a split between 70/30 and 80/20 which is standard for training CNN models. To do this we simply create seperate lists by iterating across the list of images.

#training set of 16 images
trainX <- list(1:16)

#select first 5 images of each group
for(x in 1:5){
  trainX[[x]] <- mypic[[x]] #motorcycles
  trainX[[x+5]] <- mypic[[x+7]] #cars
  trainX[[x+10]] <-mypic[[x+14]] #bikes
}


#test set of 6 images
testX <- list(1:6)

#select last two images in each group
for(x in 1:2){
  testX[[x]] <- mypic[[x+5]] #motorcycles
  testX[[x+2]] <- mypic[[x+12]] #cars
  testX[[x+4]] <- mypic[[x+19]] #bikes
}


Prepare Image Data

Lets take a look at what our training and testing data structure. For demonstration purposes I have only included the structure of the test data set.

#Inspect the structure for resizing
#str(trainX)
str(testX)
## List of 6
##  $ :Formal class 'Image' [package "EBImage"] with 2 slots
##   .. ..@ .Data    : num [1:728, 1:485, 1:3] 0.796 0.796 0.796 0.796 0.796 ...
##   .. ..@ colormode: int 2
##  $ :Formal class 'Image' [package "EBImage"] with 2 slots
##   .. ..@ .Data    : num [1:800, 1:533, 1:3] 0 0 0 0 0 0 0 0 0 0 ...
##   .. ..@ colormode: int 2
##  $ :Formal class 'Image' [package "EBImage"] with 2 slots
##   .. ..@ .Data    : num [1:480, 1:300, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..@ colormode: int 2
##  $ :Formal class 'Image' [package "EBImage"] with 2 slots
##   .. ..@ .Data    : num [1:2000, 1:1421, 1:3] 0.455 0.435 0.427 0.431 0.435 ...
##   .. ..@ colormode: int 2
##  $ :Formal class 'Image' [package "EBImage"] with 2 slots
##   .. ..@ .Data    : num [1:1000, 1:663, 1:3] 0.702 0.706 0.714 0.718 0.722 ...
##   .. ..@ colormode: int 2
##  $ :Formal class 'Image' [package "EBImage"] with 2 slots
##   .. ..@ .Data    : num [1:800, 1:533, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..@ colormode: int 2


We can see that these images are 3 dimensional but have varying lengths and widths. Taking a look at the @Data category of the structure of our list of images we can see that the first two numbers (width and height) are different for each image, and all images are color (“3” stands for numbers in RGB format which indicate a color image).

To prepare the data for training and testing we want to keep all image dimensions consistant. To do this we resize all images so that they are 166 by 166 pixels using resize() from the EBImage Library. I chose 166 by 166 because it was smaller than the smallest height of all the images in the imported data. Images must also have equal width and height (square dimensions) because we want to end up with square matrices.

#Resizing Images (Train)
for(x in 1:length(trainX)){
  trainX[[x]] <- resize(trainX[[x]], 166, 166)
}

#Resizing Images (test)
for(x in 1:length(testX)){
  testX[[x]] <- resize(testX[[x]], 166, 166)
}

#Display new dimensions
#str(trainX)
str(testX)
## List of 6
##  $ :Formal class 'Image' [package "EBImage"] with 2 slots
##   .. ..@ .Data    : num [1:166, 1:166, 1:3] 0.796 0.796 0.796 0.796 0.796 ...
##   .. ..@ colormode: int 2
##  $ :Formal class 'Image' [package "EBImage"] with 2 slots
##   .. ..@ .Data    : num [1:166, 1:166, 1:3] 0 0 0 0 0 ...
##   .. ..@ colormode: int 2
##  $ :Formal class 'Image' [package "EBImage"] with 2 slots
##   .. ..@ .Data    : num [1:166, 1:166, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..@ colormode: int 2
##  $ :Formal class 'Image' [package "EBImage"] with 2 slots
##   .. ..@ .Data    : num [1:166, 1:166, 1:3] 0.415 0.43 0.43 0.431 0.426 ...
##   .. ..@ colormode: int 2
##  $ :Formal class 'Image' [package "EBImage"] with 2 slots
##   .. ..@ .Data    : num [1:166, 1:166, 1:3] 0.731 0.715 0.724 0.72 0.731 ...
##   .. ..@ colormode: int 2
##  $ :Formal class 'Image' [package "EBImage"] with 2 slots
##   .. ..@ .Data    : num [1:166, 1:166, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..@ colormode: int 2
#Display Resized Images (train)
trainX <- combine(trainX)
dis <- tile(trainX, 5)
display(dis, title = "Pictures")

#Display Resized Images (test)
testX <- combine(testX)
dis2 <- tile(testX, 2)
display(dis2, title = "Pictures")


We can see from the display above that we have 15 images in our training set and 6 images in our testing set, all with the same dimensions.

Before we can train and test our model, we need to convert the images back into the dimensions needed for input into the CNN. Looking at the sturcture we see that now instead of having a list of images, was have one matrix with dimensions (166 X 166 X 3 X ). The addiitonal 6 comes from the 6 images we combined above in our test set (and for the training set the 4th parameter is 15 which corresponds to the numer of images in the training set). We can reorder the dimensions (number of images, width, height, color) using the aperm() R base function.

#Display Before permutation
#str(testX)
str(testX)
## Formal class 'Image' [package "EBImage"] with 2 slots
##   ..@ .Data    : num [1:166, 1:166, 1:3, 1:6] 0.796 0.796 0.796 0.796 0.796 ...
##   ..@ colormode: int 2
#permute the dimensions
testX <- aperm(testX, c(4,1,2,3))
trainX <- aperm(trainX, c(4, 1, 2, 3))

#Display the change
#str(testX)
str(testX)
##  num [1:6, 1:166, 1:166, 1:3] 0.796 0 1 0.415 0.731 ...


Create Labels

To create the labels for our data we gave each category a number between 0 and 2:

  • Motorcyclesare designated as 0
  • Cars are designated as 1
  • Bicycles are designated as 2


#Response Variable for the three categories  
trainY <- c(rep(0, 5), rep(1, 5), rep(2,5))
testY <- c(0,0,1,1,2,2)


One Hot Encoding

One hot encoding is a method used to label data sets that have multiple categories where order does not matter. The labels above indicate that a motorcycle is of category 0, a car is 1, and a bike is 2 however, the model we are using may interpret these categories in an ordinal way. The fact that 0 comes before 1 which comes before 2 on a number line will affect the model’s ability to predict categories. One-hot encoding is a way to transform labels into a binary matrix where 0 means that the particular image is not in a category and 1 means that it is. To do this we is the to_catgeorical() function from the keras library.

library(keras)
library(kableExtra) #library for dislaying tables

#One Hot Encoding
trainLabels <- to_categorical(trainY)
testLabels <- to_categorical(testY)

#Display matrix
kable(testLabels) %>%
  kable_styling() %>%
  scroll_box(width = "100%", height = "200px")
1 0 0
1 0 0
0 1 0
0 1 0
0 0 1
0 0 1


We can see that in the matrix above of test labels, the first two images are of motorcycles, the next two are of cars and the last two are of bikes. The same encoding was also done for training labels.

Model

In 2014 a CNN architecture called VGG16 was used to win the ImageNet Large Scale Visual Recognition Challenge (ILSVR) and is considered to be one of the top neural network architecture for image classification. The model implemented here is based on the VGG16 architecture as seen in the figure below.


Layers

In this model, layers are abbreviated; there are five different layers used:

  • Convolutional layers(4): The first 2 convolutional layers use 32 filters with 3 x 3 dimensions . The last 2 convolutional layers use 64 filters. The primary function of these layers are to extract features from each image. The first 2 layers extract initial features, the last 2 layers include more filters.
  • Pooling layers(2): Pooling layers follow convolutional layers in the VGG architecture. The function of these layers is to reduce the number of parameters in the network by operating on each feature map with a 2 x 2 filter. This helps save space and abstract features to avoid overfitting and is located after 2 convolutional layers for effective parameter reduction. Type of pooling used (max pooling).
  • Dropout layer(3): The function of these layers are also to decrease the amount of parameters to avoid overfitting. In this project, a dropout rate of 0.25 is used - that is 25% of the input units (that are considered weakest predictors) from the previous layer is set to 0 after a pooling layer therby reducing parameters even further and freeing up some memory space. For more information see Paper Explaining Dropout
  • Flattening layer(1): This layer “flattens” the 2x2 matrix from preceeding layer into a vector that can be fed into the next layer (a fully connected neural network classifier).
  • Dense layer(2): These are the fully connected neural network classifiers. The first has 256 neurons and the last has 3 neurons

Activation functions

A Rectified Linear Unit ReLU is used as an activation function in convolutional a layers and in the first fully connected layer. A Softmax activation function is used for the final fully connected layer to produce a probability distribution in output as per VGG16 architecture.

Model Architecture

Layer One - Input Layer
Convolutional layer with 32 3x3 filters, ReLU activation function and input dimensions of 166 x 166 x 3.

Layer Two
Convolutional layer with 32 3x3 filters and ReLU activation function.

Layer Three
Pooling layer with size filter 2 x 2

Layer Four
Dropout layer with rate of 25%

Layer Five & Six
Convolutional layer with 64 3x3 filters and ReLU activation function.

Layer Seven
Pooling layer with size filter 2 x 2

Layer Eight
Dropout layer with rate of 25%

Layer Nine
Flattening layer transforms matrix into vector for fully connected layer

Layer Ten
Fully connected layer with 256 neurons and ReLU activation function

Layer Eleven
Dropout layer with rate of 25%

Layer Twelve - Output Layer
Fully connected layer with 3 neurons (because we have 3 categories) and a softmax activation function for probability output.




Hyperparameters

Using the compile() function we can configure the CNN and specify the followng parameters:

Loss Function - categorical_crossentropy is used because each image can only belong to one category. For more information see this source for more information

Optimizer - optimizer_sgd() is a stochastic gradient descent optimizer which is currently the best choice for computer vision problems see this source for more information. There are four hyperparameters that this function takes which can be changed to optimize the model:


**Metrics* : here we specify that we want the model evaluated for accuracy of categorization.

#Model with a linear stack of layers
model <- keras_model_sequential()

#Layers within the model(as listed above)
model %>%
  layer_conv_2d(filters = 32, kernel_size = c(3,3), activation = 'relu', input_shape = c(166, 166, 3)) %>%
  layer_conv_2d(filters = 32, kernel_size = c(3,3) , activation = 'relu') %>%
  layer_max_pooling_2d(pool_size = c(2,2)) %>%
  layer_dropout(rate = 0.25) %>%
  layer_conv_2d(filters = 64, kernel_size = c(3,3), activation = 'relu')%>%
  layer_conv_2d(filters = 64, kernel_size = c(3,3), activation = 'relu') %>%
  layer_max_pooling_2d(pool_size = c(2,2)) %>%
  layer_dropout(rate = 0.25) %>%
  layer_flatten() %>%
  layer_dense(units = 256, activation = 'relu') %>%
  layer_dropout(rate = 0.25) %>%
  layer_dense(units = 3, activation = "softmax") %>%
  compile(loss = "categorical_crossentropy", optimizer = 
            optimizer_sgd(lr = 0.001, momentum = 0.9, decay = 1e-6, nesterov = T),
          metrics = c('accuracy'))
  
#View the model
summary(model)
## Model: "sequential"
## ________________________________________________________________________________
## Layer (type)                        Output Shape                    Param #     
## ================================================================================
## conv2d (Conv2D)                     (None, 164, 164, 32)            896         
## ________________________________________________________________________________
## conv2d_1 (Conv2D)                   (None, 162, 162, 32)            9248        
## ________________________________________________________________________________
## max_pooling2d (MaxPooling2D)        (None, 81, 81, 32)              0           
## ________________________________________________________________________________
## dropout (Dropout)                   (None, 81, 81, 32)              0           
## ________________________________________________________________________________
## conv2d_2 (Conv2D)                   (None, 79, 79, 64)              18496       
## ________________________________________________________________________________
## conv2d_3 (Conv2D)                   (None, 77, 77, 64)              36928       
## ________________________________________________________________________________
## max_pooling2d_1 (MaxPooling2D)      (None, 38, 38, 64)              0           
## ________________________________________________________________________________
## dropout_1 (Dropout)                 (None, 38, 38, 64)              0           
## ________________________________________________________________________________
## flatten (Flatten)                   (None, 92416)                   0           
## ________________________________________________________________________________
## dense (Dense)                       (None, 256)                     23658752    
## ________________________________________________________________________________
## dropout_2 (Dropout)                 (None, 256)                     0           
## ________________________________________________________________________________
## dense_1 (Dense)                     (None, 3)                       771         
## ================================================================================
## Total params: 23,725,091
## Trainable params: 23,725,091
## Non-trainable params: 0
## ________________________________________________________________________________


From the above summary we can see each layer in our model, and the number of parameters that each layer introduces. We can see that with each layer the shape of our output changes until we have an output with 3 units. In total we see that we have 23, 725, 091 parameters.

Fit the Model

Now that we have built the architecture of the model and the image data is processed, we are ready to integrate the two. To do this we use the fit() function from keras with the training data and traning labels. Some hyperparameters that are defined are:
  • epochs = 60 indicates the number times the data gets passed through the CNN for optimization (this can be changed to optimize the model)
  • batch_size = 32 indicates the subset of the data set that will be passed through the CNN at one time
  • validation_split = 0.2 percent of training data used for validation source
  • validation_data = list(testX, testLabels) use test data as validation data (when this line is added, the validation split will be ignored)

For more information about the significance of these hyperparameters view this source.

#Fit the model to the training set
history <- model %>%
  fit(trainX, trainLabels, epochs = 60, batch_size = 32, validation_split = 0.2)

#Plot the epochs
plot(history)


From the plot above we can see that with each iteration the loss decreasing and the accuracy increasing.

Evaluate & Predict

Now that we have trained our model we can adjust hyperparameters and other factors to optimize the accuracy.

Train Data

We can use evaluate() to calculate loss and accuarcy. This particular model had to run more than once to get the highest accuracy. On the third try this model had a 93.333% accuracy for predicting the training set with loss of 30.16%.

# Loss/Accuarcy
evTrain <- model %>%
  evaluate(trainX, trainLabels)


We can call predict_classes() to use the model to make a prediction of the training set and create a confusion matrix.

#make a prediction of the classes
pred <- model %>%
  predict_classes(trainX)

#Create the confusion matrix
table(Predicted = pred, Actual = trainY)
##          Actual
## Predicted 0 1 2
##         0 5 0 0
##         1 0 5 1
##         2 0 0 4


We can further evaluate the model by looking at the porbability the model assigned to each image for each category. The first column indicates probability for motorcycle, the second car, and third bicycle. The last two columns compare predicted and actual categories for each image

#calculate the probabilities of each category (train)
prob <- model %>%
  predict_proba(trainX)

cbind(prob, Predicted_class = pred, Actual = trainY)
##                                              Predicted_class Actual
##  [1,] 9.999985e-01 5.645261e-07 9.784324e-07               0      0
##  [2,] 9.999856e-01 1.294885e-05 1.470450e-06               0      0
##  [3,] 9.999607e-01 2.929269e-05 1.003435e-05               0      0
##  [4,] 9.964097e-01 3.076105e-03 5.141911e-04               0      0
##  [5,] 9.994032e-01 1.490755e-04 4.477720e-04               0      0
##  [6,] 3.117176e-04 9.996881e-01 1.587317e-07               1      1
##  [7,] 1.058510e-07 9.999214e-01 7.847859e-05               1      1
##  [8,] 2.270291e-04 9.996426e-01 1.303163e-04               1      1
##  [9,] 1.607486e-06 9.999963e-01 2.103938e-06               1      1
## [10,] 4.785108e-04 9.994412e-01 8.026355e-05               1      1
## [11,] 3.016197e-03 8.402288e-04 9.961436e-01               2      2
## [12,] 4.595066e-03 5.544942e-04 9.948505e-01               2      2
## [13,] 1.502816e-02 4.483731e-02 9.401345e-01               2      2
## [14,] 2.514178e-01 1.940434e-01 5.545388e-01               2      2
## [15,] 1.141389e-01 7.238862e-01 1.619749e-01               1      2

We can use the above matrix to inform adjustments made forsee from the above probability matrix that the model is having issues predicting images with bicycles correctly.

Test Data

Repeating thes steps above for test data we can perform a similar analysis.

evTest <- model %>%
  evaluate(testX, testLabels)
evTest
## $loss
## [1] 1.47326
## 
## $accuracy
## [1] 0.8333333
predTest <- model %>%
  predict_classes(testX)

#Create the confusion matrix
table(Predicted = predTest, Actual = testY)
##          Actual
## Predicted 0 1 2
##         0 2 1 0
##         1 0 1 0
##         2 0 0 2
probTest <- model %>%
  predict_proba(testX)

cbind(probTest, Predicted_class = predTest, Actual = testY)
##                                          Predicted_class Actual
## [1,] 0.9985579 1.050861e-03 0.0003912439               0      0
## [2,] 0.7808689 1.174439e-02 0.2073866576               0      0
## [3,] 0.9991227 6.932112e-04 0.0001841404               0      1
## [4,] 0.3921849 5.946720e-01 0.0131430719               1      1
## [5,] 0.2320748 2.602178e-01 0.5077074170               2      2
## [6,] 0.1121178 7.501818e-05 0.8878071904               2      2


For test data our accuracy decreased possibly due to variations in images and background, and the fact that 5 images per group for test and train data set is way too small of a sample. In addition accuracies fluctuate every time the model runs.

library(ggplot2)
ggplot() +
  geom_col(aes(x = c("Training", "Testing"), y = c(evTrain$accuracy, evTest$accuracy)), fill = c("pink", "purple")) +
  geom_text(aes(x = c("Training", "Testing"), y = c(evTrain$accuracy + 0.1 , evTest$accuracy + 0.1), label = c(round(evTrain$accuracy, 2), round(evTest$accuracy, 2)))) +
  labs(y = "Accuracy", x ="Data", title = "Accuracy of Train & Test Data ") +
   theme(plot.title = element_text(hjust = 0.5), legend.position = "top")


Conclusions

As shown in this project, CNN implementation for image classification for arbitrary images of cars, bikes and motorcycles is possible with few lines of code. Keras provides a simple way to implement multiple types of CNN architectures and facilitates easy fine tunning of hyperparameters so that models can be easily optimized.

As we can see from the differences in accuracy by the training and test data set this model does well with predicting the training set but not as accurate with test data set. Further research can be conducted to optimize accuracy and minimize loss.

To build this model and achieve a higher accuracy we can do a few things:

  • Increase the number of images in our training and testing set. This would also require a GPU (or it just might take forever to run)
  • Add more diverse images with varying colors, tones and backgrounds. To do this I would like to explore automated photo selection methods to increase abstraction
  • We can decrease loss by adding more convolutional layers and fully connected layers
  • We can decrease the learning rate
  • A pre-trained neural network can be used (keras has a vgg16 function for this reason) to increase accuracy and may take care of the issue of small image sample size


During the creation of this R Markdown file the accuracy of my model fluctuated every time I ran the model. Accuracies ranged from 30% to 93% when the learning rate was set to 0.01. When I decreased the learning rateto 0.001 the loss decreased drastically and the model improved as well. This is because the model was trading optimization for a faster training time. When the learning rate was decreased, the model took longer to train but the accuracy did not fluctuate everytime I ran the program. According to this source,, “At extremes, a learning rate that is too large will result in weight updates that will be too large and the performance of the model (such as its loss on the training dataset) will oscillate over training epochs. Oscillating performance is said to be caused by weights that diverge (are divergent). A learning rate that is too small may never converge or may get stuck on a suboptimal solution.” Click for more information about improving accuracy in CNN models. This would explain why the accuracy has such a wide range when the learning rate was set to 0.01`as opposed to 0.001.

Further Research

Although using neural networks to classify images and objects within images is an effective way of abstracting features for thousands of categories, it is far from perfect. As we can already see from this simple walkthrough, the features “discovered” and identified by the model is only as good as the data that is provided to it. For example, if the next model of car looks more like an airplane than a car (lets say with wings, jet engines or something else that is super rediculous), then this model will not be able to identify it correctly. This brings up the bigger issue about implicit bias in algorithms associated with image classification (or object classification for that matter) in the field of Computer Vision.

Joy Buolamwini talks about algorithmic bias in this talk. She talks about how some algorithms are not able to detect certain object due to the data used to train models and the algorithm architecture itself. Understanding the way these algorithms and the data that is used to train these models is the first step to contributing the decreasing algorithmic bias.