Medical Image Classification
Using CNN

Atika Faradilla

3/30/2022

Introduction

It is hard to imagine the field of modern medicine without the development and harnessing of technology. X-rays, MRIs and other solutions have been widely utilized in various medical fields to improve diagnosis accuracy and efficiency, whilst also minimizing instances of human error. One example of such technologies is AI image recognition model. Image recognition models can assist in the diagnosing of various conditions. The models can be trained and deployed to scan images from MRI or X-ray machines, as well as other visual outputs, to detect, locate and flag up medical abnormalities that the model has been trained to identify.

The resulting information then can help doctors in providing timely and accurate diagnosis which may assist in the patients’ course of treatment. This prompt and accurate diagnosis would in turn improve the efficiency of medical services and providers, decrease the time dedicated to screening and rescreening, and add to doctors’ ability to provide an early and accurate treatment plan.

Such benefits would be crucial especially for doctors dealing with large-scale and unusual events that often-put high pressure on emergency rooms (e.g. victims of natural disasters, wars, etc.). Additionally, the image recognition models can be trained to one centralized standard, and deployed across a wide range of hospitals and clinics, helping to standardize the diagnosis process across regions.

In this report, one subset of image recognition, image classification will be performed to differentiate some medical images like CT scan and MRI scan. The process will include : Data Preprocessing, Model Building, Model Fitting, Model Evaluation, Model Tuning and Conclusion. The objective of this report is to classify medical images obtained from Medical MNIST dataset using the Convolutional Neural Network (CNN) model.

Source: Article

Dataset

This dataset was developed in 2017 by Arturo Polanco Lozano. It is also known as the MedNIST dataset for radiology and medical imaging. For the preparation of this dataset, images have been gathered from several datasets, namely, TCIA, the RSNA Bone Age Challange, and the NIH Chest X-ray dataset. This dataset contains 58954 medical images belonging to 6 categories — AbdomenCT(10000 images), HeadCT(10000 images), Hand(10000 images), CXR(10000 images), CXR(10000 images), BreastMRI(8954 images), ChestCT(10000 images).

Library and Setup

# Library Import
library(tidyverse)
library(imager)
library(keras)
library(caret)

options(scipen = 999)

# Using conda environment
use_condaenv("tf_image")

Exploratory Data Analysis

Before building the model, exploratory data analysis will be performed. In this section, images in the data will be further investigated. The distribution of the image dimensions (height and width) will be explored.

# Checking file names
folder_list <- list.files("data/")

folder_list

## [1] "AbdomenCT" "BreastMRI" "ChestCT"   "CXR"       "Hand"      "HeadCT"

# Combining the folder name with the path
folder_path <- paste0("data/", folder_list, "/")

folder_path

## [1] "data/AbdomenCT/" "data/BreastMRI/" "data/ChestCT/"   "data/CXR/"      
## [5] "data/Hand/"      "data/HeadCT/"

# Getting file name
file_name <- map(folder_path, 
                 function(x) paste0(x, list.files(x))
                 ) %>% 
  unlist()

# The first 6 file names
head(file_name)

## [1] "data/AbdomenCT/000000.jpeg" "data/AbdomenCT/000001.jpeg"
## [3] "data/AbdomenCT/000002.jpeg" "data/AbdomenCT/000003.jpeg"
## [5] "data/AbdomenCT/000004.jpeg" "data/AbdomenCT/000005.jpeg"

# Total images
length(file_name)

## [1] 58954

In the data used, there are total of 58.954 images divided into 6 categories: AbdomenCT, BreastMRI, ChestCT, CXR, Hand, and HeadCT.

# Randomly select image
set.seed(123)
sample_image <- sample(file_name, 6)

# Load image into R
img <- map(sample_image, load.image)

# Plot image
par(mfrow = c(2, 3))
map(img, plot)

## [[1]]
## Image. Width: 64 pix Height: 64 pix Depth: 1 Colour channels: 1 
## 
## [[2]]
## Image. Width: 64 pix Height: 64 pix Depth: 1 Colour channels: 1 
## 
## [[3]]
## Image. Width: 64 pix Height: 64 pix Depth: 1 Colour channels: 1 
## 
## [[4]]
## Image. Width: 64 pix Height: 64 pix Depth: 1 Colour channels: 1 
## 
## [[5]]
## Image. Width: 64 pix Height: 64 pix Depth: 1 Colour channels: 1 
## 
## [[6]]
## Image. Width: 64 pix Height: 64 pix Depth: 1 Colour channels: 1

One of important aspects of image classification is to understand the dimension of the input images. To create a proper input dimension for building the deep learning model, the distribution of the image dimension must be known.

# Full Image Description
img <- load.image(file_name[1])
img

## Image. Width: 64 pix Height: 64 pix Depth: 1 Colour channels: 1

# Image Dimension
dim(img)

## [1] 64 64  1  1

# Function for acquiring width and height of an image
get_dim <- function(x){
  img <- load.image(x) 
  
  df_img <- data.frame(height = height(img),
                       width = width(img),
                       filename = x
                       )
  
  return(df_img)
}

get_dim(file_name[1])

##   height width                   filename
## 1     64    64 data/AbdomenCT/000000.jpeg

# Randomly get 1000 sample images
set.seed(123)
sample_file <- sample(file_name, 1000)

# Run the get_dim() function for each image
file_dim <- map_df(sample_file, get_dim)

head(file_dim, 10)

##    height width                   filename
## 1      64    64    data/HeadCT/002708.jpeg
## 2      64    64    data/HeadCT/008915.jpeg
## 3      64    64 data/AbdomenCT/002985.jpeg
## 4      64    64    data/HeadCT/009531.jpeg
## 5      64    64       data/CXR/000970.jpeg
## 6      64    64       data/CXR/000755.jpeg
## 7      64    64       data/CXR/008574.jpeg
## 8      64    64 data/AbdomenCT/002756.jpeg
## 9      64    64      data/Hand/006449.jpeg
## 10     64    64      data/Hand/007480.jpeg

summary(file_dim)

##      height       width      filename        
##  Min.   :64   Min.   :64   Length:1000       
##  1st Qu.:64   1st Qu.:64   Class :character  
##  Median :64   Median :64   Mode  :character  
##  Mean   :64   Mean   :64                     
##  3rd Qu.:64   3rd Qu.:64                     
##  Max.   :64   Max.   :64

As can be seen from the summary above, all of the images in our dataset has 64 pixels in both weight and height. And since all of the images already have the same size, further image resizing isn’t needed.

Data Preprocessing

To increase the size of the dataset, Image Augmentation will be performed. Using this method, the modification of the image (obtained by flipping/ rotating/ zooming/ cropping the original image) will be added to the dataset. So it can create more robust model.

# Height and width of images
target_size <- c(64, 64)

# Batch size for training the model
batch_size <- 32

# Image Generator
train_data_gen <- image_data_generator(rescale = 1/255, 
                                       horizontal_flip = T, 
                                       vertical_flip = T, 
                                       rotation_range = 45, 
                                       validation_split = 0.2)

Next, cross validation will be performed. The current (augmented) dataset will be divided into two parts, training dataset and validation dataset with the proportion of 80% training dataset and 20% validation dataset.

# Training Dataset
train_image_array_gen <- flow_images_from_directory(directory = "data/", 
                                                    target_size = target_size, 
                                                    color_mode = "grayscale", 
                                                    batch_size = batch_size , 
                                                    seed = 123,  
                                                    subset = "training", 
                                                    generator = train_data_gen)

# Validation Dataset
val_image_array_gen <- flow_images_from_directory(directory = "data/",
                                                  target_size = target_size, 
                                                  color_mode = "grayscale", 
                                                  batch_size = batch_size ,
                                                  seed = 123,
                                                  subset = "validation", 
                                                  generator = train_data_gen)

# Number of training samples
train_samples <- train_image_array_gen$n

# Number of validation samples
valid_samples <- val_image_array_gen$n

# Number of target classes/categories
output_n <- n_distinct(train_image_array_gen$classes)

# Get the class proportion
table("\nFrequency" = factor(train_image_array_gen$classes)
      ) %>% 
  prop.table()

## 
## Frequency
##         0         1         2         3         4         5 
## 0.1696209 0.1518955 0.1696209 0.1696209 0.1696209 0.1696209

Based on the proportion table above, it can be seen that our dataset is already balanced with each class having the proportion around 16%.

Model Building

The image classification of the dataset will be done using CNN. Convolutional Neural Network (CNN) is a powerful image processing, artificial intelligence (AI) that use deep learning to perform both generative and descriptive tasks, often using machine vision that includes image and video recognition, along with recommender systems and natural language processing (NLP).

In this section, we will build a simple model first with the following layer:

Convolutional layer to extract features from 2D image with relu activation function
Max Pooling layer to downsample the image features
Flattening layer to flatten data from 2D array to 1D array
Dense layer to capture more information
Dense layer for output with softmax activation function

tensorflow::tf$random$set_seed(123)

model <- keras_model_sequential(name = "simple_model") %>% 
  
  # Convolution Layer
    layer_conv_2d(filters = 16,
               kernel_size = c(3,3),
               padding = "same",
               activation = "relu",
               input_shape = c(target_size, 1)) %>% 

  # Max Pooling Layer
    layer_max_pooling_2d(pool_size = c(2,2)) %>% 
  
  # Flattening Layer
    layer_flatten() %>% 
  
  # Dense Layer
    layer_dense(units = 16,
              activation = "relu") %>% 
  
  # Output Layer
    layer_dense(units = 6,
              activation = "softmax",
              name = "Output")

model

## Model: "simple_model"
## ________________________________________________________________________________
## Layer (type)                        Output Shape                    Param #     
## ================================================================================
## conv2d (Conv2D)                     (None, 64, 64, 16)              160         
## ________________________________________________________________________________
## max_pooling2d (MaxPooling2D)        (None, 32, 32, 16)              0           
## ________________________________________________________________________________
## flatten (Flatten)                   (None, 16384)                   0           
## ________________________________________________________________________________
## dense (Dense)                       (None, 16)                      262160      
## ________________________________________________________________________________
## Output (Dense)                      (None, 6)                       102         
## ================================================================================
## Total params: 262,422
## Trainable params: 262,422
## Non-trainable params: 0
## ________________________________________________________________________________

Model Fitting

After creating the model, next we have to train the model using training dataset. For that purpose, we will use categorical_crossentropy as the loss function, optimizer_adam with learning rate 0.001, and accuracy metric.

model %>% 
  compile(
    loss = "categorical_crossentropy",
    optimizer = optimizer_adam(learning_rate = 0.01),
    metrics = "accuracy")

# Fit data into model
  history <- model %>% 
    fit(train_image_array_gen,
        steps_per_epoch = as.integer(train_samples / batch_size), 
        epochs = 15)

# Save and load model
# model %>% save_model_hdf5("model.hdf5")
model <- load_model_hdf5(filepath = "model.hdf5")

# saveRDS(history, "history.RDS")
history <- readRDS("history.RDS")

plot(history)

## `geom_smooth()` using formula 'y ~ x'

history

## 
## Final epoch (plot to see history):
##     loss: 0.0198
## accuracy: 0.9952

After training the model, from the result it can be seen that this model obtained 99.5% of accuracy, which is very good. However, further model evaluation using the validation dataset is still necessary to check whether or not the model is overfit.

Model Evaluation

In this section, we will try to evaluate the model using the validation dataset.

val_data <- data.frame(file_name = paste0("data/", val_image_array_gen$filenames)) %>% 
  mutate(class = str_extract(file_name, "AbdomenCT|BreastMRI|ChestCT|CXR|Hand|HeadCT"))

head(val_data, 10)

##                      file_name     class
## 1  data/AbdomenCT\\000000.jpeg AbdomenCT
## 2  data/AbdomenCT\\000001.jpeg AbdomenCT
## 3  data/AbdomenCT\\000002.jpeg AbdomenCT
## 4  data/AbdomenCT\\000003.jpeg AbdomenCT
## 5  data/AbdomenCT\\000004.jpeg AbdomenCT
## 6  data/AbdomenCT\\000005.jpeg AbdomenCT
## 7  data/AbdomenCT\\000006.jpeg AbdomenCT
## 8  data/AbdomenCT\\000007.jpeg AbdomenCT
## 9  data/AbdomenCT\\000008.jpeg AbdomenCT
## 10 data/AbdomenCT\\000009.jpeg AbdomenCT

# Function to convert image to array
image_prep <- function(x) {
  arrays <- lapply(x, function(path) {
    img <- image_load(path, target_size = target_size, 
                      grayscale = TRUE)
    
    x <- image_to_array(img)
    x <- array_reshape(x, c(1, dim(x)))
    x <- x/255
})
  do.call(abind::abind, c(arrays, list(along = 1)))
}

test_x <- image_prep(val_data$file_name)

# Check dimension of testing data set
dim(test_x)

## [1] 11790    64    64     1

pred_test <- predict_classes(model, test_x) 

head(pred_test, 10)

##  [1] 0 0 0 0 0 0 0 0 0 0

# Convert encoding to label
decode <- function(x){
  case_when(x == 0 ~ "AbdomenCT",
            x == 1 ~ "BreastMRI",
            x == 2 ~ "CXR",
            x == 3 ~ "ChestCT",
            x == 4 ~ "Hand",
            x == 5 ~ "HeadCT"
            )
}

pred_test <- sapply(pred_test, decode) 

head(pred_test, 10)

##  [1] "AbdomenCT" "AbdomenCT" "AbdomenCT" "AbdomenCT" "AbdomenCT" "AbdomenCT"
##  [7] "AbdomenCT" "AbdomenCT" "AbdomenCT" "AbdomenCT"

# Confusion matrix
confusionMatrix(as.factor(pred_test), 
                as.factor(val_data$class)
                )

## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  AbdomenCT BreastMRI ChestCT  CXR Hand HeadCT
##   AbdomenCT      2000         0      10    0    0      0
##   BreastMRI         0      1790       0    0    0      0
##   ChestCT           0         0    1979    0    0      0
##   CXR               0         0      11 1969    4      0
##   Hand              0         0       0   30 1989      0
##   HeadCT            0         0       0    1    7   2000
## 
## Overall Statistics
##                                                
##                Accuracy : 0.9947               
##                  95% CI : (0.9932, 0.9959)     
##     No Information Rate : 0.1696               
##     P-Value [Acc > NIR] : < 0.00000000000000022
##                                                
##                   Kappa : 0.9936               
##                                                
##  Mcnemar's Test P-Value : NA                   
## 
## Statistics by Class:
## 
##                      Class: AbdomenCT Class: BreastMRI Class: ChestCT
## Sensitivity                    1.0000           1.0000         0.9895
## Specificity                    0.9990           1.0000         1.0000
## Pos Pred Value                 0.9950           1.0000         1.0000
## Neg Pred Value                 1.0000           1.0000         0.9979
## Prevalence                     0.1696           0.1518         0.1696
## Detection Rate                 0.1696           0.1518         0.1679
## Detection Prevalence           0.1705           0.1518         0.1679
## Balanced Accuracy              0.9995           1.0000         0.9948
##                      Class: CXR Class: Hand Class: HeadCT
## Sensitivity              0.9845      0.9945        1.0000
## Specificity              0.9985      0.9969        0.9992
## Pos Pred Value           0.9924      0.9851        0.9960
## Neg Pred Value           0.9968      0.9989        1.0000
## Prevalence               0.1696      0.1696        0.1696
## Detection Rate           0.1670      0.1687        0.1696
## Detection Prevalence     0.1683      0.1712        0.1703
## Balanced Accuracy        0.9915      0.9957        0.9996

Based on the confusion matrix above, it can be seen that the accuracy obtained is 99.5% which indicates that the model is really good and isn’t overfit although the model is quite simple (only 1 convolutional layer).

Model Tuning

Eventhough the previous model is already really good, if the results of the metrics still want to be improved, model tuning can be performed. In this section, we will try to make a model with more convolutional layers than the previous model to improve the results.

The following is our improved model architecture:

1st Convolutional layer to extract features from 2D image with relu activation function
2nd Convolutional layer to extract features from 2D image with relu activation function
Max pooling layer
3rd Convolutional layer to extract features from 2D image with relu activation function
Max pooling layer
4th Convolutional layer to extract features from 2D image with relu activation function
Max pooling layer
Flattening layer from 2D array to 1D array
Dense layer to capture more information
Dense layer for output layer

tensorflow::tf$random$set_seed(123)

model_big <- keras_model_sequential() %>% 
  
  # First convolutional layer
    layer_conv_2d(filters = 64,
               kernel_size = c(5,5), 
               padding = "same",
               activation = "relu",
               input_shape = c(target_size, 1)) %>% 
  
  # Max pooling layer
    layer_max_pooling_2d(pool_size = c(2,2)) %>% 
  
  # Second convolutional layer
    layer_conv_2d(filters = 64,
               kernel_size = c(3,3), 
               padding = "same",
               activation = "relu") %>% 
  
  # Max pooling layer
    layer_max_pooling_2d(pool_size = c(2,2)) %>% 
  
  # Third convolutional layer
    layer_conv_2d(filters = 32,
               kernel_size = c(3,3),
               padding = "same",
               activation = "relu") %>% 

  # Max pooling layer
    layer_max_pooling_2d(pool_size = c(2,2)) %>% 
  
  # Fourth convolutional layer
    layer_conv_2d(filters = 256,
               kernel_size = c(3,3),
               padding = "same",
               activation = "relu") %>% 
  
  # Max pooling layer
    layer_max_pooling_2d(pool_size = c(2,2)) %>% 

  
  # Flattening layer
    layer_flatten() %>% 
  
  # Dense layer
    layer_dense(units = 128,
             activation = "relu") %>% 
  
  # Output layer
    layer_dense(name = "Output",
              units = 6, 
              activation = "softmax")

model_big

## Model: "sequential"
## ________________________________________________________________________________
## Layer (type)                        Output Shape                    Param #     
## ================================================================================
## conv2d_4 (Conv2D)                   (None, 64, 64, 64)              1664        
## ________________________________________________________________________________
## max_pooling2d_4 (MaxPooling2D)      (None, 32, 32, 64)              0           
## ________________________________________________________________________________
## conv2d_3 (Conv2D)                   (None, 32, 32, 64)              36928       
## ________________________________________________________________________________
## max_pooling2d_3 (MaxPooling2D)      (None, 16, 16, 64)              0           
## ________________________________________________________________________________
## conv2d_2 (Conv2D)                   (None, 16, 16, 32)              18464       
## ________________________________________________________________________________
## max_pooling2d_2 (MaxPooling2D)      (None, 8, 8, 32)                0           
## ________________________________________________________________________________
## conv2d_1 (Conv2D)                   (None, 8, 8, 256)               73984       
## ________________________________________________________________________________
## max_pooling2d_1 (MaxPooling2D)      (None, 4, 4, 256)               0           
## ________________________________________________________________________________
## flatten_1 (Flatten)                 (None, 4096)                    0           
## ________________________________________________________________________________
## dense_1 (Dense)                     (None, 128)                     524416      
## ________________________________________________________________________________
## Output (Dense)                      (None, 6)                       774         
## ================================================================================
## Total params: 656,230
## Trainable params: 656,230
## Non-trainable params: 0
## ________________________________________________________________________________

model_big %>% 
   compile(
     loss = "categorical_crossentropy",
     optimizer = optimizer_adam(learning_rate = 0.001),
     metrics = "accuracy")

history_big <- model_big %>% 
   fit_generator(train_image_array_gen,
                 steps_per_epoch = as.integer(train_samples / batch_size), 
                 epochs = 15)

# Save and load model
# model_big %>% save_model_hdf5("model_big.hdf5")
model_big <- load_model_hdf5(filepath = "model_big.hdf5")

# saveRDS(history, "history_big.RDS")
history_big <- readRDS("history_big.RDS")

plot(history_big)

## `geom_smooth()` using formula 'y ~ x'

history_big

## 
## Final epoch (plot to see history):
##     loss: 0.0198
## accuracy: 0.9952

After training the model, from the result it can be seen that this model obtained 99.9% of accuracy, which is excellent. Next, further model evaluation using the validation dataset will be done.

pred_test_big <- predict_classes(model_big, test_x) 

head(pred_test_big, 10)

##  [1] 0 0 0 0 0 0 0 0 0 0

# Convert encoding to label
pred_test_big <- sapply(pred_test_big, decode) 

head(pred_test_big, 10)

##  [1] "AbdomenCT" "AbdomenCT" "AbdomenCT" "AbdomenCT" "AbdomenCT" "AbdomenCT"
##  [7] "AbdomenCT" "AbdomenCT" "AbdomenCT" "AbdomenCT"

# Confusion matrix
confusionMatrix(as.factor(pred_test_big), 
                as.factor(val_data$class)
                )

## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  AbdomenCT BreastMRI ChestCT  CXR Hand HeadCT
##   AbdomenCT      2000         0       5    0    0      0
##   BreastMRI         0      1790       0    0    0      0
##   ChestCT           0         0    1995    0    0      0
##   CXR               0         0       0 1994    3      0
##   Hand              0         0       0    6 1996      1
##   HeadCT            0         0       0    0    1   1999
## 
## Overall Statistics
##                                                
##                Accuracy : 0.9986               
##                  95% CI : (0.9978, 0.9992)     
##     No Information Rate : 0.1696               
##     P-Value [Acc > NIR] : < 0.00000000000000022
##                                                
##                   Kappa : 0.9984               
##                                                
##  Mcnemar's Test P-Value : NA                   
## 
## Statistics by Class:
## 
##                      Class: AbdomenCT Class: BreastMRI Class: ChestCT
## Sensitivity                    1.0000           1.0000         0.9975
## Specificity                    0.9995           1.0000         1.0000
## Pos Pred Value                 0.9975           1.0000         1.0000
## Neg Pred Value                 1.0000           1.0000         0.9995
## Prevalence                     0.1696           0.1518         0.1696
## Detection Rate                 0.1696           0.1518         0.1692
## Detection Prevalence           0.1701           0.1518         0.1692
## Balanced Accuracy              0.9997           1.0000         0.9988
##                      Class: CXR Class: Hand Class: HeadCT
## Sensitivity              0.9970      0.9980        0.9995
## Specificity              0.9997      0.9993        0.9999
## Pos Pred Value           0.9985      0.9965        0.9995
## Neg Pred Value           0.9994      0.9996        0.9999
## Prevalence               0.1696      0.1696        0.1696
## Detection Rate           0.1691      0.1693        0.1696
## Detection Prevalence     0.1694      0.1699        0.1696
## Balanced Accuracy        0.9983      0.9986        0.9997

Based on the confusion matrix above, it can be seen that the accuracy obtained is 99.9% which indicates that the model is excellent and isn’t overfit.

Conclusion

Based on the data used in this report and the image classification process using CNN model that has been done, we can conclude that:

Based on the model evaluation of the first model,the accuracy obtained is 99.5% which indicates that the model is really good and isn’t overfit although the model is quite simple (only 1 convolutional layer).
Meanwhile, based on the model evaluation of the second model, the accuracy obtained is 99.9% which indicates that the model is excellent and isn’t overfit. However, it is important to notice that, to build the second model is very time consuming since it has 4 convolutional layers. Therefore, unless it is very necessary to improve the simplest model (low accuracy or overfitting), model tuning is not recommended (like what happened in this case).

Medical Image Classification Using CNN