INTRODUCTION

Dimensional reduction is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension (number of variables needed in a minimal representation of the data).

In this analysis, we will be conducting PCA based image compression. Principal Component Analysis (PCA) is a statistical procedure that uses an orthogonal transformation that converts a set of correlated variables to a set of uncorrelated variables. For the sake of this analysis, one image will be used and two different different methods namely Covariance Matrix Method and Singular Value Decomposition. To build an efficient model in machine learning, we often need lots of data. But the fact is that sometimes, dealing with large data set is difficult. So as data scientist, with the use of PCA, we have figured how to efficiently manage these problems

Images are combinations of pixels in rows placed one after the other to form one single image. This analysis will be focused on working images using the principal component analysis.

We will be using the packages installed below:

library(magick)
## Linking to ImageMagick 6.9.12.3
## Enabled features: cairo, fontconfig, freetype, heic, lcms, pango, raw, rsvg, webp
## Disabled features: fftw, ghostscript, x11
library(jpeg)
library(factoextra)
## Loading required package: ggplot2
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(gridExtra)
library(ggplot2)
suppressPackageStartupMessages(library(dplyr)) 
suppressPackageStartupMessages(library(ggplot2)) 
suppressPackageStartupMessages(library(tibble)) 

IMAGE SOURCE

Firstly, we will import and plot a colour photo of Chicago gotten from https://pixabay.com/get/gf511dd53bafa8367b4ce7f80db3e37c5ec9d0c40a246c19ef128e15c762b6157fe37e153314b9a640d0327a01a75054b7dcc599e0b09dfbc3bf05d9e40b3ef072577f395590c0f13b6cac414ad0c9690_1920.jpg

chicago<-readJPEG("~/Downloads/chicago.jpeg")
ncol(chicago)
## [1] 1280
nrow(chicago)
## [1] 853

The Chicago image is represented as three 1280x853 matrices as an array with each corresponding to the RGB color value schema. We will extract the individual color value matrices to perform PCA on each.

r= chicago[,,1]
g= chicago[,,2]
b= chicago[,,3]

PRINCIPAL COMPONENT ANALYSIS

PCA is performed on each color value matrix. This analysis is mainly focused on image compression and not description or interpretation of the variables, so we will not need to perform centering so we will set the center argument to FALSE. If the argument is not set to FALSE, the returned image will not have the right RGB values due to having their respective means subtracted from each pixel color vector.

chicago.r.pca <- prcomp(r, center = FALSE)
chicago.g.pca <- prcomp(g, center = FALSE)
chicago.b.pca <- prcomp(b, center = FALSE)

Then we will put the PCA objects into a list.

rgb.pca <- list(chicago.r.pca, chicago.g.pca, chicago.b.pca)

Now that we have found the the principal components for each color value matrix, we will compress the image. We have been able to get new dimensions that will describe the original data, in this case, pixels. We will project the pixel values on the new dimensions of the data frame for each respective matrix.

The loop below performs a reconstruction on the original image through using the projections of the data by utilizing increasing amounts of principal components. Through each round, we will observe that as the number of principal components will increase, the more representative of the original image the reconstruction becomes. This sequential improvement in quality is because as more principal components are used, the more the variance (information) is described. The first few principal components will have the most drastic change in quality while the last few components will not make much if any, difference to quality.

for (i in seq.int(3, round(nrow(chicago) - 5), length.out = 5)) {
  pca.img <- sapply(rgb.pca, function(j) {
    compressed.img <- j$x[,1:i] %*% t(j$rotation[,1:i])
  }, simplify = 'array')
  writeJPEG(pca.img, paste('photo', round(i,0), '_components.jpg', sep = ''))
}

After 5 iterations, we will check and compare the compression ratio from the original image to the final one.

original <- file.info('chicago.jpg')$size / 1000
imgs <- dir('photo')

for (i in imgs) {
  full.path <- paste('photo', i, sep='')
  print(paste(i, ' size: ', file.info(full.path)$size / 1000, ' original: ', original, ' % diff: ', round((file.info(full.path)$size / 1000 - original) / original, 2) * 100, '%', sep = ''))
}
Photo3

Photo3

Photo214

Photo214

Photo426

Photo426

Photo637

Photo637

Photo848

Photo848

RESULTS AND COMPARISON

The results from the iterations is as follows:

  1. “photo3_components.jpg size: 123.058 original: 326.149 % diff: -62.27%”
  2. “photo214_components.jpg size: 283.292 original: 326.149 % diff: -13.14%”
  3. “photo426_components.jpg size: 289.751 original: 326.149 % diff: -11.16%”
  4. “photo637_components.jpg size: 273.708 original: 326.149 % diff: -16.08%”
  5. “photo848_components.jpg size: 265.230 original: 326.149 % diff: -18.68%”

Using PCA, we were able to reduce the original photo by 18.68% with minimal loss in the quality of the image.

SUMMARY

The aim of the whole analysis was to conduct PCA based image – compression with R. The study was done with one image and one code basing on slightly different methods: Covariance Matrix Method and Singular Value Decomposition.

CONCLUSION

Image compression with principal component analysis is a useful and relatively straightforward application of the technique by imaging an image as a matrix made of pixel color values. There are many other real-world applications of PCA, including face and handwriting recognition, and other situations when dealing with many variables such as gene expression experiments.