INTRODUCTION

Principal component analysis (PCA) is a technique for reducing the dimensionality of large datasets, increasing interpretability but at the same time minimizing information loss. PCA can also be defined as a dimension-reduction technique that is often used to reduce the dimension of large data sets, by transforming a large set of variables into smaller ones without loosing the major information in the larger dataset.
PCA helps to explore and visualize data in much easier way as compare to the large datasets. However, in this paper I would be discussing how PCA can be used to reduce the sizes of images based on lesser components of the particular image without having to sacrifice its quality.

NEED FOR IMAGE COMPRESSION?

An image, 1024 pixel x 1024 pixel x 24 bit, without compression, would require 3 MB of storage and 7 minutes for transmission, utilizing a high speed, 64 Kbit/s, ISDN line. If the image is compressed at a 10:1 compression ratio, the storage requirement is reduced to 300 KB and the transmission time drops to under 4 seconds. Therefore,the compressed images can be transferred to a storage device or over a network in a significantly lesser period.
In a distributed environment large image files remain a major bottleneck within systems. Compression is an important component of the solutions available for creating file sizes of manageable and transmittable dimensions. Increasing the bandwidth is another method, but the cost sometimes makes this a less attractive solution. Platform portability and performance are important in the selection of the compression/decompression technique to be employed. The easiest way to reduce the size of the image file is to reduce the size of the image itself. By shrinking the size of the image, fewer pixels need to be stored and consequently the file will take less time to load.
An average image with good quality that can be used fully on a standard A4 sheet is between 1 to 1.5 megabytes. Doing the mathematics of this, your guess is as good as mine. Where are these tech companies storing all these information, more specifically the videos and pictures. But let’s leave that question for another day and consider the advantages when the sizes of these images are been compressed whilst its quality is been maintained. This goes a long way to save storage spaces both on our personal devices and cloud storage. This is what we seek to achieve using PCA.

READING OF IMAGE

The image used in this paper is a photo shoot of painted rocks carried out by a group of “kid enthusiasts” taking part in rock-painting activities to create beautiful patterns and painting projects. More beautiful pictures can be accessed here . Let’s go ahead to import and read the photo with the help of the jpeg library and the magick library.
library("jpeg")
library("magick")
paint_origin<-readJPEG("paintrock.jpg")
plot(1, type="n") # plotting the rasterImage – colour photo
rasterImage(paint_origin, 0.6, 0.6, 1.4, 1.4)

The jpeg package is used to also convert the image into a matrix representation. Let’s now take a look at the dimension of our matrix representation. Displaying the whole matrix would make this paper bulky and lengthy even if just the head or tail is displayed, but we could have achieved this using the head(rock_origin) or tail(rock_origin) function.
dim (paint_origin)
## [1] 473 710   3

PCA

What is needed to be done now is to represent the image as a three 473 x 710 matrices array with each matrix corresponding to the RGB color value scheme and then extract the individual color value matrices to perform PCA on each.
red <- paint_origin[,,1]
green <- paint_origin[,,2]
blue <- paint_origin[,,3]
PCA targets features with higher variance. Therefore scaling and centering of our data would not be necessary for image compression. This is because if we scale and the scaling coefficient is > 1, that feature will have more influence than it would have before. On the contrary, a coefficient < 1 will also mean less influence. In other words, as long as the parameters have same orders, centering and scaling may not be necessary.
red.pca <- prcomp(red, center=FALSE, scale.=FALSE)
green.pca <- prcomp(green, center=FALSE, scale.=FALSE)
blue.pca <- prcomp(blue, center=FALSE, scale.=FALSE)
We now put them together as a list to integrate it into the three dimensional table of the RGB.
list.rock_origin.pca <- list(red.pca, green.pca, blue.pca)

PRINCIPAL COMPONENTS REPRESENTATION

The color indications in the plot below indicates the eigen values for the principal components for all the RGB components. In this case I am only displaying just seven(7) principal components.
library("factoextra")
library("gridExtra")
library("ggplot2")
f1 <- fviz_eig(red.pca, choice = 'eigenvalue', main = "Red", barfill = "red", ncp = 7, addlabels = TRUE)
f2 <- fviz_eig(green.pca, choice = 'eigenvalue', main = "Green", barfill = "green", ncp = 7, addlabels = TRUE)
f3 <- fviz_eig(blue.pca, choice = 'eigenvalue', main = "Blue", barfill = "blue", ncp = 7, addlabels = TRUE)

grid.arrange(f1, f2, f3, ncol=3)

Let us now take a look at the percentage of the explained variances in these principal components.
f11 <- fviz_eig(red.pca, main = "Red", barfill = "red", ncp = 7)
f22 <- fviz_eig(green.pca, main = "Green", barfill = "green", ncp = 7)
f33 <- fviz_eig(blue.pca, main = "Blue", barfill = "blue", ncp = 7)

grid.arrange(f11, f22, f33, ncol = 3)

It could be realized from the above screen plot that the first principal component for each color explains majority of the variances in the color component, explaining about 80% variance. The second principal components are also considered and the rest are obviously negligible since they explain very little or no variance at all.

COMPRESSION OF THE IMAGE

Here we are going to take a look at how the image is going to look like after we compress it with different number of principal components. We will notice that as the number of principal components increase, the more likeness to the original image the newly created images becomes. This progressive improvement in quality is due to the fact that as more principal components are included in the compression, the more the variance (information) is described in the output.
library(abind)
library(ggplot2)

for (i in c(10,30,60,100,150,200,250,300)) {
  new_image <- abind(red.pca$x[,1:i] %*% t(red.pca$rotation[,1:i]),
                     green.pca$x[,1:i] %*% t(green.pca$rotation[,1:i]),
                     blue.pca$x[,1:i] %*% t(blue.pca$rotation[,1:i]),
                     along = 3)
  writeJPEG(new_image, paste0('Compressed_image_with_',i, '_components.jpg'))
}

image_plot <- function(path, plot_name) {
  require('jpeg')
  img <- readJPEG(path)
  d <- dim(img)
  plot(0,0,xlim=c(0,d[2]),ylim=c(0,d[2]),xaxt='n',yaxt='n',xlab='',ylab='',bty='n')
  title(plot_name, line = -0.5)
  rasterImage(img,0,0,d[2],d[2])
}

par(mfrow = c(1,2), mar = c(0,0,1,1))
for (i in c(10,30,60,100,150,200,250,300)) {
  image_plot(paste0('Compressed_image_with_',i, '_components.jpg'), 
             paste0(round(i,0), ' Components'))
}

As it can be seen from the images above, the quality of the images kept increasing as the number of principal components increased. It is also important to note that, the last image generated is only representing 300 principal components of the original image but we can see the quality in both images are almost the same.
Now, lets compare the sizes in kilobytes to know if there is a significant change in the sizes of these images to ascertain our point that, PCA for image compression helps to save disk spaces without sacrificing quality since it only removes less significant components.
library(knitr)

table <- matrix(0,9,3)
colnames(table) <- c("Number of components", "Image size (kilobytes)", "Saved Disk Space (kilobytes)")
table[,1] <- c(10,30,60,100,150,200,250,300,"Original Rock image")
table[9,2:3] <- round(c(file.info('paintrock.jpg')$size/1024, 0),2)
for (i in c(1:8)) {
  path <- paste0('Compressed_image_with_',table[i,1], '_components.jpg')
  table[i,2] <- round(file.info(path)$size/1024,2)
  table[i,3] <- round(as.numeric(table[9,2]) - as.numeric(table[i,2]),2)
}

kable(table)
Number of components Image size (kilobytes) Saved Disk Space (kilobytes)
10 32.68 117.6
30 42.67 107.61
60 46.17 104.11
100 47.5 102.78
150 47.3 102.98
200 46.59 103.69
250 45.65 104.63
300 43.86 106.42
Original Rock image 150.28 0

CONCLUSION

Image compression with principal component analysis helped to save disk space of about 70% with little or no loss in image quality. Not only has it saved disk space but it has made it easier and efficient to transmit these images between different sets of locations.
Other fields where image compression using PCA can be applied are; recognition of patterns, processing of digital images in medicine and among others.
# REFERENCES

# https://royalsocietypublishing.org/doi/10.1098/rsta.2015.0202
# https://www.uniassignment.com/essay-samples/information-technology/the-need-for-image-compression-inforrmation-technology-essay.php?vref=1
# https://kidsactivitiesblog.com/112656/rock-painting-ideas/?utm_source=pocket_mylist