Dimensionality Reduction for Image Compresion

Introduction - Why Image Compresion?

In today’s digital era, almost every website or application contains some images. The posts on social media, the website’s backgrounds, the online advertisements, and so on.. At the same time, the photography industry changes a lot and even pictures taken by smartphones are incredibly high quality. Images are crucial. However, they may be also a major contributor to storage requirements and load times on websites. One of the ways to deal with that issue is image compression.

Image compression is about “minimizing (image) size in bytes without degrading image quality below an acceptable threshold”. https://www.techtarget.com/whatis/definition/image-compression

There are many ways to do so. In this project I focused on applying a PCA technique to perform the image compression and analising its advantages and downsides.

PCA process for Image Compression

Initial Image Processing

Naturally I start with loading necessary packages. Then I import the pohoto I’m goint to compress.

#install.packages("jpeg")
#install.packages("factoextra")
#install.packages("gridExtra")
#install.packages("ggplot2")
library(jpeg)
library(factoextra)
## Ładowanie wymaganego pakietu: ggplot2
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(gridExtra)
library(ggplot2)
library(magick)
## Linking to ImageMagick 6.9.12.98
## Enabled features: cairo, freetype, fftw, ghostscript, heic, lcms, pango, raw, rsvg, webp
## Disabled features: fontconfig, x11
library(png)
library(Metrics)
library(knitr)
library(imgpalr)
library(abind)

photo<-readJPEG("first_man.jpg")
plot(1, type="n") # plotting the rasterImage – colour photo
rasterImage(photo, 0.6, 0.6, 1.4, 1.4)

The plotted image consists of three dimensions - each one is representing (in scale from 0 to 255) the amounts of red, green and blue color.Those are components of RBG format. For PCA image compresion, we create the matrix for each RGB component.

r<-photo[,,1]
g<-photo[,,2]
b<-photo[,,3]

There are few interesting functions we can apply to play with the image colors. For example we can create a custom color palette, based on dominant image shades.

library(imgpalr)

myPalette<-image_pal("first_man.jpg", n=8, type="div", saturation=c(0.75, 1), brightness=c(0.75, 1), plot=TRUE)

Another thing we can do, utilizing the RGB format is converting the image to the grey scale. It can be done by adding all shades and scaling them up to 1. This step can be useful to PCA image compresion as well.

photo.sum<-photo[,,1]+photo[,,2]+photo[,,3]
photo.bw<-photo.sum/max(photo.sum)
plot(1, type="n")
rasterImage(photo.bw, 0.6, 0.6, 1.4, 1.4)

Principal Component Analysis (PCA)

The PCA technique allows to transform initial dataset into new set of independent variables - principal components. In image processing PCA allows to represent an image as a linear combination of principal components, that minimize quality (information) loss by explaining the greatest amount of variation in data.

To start, for each RGB component we calculate the PCAs. As a next step - merge them.

r.pca<-prcomp(r, center=FALSE, scale.=FALSE)
g.pca<-prcomp(g, center=FALSE, scale.=FALSE)
b.pca<-prcomp(b, center=FALSE, scale.=FALSE)
rgb.pca<-list(r.pca, g.pca, b.pca)

Below we identify the relationships between percentages of explained variables and components.For all of the components the variance id greater than 80%.

f1<-fviz_eig(r.pca, main="Red", barfill="red", ncp=5, addlabels=TRUE)
f2<-fviz_eig(g.pca, main="Green", barfill="green", ncp=5, addlabels=TRUE)
f3<-fviz_eig(b.pca, main="Blue", barfill="blue", ncp=5, addlabels=TRUE)
grid.arrange(f1, f2, f3, ncol=3)

Now it’s time to perform and check the results of image compression with PCA technique. Using loop we create 9 photos. The number of principal components is determined by the amount of pixels. We then multiply x by rotation, and then implement it on the current pixels grid.

vec<-seq.int(3, round(nrow(photo)), length.out=9)
for(i in vec){
  photo.pca<-sapply(rgb.pca, function(j) {
    new.RGB<-j$x[,1:i] %*% t(j$rotation[,1:i])}, simplify="array")
  assign(paste("photo_", round(i,0), sep=""), photo.pca)
  writeJPEG(photo.pca, paste("photo_", round(i,0), "_princ_comp.jpg", sep=""))
}

par(mfrow=c(3,3)) 
par(mar=c(1,1,1,1))

plot(image_read(get(paste("photo_", round(vec[1],0), sep=""))))
plot(image_read(get(paste("photo_", round(vec[2],0), sep=""))))
plot(image_read(get(paste("photo_", round(vec[3],0), sep=""))))
plot(image_read(get(paste("photo_", round(vec[4],0), sep=""))))
plot(image_read(get(paste("photo_", round(vec[5],0), sep=""))))
plot(image_read(get(paste("photo_", round(vec[6],0), sep=""))))
plot(image_read(get(paste("photo_", round(vec[7],0), sep=""))))
plot(image_read(get(paste("photo_", round(vec[8],0), sep=""))))
plot(image_read(get(paste("photo_", round(vec[9],0), sep=""))))

The very first look at the pictures above gives us some overview on how the PCA image compression works. In the process we saved compressed images to the working directory, so that we can compare the image quality with its size. The decrease of image quality with reduction of principal components numbers can be observed in the data displayed below.

#install.packages("Metrics")
library(Metrics)

sizes<-matrix(0, nrow=9, ncol=4)
colnames(sizes)<-c("Number of PC", "Photo size", "Compression ratio", "MSE-Mean Squared Error")
sizes[,1]<-round(vec,0)
for(i in 1:9) {
  path<-paste("photo_", round(vec[i],0), "_princ_comp.jpg", sep="")
  sizes[i,2]<-file.info(path)$size 
  photo_mse<-readJPEG(path)
  sizes[i,4]<-mse(photo, photo_mse)
}
sizes[,3]<-round(as.numeric(sizes[,2])/as.numeric(sizes[9,2]),3)

#install.packages("knitr")
library(knitr)
kable(sizes)
Number of PC Photo size Compression ratio MSE-Mean Squared Error
3 36044 0.536 0.0212402
78 72887 1.084 0.0027280
152 73956 1.099 0.0017773
227 71095 1.057 0.0013443
302 69590 1.035 0.0008806
376 68662 1.021 0.0005074
451 67584 1.005 0.0003551
525 67271 1.000 0.0003404
600 67268 1.000 0.0003403

Conclusion

Principal Component Analysis offers many advantages in image compression, such as: reducing storage requirements and enhancing website speed and the overall user experience. At the same time we must be considered about its limitations. These include the potential loss of image quality during compression (directly dependent on the chosen number of principal components). PCA performs well with images featuring smooth and continuous structures but may be less effective for those with sharp edges or high-frequency content.Other disadvantages of PCA include: high computational requirements (especially for larger images), challenge with selecting the optimal number of principal components, and limitations resulting from applicability to a certain types of images (can be effective for photographs but potentially less so for logos or text).

Comparing with other compression techniques (eg. JPEG, PNG) PCA win by its efficiency in terms of compression, image quality, and speed. It’s important to remmber that each technique has its own strengths and weaknesses. The choice of the techinique used should be dependent on the application/website specification and requirements.