Introduction

Libraries

library(jpeg)
library(factoextra)
## Loading required package: ggplot2
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(magick)
## Linking to ImageMagick 6.9.12.3
## Enabled features: cairo, fontconfig, freetype, heic, lcms, pango, raw, rsvg, webp
## Disabled features: fftw, ghostscript, x11
library(gridExtra)
library(ggplot2)
library(gridExtra)
library(Metrics)
library(pdp)

Image

I chose a photo of a muscari flower from Google Images.

img<-readJPEG("img.jpg")
plot(1, type="n")   # plotting the rasterImage – colour photo
rasterImage(img, 0.6, 0.6, 1.4, 1.4)

The raster function allows us to display the above picture on a plot as a raster image. A raster image, also known as a bitmap, is a digital image that is constructed of a grid of pixels.

dim(img)
## [1] 1220 1313    3

Ploting each RGB color separetly.

plot(1, type="n")
rasterImage(img[,,3],0.6, 0.6, 1.4, 1.4)

writeJPEG(img[,,1], "photo_bw.jpg")

Converting to a grey scale

Since a color photo is represented by three matrices, each representing one component of RGB colors, we can convert it to a grayscale photo by summing up the RGB values for each pixel and dividing by the maximum value to obtain a value between 0 and 1.

img.sum<-img[,,1]+img[,,2]+img[,,3] 
img.bw<-img.sum/max(img.sum)
plot(1, type="n")
rasterImage(img.bw, 0.6, 0.6, 1.4, 1.4)

writeJPEG(img.bw, "photo_bw.jpg")

Principal Component Analysis (PCA)

PCA is a data preprocessing techique used to reduce dimensionality of the dataset. It transforms high-dimensions data into lower-dimansions and keep as much information as possible at the same time. PCA copes well with large datasets comprising of large number of features per observation. Therefore it is widely utilized in image processing and genome reaserch. The algorithm looks as follows:

  1. Standardization of the initial variables.
  2. Discovery of correlations by calculating the covariance matrix.
  3. Determination of the principal components by computing eigenvectors and eigenvalues.
  4. Decision of which principal components to keep by establishing a feature vector.
  5. Reshaping the data along the principal component axes.

When applied on image, PCS decomposes this image into set of eigenvectors which represents the greatest variance in the each R, G and B shade. When applied to images, PCA works by decomposing an image into a set of orthogonal basis vectors, known as eigenvectors. These eigenvectors represent the principal components of the image, which are the directions of greatest variance in the image data.

Generation of three matrices, each representing one color of RGB scale.

r<-img[,,1] 
g<-img[,,2]
b<-img[,,3]

Perform PCA on each matrix separately and then merge the results into one object

r.pca<-prcomp(r, center=FALSE, scale.=FALSE)    
g.pca<-prcomp(g, center=FALSE, scale.=FALSE)
b.pca<-prcomp(b, center=FALSE, scale.=FALSE)
rgb.pca<-list(r.pca, g.pca, b.pca)  

Visualization of the direction of convergence of multiple variables

R
fviz_pca_var(r.pca, col.var = "red")

##### G

fviz_pca_var(g.pca, col.var = "green")

B
fviz_pca_var(b.pca, col.var = "blue")

Visualization of the contribution of variables from the results of Principal Component Analysis

R
PC1 <- fviz_contrib(r.pca, choice = "var", axes = 1)
PC2 <- fviz_contrib(r.pca, choice = "var", axes = 2)
PC3 <- fviz_contrib(r.pca, choice = "var", axes = 3)
PC4 <- fviz_contrib(r.pca, choice = "var", axes = 4)
PC5 <- fviz_contrib(r.pca, choice = "var", axes = 5)
PC6 <- fviz_contrib(r.pca, choice = "var", axes = 6)
grid.arrange(PC1, PC2, PC3, PC4, PC5, PC6)

G
PC1 <- fviz_contrib(g.pca, choice = "var", axes = 1)
PC2 <- fviz_contrib(g.pca, choice = "var", axes = 2)
PC3 <- fviz_contrib(g.pca, choice = "var", axes = 3)
PC4 <- fviz_contrib(g.pca, choice = "var", axes = 4)
PC5 <- fviz_contrib(g.pca, choice = "var", axes = 5)
PC6 <- fviz_contrib(g.pca, choice = "var", axes = 6)
grid.arrange(PC1, PC2, PC3, PC4, PC5, PC6)

B
PC1 <- fviz_contrib(b.pca, choice = "var", axes = 1)
PC2 <- fviz_contrib(b.pca, choice = "var", axes = 2)
PC3 <- fviz_contrib(b.pca, choice = "var", axes = 3)
PC4 <- fviz_contrib(b.pca, choice = "var", axes = 4)
PC5 <- fviz_contrib(b.pca, choice = "var", axes = 5)
PC6 <- fviz_contrib(b.pca, choice = "var", axes = 6)
grid.arrange(PC1, PC2, PC3, PC4, PC5, PC6)

Visulaizing the results of PCA.

f1<-fviz_eig(r.pca, main="Red", barfill="red", ncp=5, addlabels=TRUE)
f2<-fviz_eig(g.pca, main="Green", barfill="green", ncp=5, addlabels=TRUE)
f3<-fviz_eig(b.pca, main="Blue", barfill="blue", ncp=5, addlabels=TRUE)
grid.arrange(f1, f2, f3, ncol=3)

9 photos with a different number of principal components. We use the formula: min. 3 & max=n PC and then multiply x * rotation and apply it on existing pixels grid.

vec<-seq.int(3, round(nrow(img)), length.out=9)
for(i in vec){
photo.pca<-sapply(rgb.pca, function(j) {
    new.RGB<-j$x[,1:i] %*% t(j$rotation[,1:i])}, simplify="array")
assign(paste("photo_", round(i,0), sep=""), photo.pca) 
writeJPEG(photo.pca, paste("photo_", round(i,0), "_princ_comp.jpg", sep=""))
}

plot(image_read(photo_3))
round(vec,0)
## [1]    3  155  307  459  612  764  916 1068 1220
# plotting 9 images
par(mfrow=c(3,3)) 
par(mar=c(1,1,1,1))
plot(image_read(get(paste("photo_", round(vec[1],0), sep=""))))
plot(image_read(get(paste("photo_", round(vec[2],0), sep=""))))
plot(image_read(get(paste("photo_", round(vec[3],0), sep=""))))
plot(image_read(get(paste("photo_", round(vec[4],0), sep=""))))
plot(image_read(get(paste("photo_", round(vec[5],0), sep=""))))
plot(image_read(get(paste("photo_", round(vec[6],0), sep=""))))
plot(image_read(get(paste("photo_", round(vec[7],0), sep=""))))
plot(image_read(get(paste("photo_", round(vec[8],0), sep=""))))
plot(image_read(get(paste("photo_", round(vec[9],0), sep=""))))

With each iteration, the images become clearer. However, the most visible difference in image sharpness to the human eye is between the first image with 3 principal components and the second image with 282 principal components.

Summary

In this article we implemented Principal Component Analysis to reduce the dimensionality of a muscari image. Images often contain a high degree of redundancy thus PCA can be employed to identify the most important features or principal components in the data and remove the less important ones.