This nice example was taken and re-worked from João Neto’s website here. Let’s begin by loading the necessary packages. In order to process image data we will make use of the EBImage package from the Bioconductor repository, meaning we should first run the commands source(“https://bioconductor.org/biocLite.R”) and biocLite(“EBImage”).

library(EBImage)

Data import and exploration

Now download the image we will be playing around with and take a look at the object that is defined by the readImage function, that stores images as arrays containing 0-1 pixel intensities.

tmp <- readImage(
  'https://s-media-cache-ak0.pinimg.com/736x/5d/73/47/5d73475a51b41e1ef32ca98ecbdb7628.jpg')

tmp

## Image 
##   colorMode    : Color 
##   storage.mode : double 
##   dim          : 736 419 3 
##   frames.total : 3 
##   frames.render: 1 
## 
## imageData(object)[1:5,1:6,1]
##            [,1]       [,2]       [,3]       [,4]       [,5]       [,6]
## [1,] 0.05490196 0.05490196 0.05490196 0.05490196 0.05490196 0.05490196
## [2,] 0.05490196 0.05490196 0.05490196 0.05490196 0.05490196 0.05490196
## [3,] 0.05490196 0.05490196 0.05490196 0.05490196 0.05490196 0.05490196
## [4,] 0.05490196 0.05490196 0.05490196 0.05490196 0.05490196 0.05490196
## [5,] 0.05490196 0.05490196 0.05490196 0.05490196 0.05490196 0.05490196

str(tmp)

## Formal class 'Image' [package "EBImage"] with 2 slots
##   ..@ .Data    : num [1:736, 1:419, 1:3] 0.0549 0.0549 0.0549 0.0549 0.0549 ...
##   ..@ colormode: int 2

The \(tmp\) object, containing the image, has a physical dimension of approximately 7Mb, as can be immediately seen from

object.size(tmp)

## 7402000 bytes

Let’s now display the pixels on each third-dimension slice of the \(tmp\) array in a gray scale, corresponding to one of red, green and blue pixel intensities for the RGB representation system.

par(mfrow=c(1,3))
image(imageData(tmp)[,,1],col = grey(seq(0, 1, length = 256)),
      main='red')
image(imageData(tmp)[,,2],col = grey(seq(0, 1, length = 256)),
      main='green')
image(imageData(tmp)[,,3],col = grey(seq(0, 1, length = 256)),
      main='blue')

par(mfrow=c(1,1))

A suitable combination of these three images/slices can be obtain with the following code, that outputs the original image (nevermind the image flip):

display(tmp,method='raster')

Data transform

To obtain the properly computed luminance for monitors having phosphors that were contemporary at the introduction of NTSC television in 1953 (this is just a standard, we could as well have picked another one but this works well with gray scales), we employ the coefficients 0.299, 0.587 and 0.114 to weigh the intensities on each slice of the array as follows:

red.weight <- .299 
green.weight <- .587
blue.weight <- .114
m <- red.weight * imageData(tmp)[,,1] + 
  green.weight * imageData(tmp)[,,2] + 
  blue.weight  * imageData(tmp)[,,3]

The result is

image(m, col = grey(seq(0, 1, length = 256)),axes=F)

Dimensionality reduction

We are now going to use Principal Component Analysis (PCA) to demonstrate the possibility of reducing the physical weight of the \(tmp\) object (hence, of the image itself) in spite of the image quality.

Let us first extract the principal components from the matrix \(m\) defined a few lines ago, and construct a plot showing the percentage of explained variance as a function of the number of components.

pca.m <- prcomp(m,scale=TRUE)
plot(summary(pca.m)$importance[3,], type="l", 
     ylab="%variance explained", 
     xlab="nth component (decreasing order)")
abline(h=0.99,v=75,col="red",lty=3)

To get an idea, this plot suggests that the first 75 (out of 419) components are enough to explain approximately \(99%\) of the variance. The following code extracts them and constructs the compact version of the original matrix \(m\) using the partial information.

c.comp <- 1:75
feature.vector <- pca.m$rotation[,c.comp]

compact.data <- t(feature.vector) %*% t(m)
dim(compact.data)

## [1]  75 736

The representation is indeed parsimonious (much less columns!) How much do these components weigh?

object.size(feature.vector)

## 256008 bytes

We now build the actual approximation of \(m\) using \(compact.data\) and \(feature.vector\). Note that these two objects alone are enough to reconstruct (an approximated version of) \(m\), and jointly weigh 702216 bytes, as opposed to the weight of \(m\), which is 2467272 bytes.

approx.m <- t(feature.vector %*% compact.data) 
dim(approx.m)

## [1] 736 419

Let’s show the approximation

par(mfrow=c(1,2))
image(m, col = grey(seq(0, 1, length = 256)),
      main='original')
image(approx.m, col = grey(seq(0, 1, length = 256)),
      main='approximation')

par(mfrow=c(1,1))

A simple application of linear PCA to image processing: data compression

Data import and exploration

Data transform

Dimensionality reduction