Eigen faces of my 3 dogs

Eigenfaces

Eigenface is a set of eigen vectors usually used in computer vision problem of human face recognition.The eigenvectors are derived from the covariance matrix of the probability distribution over the high-dimensional vector space of face images. The eigenfaces when combined construct the covariance matrix. This means that we can reduce the dimension by taking smaller set of faces. The faces when reconstructedcan be used for classification of images.

Implementation of this involves * Preparing the training set of images * Subtracting the mean image from every image. * Calculate the covariance matrix of a dataframe with all images * Calculate the eigenvectors and eigenvalues of the covariance matrix * Choose the principal components (which carry atleast the threshold amount of information)

This paper is about replication of the usual eigenfaces approach on images of my three dogs: ‘KC’, ‘Soyka’ and ‘Astra’. For simplicity, I took 10 images of each dog for the training set, and few other images for testing.

All steps are described below.

Install the required packages

requiredPackages = c("RSpectra","bmp","stats","dplyr") # list of required packages
for(i in requiredPackages){if(!require(i,character.only = TRUE)) install.packages(i)}
for(i in requiredPackages){if(!require(i,character.only = TRUE)) library(i,character.only = TRUE) }

Import the data

The images were first changed to 64x64 size binary data. This should make the subsequent calculations faster.Image quality is not important. The images then were stored into a .csv file.

dogfaces<-read.csv("C:\\Users\\tesfa\\OneDrive\\Documents\\CoursesDataScince\\Unsupervised learning\\Project\\Dimension reduction\\Photos\\Dog_faces_rotated.csv",header = T) %>% as.data.frame()
dim(dogfaces) # a row is an image

## [1]   30 4096

Training images

30 images of 3 dogs (10 images each) was used as training images.

# function for plotting images
plt_img <- function(x){ image(x, col=grey(seq(0, 1, length=256)))} # a function for plotting images

par(mfrow=c(3,10))
par(mar=c(0.1,0.1,0.1,0.1))


for (i in seq(1,30)) {
  if(i<11){
    plt_img(matrix(as.numeric(dogfaces[i, ]), nrow=64, byrow=T)) # the first dog
  }
  else if(i>10 & i<21){
    plt_img(matrix(as.numeric(dogfaces[i, ]), nrow=64, byrow=T)) # the second dog
  }
  else if(i>21){
    plt_img(matrix(as.numeric(dogfaces[i, ]), nrow=64, byrow=T)) # the third dog
  }
}

Normalizing the data

Find the average face

The average face is to be subtracted from every image, this is the first step of normalizing the data.

par(mfrow=c(1,1))
average_face=colMeans(dogfaces) ## average image

Scaling the images

dogfaces_scaled <- scale(data.matrix(dogfaces)) # scales the columns of a matrix D

Find the principal eigen values

Eigenvalues of the covariance matrix represent the variance magnitude. This is because The eigen vectors are along the direction the data set has the maximum variance.

Find the covariance matrix

covar <- cov(dogfaces_scaled)

Find min number of eigen values carrying 95% information

I assume 95% data would be enough to identify a dog, therefore, only the first few eigen vectors will be taken for the following steps. This reduces the dimension of the data from 4096 to only 21.

eig          <- eigen(covar)
eigenvalues  <- eig$values
cum.var  <- cumsum(eigenvalues) / sum(eigenvalues) #cummulative sum of eigenvalues
thres    <- min(which(cum.var > .95)) # where cum.var > 95%?
thres

## [1] 21

eigenvalues_95=eigenvalues[1:thres] # only threshold number of eigenvalues

Plotting the first 21 eigen values

par(mfrow=c(1,1))
par(mar=c(2.5,2.5,2.5,2.5))
plot(1:21, eigenvalues_95, log = "y", type="o",main="Magnitude of the 21 biggest eigenvalues", xlab="Eigenvalue #", ylab="Magnitude")

Finding the principal components

Principal components are eigenvectors ordered in descending order of eigenvalue.

eigenvectors <- eig$vectors

Detecting photos using eigen faces

This involves projecting images to eigen space and comparing it to the training images. ### Projection of photo to eigen vector space

# 1st dog project into eigen space and reconstruct
Eigen_image1 <- data.matrix(dogfaces[1,]) %*% eigenvectors

Reconstruction

Reco_image1 <- Eigen_image1 %*% t(eigenvectors)

#add the average face
# average face
average_face=colMeans(dogfaces)
AV_Face=matrix(average_face,nrow=1,byrow=T)

# add the average face
reconstructed=Reco_image1+AV_Face
plt_img(matrix(as.numeric(reconstructed),nrow=64,byrow=T))

##Testing FOr testing if the images projected into the eigenspace can be used for identificaton, I have created 5 test images,the first 3 of which are other(non-training) images of the same 3 dogs. The other 2 are images of random dogs from the internet, zoomed and resized for this purpose. The steps of preparing the images is not relevant for this topic.

Detector function

Here input image is changed to eigen space and then compared to all vectors in principal eigen faces The ‘face’ which has the smallest difference interms of squared distances is taken as the nearest image. If the minimum difference is greater than the set threshold, the function should tell the test image is unkown.

PFall <- data.matrix(dogfaces) %*% eigenvectors #all eigen faces
Detector <- function(image){
  PF1 <- data.matrix(image) %*% eigenvectors
  test <- matrix(rep(1,30),nrow=30,byrow=T)
  test_PF1 <- test %*% PF1
  Diff <- PFall-test_PF1
  y <- (rowSums(Diff)*rowSums(Diff))
  
  # Find the place with smallest squared error
  x=c(1:30)
  newdf=data.frame(cbind(x,y))
  if(min(newdf$y) < 0.5*max(newdf$y))
  {
    the_number = newdf$x[newdf$y == min(newdf$y)]
    # return(the_number)
    par(mfrow=c(1,1))
    cat("The most similar image is image: ", the_number) 
    plt_img(matrix(as.numeric(dogfaces[the_number, ]), nrow=64, byrow=T)) 
  }
  else{
    print("No matching image")
  }
  
}

Testing on images from the training set

Image 2 of the third dog was tested. The detector should find exact image.

# Testing on images from the training set
Detector(dogfaces[22,]) # testing an image from the training set

## The most similar image is image:  22

Testing other images

Other 5 test images,the first 3 of which are other(non-training) images of the same 3 dogs. The other 2 are images of random dogs from the internet, zoomed and resized for this purpose.

# Testing other images
testimages<-read.csv("C:\\Users\\tesfa\\OneDrive\\Documents\\CoursesDataScince\\Unsupervised learning\\Project\\Dimension reduction\\Photos\\testphotos\\testimages.csv",header = F, sep = ",") %>% as.data.frame()

rotate <- function(x) t(apply(x, -2, rev)) # rotate images 
KC = rotate(data.matrix(testimages[1,]))
Dog1 = rotate(data.matrix(testimages[4,]))

par(mfrow=c(1,2))
plt_img(t(matrix(KC,nrow=64,byrow=T))) # another image of KCa
plt_img(t(matrix(Dog1,nrow=64,byrow=T))) # a random dog

Detector(KC) # another image of KCa

## [1] "No matching image"

Detector(Dog1) # a random dog

## [1] "No matching image"

Conclusion

The method is great at finding the images from training set, however, when other images are introduced, it easily gets wrong.