Eigenface is a set of eigen vectors usually used in computer vision problem of human face recognition.The eigenvectors are derived from the covariance matrix of the probability distribution over the high-dimensional vector space of face images. The eigenfaces when combined construct the covariance matrix. This means that we can reduce the dimension by taking smaller set of faces. The faces when reconstructedcan be used for classification of images.
Implementation of this involves * Preparing the training set of images * Subtracting the mean image from every image. * Calculate the covariance matrix of a dataframe with all images * Calculate the eigenvectors and eigenvalues of the covariance matrix * Choose the principal components (which carry atleast the threshold amount of information)
This paper is about replication of the usual eigenfaces approach on images of my three dogs: ‘KC’, ‘Soyka’ and ‘Astra’. For simplicity, I took 10 images of each dog for the training set, and few other images for testing.
All steps are described below.
requiredPackages = c("RSpectra","bmp","stats","dplyr") # list of required packages
for(i in requiredPackages){if(!require(i,character.only = TRUE)) install.packages(i)}
for(i in requiredPackages){if(!require(i,character.only = TRUE)) library(i,character.only = TRUE) }
The images were first changed to 64x64 size binary data. This should make the subsequent calculations faster.Image quality is not important. The images then were stored into a .csv file.
dogfaces<-read.csv("C:\\Users\\tesfa\\OneDrive\\Documents\\CoursesDataScince\\Unsupervised learning\\Project\\Dimension reduction\\Photos\\Dog_faces_rotated.csv",header = T) %>% as.data.frame()
dim(dogfaces) # a row is an image
## [1] 30 4096
30 images of 3 dogs (10 images each) was used as training images.
# function for plotting images
plt_img <- function(x){ image(x, col=grey(seq(0, 1, length=256)))} # a function for plotting images
par(mfrow=c(3,10))
par(mar=c(0.1,0.1,0.1,0.1))
for (i in seq(1,30)) {
if(i<11){
plt_img(matrix(as.numeric(dogfaces[i, ]), nrow=64, byrow=T)) # the first dog
}
else if(i>10 & i<21){
plt_img(matrix(as.numeric(dogfaces[i, ]), nrow=64, byrow=T)) # the second dog
}
else if(i>21){
plt_img(matrix(as.numeric(dogfaces[i, ]), nrow=64, byrow=T)) # the third dog
}
}
The average face is to be subtracted from every image, this is the first step of normalizing the data.
par(mfrow=c(1,1))
average_face=colMeans(dogfaces) ## average image
dogfaces_scaled <- scale(data.matrix(dogfaces)) # scales the columns of a matrix D
Eigenvalues of the covariance matrix represent the variance magnitude. This is because The eigen vectors are along the direction the data set has the maximum variance.
covar <- cov(dogfaces_scaled)
I assume 95% data would be enough to identify a dog, therefore, only the first few eigen vectors will be taken for the following steps. This reduces the dimension of the data from 4096 to only 21.
eig <- eigen(covar)
eigenvalues <- eig$values
cum.var <- cumsum(eigenvalues) / sum(eigenvalues) #cummulative sum of eigenvalues
thres <- min(which(cum.var > .95)) # where cum.var > 95%?
thres
## [1] 21
eigenvalues_95=eigenvalues[1:thres] # only threshold number of eigenvalues
par(mfrow=c(1,1))
par(mar=c(2.5,2.5,2.5,2.5))
plot(1:21, eigenvalues_95, log = "y", type="o",main="Magnitude of the 21 biggest eigenvalues", xlab="Eigenvalue #", ylab="Magnitude")
Principal components are eigenvectors ordered in descending order of eigenvalue.
eigenvectors <- eig$vectors
This involves projecting images to eigen space and comparing it to the training images. ### Projection of photo to eigen vector space
# 1st dog project into eigen space and reconstruct
Eigen_image1 <- data.matrix(dogfaces[1,]) %*% eigenvectors
Reco_image1 <- Eigen_image1 %*% t(eigenvectors)
#add the average face
# average face
average_face=colMeans(dogfaces)
AV_Face=matrix(average_face,nrow=1,byrow=T)
# add the average face
reconstructed=Reco_image1+AV_Face
plt_img(matrix(as.numeric(reconstructed),nrow=64,byrow=T))
##Testing FOr testing if the images projected into the eigenspace can be used for identificaton, I have created 5 test images,the first 3 of which are other(non-training) images of the same 3 dogs. The other 2 are images of random dogs from the internet, zoomed and resized for this purpose. The steps of preparing the images is not relevant for this topic.
Here input image is changed to eigen space and then compared to all vectors in principal eigen faces The ‘face’ which has the smallest difference interms of squared distances is taken as the nearest image. If the minimum difference is greater than the set threshold, the function should tell the test image is unkown.
PFall <- data.matrix(dogfaces) %*% eigenvectors #all eigen faces
Detector <- function(image){
PF1 <- data.matrix(image) %*% eigenvectors
test <- matrix(rep(1,30),nrow=30,byrow=T)
test_PF1 <- test %*% PF1
Diff <- PFall-test_PF1
y <- (rowSums(Diff)*rowSums(Diff))
# Find the place with smallest squared error
x=c(1:30)
newdf=data.frame(cbind(x,y))
if(min(newdf$y) < 0.5*max(newdf$y))
{
the_number = newdf$x[newdf$y == min(newdf$y)]
# return(the_number)
par(mfrow=c(1,1))
cat("The most similar image is image: ", the_number)
plt_img(matrix(as.numeric(dogfaces[the_number, ]), nrow=64, byrow=T))
}
else{
print("No matching image")
}
}
Image 2 of the third dog was tested. The detector should find exact image.
# Testing on images from the training set
Detector(dogfaces[22,]) # testing an image from the training set
## The most similar image is image: 22
Other 5 test images,the first 3 of which are other(non-training) images of the same 3 dogs. The other 2 are images of random dogs from the internet, zoomed and resized for this purpose.
# Testing other images
testimages<-read.csv("C:\\Users\\tesfa\\OneDrive\\Documents\\CoursesDataScince\\Unsupervised learning\\Project\\Dimension reduction\\Photos\\testphotos\\testimages.csv",header = F, sep = ",") %>% as.data.frame()
rotate <- function(x) t(apply(x, -2, rev)) # rotate images
KC = rotate(data.matrix(testimages[1,]))
Dog1 = rotate(data.matrix(testimages[4,]))
par(mfrow=c(1,2))
plt_img(t(matrix(KC,nrow=64,byrow=T))) # another image of KCa
plt_img(t(matrix(Dog1,nrow=64,byrow=T))) # a random dog
Detector(KC) # another image of KCa
## [1] "No matching image"
Detector(Dog1) # a random dog
## [1] "No matching image"
The method is great at finding the images from training set, however, when other images are introduced, it easily gets wrong.