With the attached data file, build and visualize eigenimagery that accounts for 80% of the variability. Provide full R code and discussion.
For this assignment, we are going to take a bunch of image files and find the set of images that account for ~80% of the variability of the entire image. The methodology used in this assignment is based on Principal Component Analysis and using eigenvalues and eigenvectors to accomplish this task.
For this assignment, I consulted a number of resources to replicate the assignment, which are referenced at the bottom of the file.
The first step is to retrieve the image files that are stored in the jpg folder within the current working directory. Additionally, we set a variable num_images that will represent the number of images that are included in the folder.
path = './jpg'
files = list.files(path,pattern="\\.jpg")
num_images = length(files)
Next we will create an image matrix that will hold the data associated with each of the 17 images in the file. We will start by using a sample image to identify the dimensions needed for the array. The matrix will be the size of the 17 images and the product of the dimensions of the image = 120025003 = 9,000,000. Additionally, we will take the data representing the images along with the RGB colors for the image and put those into the matrix representing the eigenimages.
sample_image = readJPEG(file.path(path,files[1]))
image_matrix=matrix(0,num_images,prod(dim(sample_image)))
for (i in 1:num_images) {
img=readJPEG(file.path(path,files[i]))
r=as.vector(img[,,1])
g=as.vector(img[,,2])
b=as.vector(img[,,3])
image_matrix[i,]=t(c(r,g,b))
}
shoes=as.data.frame(t(image_matrix))
Next we take our image dataframe and rescale it
scaled=scale(shoes, center = TRUE, scale = TRUE)
Next we take the scaled dataframe, and find the correlation matrix of each image to one another, which we will use to calculate our eigenvalues and eigenvectory.
Sigma_=cor(scaled)
We then calculate the eigenvectors and eigenvalues of this correlation matrix and use that to identify the cumulative percent of the variance contributed by each image, with the goal of identifying the number of images required to account for ~80% of the oveall variance in the images. Based on our cumulative variance vector - variance_pct - we find that we achieve a variance just over 80% at image 3. Therefore, we only need 3 eigenimages to account for ~80% of the variance in our eigenshoe images
myeigen=eigen(Sigma_)
variance_pct = cumsum(myeigen$values) / sum(myeigen$values)
min_threshold = min(which(variance_pct >=.80))
This part is the part that I did not fully comprehend, but was something that I observed in several other solutions from individuals in the class. And we then create our eigen_image based on this calculation using the 3 images we identified that are responsible for the ~80% variability of our images; and finally we output the image.
scaling=diag(myeigen$values[1:min_threshold]^(-1/2)) / (sqrt(nrow(scaled)-1))
eigen_image=scaled%*%myeigen$vectors[,1:min_threshold]%*%scaling
image = array(eigen_image[,min_threshold], dim(sample_image))
imageShow(image)
This particular assignment incorporated several topics in linear algebra as well as principal component analysis. As mentioned in the first resource listed below, eigenfaces are used in facial recognition as well as several other areas including handwriting recognition, lip reading, voice recognition and other similar areas. While I did not fully understand everything that went into solving this problem, it was a very useful introduction and hands-on exercise to help me better understand eigenfaces.