In this document, we will explore the concept of eigenimagery, which involves the use of principal component analysis (PCA) to identify and visualize the most important features of a set of images. Specifically, we will use R to read in a set of images, perform PCA on the images, and generate eigenimages that capture the most important features of the original images.
The first step is to load the required packages and read in the images from a directory.
We then reshape the arrays into a matrix and convert it to a data frame.
# load required packages:
library(jpeg)
library(ggplot2)
library(OpenImageR)
# set the path to the directory containing the jpg files:
path <- file.path("imgs")
# get the list of jpg files:
img_list <- list.files(path, pattern = "\\.jpg$", full.names = TRUE)
# load the images:
images <- lapply(img_list, readJPEG)
# get dimensions of one image:
img_dims <- dim(images[[1]])
# reshape the images into a matrix and then into a data frame:
image_df <- do.call(cbind, images) |> as.data.frame()
# write the df to a file because of memory issues:
saveRDS(image_df, file = "image_df.RDS")
# also write img_dims:
saveRDS(img_dims, "img_dims")
# clear workspace:
rm(list = ls(all.names = TRUE))
# clear unused memory:
gc()
We then scale the data using the scale() function. This is an important step in PCA because it ensures that each variable (pixel) has equal weight in the analysis.
# read in `image_df`:
image_df <- readRDS("image_df.RDS")
# scale the data:
scaled_images <- scale(image_df, center = TRUE, scale = TRUE)
# write the matrix to a file because of memory issues:
saveRDS(scaled_images, file = "scaled_images.RDS")
# clear workspace:
rm(list = ls(all.names = TRUE))
# clear unused memory:
gc()
We then compute the covariance matrix using the cov() function, and then compute the eigenvalues and eigenvectors of the covariance matrix using the eigen() function. The cumulative variance is then computed using the cumulative sum of the eigenvalues. We can then determine the number of eigenfaces needed to explain a certain percentage of the variance. In this case, we aim to explain 80% of the variance.
# read in `scaled_images.RDS`:
scaled_images <- readRDS("scaled_images.RDS")
# compute the covariance matrix:
sigma <- cov(scaled_images)
# compute the eigenvalues and eigenvectors:
eig <- eigen(sigma)
eigenvalues <- eig$values
eigenvectors <- eig$vectors
# compute the cumulative variance:
cum_var <- cumsum(eigenvalues) / sum(eigenvalues)
cum_var
## [1] 0.6833138 0.7824740 0.8353528 0.8629269 0.8825039 0.8996099 0.9144723
## [8] 0.9271856 0.9374462 0.9472859 0.9561859 0.9647964 0.9732571 0.9804242
## [15] 0.9874038 0.9941511 1.0000000
# find the number of eigenfaces needed to explain 80% of the variance:
threshold <- min(which(cum_var > .80))
threshold
## [1] 3
# plot the cumulative variance:
cum_var_df <- as.data.frame(cum_var)
ggplot(cum_var_df, aes(x = seq_along(cum_var), y = cum_var)) +
geom_line() +
geom_point() +
xlab("Number of Eigenfaces") +
ylab("Cumulative Variance") +
ggtitle("Cumulative Variance")
We then compute the eigenfaces, which are the principal components of the images. The eigenfaces capture the most important features of the images and can be used to reconstruct the original images. We display the third eigenface using the imageShow() function from the OpenImageR package.
# compute the eigenfaces:
scaling <- diag(eigenvalues[1:threshold]^(-1/2)) / sqrt(nrow(scaled_images) - 1)
eigenfaces <- scaled_images %*% eigenvectors[, seq_len(threshold)] %*% scaling
# read in `img_dims`:
img_dims <- readRDS("img_dims")
eigenimage <- array(eigenfaces[,3], img_dims)
# display the third eigenface:
imageShow(eigenimage, clear_viewer = TRUE)
In this document, we have explored the concept of eigenimagery and demonstrated how to use R to perform PCA on a set of images and generate eigenimages. Eigenimagery is a powerful technique for analyzing and visualizing large sets of images and has many applications in various fields. By understanding and applying the principles of eigenimagery, we can gain valuable insights into the features and patterns of the images we work with.