library(jpeg)
library(OpenImageR)
In order to replicate, place the jpg folder of images within your working directory. First, a list of all image files is created. Then, one image is loaded (via the jpeg library) in order to obtain the dimensions of the images, which are 1200 x 2500 x 3. Finally, a 17 by 9,000,000 matrix is created, with 17 representing the number of images and 9,000,000 being the product of the dimensions of the images.
# Path of images
path = file.path(getwd(), 'jpg/')
# List all images
images <- list.files(path = path, pattern = "jpg")
# Load first image (to inform the dimensions)
image <- readJPEG(paste0(path, images[1]))
# Create 17 (number of images) by 9,000,000 (product of dimensions of images) matrix
data <- matrix(0, length(images), prod(dim(image)))
Here all images are looped through and converted to vectors of R, G, B components, then the transpose is taken so the result is a 9,000,000 rows and 17 columns matrix.
# Loop through images and convert to R, B, G vectors
for (i in 1:length(images)) {
im <- readJPEG(paste0(path, images[i]))
r <- as.vector(im[,,1])
g <- as.vector(im[,,2])
b <- as.vector(im[,,3])
# Concat and take transpose
data[i,] <- t(c(r, g, b))
}
df_sneakers <- as.data.frame(t(data))
Center the data by subtracting the mean. Scale the data by dividing by the standard deviation.
scaled_sneakers <- scale(df_sneakers, center = TRUE, scale = FALSE)
sigma <- cov(scaled_sneakers)
After calculating the eigenvalues and eigenvectors, the cumulative variance for the number of components included is reviewed. The results show that including the top 2 components will give us an image that accounts for ~81% of the variability.
# Calculate eigenvalues and eigenvectors
eig <- eigen(sigma)
# Cumulative variance for the number of components
cum_var <- cumsum(eig$values) / sum(eig$values)
# Number of components to include in eigenimage for 81% variability
num_comp <- 2
cum_var
## [1] 0.7149925 0.8104477 0.8478518 0.8723797 0.8916808 0.9072236 0.9205413
## [8] 0.9331625 0.9448626 0.9543871 0.9636185 0.9714413 0.9784834 0.9851190
## [15] 0.9913699 0.9968936 1.0000000
Again, it can be seen that including only the top two most informative components will allow us to generate an image accounting for over 80% of the variability. Only 10 components are needed to obtain an image that accounts for 95% of the variability, demonstrating the magnitude of this dimensionality reduction technique.
plot(1:17, cum_var, xlab="Number of Components", ylab="Proportion of Variance")
Per Diego Herrero, a scaling factor is created by generating a diagonal matrix from the eigenvalues of the top 2 components, raised to the -.5 power, divided by the square root of m (rows), minus 1. Then, multiply the centered data (scaled_sneakers) with the top 2 eigenvectors, then with the scaling factor. Lastly, convert into dimensions of original and select one image to display.
# Compute scaling factor
scaling_sneakers <- diag(eig$values[1:num_comp]^(-1/2)) / (sqrt(nrow(scaled_sneakers)-1))
# Multiply centered matrix by top 2 components, then by scaling factor
eigensneakers <- scaled_sneakers %*% eig$vectors[,1:num_comp] %*% scaling_sneakers
# Convert to dimensions of original images, and select one image to display
eigenimage <- array(eigensneakers[,2], dim(image))
Accounts for ~81% of the variability.
imageShow(eigenimage)