With the attached data file, build and visualize eigen imagery that accounts for 80% of the variability. Provide full R code and discussion.
test <- readJPEG("/Users/williamaiken/Downloads/jpg/RC_2500x1200_2014_us_53446.jpg")
imageShow(test)
Not a golfer
path = "/Users/williamaiken/Downloads/jpg/"
filenames <- list.files(path = "/Users/williamaiken/Downloads/jpg/", pattern="*.jpg")
data <- matrix(0, length(filenames), prod(dim(test)))
for (i in 1:length(filenames)) {
#im <- readJPEG(paste0(path, filenames[i]))
im = resizeImage(
readJPEG(paste0(path, filenames[i])),
1200/20,
2500/20
)
r <- as.vector(im[,,1])
g <- as.vector(im[,,2])
b <- as.vector(im[,,3])
data[i,] <- t(c(r, g, b))
}
shoes <- data.frame(t(data))
scaled <- scale(shoes, center = TRUE, scale = TRUE)
Sigma_=cor(scaled)
eig <- eigen(Sigma_)
eigenvalues <- eig$values
eigenvectors <- eig$vectors
cumsum(eigenvalues) / sum(eigenvalues)
## [1] 0.6833796 0.7836119 0.8350410 0.8629807 0.8827157 0.8996343 0.9143339
## [8] 0.9269998 0.9375131 0.9474672 0.9565156 0.9650405 0.9734219 0.9805532
## [15] 0.9875810 0.9943436 1.0000000
prop.var <- eigenvalues / sum(eigenvalues)
cum.var <- cumsum(eigenvalues) / sum(eigenvalues)
thres <- min(which(cum.var > .80))
scaling=diag(eig$values[1:thres]^(-1/2)) / (sqrt(nrow(scaled)-1))
eigshoes=scaled%*%eig$vectors[,1:thres]%*%scaling
par(mfrow=c(2,3))
imageShow(array(eigshoes[,1], c(60,125,3)))
imageShow(array(eigshoes[,2], c(60,125,3)))
imageShow(array(eigshoes[,3], c(60,125,3)))
It’s amazing how few eigenvectors are required to create images accounting for 80% of the variation in the initial data set
This assignment was incredibly challenging and could not have been accomplished without the reviewing the work of two other data scientists