Using PCA to Detect Outliers in Images

In this article, the Principal Component Analysis will be used to find the outliers in images. PCA can be interpreted in the following ways:

  1. The principal components found in PCA captures the directions with highest variance in data (maximize the variance of projection along each component).
  2. The principal components minimize the reconstruction error (i.e., the squared distance between the original data and its estimate, by projecting the data on the first few principal compnents).

Since most of the time, the first few principal components explain almost all of the variance in the data, the above interpretations lead to the intuition that the data points that are not explained well by the first few principal components are probably the ones that are noisy.

## [1] "------------------------------------------"
## [1] "% Variance explained by upto k (1,2,3) PCs"
## [1] "------------------------------------------"
## [1]  94.79  98.85 100.00

## [1] "----------------------------------------------"
## [1] "85%, 90% and 95% quantile values of the scores"
## [1] "----------------------------------------------"
##       85%       90%       95% 
## 0.1311154 0.1864878 0.2558950

## [1] "------------------------------------------"
## [1] "% Variance explained by upto k (1,2,3) PCs"
## [1] "------------------------------------------"
## [1]  99.47  99.99 100.00

## [1] "----------------------------------------------"
## [1] "85%, 90% and 95% quantile values of the scores"
## [1] "----------------------------------------------"
##         85%         90%         95% 
## 0.006053905 0.010789753 0.017195377

## [1] "------------------------------------------"
## [1] "% Variance explained by upto k (1,2,3) PCs"
## [1] "------------------------------------------"
## [1]  99.48  99.90 100.00

## [1] "----------------------------------------------"
## [1] "85%, 90% and 95% quantile values of the scores"
## [1] "----------------------------------------------"
##        85%        90%        95% 
## 0.03542360 0.04108505 0.05129503