Using PCA to represnt digits in the eigen-digits space

In this article, the handwritten digits dataset (mnist_train) is going to be used to visualize and demonstrate how Principal Component Analysis can be used to represent the digits in the low dimensional feature space as a linear combination of the principal components as orthonormal basis vectors.

Here only the digit \(8\) is considered for the analysis.
The data contains \(477\) handwritten images of the digit \(8\) and each digit is a \(28x28\) image which is stored as a row of length \(784\).
Below are the original images of \(8\) in the dataset.

Next PCA is done on the dataset and the variance explained by the first few principal components are shown below.

##  [1] 57.87 63.76 66.76 69.38 71.32 72.95 74.32 75.55 76.69 77.79 78.87
## [12] 79.87 80.72 81.50 82.25 82.95 83.60 84.18 84.74 85.26 85.77 86.26
## [23] 86.73 87.16 87.55 87.92 88.29 88.64 88.97 89.31 89.62 89.92 90.20
## [34] 90.47 90.72 90.97 91.20 91.43 91.64 91.84 92.03 92.22 92.41 92.59
## [45] 92.76 92.93 93.08 93.24 93.39 93.54 93.69 93.83 93.97 94.10 94.22
## [56] 94.35 94.47 94.58 94.69 94.81 94.91 95.02 95.12 95.22 95.31 95.41
## [67] 95.50 95.59 95.67 95.76 95.84 95.92 96.00 96.08 96.15

The Principal components (the eigenvectors) are then visualized. Since they (particularly the dominant ones) look like \(8\) we can name this orthonormal basis vectors in the rotated space as eigen digits. They typically capture different features of the handwritten digit \(8\), depending on different writing styles.
The first \(210\) eigen-digits are visualized (stacked in 15 rows and 14 columns), in the column-major order, with the first eigenvector is the top left one, the second one being the next one below the first one in the same column and so on. As can be seen, the first few dominant eigenvectors clearly represents the digit \(8\) and hence the name.

Next let’s just focus on one single data point (one particular handwritten digit image) and try to see how it can be represented in the eigen-digits space, as linear combination of the eigen-digits. The next figure shows the one we shall now try to represent in the feature space.
Since the eigen-vectors \(e_j\) returned by PCA forms orthonormal basis vectors in the transformed space, any image tuple \(I_k\) can be expressed exactly as \(I = \sum_{j=1}^{28\times 28} w_j e_j\), where the weights \(w_j=\langle I, e_j \rangle = I^{T}.e_j\), which will be used in the linear combination of the eigenvectors to represent the digit image in the feature space.
Also, we don’t want to use all the \(784\) basis vectors, since the first few of the principal components (as computed above, first \(62\) of them explains more than \(95%\) variance), we shall use first \(75\) of them only and will represent the original image approximately in the transformed feature space as \(\hat{I} = \sum_{j=1}^{k} w_j e_j\), where \(k=1,2,\ldots,75\).
The below figure shows how the magnitude of the weights along the eigenvectors decrease as we go further away from the dominant eigenvectors, which is expected, since they are the ones that are supposed to contribute most to represent the image in the feature space.

## [1] "Weights for the first 20 basis vectors for the digit"

##  [1] -1905.408055  1355.563894   564.485519   -78.098594   747.401692
##  [6]   116.314375  -293.701813   249.330045   158.920772  -258.179455
## [11]  -659.294321  -322.247876   215.188713  -341.871715     4.276662
## [16]   279.882382   301.395575   123.411382    67.083178  -285.374166

The below figures show how the digit image looks like when it’s represented as linear comibination of only the first \(k\) eigen vectors (eign digits) only, where \(k=1,2,\ldots,75\), in the same column major order.
As expected, the approximate representation of the digit looks like the actual representation shown above when it’s expressed as linear combination of first few eigenvectors with the weights computed.
The approximate representation of the digit resembles the original digit as more and more basis vectors are included in its representation.

Data File Format

Here the images are not normalized while doing the PCA, if normalized the weights would become smaller. But the idea should be c
The next figure shows how the error (computed as Frobenius norm of the difference of the original and the approxmated digit image, expressed as the linear combination of the first few eigenvectors in the PCA space). As expected again, the figure shows that the error decreases very fast.