Hebbian-Based Principal Component Analysis

This work implements Hebbian-based PCA implemented as a simple feedforward one-layer neural network. It has m inputs that corresponds to the number of features of our original data. There are l outputs (that are linear neurons) that correspond to the number of components we want to extract.

Load data

I took iris dataset not because it’s small but due to limitations of my personal computer. Since we want to get back all the components, it will require 10^(number of features) epochs to converge. Iris has only 4 dimensions and it consumes significant amount of CPU time to converge (10^4 epochs).

Preprocess data

The preprocessing step includes centering the data. It’s also reasonable to encode the data on the hypershere. However, the algorithm converges both ways to the values very close to the pricipal component obtained using classical PCA.

normalize <- function(M) {
        #center data
        means = apply(M,2,mean)
        Xnorm = t(apply(M,1,function(x) {x-means}))
        Xnorm
}

encode <- function(M) {
        # put on hypershpere
        mins = apply(M,2,min)
        maxs = apply(M,2,max)
        ranges = maxs-mins
        Xnorm = t(apply(M,1,function(x) { 2*(x-mins)/ranges-1}))
        Xnorm
        
}
Xnorm = normalize(X)
# Save a copy of the normalized data set
X_original = Xnorm

Run PCA

Generalized Hebbian Algorithm

Unlike classical Oja’s rule that extracts only one component at a time generalized hebbian algorithm ‘works’ on all the component simultaneously. Basically for every component k, if k != 1, the next x used in the update rule is calculated as follows: \[\mathbf{x'} = \textbf{x}-\sum_{k=1}^{j-1} {\mathbf{w_k(n)}}y_k(n)\]

The whole algorithm is therefore can be formulated as follows: \[y_j(n)=\sum_{i=1}^{m}w_{ji}(n)x_i(n)\] \[\Delta(w_{ji}(n)) = \eta (y_j(n)x_i(n)-y_j(n) \sum_{k=1}^j w_{ki}(n) y_k(n)\]

trainNetwork <- function(Xnorm,n_components,eta) {
        # initialize
        n_features = ncol(Xnorm)
        epochs = 10^n_components
        z = epochs*nrow(Xnorm)
        W = matrix(data=runif(n_features*n_components,-1,1), 
                   nrow=n_components, 
                   ncol=n_features)
        W = normalize(W)
        Y = matrix(data=0,nrow=n_obs,ncol=n_components)
        # train
        for (n in seq(1:z)) {
                ind = n%%(n_obs)+1
                for (j in seq(1:n_components)) {
                        Y[ind,j] = W[j,]%*%Xnorm[ind,]
                        W[j,] = W[j,]+eta*(Y[ind,j]*Xnorm[ind,]-Y[ind,j]*(Y[ind,1:j]%*%W[1:j,]))
                }
        }
        list(data = Y, pc = W)
}

Train Network

Now all we have to do is to train our simple network. We will use the learning rate \(\eta = \frac{1}{number of observaton}\)=0.0066667. Ideally we should have used the rule: \[learningRate = \frac{constant1}{n+constant2}\] This rule should satisfy the following: \[\lim_{n \to \infty} \eta(n) = 0\] \[\sum_{n=0}^{\infty} \eta(n) = \infty\] However, it can be really time consuming to find these constants. In this case we simply make learning rate a constant which give us rather good convergence. We will try both versions of data pre-processing. The first experiment uses just centered data and the second - normalized to (-1,1) interval.

eta = 1/n_obs
# use centered data
results = trainNetwork(Xnorm,4,eta)
# use normalized data
Xnorm2 = encode(Xnorm)
results2 = trainNetwork(Xnorm2,4,eta)
W = results$pc
W2 = results2$pc
Y = results$data
Y2 = results2$data

And we obtain the following principal components:

##            [,1]        [,2]       [,3]        [,4]
## [1,]  0.3868362 -0.03370618  0.8417560  0.37670935
## [2,]  0.6365878  0.71272462 -0.1696528 -0.05774605
## [3,]  0.5986174 -0.54297189 -0.1008112 -0.54043736
## [4,] -0.2621777  0.36708316  0.4533275 -0.70828539

W2

##            [,1]       [,2]        [,3]        [,4]
## [1,] -0.4349662  0.1248902 -0.61242855 -0.64859483
## [2,] -0.4298506 -0.8961971  0.09053084  0.03808553
## [3,]  0.7020452 -0.3530506  0.06045431 -0.61216302
## [4,] -0.3503979  0.2245385  0.78301676 -0.45535006

Reconstruct data

Simple as that:

\[\mathbf{\hat{X}}=\mathbf {Y W}\]

Xhat = Y%*%W
Xhat2 = Y2%*%W2

We can now evaluate the difference between the original dataset and reconstructed one. We’ll use the Euclidean norm:

error = Xnorm-Xhat
error2 = Xnorm2-Xhat2
norm(error, type="2")

## [1] 1.878206

norm(error2, type="2")

## [1] 0.1473655

It’s intersting to note that a normalized version shows better results that a centered one.

Compare to built-in PCA

The last step is to compare our results to the built-in batch PCA. Let’s calculate both versions (for both types of preprocessing):

tPCA1 = t(prcomp(Xnorm)$rotation)
tPCA2 = t(prcomp(Xnorm2)$rotation)
tPCA1

##           [,1]        [,2]        [,3]        [,4]
## PC1  0.3613866 -0.08452251  0.85667061  0.35828920
## PC2 -0.6565888 -0.73016143  0.17337266  0.07548102
## PC3  0.5820299 -0.59791083 -0.07623608 -0.54583143
## PC4  0.3154872 -0.31972310 -0.47983899  0.75365743

tPCA2

##           [,1]       [,2]        [,3]         [,4]
## PC1  0.4249421 -0.1507482  0.61626702  0.645688876
## PC2 -0.4232027 -0.9039671  0.06038308  0.009839255
## PC3  0.7135724 -0.3363160  0.06590030 -0.611034506
## PC4  0.3621300 -0.2168178 -0.78244872  0.457849207

Compare the difference between the obtainted components using Euclidean norm again.

norm(tPCA1-W,type="2")

## [1] 1.994868

norm(tPCA2-W2,type="2")

## [1] 2.001197

As we can see the results are almost identical and there is almost no difference in the results. But the first approach is much more fruitful because it required less operations and better works in online setting (we can use moving average for our stream).

Generalized Hebbian Algorithm

Anastasia Makarevich

05/11/2016

Hebbian-Based Principal Component Analysis

Load data

Preprocess data

Run PCA

Generalized Hebbian Algorithm

Train Network

Reconstruct data

Compare to built-in PCA

Summary