Forward propagation is the process propogating the signal from the input layer to the output or visual layer. Consider this three-layer network:
The desired output of N2,0 is 1.0. The inputs are [1, 1]. The learning rate is \(\beta = 0.45\) and the momentum is \(\alpha = 0.9\).
beta <- 0.45
alpha <- 0.9
input <- N0 <- matrix(c(1,1))
w0 <- matrix(c(.4,-.1,.1,-.1), nrow=2)
print(input)
## [,1]
## [1,] 1
## [2,] 1
print(w0)
## [,1] [,2]
## [1,] 0.4 0.1
## [2,] -0.1 -0.1
Each of the hidden and output neurons are logistic neurons meaning that they apply the logistic function
\(\sigma(t) = \frac{1}{1 + e^{-t}}\)
sigma <- function(t) 1/(1+exp(-t))
to the input before returning the response.
Calculate the inputs into the hidden layer.
N1 <- sigma(w0 %*% input)
print(N1)
## [,1]
## [1,] 0.6224593
## [2,] 0.4501660
Calculate the inputs into the output layer.
w1 <- matrix(c(0.06, -0.4), nrow=1)
print(w1)
## [,1] [,2]
## [1,] 0.06 -0.4
N2 <- sigma(w1 %*% N1)
The output is 0.4643807.
First, calculate the error at N2,0.
Substitute:
N2.0.error <- N2 * (1-N2) * (1-N2)
print(N2.0.error)
## [,1]
## [1,] 0.1332253
Once error is known, it will be used for backward propagation and weights adjustment.
Calculate the rate of change for the weights at each of the two nodes with the equation:
w1.Rate = (beta * N2.0.error[1,1]) * t(N1)
print(w1.Rate)
## [,1] [,2]
## [1,] 0.03731729 0.02698807
print(w1)
## [,1] [,2]
## [1,] 0.4 0.1
## [2,] -0.1 -0.1
Calculate the new weights. t is the iteration of the back propogation. Obviously, the first time through, the learning rate \(\alpha\) is not applied, but on subsequent iterations, the multiples of the learning rate are applied on each iteration.
t <- 1
w1.new <- w1 + w1.Rate + alpha*(t-1)
print(w1.new)
## [,1] [,2]
## [1,] 0.09731729 -0.3730119
N1.0.error <- N2.0.error %*% w1.new
Calculate the rate of change for the weights between the input and hidden layer.
w0.Rate = t(beta * N1.0.error) * (N0)
print(w0.Rate)
## [,1]
## [1,] 0.005834304
## [2,] -0.022362575
Note - w0.Rate[1] applies to weights W0.[1,2] and W0.Rate[2] applies to W0.[2,3]
w0.Rate <- matrix(c(w0.Rate[1,1], w0.Rate[2,1], w0.Rate[1,1], w0.Rate[2,1]), nrow=2)
print(w0.Rate)
## [,1] [,2]
## [1,] 0.005834304 0.005834304
## [2,] -0.022362575 -0.022362575
Calculate the new weights at the input layer.
w0.new <- w0 + w0.Rate + alpha*(t-1)
print(w0.new)
## [,1] [,2]
## [1,] 0.4058343 0.1058343
## [2,] -0.1223626 -0.1223626
Run forward propagation again to see if we improve.
w0 <- w0.new
N1 <- sigma(w0 %*% input)
w1 <- w1.new
N2 <- sigma(w1 %*% N1)
print(N2)
## [,1]
## [1,] 0.4742839
N2.1.error <- N2 * (1-N2) * (1-N2)
print(N2.1.error)
## [,1]
## [1,] 0.1310814
Error improves from 0.1332253 to 0.1310814. This isn’t much of an improvement, but as noted above, the learning rate will increase and will increase convergence towards the proper weights.
Cilimkovic, Mirza, Neural Networks and Back Propagation Algorithm, http://www.dataminingmasters.com/uploads/studentProjects/NeuralNetworks.pdf.