A story of an toy perceptron

M.
Movember 16th, 2018

Basic idea

the idea of nn is quite simple and straightforward taking the input, excite neurons and produce adenosine then it comes down to reduce noises (optimization) and the speed (optimization again for network converge) most of the business problems does not require CNN or RNN rather solved by single or double hidden layer of vanilla neural networks

y = wx + b + E

train a model from iris, to keep it simple, two class prediction (first 100 obs.)

Species ~ Sepal.Length + Sepal.Width

Train the model, not really useful, demonstration only

n.obs  <- nrow(x)                                                                                                  
n.parm <- ncol(x)                                               

# epoch and learning rate
# prediction threshold and epsilon
EPOCH  <- 100    
alpha  <- .01                                                  
cutoff <- .5                                     
eps    <- 1e-15                                

# initialize weight, bias and predictions

y.hat  <- rep(0, n.obs)                                      
prob   <- rep(0, n.obs)                                       
loss   <- rep(0, n.iter)                                      
w      <- as.matrix(rnorm(n.parm) * .01, nrow=n.parm)
b      <- rnorm(1)*.01   

for(e in 1:EPOCH){ 
    p     <- tanh(x %*% w + b)

    prob  <- ifelse(p>0, p, 1+p)                              
    prob  <- pmin(pmax(prob, eps), 1-eps)              
    y.hat <- prob > cutoff                                      

    #eta   <- y - y.hat                                   
    eta  <- y - prob
    loss[e] <- -(mean(y * log(prob) + (1 - y) * log(1 - prob))) 

    # ... this sucker DOES work 
    w <- w + t(t(eta) %*% x) * alpha
    b = b + sum(alpha * eta* b)

}