M.
Movember 16th, 2018
the idea of nn is quite simple and straightforward taking the input, excite neurons and produce adenosine then it comes down to reduce noises (optimization) and the speed (optimization again for network converge) most of the business problems does not require CNN or RNN rather solved by single or double hidden layer of vanilla neural networks
y = wx + b + E
train a model from iris, to keep it simple, two class prediction (first 100 obs.)
Species ~ Sepal.Length + Sepal.Width
n.obs <- nrow(x)
n.parm <- ncol(x)
# epoch and learning rate
# prediction threshold and epsilon
EPOCH <- 100
alpha <- .01
cutoff <- .5
eps <- 1e-15
# initialize weight, bias and predictions
y.hat <- rep(0, n.obs)
prob <- rep(0, n.obs)
loss <- rep(0, n.iter)
w <- as.matrix(rnorm(n.parm) * .01, nrow=n.parm)
b <- rnorm(1)*.01
for(e in 1:EPOCH){
p <- tanh(x %*% w + b)
prob <- ifelse(p>0, p, 1+p)
prob <- pmin(pmax(prob, eps), 1-eps)
y.hat <- prob > cutoff
#eta <- y - y.hat
eta <- y - prob
loss[e] <- -(mean(y * log(prob) + (1 - y) * log(1 - prob)))
# ... this sucker DOES work
w <- w + t(t(eta) %*% x) * alpha
b = b + sum(alpha * eta* b)
}