How do neural networks work?

Lecture 219 https://www.udemy.com/machinelearning/learn/lecture/6760386
Outline of Neural Netword Activation Function use

knitr::include_graphics("ANN_OutlineOfProcess.png")

UCB steps from lecture

How do Neural Networks learn?

The Cost Function is the difference between the y^ prediction and y the actual value, so the lower the Cost Function the higer the accuracy of the NN. This process is refferred to as back propagation.
Lecture 220 https://www.udemy.com/machinelearning/learn/lecture/6760388

knitr::include_graphics("ANN_HowDoTheyLearn.png")

UCB steps from lecture

knitr::include_graphics("ANN_HowDoTheyLearn2.png")

UCB steps from lecture

knitr::include_graphics("ANN_MultipleHiddenLayers.png")

UCB steps from lecture

Nicely done post : https://stats.stackexchange.com/questions/154879/a-list-of-cost-functions-used-in-neural-networks-alongside-applications

Gradient Descent, Curse of Dimensionality, nuances to adjustment of weights.

The require a convex relationship of y and Cost Function. Gradient Descent is also called batch gradient descent.
Lecture 221 https://www.udemy.com/machinelearning/learn/lecture/6760390

Stochastic Gradient Descent

This does not require convex relationship. This is NOT a batch process, it’s running each row (each weight) at a time. This allows the proces to find the BEST Cost Function even if relationship of y to C is not convex.
Lecture 222 https://www.udemy.com/machinelearning/learn/lecture/6760392

Backpropagation

Lecture 223 https://www.udemy.com/machinelearning/learn/lecture/6760394
Good post http://neuralnetworksanddeeplearning.com/chap2.html#the_four_fundamental_equations_behind_backpropagation

ANN in R Lectures
Lecture 237 https://www.udemy.com/machinelearning/learn/lecture/6142016
Lecutre 238 https://www.udemy.com/machinelearning/learn/lecture/6145488
Lecture 239 https://www.udemy.com/machinelearning/learn/lecture/6210372
Lecture 240 https://www.udemy.com/machinelearning/learn/lecture/6147564

check working directory getwd()

Importing the dataset

dataset = read.csv('Churn_Modelling.csv')

Let’s have a look at the variables. (Python image below). We have 10 Independent variables from index 4 to 13 and the Dependent variable is 14 ‘Exited’. Because of this we’ll strip the dataset down to the group of Independent and Dependent variables only. Indexes in R start at 1.

dataset = dataset[4:14]

knitr::include_graphics("BankCustomerData.png")

Python view of dataset

Encoding the categorical variables as factors

Geography (country) and Gender need to be made into factors. Our dependent variable is already binary and needs no adjustment. Our deep learning package requires this processing.

dataset$Geography = as.numeric(factor(dataset$Geography,
                                      levels = c('France', 'Spain', 'Germany'),
                                      labels = c(1, 2, 3)))
dataset$Gender = as.numeric(factor(dataset$Gender,
                                   levels = c('Female', 'Male'),
                                   labels = c(1, 2)))

Splitting the dataset into the Training set and Test set

We’ll train on the Training set and test on Test set

# install.packages('caTools')
library(caTools)
set.seed(123)
split = sample.split(dataset$Exited, SplitRatio = 0.8)
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)

Feature Scaling

Needed in order to ease all the calculations we’ll be doing. It is required by the package we’ll be using for calculations. The -11 is to reference that we want to feature scale all the elements except the last column, the Exited column.

training_set[-11] = scale(training_set[-11])
test_set[-11] = scale(test_set[-11])

Fitting ANN to the Training set - h2o package

h2o package - https://cran.r-project.org/web/packages/h2o/h2o.pdf
Gettng H2O going.

# install.packages('h2o')
library(h2o)

## 
## ----------------------------------------------------------------------
## 
## Your next step is to start H2O:
##     > h2o.init()
## 
## For H2O package documentation, ask for help:
##     > ??h2o
## 
## After starting H2O, you can use the Web UI at http://localhost:54321
## For more information visit http://docs.h2o.ai
## 
## ----------------------------------------------------------------------

## 
## Attaching package: 'h2o'

## The following objects are masked from 'package:stats':
## 
##     cor, sd, var

## The following objects are masked from 'package:base':
## 
##     &&, %*%, %in%, ||, apply, as.factor, as.numeric, colnames,
##     colnames<-, ifelse, is.character, is.factor, is.numeric, log,
##     log10, log1p, log2, round, signif, trunc

h2o.init(nthreads = -1) # always use -1 if you are running on a basic computer

## 
## H2O is not running yet, starting it now...
## 
## Note:  In case of errors look at the following log files:
##     /var/folders/9_/hq3lk39145g0n42f5wy7722h0000gn/T//Rtmp9F9veW/h2o_markloessi_started_from_r.out
##     /var/folders/9_/hq3lk39145g0n42f5wy7722h0000gn/T//Rtmp9F9veW/h2o_markloessi_started_from_r.err
## 
## 
## Starting H2O JVM and connecting: .. Connection successful!
## 
## R is connected to the H2O cluster: 
##     H2O cluster uptime:         3 seconds 822 milliseconds 
##     H2O cluster timezone:       America/Boise 
##     H2O data parsing timezone:  UTC 
##     H2O cluster version:        3.22.1.1 
##     H2O cluster version age:    5 months and 18 days !!! 
##     H2O cluster name:           H2O_started_from_R_markloessi_iiw313 
##     H2O cluster total nodes:    1 
##     H2O cluster total memory:   0.88 GB 
##     H2O cluster total cores:    4 
##     H2O cluster allowed cores:  4 
##     H2O cluster healthy:        TRUE 
##     H2O Connection ip:          localhost 
##     H2O Connection port:        54321 
##     H2O Connection proxy:       NA 
##     H2O Internal Security:      FALSE 
##     H2O API Extensions:         XGBoost, Algos, AutoML, Core V3, Core V4 
##     R Version:                  R version 3.5.2 (2018-12-20)

## Warning in h2o.clusterInfo(): 
## Your H2O cluster version is too old (5 months and 18 days)!
## Please download and install the latest version from http://h2o.ai/download/

Train the model creating classifier

classifier = h2o.deeplearning(y = 'Exited', # y is the name of the Dependent variable
                  # we need to make an H2o dataframe because we have a plain dataframe
                         training_frame = as.h2o(training_set), 
                         activation = 'Rectifier', # this is activation function
                         hidden = c(6,6), # as a vector the number of hidden layers and vectors
                  # chosing the nodes rule of thumb is to use variables 10 + 1 / 2 = 5.5 ;) we'll
                  # use 6 (note the 2 is for the 2 groups)
                         epochs = 100, # how many times the dataset will be iterated
                         train_samples_per_iteration = -2) # -2 is autotune :-)

## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |====================                                             |  30%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |=================================================================| 100%

Predicting the Test set results

This will provide us with an ‘Environment’

prob_pred = h2o.predict(classifier, newdata = as.h2o(test_set[-11])) # -11 is the index of dependent variable. Again H2O needs a h2o dataframe

## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%

We need to turn the propabilities provided by predict into

# the 0.5 is a threshold
# y_pred = ifelse(prod_pred > 0.5, 1, 0) # this can be written simpler as
y_pred = (prob_pred > 0.5)

Still an environment, so convert back to vector

y_pred = as.vector(y_pred)

Making the Confusion Matrix

cm = table(test_set[, 11], y_pred) # 11 is the index of the dependent variable

cm

##    y_pred
##        0    1
##   0 1532   61
##   1  211  196

knitr::include_graphics("Confusion_Matrix_Explained.png")

UCB steps from lecture

Accuracy of our Model

TP = # True Positives
TN = # True Negatives
FP = # False Positives
FN = # False Negatives
Accuracy = (TP + TN) / (TP + TN + FP + FN)

(1551 + 166) / 2000

## [1] 0.8585

Shutdown of h20 server;

h2o.shutdown()

## Are you sure you want to shutdown the H2O instance running at http://localhost:54321/ (Y/N)?

## [1] TRUE

=========================
Github files; https://github.com/ghettocounselor

Useful PDF for common questions in Lectures;
https://github.com/ghettocounselor/Machine_Learning/blob/master/Machine-Learning-A-Z-Q-A.pdf

Artificial Neural Network in R

How do neural networks work?

How do Neural Networks learn?

Gradient Descent, Curse of Dimensionality, nuances to adjustment of weights.

Stochastic Gradient Descent

Backpropagation

Importing the dataset

Encoding the categorical variables as factors

Splitting the dataset into the Training set and Test set

Feature Scaling

Fitting ANN to the Training set - h2o package

Train the model creating classifier

Predicting the Test set results

Making the Confusion Matrix

Accuracy of our Model