About the KDnuggets

KDnuggets is an interesting site on Data Science, Machine Learning, AI and Analytics.

One acception, or a received meaning, of the word “nugget” is “a small compact portion or unit: nuggets of information” (as described in thefreedictionary.com)

So, we can assume that KDnuggets articles are not intended to be exhaustive in depth.

About the article and the author

“A Beginner’s Guide to Neural Networks with R!” is a high level show case of a Neural Network using the Statistical Environment R.

The author, Jose Portilla, has some courses on Udemy where he teaches related content.

About the work

Portilla uses the R package ISLR, which is one of the various “built-in” R datasets used for exploring and learning Statistical and Analytics related concepts and ideas.

The data set is a collection of college attributes, also called features or “columns”. Such denomination and terminology depends on the analyst “background” knowledge.

Moving on to the data preprocessing stage, the author says that it is important to normalize data before training a neural network. That is nice, but in computer science realm, we have a discipline called “Database management systems”. Inside that discipline, there is a very important concept of “normalization” which is applyed for minimizing the unecessarily duplication of data and guarantee that there will be the minimum overhead for maintaing the data state throughout the lifespan of a record inside an electronical information system.

It is important to differentiate the normalization task inside those realms.

Normalization principle in the Database Management Systems realm Nugget

“The less the data is duplicated, the better for maintaining changes on it.”

Loading the ISLR R package

if("ISLR" %in% rownames(installed.packages()) == FALSE) {
  install.packages('ISLR')
} else {
  library(ISLR)
}
## Warning: package 'ISLR' was built under R version 4.1.1

Taking a look at some data

print(head(College,30))
##                                         Private  Apps Accept Enroll Top10perc
## Abilene Christian University                Yes  1660   1232    721        23
## Adelphi University                          Yes  2186   1924    512        16
## Adrian College                              Yes  1428   1097    336        22
## Agnes Scott College                         Yes   417    349    137        60
## Alaska Pacific University                   Yes   193    146     55        16
## Albertson College                           Yes   587    479    158        38
## Albertus Magnus College                     Yes   353    340    103        17
## Albion College                              Yes  1899   1720    489        37
## Albright College                            Yes  1038    839    227        30
## Alderson-Broaddus College                   Yes   582    498    172        21
## Alfred University                           Yes  1732   1425    472        37
## Allegheny College                           Yes  2652   1900    484        44
## Allentown Coll. of St. Francis de Sales     Yes  1179    780    290        38
## Alma College                                Yes  1267   1080    385        44
## Alverno College                             Yes   494    313    157        23
## American International College              Yes  1420   1093    220         9
## Amherst College                             Yes  4302    992    418        83
## Anderson University                         Yes  1216    908    423        19
## Andrews University                          Yes  1130    704    322        14
## Angelo State University                      No  3540   2001   1016        24
## Antioch University                          Yes   713    661    252        25
## Appalachian State University                 No  7313   4664   1910        20
## Aquinas College                             Yes   619    516    219        20
## Arizona State University Main campus         No 12809  10308   3761        24
## Arkansas College (Lyon College)             Yes   708    334    166        46
## Arkansas Tech University                     No  1734   1729    951        12
## Assumption College                          Yes  2135   1700    491        23
## Auburn University-Main Campus                No  7548   6791   3070        25
## Augsburg College                            Yes   662    513    257        12
## Augustana College IL                        Yes  1879   1658    497        36
##                                         Top25perc F.Undergrad P.Undergrad
## Abilene Christian University                   52        2885         537
## Adelphi University                             29        2683        1227
## Adrian College                                 50        1036          99
## Agnes Scott College                            89         510          63
## Alaska Pacific University                      44         249         869
## Albertson College                              62         678          41
## Albertus Magnus College                        45         416         230
## Albion College                                 68        1594          32
## Albright College                               63         973         306
## Alderson-Broaddus College                      44         799          78
## Alfred University                              75        1830         110
## Allegheny College                              77        1707          44
## Allentown Coll. of St. Francis de Sales        64        1130         638
## Alma College                                   73        1306          28
## Alverno College                                46        1317        1235
## American International College                 22        1018         287
## Amherst College                                96        1593           5
## Anderson University                            40        1819         281
## Andrews University                             23        1586         326
## Angelo State University                        54        4190        1512
## Antioch University                             44         712          23
## Appalachian State University                   63        9940        1035
## Aquinas College                                51        1251         767
## Arizona State University Main campus           49       22593        7585
## Arkansas College (Lyon College)                74         530         182
## Arkansas Tech University                       52        3602         939
## Assumption College                             59        1708         689
## Auburn University-Main Campus                  57       16262        1716
## Augsburg College                               30        2074         726
## Augustana College IL                           69        1950          38
##                                         Outstate Room.Board Books Personal PhD
## Abilene Christian University                7440       3300   450     2200  70
## Adelphi University                         12280       6450   750     1500  29
## Adrian College                             11250       3750   400     1165  53
## Agnes Scott College                        12960       5450   450      875  92
## Alaska Pacific University                   7560       4120   800     1500  76
## Albertson College                          13500       3335   500      675  67
## Albertus Magnus College                    13290       5720   500     1500  90
## Albion College                             13868       4826   450      850  89
## Albright College                           15595       4400   300      500  79
## Alderson-Broaddus College                  10468       3380   660     1800  40
## Alfred University                          16548       5406   500      600  82
## Allegheny College                          17080       4440   400      600  73
## Allentown Coll. of St. Francis de Sales     9690       4785   600     1000  60
## Alma College                               12572       4552   400      400  79
## Alverno College                             8352       3640   650     2449  36
## American International College              8700       4780   450     1400  78
## Amherst College                            19760       5300   660     1598  93
## Anderson University                        10100       3520   550     1100  48
## Andrews University                          9996       3090   900     1320  62
## Angelo State University                     5130       3592   500     2000  60
## Antioch University                         15476       3336   400     1100  69
## Appalachian State University                6806       2540    96     2000  83
## Aquinas College                            11208       4124   350     1615  55
## Arizona State University Main campus        7434       4850   700     2100  88
## Arkansas College (Lyon College)             8644       3922   500      800  79
## Arkansas Tech University                    3460       2650   450     1000  57
## Assumption College                         12000       5920   500      500  93
## Auburn University-Main Campus               6300       3933   600     1908  85
## Augsburg College                           11902       4372   540      950  65
## Augustana College IL                       13353       4173   540      821  78
##                                         Terminal S.F.Ratio perc.alumni Expend
## Abilene Christian University                  78      18.1          12   7041
## Adelphi University                            30      12.2          16  10527
## Adrian College                                66      12.9          30   8735
## Agnes Scott College                           97       7.7          37  19016
## Alaska Pacific University                     72      11.9           2  10922
## Albertson College                             73       9.4          11   9727
## Albertus Magnus College                       93      11.5          26   8861
## Albion College                               100      13.7          37  11487
## Albright College                              84      11.3          23  11644
## Alderson-Broaddus College                     41      11.5          15   8991
## Alfred University                             88      11.3          31  10932
## Allegheny College                             91       9.9          41  11711
## Allentown Coll. of St. Francis de Sales       84      13.3          21   7940
## Alma College                                  87      15.3          32   9305
## Alverno College                               69      11.1          26   8127
## American International College                84      14.7          19   7355
## Amherst College                               98       8.4          63  21424
## Anderson University                           61      12.1          14   7994
## Andrews University                            66      11.5          18  10908
## Angelo State University                       62      23.1           5   4010
## Antioch University                            82      11.3          35  42926
## Appalachian State University                  96      18.3          14   5854
## Aquinas College                               65      12.7          25   6584
## Arizona State University Main campus          93      18.9           5   4602
## Arkansas College (Lyon College)               88      12.6          24  14579
## Arkansas Tech University                      60      19.6           5   4739
## Assumption College                            93      13.8          30   7100
## Auburn University-Main Campus                 91      16.7          18   6642
## Augsburg College                              65      12.8          31   7836
## Augustana College IL                          83      12.7          40   9220
##                                         Grad.Rate
## Abilene Christian University                   60
## Adelphi University                             56
## Adrian College                                 54
## Agnes Scott College                            59
## Alaska Pacific University                      15
## Albertson College                              55
## Albertus Magnus College                        63
## Albion College                                 73
## Albright College                               80
## Alderson-Broaddus College                      52
## Alfred University                              73
## Allegheny College                              76
## Allentown Coll. of St. Francis de Sales        74
## Alma College                                   68
## Alverno College                                55
## American International College                 69
## Amherst College                               100
## Anderson University                            59
## Andrews University                             46
## Angelo State University                        34
## Antioch University                             48
## Appalachian State University                   70
## Aquinas College                                65
## Arizona State University Main campus           48
## Arkansas College (Lyon College)                54
## Arkansas Tech University                       48
## Assumption College                             88
## Auburn University-Main Campus                  69
## Augsburg College                               58
## Augustana College IL                           71

Preprocess the data

# Create Vector of Column Max and Min Values
maxs <- apply(College[,2:18], 2, max)
mins <- apply(College[,2:18], 2, min)

# Use scale() and convert the resulting matrix to a data frame
scaled.data <- as.data.frame(scale(College[,2:18],center = mins, scale = maxs - mins))

# Check out results
print(head(scaled.data,2))
##                                    Apps     Accept     Enroll Top10perc
## Abilene Christian University 0.03288693 0.04417701 0.10791254 0.2315789
## Adelphi University           0.04384229 0.07053089 0.07503539 0.1578947
##                              Top25perc F.Undergrad P.Undergrad  Outstate
## Abilene Christian University 0.4725275  0.08716353  0.02454774 0.2634298
## Adelphi University           0.2197802  0.08075165  0.05614839 0.5134298
##                              Room.Board     Books  Personal       PhD
## Abilene Christian University  0.2395965 0.1577540 0.2977099 0.6526316
## Adelphi University            0.7361286 0.2914439 0.1908397 0.2210526
##                                Terminal S.F.Ratio perc.alumni    Expend
## Abilene Christian University 0.71052632 0.4182306      0.1875 0.0726714
## Adelphi University           0.07894737 0.2600536      0.2500 0.1383867
##                              Grad.Rate
## Abilene Christian University 0.4629630
## Adelphi University           0.4259259

One-hot encoding

There is a technique known as One-hot encoding, which translates categorical values, such as “yes” or “no”, to numerical values as 0 and 1.

Such conversion is mandatory, since the linear algebra that will be applied latter relies on that prerequisite for performing computations over the data.

# Convert Private column from Yes/No to 1/0
Private = as.numeric(College$Private)-1
data = cbind(Private,scaled.data)

if("caTools" %in% rownames(installed.packages()) == FALSE) {
  install.packages('caTools')
} else {
  library(caTools)
}
## Warning: package 'caTools' was built under R version 4.1.1

Reproducibility and the ability of getting the same results in your own environment

Set some seed for that.

set.seed(101)

Split train / test subsets

# Create Split (any column is fine)
split = sample.split(data$Private, SplitRatio = 0.70)

# Split based off of split Boolean Vector
train = subset(data, split == TRUE)
test = subset(data, split == FALSE)

“We will run our neural network on the training set and then see how well it performed on the test set.”

Defining the relationship between the input and the output (defining the math model)

feats <- names(scaled.data)

# Concatenate strings
f <- paste(feats,collapse=' + ')
f <- paste('Private ~',f)

# Convert to formula
f <- as.formula(f)

f
## Private ~ Apps + Accept + Enroll + Top10perc + Top25perc + F.Undergrad + 
##     P.Undergrad + Outstate + Room.Board + Books + Personal + 
##     PhD + Terminal + S.F.Ratio + perc.alumni + Expend + Grad.Rate

Loading the neural net package

if("neuralnet" %in% rownames(installed.packages()) == FALSE) {
  install.packages('neuralnet')
} else {
  library(neuralnet)
}
## Warning: package 'neuralnet' was built under R version 4.1.1

Training the model

It is in this step that we let the computer figure out the hyperparameters of the mathematical model.

The output of the “neuralnet” function call is a mathematical statistical model, which is stored in the variable nn for later use in data outside of the training subset.

nn <- neuralnet(f,train,hidden=c(10,10,10),linear.output=FALSE)

Predictive Analytics

Predict values and show the results.

# Compute Predictions off Test Set
predicted.nn.values <- compute(nn,test[2:18])

# Check out net.result
print(head(predicted.nn.values$net.result))
##                                         [,1]
## Adrian College                             1
## Alfred University                          1
## Allegheny College                          1
## Allentown Coll. of St. Francis de Sales    1
## Alma College                               1
## Amherst College                            1
predicted.nn.values$net.result <- sapply(predicted.nn.values$net.result,round,digits=0)

Confusion matrix

Main diagonal = 1,1 - true positive

Main diagonal = 2,2 - true negative

Secondary diagonal = 1,2 - False negative

Secondary diagonal = 2,1 - False positive

table(test$Private,predicted.nn.values$net.result)
##    
##       0   1
##   0  55   9
##   1   6 163

Plot of the Neural Network

plot(nn, rep="best")

Conclusion

With that, we have used a subset of the main data, the train subset, to train a neural network and get a mathematical model that can now be used to predict the categorical variable “Private” of outside of the data set we have at hand.

This is known as a supervised learning technique.

References