Kuhn and Johnson Chapter 7

7.3

Here we split the data.

## Warning: package 'mlbench' was built under R version 3.5.3

## Warning: package 'caret' was built under R version 3.5.3

## Warning: package 'ggplot2' was built under R version 3.5.2

## Warning: package 'earth' was built under R version 3.5.3

## Warning: package 'Formula' was built under R version 3.5.2

## Warning: package 'plotmo' was built under R version 3.5.3

## Warning: package 'plotrix' was built under R version 3.5.3

## Warning: package 'TeachingDemos' was built under R version 3.5.3

##         x.1       x.2        x.3        x.4        x.5        x.6
## 1 0.5337724 0.6478064 0.85078526 0.18159957 0.92903976 0.36179060
## 2 0.5837650 0.4381528 0.67272659 0.66924914 0.16379784 0.45305931
## 3 0.5895783 0.5879065 0.40967108 0.33812728 0.89409334 0.02681911
## 4 0.6910399 0.2259548 0.03335447 0.06691274 0.63744519 0.52500637
## 5 0.6673315 0.8188985 0.71676079 0.80324287 0.08306864 0.22344157
## 6 0.8392937 0.3862983 0.64618857 0.86105431 0.63038947 0.43703891
##         x.7       x.8        x.9      x.10        y
## 1 0.8266609 0.4214081 0.59111440 0.5886216 18.46398
## 2 0.6489601 0.8446239 0.92819306 0.7584008 16.09836
## 3 0.1785614 0.3495908 0.01759542 0.4441185 17.76165
## 4 0.5133614 0.7970260 0.68986918 0.4450716 13.78730
## 5 0.6644906 0.9038919 0.39696995 0.5500808 18.42984
## 6 0.3360117 0.6489177 0.53116033 0.9066182 20.85817

Here I generate the test data.

testData <- mlbench.friedman1(5000, sd = 1)
testData$x <- data.frame(testData$x)

Here I build a KNN classifier.

knnModel <- train(x = trainingData$x, 
                  y = trainingData$y, 
                  method = "knn", preProc =                             c("center", "scale"), 
                  tuneLength = 10)
plot(knnModel$results$RMSE)

The minimized RMSE appears to happen around 6 cluster, although 5 and 7 give similar results.

mars <- earth(trainingData$x, trainingData$y)
mars

## Selected 12 of 18 terms, and 6 of 10 predictors
## Termination condition: Reached nk 21
## Importance: X1, X4, X2, X5, X3, X6, X7-unused, X8-unused, X9-unused, ...
## Number of terms at each degree of interaction: 1 11 (additive model)
## GCV 2.540556    RSS 397.9654    GRSq 0.8968524    RSq 0.9183982

Mars selects x1-x6.

7.5

Below is the data presented for modelling.

## Warning: package 'AppliedPredictiveModeling' was built under R version
## 3.5.3

## Warning: package 'doParallel' was built under R version 3.5.3

## Warning: package 'foreach' was built under R version 3.5.3

## Warning: package 'iterators' was built under R version 3.5.3

##   Yield BiologicalMaterial01 BiologicalMaterial02 BiologicalMaterial03
## 1 38.00                 6.25                49.58                56.97
## 2 42.44                 8.01                60.97                67.48
## 3 42.03                 8.01                60.97                67.48
## 4 41.42                 8.01                60.97                67.48
## 5 42.49                 7.47                63.33                72.25
## 6 43.57                 6.12                58.36                65.31
##   BiologicalMaterial04 BiologicalMaterial05 BiologicalMaterial06
## 1                12.74                19.51                43.73
## 2                14.65                19.36                53.14
## 3                14.65                19.36                53.14
## 4                14.65                19.36                53.14
## 5                14.02                17.91                54.66
## 6                15.17                21.79                51.23
##   BiologicalMaterial07 BiologicalMaterial08 BiologicalMaterial09
## 1                  100                16.66                11.44
## 2                  100                19.04                12.55
## 3                  100                19.04                12.55
## 4                  100                19.04                12.55
## 5                  100                18.22                12.80
## 6                  100                18.30                12.13
##   BiologicalMaterial10 BiologicalMaterial11 BiologicalMaterial12
## 1                 3.46               138.09                18.83
## 2                 3.46               153.67                21.05
## 3                 3.46               153.67                21.05
## 4                 3.46               153.67                21.05
## 5                 3.05               147.61                21.05
## 6                 3.78               151.88                20.76
##   ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03
## 1                     NA                     NA                     NA
## 2                    0.0                      0                     NA
## 3                    0.0                      0                     NA
## 4                    0.0                      0                     NA
## 5                   10.7                      0                     NA
## 6                   12.0                      0                     NA
##   ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06
## 1                     NA                     NA                     NA
## 2                    917                 1032.2                  210.0
## 3                    912                 1003.6                  207.1
## 4                    911                 1014.6                  213.3
## 5                    918                 1027.5                  205.7
## 6                    924                 1016.8                  208.9
##   ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess09
## 1                     NA                     NA                  43.00
## 2                    177                    178                  46.57
## 3                    178                    178                  45.07
## 4                    177                    177                  44.92
## 5                    178                    178                  44.96
## 6                    178                    178                  45.32
##   ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12
## 1                     NA                     NA                     NA
## 2                     NA                     NA                      0
## 3                     NA                     NA                      0
## 4                     NA                     NA                      0
## 5                     NA                     NA                      0
## 6                     NA                     NA                      0
##   ManufacturingProcess13 ManufacturingProcess14 ManufacturingProcess15
## 1                   35.5                   4898                   6108
## 2                   34.0                   4869                   6095
## 3                   34.8                   4878                   6087
## 4                   34.8                   4897                   6102
## 5                   34.6                   4992                   6233
## 6                   34.0                   4985                   6222
##   ManufacturingProcess16 ManufacturingProcess17 ManufacturingProcess18
## 1                   4682                   35.5                   4865
## 2                   4617                   34.0                   4867
## 3                   4617                   34.8                   4877
## 4                   4635                   34.8                   4872
## 5                   4733                   33.9                   4886
## 6                   4786                   33.4                   4862
##   ManufacturingProcess19 ManufacturingProcess20 ManufacturingProcess21
## 1                   6049                   4665                    0.0
## 2                   6097                   4621                    0.0
## 3                   6078                   4621                    0.0
## 4                   6073                   4611                    0.0
## 5                   6102                   4659                   -0.7
## 6                   6115                   4696                   -0.6
##   ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24
## 1                     NA                     NA                     NA
## 2                      3                      0                      3
## 3                      4                      1                      4
## 4                      5                      2                      5
## 5                      8                      4                     18
## 6                      9                      1                      1
##   ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27
## 1                   4873                   6074                   4685
## 2                   4869                   6107                   4630
## 3                   4897                   6116                   4637
## 4                   4892                   6111                   4630
## 5                   4930                   6151                   4684
## 6                   4871                   6128                   4687
##   ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30
## 1                   10.7                   21.0                    9.9
## 2                   11.2                   21.4                    9.9
## 3                   11.1                   21.3                    9.4
## 4                   11.1                   21.3                    9.4
## 5                   11.3                   21.6                    9.0
## 6                   11.4                   21.7                   10.1
##   ManufacturingProcess31 ManufacturingProcess32 ManufacturingProcess33
## 1                   69.1                    156                     66
## 2                   68.7                    169                     66
## 3                   69.3                    173                     66
## 4                   69.3                    171                     68
## 5                   69.4                    171                     70
## 6                   68.2                    173                     70
##   ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36
## 1                    2.4                    486                  0.019
## 2                    2.6                    508                  0.019
## 3                    2.6                    509                  0.018
## 4                    2.5                    496                  0.018
## 5                    2.5                    468                  0.017
## 6                    2.5                    490                  0.018
##   ManufacturingProcess37 ManufacturingProcess38 ManufacturingProcess39
## 1                    0.5                      3                    7.2
## 2                    2.0                      2                    7.2
## 3                    0.7                      2                    7.2
## 4                    1.2                      2                    7.2
## 5                    0.2                      2                    7.3
## 6                    0.4                      2                    7.2
##   ManufacturingProcess40 ManufacturingProcess41 ManufacturingProcess42
## 1                     NA                     NA                   11.6
## 2                    0.1                   0.15                   11.1
## 3                    0.0                   0.00                   12.0
## 4                    0.0                   0.00                   10.6
## 5                    0.0                   0.00                   11.0
## 6                    0.0                   0.00                   11.5
##   ManufacturingProcess43 ManufacturingProcess44 ManufacturingProcess45
## 1                    3.0                    1.8                    2.4
## 2                    0.9                    1.9                    2.2
## 3                    1.0                    1.8                    2.3
## 4                    1.1                    1.8                    2.1
## 5                    1.1                    1.7                    2.1
## 6                    2.2                    1.8                    2.0

Here, I replaced NAs with the mean of the column they are in and center and scaled all the data.

na_to_mean <- function(x) replace(x, is.na(x), mean(x, na.rm = TRUE))
df <- replace(df, TRUE, lapply(df, na_to_mean))
pre <- preProcess(df, method = c("center", "scale"))
data <- predict(pre, df)
head(data)

##        Yield BiologicalMaterial01 BiologicalMaterial02
## 1 -1.1792673           -0.2261036           -1.5140979
## 2  1.2263678            2.2391498            1.3089960
## 3  1.0042258            2.2391498            1.3089960
## 4  0.6737219            2.2391498            1.3089960
## 5  1.2534583            1.4827653            1.8939391
## 6  1.8386128           -0.4081962            0.6620886
##   BiologicalMaterial03 BiologicalMaterial04 BiologicalMaterial05
## 1          -2.68303622            0.2201765            0.4941942
## 2          -0.05623504            1.2964386            0.4128555
## 3          -0.05623504            1.2964386            0.4128555
## 4          -0.05623504            1.2964386            0.4128555
## 5           1.13594780            0.9414412           -0.3734185
## 6          -0.59859075            1.5894524            1.7305423
##   BiologicalMaterial06 BiologicalMaterial07 BiologicalMaterial08
## 1           -1.3828880           -0.1313107            -1.233131
## 2            1.1290767           -0.1313107             2.282619
## 3            1.1290767           -0.1313107             2.282619
## 4            1.1290767           -0.1313107             2.282619
## 5            1.5348350           -0.1313107             1.071310
## 6            0.6192092           -0.1313107             1.189487
##   BiologicalMaterial09 BiologicalMaterial10 BiologicalMaterial11
## 1           -3.3962895            1.1005296            -1.838655
## 2           -0.7227225            1.1005296             1.393395
## 3           -0.7227225            1.1005296             1.393395
## 4           -0.7227225            1.1005296             1.393395
## 5           -0.1205678            0.4162193             0.136256
## 6           -1.7343424            1.6346255             1.022062
##   BiologicalMaterial12 ManufacturingProcess01 ManufacturingProcess02
## 1           -1.7709224              0.0000000               0.000000
## 2            1.0989855             -6.1673490              -1.986352
## 3            1.0989855             -6.1673490              -1.986352
## 4            1.0989855             -6.1673490              -1.986352
## 5            1.0989855             -0.2792335              -1.986352
## 6            0.7240877              0.4361451              -1.986352
##   ManufacturingProcess03 ManufacturingProcess04 ManufacturingProcess05
## 1                      0               0.000000             0.00000000
## 2                      0              -2.373764             1.00220071
## 3                      0              -3.172935             0.06264341
## 4                      0              -3.332769             0.42401160
## 5                      0              -2.213930             0.84779794
## 6                      0              -1.254926             0.49628524
##   ManufacturingProcess06 ManufacturingProcess07 ManufacturingProcess08
## 1              0.0000000              0.0000000              0.0000000
## 2              0.9680861             -0.9607689              0.8967295
## 3             -0.1124188              1.0408330              0.8967295
## 4              2.1976262             -0.9607689             -1.1151636
## 5             -0.6340418              1.0408330              0.8967295
## 6              0.5582394              1.0408330              0.8967295
##   ManufacturingProcess09 ManufacturingProcess10 ManufacturingProcess11
## 1             -1.7201524                      0                      0
## 2              0.5883746                      0                      0
## 3             -0.3815947                      0                      0
## 4             -0.4785917                      0                      0
## 5             -0.4527258                      0                      0
## 6             -0.2199332                      0                      0
##   ManufacturingProcess12 ManufacturingProcess13 ManufacturingProcess14
## 1               0.000000             0.97711512              0.8117224
## 2              -0.482073            -0.50030980              0.2783168
## 3              -0.482073             0.28765016              0.4438565
## 4              -0.482073             0.28765016              0.7933291
## 5              -0.482073             0.09066017              2.5406922
## 6              -0.482073            -0.50030980              2.4119391
##   ManufacturingProcess15 ManufacturingProcess16 ManufacturingProcess17
## 1              1.1846438              0.3303945              0.9263296
## 2              0.9617071              0.1455765             -0.2753953
## 3              0.8245152              0.1455765              0.3655246
## 4              1.0817499              0.1967569              0.3655246
## 5              3.3282665              0.4754056             -0.3555103
## 6              3.1396277              0.6261033             -0.7560852
##   ManufacturingProcess18 ManufacturingProcess19 ManufacturingProcess20
## 1              0.1505348              0.4563798              0.3109942
## 2              0.1559773              1.5095063              0.1849230
## 3              0.1831898              1.0926437              0.1849230
## 4              0.1695836              0.9829430              0.1562704
## 5              0.2076811              1.6192070              0.2938027
## 6              0.1423710              1.9044287              0.3998171
##   ManufacturingProcess21 ManufacturingProcess22 ManufacturingProcess23
## 1              0.2109804              0.0000000              0.0000000
## 2              0.2109804             -0.7243735             -1.8199757
## 3              0.2109804             -0.4232681             -1.2167640
## 4              0.2109804             -0.1221628             -0.6135524
## 5             -0.6884239              0.7811534              0.5928709
## 6             -0.5599376              1.0822588             -1.2167640
##   ManufacturingProcess24 ManufacturingProcess25 ManufacturingProcess26
## 1              0.0000000              0.1217705              0.1274689
## 2             -1.0088982              0.1109041              0.1994933
## 3             -0.8359725              0.1869689              0.2191363
## 4             -0.6630467              0.1733859              0.2082235
## 5              1.5849880              0.2766168              0.2955257
## 6             -1.3547497              0.1163373              0.2453269
##   ManufacturingProcess27 ManufacturingProcess28 ManufacturingProcess29
## 1              0.3510871              0.7940899              0.6030010
## 2              0.1934449              0.8907371              0.8469115
## 3              0.2135084              0.8714077              0.7859338
## 4              0.1934449              0.8714077              0.7859338
## 5              0.3482209              0.9100666              0.9688667
## 6              0.3568195              0.9293960              1.0298443
##   ManufacturingProcess30 ManufacturingProcess31 ManufacturingProcess32
## 1              0.7677420             -0.1981058             -0.4568829
## 2              0.7677420             -0.2711540              1.9517531
## 3              0.2480117             -0.1615817              2.6928719
## 4              0.2480117             -0.1615817              2.3223125
## 5             -0.1677726             -0.1433197              2.3223125
## 6              0.9756342             -0.3624642              2.6928719
##   ManufacturingProcess33 ManufacturingProcess34 ManufacturingProcess35
## 1               1.003470             -1.7453870            -0.89989600
## 2               1.003470              1.9853777             1.16311970
## 3               1.003470              1.9853777             1.25689314
## 4               1.820581              0.1199954             0.03783841
## 5               2.637692              0.1199954            -2.58781794
## 6               2.637692              0.1199954            -0.52480224
##   ManufacturingProcess36 ManufacturingProcess37 ManufacturingProcess38
## 1             -0.6653513             -1.1540243              0.7174727
## 2             -0.6653513              2.2161351             -0.8224687
## 3             -1.8263214             -0.7046697             -0.8224687
## 4             -1.8263214              0.4187168             -0.8224687
## 5             -2.9872915             -1.8280562             -0.8224687
## 6             -1.8263214             -1.3787016             -0.8224687
##   ManufacturingProcess39 ManufacturingProcess40 ManufacturingProcess41
## 1              0.2317270              0.0000000              0.0000000
## 2              0.2317270              2.1552636              2.3529953
## 3              0.2317270             -0.4639804             -0.4418521
## 4              0.2317270             -0.4639804             -0.4418521
## 5              0.2981503             -0.4639804             -0.4418521
## 6              0.2317270             -0.4639804             -0.4418521
##   ManufacturingProcess42 ManufacturingProcess43 ManufacturingProcess44
## 1             0.20279570             2.40564734            -0.01588055
## 2            -0.05472265            -0.01374656             0.29467248
## 3             0.40881037             0.10146268            -0.01588055
## 4            -0.31224099             0.21667191            -0.01588055
## 5            -0.10622632             0.21667191            -0.32643359
## 6             0.15129203             1.48397347            -0.01588055
##   ManufacturingProcess45
## 1             0.64371849
## 2             0.15220242
## 3             0.39796046
## 4            -0.09355562
## 5            -0.09355562
## 6            -0.33931365

Here, I create my training and test sets as well as label sets.

all_indexes = 1:176
training_index = sample(1:176, size = 140)
test_index = setdiff(all_indexes, training_index)
training = data[training_index,]
testing = data[test_index,]
test_labels <- testing['Yield'] 
train_labels <- training['Yield']
features <- training
features['Yield'] <- NULL

Below we build a neural net.

library(neuralnet, quietly = TRUE)

## Warning: package 'neuralnet' was built under R version 3.5.3

model1 <- neuralnet(Yield~. , data = training, hidden = c(5,3))
#plot(model1)
tmp <- compute(model1, testing)
pred1 <- tmp$net.result
rmse1 <- sum(((pred1 - test_labels)^2)^.5)

Then, we build a knn. Here we find that the optimal cluster size is 5

model2 <- train(x  = features , 
                  y = train_labels$Yield, 
                  method = "knn", preProc =                             c("center", "scale"), 
                  tuneLength = 10)
plot(model2$results$RMSE)

Model 3 was a MARS model.

model3 <- earth(x  = features , y = train_labels$Yield)
pred3 <- predict(model3, testing)
rmse3 <- sum(((pred3 - test_labels)^2)^.5)
rmse3

## [1] 21.57898

Below is an SVM, using the polynomial kernel.

library(e1071, quietly = TRUE)

## Warning: package 'e1071' was built under R version 3.5.3

labels = list(train_labels)
model4 <- svm(Yield ~ ., data = training, kernel = "polynomial")
pred4 <- predict(model4, testing)
rmse4 <- sum(((pred4 - test_labels)^2)^.5)
rmse4

## [1] 23.71598

Below we can compare the results of 3 of the models.

barplot(c(rmse1, rmse3, rmse4), names.arg = c('NN', 'KNN', 'SVM'), main = "RMSE for Various Models")

Since the KNN is a classifier rather than a regressor, it is inappropriate for this data and the RMSE comparator. Both the KNN and NN perform well with the data, but the NN wins slightly.

Kuhn and Johnson Chapter 7

simplymathematics

April 7, 2019

7.3

7.5