DATA 624 Homework 8

Question 7.2
Question 7.5

Question 7.2

Friedman (1991) introduced several benchmark data sets created by simulation. On of these simulations used the following nonlinear equations to create data:

\[y = 10 sin(\pi x_1x_2) + 20(x_3- 0.5)^2 + 10x_4 + 5x_5 + N(0, \sigma^2)\]

where the x values are random variables uniformly distributed between [0,1] (there are also 5 other non-informative variables also created in the simulation). The package mlbench contains a function called mlbench.friedman1 that simulates these data:

library(mlbench)

## Warning: package 'mlbench' was built under R version 3.6.2

set.seed(200)
trainingData <- mlbench.friedman1(200, sd = 1)
## We convert the 'x' data from a matrix to a data frame
## One reason is that this will five the columns names.
trainingData$x <- data.frame(trainingData$x)
## Look at the data using
featurePlot(trainingData$x, trainingData$y)

## or other methods.

## This creates a list with a vector 'y' and a matrix
## of predictors 'x'.  Also simulate a large test set to
## estimate the true error rate with good precision:
testData <- mlbench.friedman1(5000, sd = 1)
testData$x <- data.frame(testData$x)

Tune several models on these data. For example:

KNN MODEL

library(caret)
knnModel <- train(x = trainingData$x, 
                  y = trainingData$y,
                  method = "knn",
                  preProcess = c("center", "scale"),
                  tuneLength = 10)
knnModel

## k-Nearest Neighbors 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE      Rsquared   MAE     
##    5  3.466085  0.5121775  2.816838
##    7  3.349428  0.5452823  2.727410
##    9  3.264276  0.5785990  2.660026
##   11  3.214216  0.6024244  2.603767
##   13  3.196510  0.6176570  2.591935
##   15  3.184173  0.6305506  2.577482
##   17  3.183130  0.6425367  2.567787
##   19  3.198752  0.6483184  2.592683
##   21  3.188993  0.6611428  2.588787
##   23  3.200458  0.6638353  2.604529
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 17.

knnPred <- predict(knnModel, newdata = testData$x)
## The function 'postResample' can be used to get the test set
## performance values
postResample(pred = knnPred, obs = testData$y)

##      RMSE  Rsquared       MAE 
## 3.2040595 0.6819919 2.5683461

Which models appear to give the best performance? Does MARS select the informative predictors (those named X1-X5)?

MARS MODEL

library(earth)

## Warning: package 'earth' was built under R version 3.6.3

## Loading required package: Formula

## Loading required package: plotmo

## Warning: package 'plotmo' was built under R version 3.6.3

## Loading required package: plotrix

## Loading required package: TeachingDemos

## Warning: package 'TeachingDemos' was built under R version 3.6.3

MARS_grid <- expand.grid(.degree = 1:2, .nprune = 2:15)
MARS_model <- train(x = trainingData$x, 
                  y = trainingData$y,
                  method = "earth",
                  tuneGrid = MARS_grid,
                  preProcess = c("center", "scale"),
                  tuneLength = 10)
MARS_model

## Multivariate Adaptive Regression Spline 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
## Resampling results across tuning parameters:
## 
##   degree  nprune  RMSE      Rsquared   MAE     
##   1        2      4.383438  0.2405683  3.597961
##   1        3      3.645469  0.4745962  2.930453
##   1        4      2.727602  0.7035031  2.184240
##   1        5      2.449243  0.7611230  1.939231
##   1        6      2.331605  0.7835496  1.833420
##   1        7      1.976830  0.8421599  1.562591
##   1        8      1.870959  0.8585503  1.464551
##   1        9      1.804342  0.8683110  1.410395
##   1       10      1.787676  0.8711960  1.386944
##   1       11      1.790700  0.8707740  1.393076
##   1       12      1.821005  0.8670619  1.419893
##   1       13      1.858688  0.8617344  1.445459
##   1       14      1.862343  0.8623072  1.446050
##   1       15      1.871033  0.8607099  1.457618
##   2        2      4.383438  0.2405683  3.597961
##   2        3      3.644919  0.4742570  2.929647
##   2        4      2.730222  0.7028372  2.183075
##   2        5      2.481291  0.7545789  1.965749
##   2        6      2.338369  0.7827873  1.825542
##   2        7      2.030065  0.8328250  1.602024
##   2        8      1.890997  0.8551326  1.477422
##   2        9      1.742626  0.8757904  1.371910
##   2       10      1.608221  0.8943432  1.255416
##   2       11      1.474325  0.9111463  1.157848
##   2       12      1.437483  0.9157967  1.120977
##   2       13      1.439395  0.9164721  1.128309
##   2       14      1.428565  0.9184503  1.118634
##   2       15      1.434093  0.9182413  1.121622
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 14 and degree = 2.

Look at the Perfomance of the MARS Model

MARS_predictions <- predict(MARS_model, newdata = testData$x)
postResample(pred = MARS_predictions, obs = testData$y)

##      RMSE  Rsquared       MAE 
## 1.2779993 0.9338365 1.0147070

Look at Variable Importance

varImp(MARS_model)

## earth variable importance
## 
##     Overall
## X1   100.00
## X4    84.98
## X2    68.87
## X5    48.55
## X3    38.96
## X7     0.00
## X9     0.00
## X6     0.00
## X10    0.00
## X8     0.00

SVM MODEL

library(kernlab)

## 
## Attaching package: 'kernlab'

## The following object is masked from 'package:purrr':
## 
##     cross

## The following object is masked from 'package:ggplot2':
## 
##     alpha

SVM_model <- train(x = trainingData$x,
                   y = trainingData$y,
                   method = "svmRadial",
                   preProcess = c("center", "scale"),
                   tuneLength = 10,
                   trControl = trainControl(method = "cv"))
SVM_model

## Support Vector Machines with Radial Basis Function Kernel 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
## Resampling results across tuning parameters:
## 
##   C       RMSE      Rsquared   MAE     
##     0.25  2.534047  0.7923747  2.033670
##     0.50  2.288685  0.8061041  1.833890
##     1.00  2.140818  0.8219731  1.702844
##     2.00  2.065252  0.8337042  1.632423
##     4.00  1.977030  0.8470946  1.576586
##     8.00  1.919041  0.8548963  1.533203
##    16.00  1.915333  0.8556587  1.532406
##    32.00  1.915333  0.8556587  1.532406
##    64.00  1.915333  0.8556587  1.532406
##   128.00  1.915333  0.8556587  1.532406
## 
## Tuning parameter 'sigma' was held constant at a value of 0.06670077
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.06670077 and C = 16.

Results of the SVM Model and the Most Important Variables for SVM

SVM_predictions <- predict(SVM_model, newdata = testData$x)
postResample(pred = SVM_predictions, obs = testData$y)

##      RMSE  Rsquared       MAE 
## 2.0829707 0.8242096 1.5826017

varImp(SVM_model)

## loess r-squared variable importance
## 
##      Overall
## X4  100.0000
## X1   95.5047
## X2   89.6186
## X5   45.2170
## X3   29.9330
## X9    6.3299
## X10   5.5182
## X8    3.2527
## X6    0.8884
## X7    0.0000

Neural Network Model

nnet_grid <- expand.grid(.decay = c(0, 0.01, .1), .size = c(1:10), .bag = FALSE)
nnet_maxnwts <- 5 * (ncol(trainingData$x) + 1) + 5 + 1
nnet_model <- train(x = trainingData$x,
                    y = trainingData$y,
                    method = "avNNet",
                    preProcess = c("center", "scale"),
                    tuneGrid = nnet_grid,
                    trControl = trainControl(method = "cv"),
                    linout = TRUE,
                    trace = FALSE,
                    MaxNWts = nnet_maxnwts,
                    maxit = 500)

## Warning: executing %dopar% sequentially: no parallel backend registered

## Warning: model fit failed for Fold01: decay=0.00, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold01: decay=0.01, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold01: decay=0.10, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold01: decay=0.00, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold01: decay=0.01, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold01: decay=0.10, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold01: decay=0.00, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold01: decay=0.01, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold01: decay=0.10, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold01: decay=0.00, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold01: decay=0.01, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold01: decay=0.10, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold01: decay=0.00, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning: model fit failed for Fold01: decay=0.01, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning: model fit failed for Fold01: decay=0.10, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning: model fit failed for Fold02: decay=0.00, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold02: decay=0.01, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold02: decay=0.10, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold02: decay=0.00, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold02: decay=0.01, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold02: decay=0.10, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold02: decay=0.00, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold02: decay=0.01, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold02: decay=0.10, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold02: decay=0.00, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold02: decay=0.01, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold02: decay=0.10, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold02: decay=0.00, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning: model fit failed for Fold02: decay=0.01, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning: model fit failed for Fold02: decay=0.10, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning: model fit failed for Fold03: decay=0.00, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold03: decay=0.01, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold03: decay=0.10, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold03: decay=0.00, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold03: decay=0.01, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold03: decay=0.10, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold03: decay=0.00, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold03: decay=0.01, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold03: decay=0.10, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold03: decay=0.00, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold03: decay=0.01, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold03: decay=0.10, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold03: decay=0.00, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning: model fit failed for Fold03: decay=0.01, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning: model fit failed for Fold03: decay=0.10, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning: model fit failed for Fold04: decay=0.00, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold04: decay=0.01, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold04: decay=0.10, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold04: decay=0.00, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold04: decay=0.01, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold04: decay=0.10, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold04: decay=0.00, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold04: decay=0.01, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold04: decay=0.10, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold04: decay=0.00, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold04: decay=0.01, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold04: decay=0.10, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold04: decay=0.00, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning: model fit failed for Fold04: decay=0.01, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning: model fit failed for Fold04: decay=0.10, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning: model fit failed for Fold05: decay=0.00, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold05: decay=0.01, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold05: decay=0.10, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold05: decay=0.00, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold05: decay=0.01, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold05: decay=0.10, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold05: decay=0.00, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold05: decay=0.01, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold05: decay=0.10, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold05: decay=0.00, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold05: decay=0.01, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold05: decay=0.10, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold05: decay=0.00, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning: model fit failed for Fold05: decay=0.01, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning: model fit failed for Fold05: decay=0.10, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning: model fit failed for Fold06: decay=0.00, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold06: decay=0.01, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold06: decay=0.10, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold06: decay=0.00, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold06: decay=0.01, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold06: decay=0.10, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold06: decay=0.00, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold06: decay=0.01, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold06: decay=0.10, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold06: decay=0.00, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold06: decay=0.01, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold06: decay=0.10, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold06: decay=0.00, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning: model fit failed for Fold06: decay=0.01, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning: model fit failed for Fold06: decay=0.10, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning: model fit failed for Fold07: decay=0.00, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold07: decay=0.01, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold07: decay=0.10, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold07: decay=0.00, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold07: decay=0.01, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold07: decay=0.10, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold07: decay=0.00, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold07: decay=0.01, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold07: decay=0.10, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold07: decay=0.00, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold07: decay=0.01, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold07: decay=0.10, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold07: decay=0.00, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning: model fit failed for Fold07: decay=0.01, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning: model fit failed for Fold07: decay=0.10, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning: model fit failed for Fold08: decay=0.00, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold08: decay=0.01, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold08: decay=0.10, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold08: decay=0.00, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold08: decay=0.01, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold08: decay=0.10, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold08: decay=0.00, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold08: decay=0.01, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold08: decay=0.10, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold08: decay=0.00, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold08: decay=0.01, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold08: decay=0.10, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold08: decay=0.00, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning: model fit failed for Fold08: decay=0.01, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning: model fit failed for Fold08: decay=0.10, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning: model fit failed for Fold09: decay=0.00, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold09: decay=0.01, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold09: decay=0.10, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold09: decay=0.00, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold09: decay=0.01, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold09: decay=0.10, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold09: decay=0.00, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold09: decay=0.01, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold09: decay=0.10, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold09: decay=0.00, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold09: decay=0.01, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold09: decay=0.10, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold09: decay=0.00, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning: model fit failed for Fold09: decay=0.01, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning: model fit failed for Fold09: decay=0.10, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning: model fit failed for Fold10: decay=0.00, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold10: decay=0.01, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold10: decay=0.10, size= 6, bag=FALSE Error in { : task 1 failed - "too many (73) weights"

## Warning: model fit failed for Fold10: decay=0.00, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold10: decay=0.01, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold10: decay=0.10, size= 7, bag=FALSE Error in { : task 1 failed - "too many (85) weights"

## Warning: model fit failed for Fold10: decay=0.00, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold10: decay=0.01, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold10: decay=0.10, size= 8, bag=FALSE Error in { : task 1 failed - "too many (97) weights"

## Warning: model fit failed for Fold10: decay=0.00, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold10: decay=0.01, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold10: decay=0.10, size= 9, bag=FALSE Error in { : task 1 failed - "too many (109) weights"

## Warning: model fit failed for Fold10: decay=0.00, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning: model fit failed for Fold10: decay=0.01, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning: model fit failed for Fold10: decay=0.10, size=10, bag=FALSE Error in { : task 1 failed - "too many (121) weights"

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
## There were missing values in resampled performance measures.

## Warning in train.default(x = trainingData$x, y = trainingData$y, method =
## "avNNet", : missing values found in aggregated results

nnet_model

## Model Averaged Neural Network 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
## Resampling results across tuning parameters:
## 
##   decay  size  RMSE      Rsquared   MAE     
##   0.00    1    2.434946  0.7705241  1.890180
##   0.00    2    2.464028  0.7636460  1.954264
##   0.00    3    1.952687  0.8503745  1.569414
##   0.00    4    1.993307  0.8353695  1.564593
##   0.00    5    2.163580  0.8268061  1.687171
##   0.00    6         NaN        NaN       NaN
##   0.00    7         NaN        NaN       NaN
##   0.00    8         NaN        NaN       NaN
##   0.00    9         NaN        NaN       NaN
##   0.00   10         NaN        NaN       NaN
##   0.01    1    2.430323  0.7698516  1.889068
##   0.01    2    2.443860  0.7685031  1.905649
##   0.01    3    2.128012  0.8220171  1.660445
##   0.01    4    2.087196  0.8291606  1.598100
##   0.01    5    2.033927  0.8350220  1.596082
##   0.01    6         NaN        NaN       NaN
##   0.01    7         NaN        NaN       NaN
##   0.01    8         NaN        NaN       NaN
##   0.01    9         NaN        NaN       NaN
##   0.01   10         NaN        NaN       NaN
##   0.10    1    2.437162  0.7674718  1.890319
##   0.10    2    2.397189  0.7716524  1.858708
##   0.10    3    2.108429  0.8240924  1.655143
##   0.10    4    2.084801  0.8300933  1.582811
##   0.10    5    2.078994  0.8321068  1.643078
##   0.10    6         NaN        NaN       NaN
##   0.10    7         NaN        NaN       NaN
##   0.10    8         NaN        NaN       NaN
##   0.10    9         NaN        NaN       NaN
##   0.10   10         NaN        NaN       NaN
## 
## Tuning parameter 'bag' was held constant at a value of FALSE
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 3, decay = 0 and bag = FALSE.

Results of the Neural Net Model and the Most Important Variables for the Neural Net

nnet_predictions <- predict(nnet_model, newdata = testData$x)
postResample(pred = nnet_predictions, obs = testData$y)

##      RMSE  Rsquared       MAE 
## 1.8234934 0.8683951 1.3971626

varImp(nnet_model)

## loess r-squared variable importance
## 
##      Overall
## X4  100.0000
## X1   95.5047
## X2   89.6186
## X5   45.2170
## X3   29.9330
## X9    6.3299
## X10   5.5182
## X8    3.2527
## X6    0.8884
## X7    0.0000

Random Forest Model

#10 folds repeat 3 times
control <- trainControl(method='repeatedcv', 
                        number=10, 
                        repeats=3)
#Metric compare model is Accuracy
metric <- "Accuracy"
set.seed(123)
#Number randomely variable selected is mtry
mtry <- sqrt(ncol(trainingData$x))
tunegrid <- expand.grid(.mtry=c(1:10))
rf_default <- train(x = trainingData$x,
                    y = trainingData$y,
                      method='rf', 
                      metric='RMSE', 
                      tuneGrid=tunegrid, 
                      trControl=control)
rf_default

## Random Forest 
## 
## 200 samples
##  10 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
## Resampling results across tuning parameters:
## 
##   mtry  RMSE      Rsquared   MAE     
##    1    3.285238  0.7900593  2.700433
##    2    2.804792  0.8246672  2.321143
##    3    2.577537  0.8258097  2.132428
##    4    2.487705  0.8169925  2.049686
##    5    2.415135  0.8174696  1.995772
##    6    2.398431  0.8098909  1.976713
##    7    2.390828  0.8031119  1.966884
##    8    2.378295  0.8035940  1.953366
##    9    2.375660  0.8014050  1.945577
##   10    2.372410  0.7990464  1.934024
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 10.

rf_predictions <- predict(rf_default, newdata = testData$x)
postResample(pred = rf_predictions , obs = testData$y)

##      RMSE  Rsquared       MAE 
## 2.3902120 0.7910494 1.8855316

Look at the Results of the Models

results <- data.frame(t(postResample(pred = knnPred, obs = testData$y))) %>% 
  mutate("Model" = "KNN")

results <- data.frame(t(postResample(pred = MARS_predictions, obs = testData$y))) %>%
  mutate("Model"= "MARS") %>%
  bind_rows(results)

results <- data.frame(t(postResample(pred = SVM_predictions, obs = testData$y))) %>%
  mutate("Model"= "SVM") %>%
  bind_rows(results)

results <- data.frame(t(postResample(pred = nnet_predictions, obs = testData$y))) %>%
  mutate("Model"= "Neural Network") %>%
  bind_rows(results)


results <- data.frame(t(postResample(pred = rf_predictions, obs = testData$y))) %>%
  mutate("Model"= "Random Forest") %>%
  bind_rows(results)

results %>%
  select(Model, RMSE, Rsquared, MAE) %>%
  arrange(RMSE)

##            Model     RMSE  Rsquared      MAE
## 1           MARS 1.277999 0.9338365 1.014707
## 2 Neural Network 1.823493 0.8683951 1.397163
## 3            SVM 2.082971 0.8242096 1.582602
## 4  Random Forest 2.390212 0.7910494 1.885532
## 5            KNN 3.204059 0.6819919 2.568346

rm(results)

The MARS model was the top performing model

Question 7.5

Exercise 6.3 describes data for a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and training several nonlinear regression models.

Pre-Processing from last Homework

library(AppliedPredictiveModeling)
data(ChemicalManufacturingProcess)
library(RANN)

## Warning: package 'RANN' was built under R version 3.6.3

knn_model <- preProcess(ChemicalManufacturingProcess, "knnImpute")
df <- predict(knn_model, ChemicalManufacturingProcess)


df <- df %>%
  select_at(vars(-one_of(nearZeroVar(., names = TRUE))))

in_train <- createDataPartition(df$Yield, times = 1, p = 0.8, list = FALSE)
train_df <- df[in_train, ]
test_df <- df[-in_train, ]

(a) Which nonlinear regression model gives the optimal resampling and test set performance?

Partial Least squares

pls_model <- train(
  Yield ~ ., data = train_df, method = "pls",
  center = TRUE,
  scale = TRUE,
  trControl = trainControl("cv", number = 10),
  tuneLength = 25
)

pls_model

## Partial Least Squares 
## 
## 144 samples
##  56 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 130, 129, 129, 129, 129, 130, ... 
## Resampling results across tuning parameters:
## 
##   ncomp  RMSE       Rsquared   MAE      
##    1     0.8576804  0.4131837  0.6794185
##    2     0.8714590  0.5433573  0.6255461
##    3     0.7401377  0.6075969  0.5586644
##    4     0.8679051  0.5975357  0.5941941
##    5     1.0289739  0.5701991  0.6415899
##    6     1.1090917  0.5439065  0.6818868
##    7     1.1852897  0.5239455  0.7239929
##    8     1.3302896  0.5122051  0.7729815
##    9     1.4834766  0.5015878  0.8180053
##   10     1.6424755  0.4951390  0.8668867
##   11     1.7779462  0.4881766  0.9190981
##   12     1.8586315  0.4853637  0.9600555
##   13     1.9480507  0.4705111  0.9979606
##   14     1.9960252  0.4683392  1.0132388
##   15     2.0690262  0.4642428  1.0292437
##   16     2.1146237  0.4645513  1.0354379
##   17     2.1263805  0.4695277  1.0267929
##   18     2.1287293  0.4682780  1.0240099
##   19     2.1602221  0.4736577  1.0352071
##   20     2.2157738  0.4741899  1.0563007
##   21     2.2317479  0.4790634  1.0623177
##   22     2.2444048  0.4820692  1.0632880
##   23     2.2729578  0.4821992  1.0708513
##   24     2.3069273  0.4837630  1.0781972
##   25     2.3083085  0.4881668  1.0818909
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was ncomp = 3.

pls_predictions <- predict(pls_model, test_df)

results <- data.frame(t(postResample(pred = pls_predictions, obs = test_df$Yield))) %>%
  mutate("Model"= "PLS")

pls_predictions <- predict(pls_model, test_df)

results <- data.frame(t(postResample(pred = pls_predictions, obs = test_df$Yield))) %>%
  mutate("Model"= "PLS")

KNN Model

knn_model <- train(
  Yield ~ ., data = train_df, method = "knn",
  center = TRUE,
  scale = TRUE,
  trControl = trainControl("cv", number = 10),
  tuneLength = 25
)
knn_model

## k-Nearest Neighbors 
## 
## 144 samples
##  56 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 130, 128, 129, 130, 131, 128, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE       Rsquared   MAE      
##    5  0.6692539  0.5938108  0.5308364
##    7  0.7283808  0.5290278  0.5930450
##    9  0.7417540  0.5207871  0.6080960
##   11  0.7430073  0.5286696  0.6087160
##   13  0.7627158  0.5055015  0.6285991
##   15  0.7762147  0.4889522  0.6399524
##   17  0.7893038  0.4699167  0.6512046
##   19  0.7979458  0.4569047  0.6564169
##   21  0.8060648  0.4594142  0.6602083
##   23  0.8137100  0.4545218  0.6691173
##   25  0.8190099  0.4447262  0.6695808
##   27  0.8244290  0.4426863  0.6732239
##   29  0.8350223  0.4234077  0.6810915
##   31  0.8386291  0.4216373  0.6887168
##   33  0.8413266  0.4275678  0.6899586
##   35  0.8469036  0.4239683  0.6940296
##   37  0.8534702  0.4101826  0.6974884
##   39  0.8579448  0.3973243  0.7022286
##   41  0.8607191  0.3965386  0.7027618
##   43  0.8656928  0.3870763  0.7067856
##   45  0.8699861  0.3833272  0.7106253
##   47  0.8738008  0.3776011  0.7128646
##   49  0.8777531  0.3721661  0.7168320
##   51  0.8803736  0.3651827  0.7191199
##   53  0.8821089  0.3716089  0.7223133
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 5.

knn_predictions <- predict(knn_model, test_df)

results <- data.frame(t(postResample(pred = knn_predictions, obs = test_df$Yield))) %>%
  mutate("Model"= "KNN") %>% rbind(results)

ggplot(knn_model, highlight = TRUE) + 
  labs(title = paste0("Tuning profile: ", knn_model$modelInfo$label))

MARS Model

MARS_grid <- expand.grid(.degree = 1:2, .nprune = 2:15)

MARS_model <- train(
  Yield ~ ., data = train_df, method = "earth",
  tuneGrid = MARS_grid,
  # If the following lines are uncommented, it throws an error
  #center = TRUE,
  #scale = TRUE,
  trControl = trainControl("cv", number = 10),
  tuneLength = 25
)
MARS_model

## Multivariate Adaptive Regression Spline 
## 
## 144 samples
##  56 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 128, 130, 128, 132, 129, 131, ... 
## Resampling results across tuning parameters:
## 
##   degree  nprune  RMSE       Rsquared   MAE      
##   1        2      0.7833357  0.4136435  0.6272108
##   1        3      0.6598706  0.5945532  0.5310295
##   1        4      0.6426884  0.6206480  0.5152161
##   1        5      0.6597011  0.6079388  0.5322105
##   1        6      0.6692290  0.5904944  0.5468852
##   1        7      0.6591737  0.6129292  0.5396127
##   1        8      0.6366928  0.6372710  0.5231687
##   1        9      0.6415472  0.6233287  0.5261385
##   1       10      0.6547744  0.6022351  0.5331004
##   1       11      0.6707619  0.6029028  0.5408217
##   1       12      0.7092721  0.5769486  0.5682464
##   1       13      0.7094381  0.5813564  0.5658792
##   1       14      0.7123360  0.5839616  0.5682336
##   1       15      0.7103324  0.5835317  0.5687466
##   2        2      0.8310258  0.3617201  0.6600760
##   2        3      0.7093194  0.5379718  0.5688216
##   2        4      0.6724856  0.5666371  0.5503680
##   2        5      0.6853900  0.5633715  0.5515835
##   2        6      0.6826577  0.5684930  0.5621065
##   2        7      0.6997120  0.5390960  0.5776795
##   2        8      0.6994303  0.5484022  0.5731253
##   2        9      0.7468681  0.5218637  0.5932661
##   2       10      0.7599665  0.5201282  0.6012816
##   2       11      0.7499884  0.5302305  0.5978698
##   2       12      0.7376165  0.5447203  0.6009713
##   2       13      0.7529652  0.5490325  0.6067416
##   2       14      0.7903431  0.5217560  0.6423770
##   2       15      0.8007415  0.5162239  0.6373339
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 8 and degree = 1.

MARS_predictions <- predict(MARS_model, test_df)

results <- data.frame(t(postResample(pred = MARS_predictions, obs=test_df$Yield))) %>%
  mutate("Model"= "MARS") %>% rbind(results)

ggplot(MARS_model, highlight = TRUE) + 
  labs(title = paste0("Tuning profile: ", MARS_model$modelInfo$label))

# Variable Importance
ggplot(varImp(MARS_model), top = 20) + 
  labs(title = paste0("Variable importance: ", MARS_model$modelInfo$label))

Random Forest Model

#10 folds repeat 3 times
control <- trainControl(method='repeatedcv', 
                        number=10, 
                        repeats=3)
#Metric compare model is Accuracy
metric <- "Accuracy"
set.seed(123)
#Number randomely variable selected is mtry
mtry <- sqrt(ncol(trainingData$x))
tunegrid <- expand.grid(.mtry=c(1:10))
rf_default <- train(Yield ~ ., data = train_df,
                      method='rf', 
                      metric='RMSE', 
                      tuneGrid=tunegrid, 
                      trControl=control)
rf_default

## Random Forest 
## 
## 144 samples
##  56 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 128, 129, 129, 130, 128, 131, ... 
## Resampling results across tuning parameters:
## 
##   mtry  RMSE       Rsquared   MAE      
##    1    0.7500279  0.5894811  0.6083494
##    2    0.6942100  0.6324708  0.5612511
##    3    0.6738387  0.6449603  0.5443797
##    4    0.6610700  0.6565940  0.5319181
##    5    0.6561592  0.6545893  0.5274493
##    6    0.6546707  0.6529865  0.5231429
##    7    0.6513045  0.6560895  0.5211297
##    8    0.6476475  0.6562952  0.5170142
##    9    0.6495370  0.6492577  0.5154299
##   10    0.6455195  0.6529822  0.5146515
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 10.

RF_predictions <- predict(rf_default, test_df)

results <- data.frame(t(postResample(pred = RF_predictions, obs=test_df$Yield))) %>%
  mutate("Model"= "RF") %>% rbind(results)


ggplot(knn_model, highlight = TRUE) + 
  labs(title = paste0("Tuning profile: ", knn_model$modelInfo$label))

# Variable Importance
ggplot(varImp(knn_model), top = 20) + 
  labs(title = paste0("Variable importance: ", knn_model$modelInfo$label))

Neural Net Model

nnetGrid <- expand.grid(decay = c(0, 0.01, 0.1), size = 1:10, bag = FALSE)
prep <- c("center", "scale", "nzv")
ctrl <- trainControl(method = "cv", number = 10)
nnetModel <- train(Yield ~ ., data = train_df,
                   preProcess = prep, 
                   trControl = ctrl, 
                   method = "avNNet", 
                   linout = TRUE, trace = FALSE, 
                   tuneGrid = nnetGrid)
nnetModel

## Model Averaged Neural Network 
## 
## 144 samples
##  56 predictor
## 
## Pre-processing: centered (56), scaled (56) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 130, 129, 130, 130, 131, 128, ... 
## Resampling results across tuning parameters:
## 
##   decay  size  RMSE       Rsquared   MAE      
##   0.00    1    0.7301770  0.5091075  0.5762919
##   0.00    2    0.7096863  0.5482899  0.5851779
##   0.00    3    0.6828039  0.5905919  0.5541068
##   0.00    4    0.6990509  0.5537490  0.5770173
##   0.00    5    0.7320894  0.5311838  0.5687643
##   0.00    6    0.6904937  0.5533814  0.5390123
##   0.00    7    0.7331318  0.5301216  0.5873645
##   0.00    8    0.6919375  0.5810917  0.5583915
##   0.00    9    0.6784288  0.5765275  0.5461986
##   0.00   10    0.6835410  0.5918399  0.5307809
##   0.01    1    0.7240042  0.5191925  0.5758404
##   0.01    2    0.7246437  0.5211805  0.6014279
##   0.01    3    0.7056601  0.5520487  0.5566007
##   0.01    4    0.7100460  0.5531643  0.5332127
##   0.01    5    0.7256409  0.5327324  0.5670537
##   0.01    6    0.6989776  0.5631084  0.5434745
##   0.01    7    0.6521814  0.6015217  0.5122805
##   0.01    8    0.6759450  0.5904515  0.5538348
##   0.01    9    0.6369953  0.6344644  0.5115720
##   0.01   10    0.6576972  0.6108285  0.5124666
##   0.10    1    0.7174985  0.5458021  0.5711438
##   0.10    2    0.7197452  0.5327101  0.5762454
##   0.10    3    0.6586336  0.6025226  0.5176173
##   0.10    4    0.7148784  0.5492912  0.5774686
##   0.10    5    0.6862122  0.5758267  0.5351433
##   0.10    6    0.6798097  0.5872547  0.5407973
##   0.10    7    0.6714346  0.5956588  0.5330426
##   0.10    8    0.6547782  0.6135584  0.5306272
##   0.10    9    0.6768946  0.5967825  0.5287798
##   0.10   10    0.6644649  0.6034206  0.5297647
## 
## Tuning parameter 'bag' was held constant at a value of FALSE
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 9, decay = 0.01 and bag = FALSE.

NN_predictions <- predict(nnetModel, test_df)

results <- data.frame(t(postResample(pred = NN_predictions, obs=test_df$Yield))) %>%
  mutate("Model"= "Neural Net") %>% rbind(results)

ggplot(nnetModel, highlight = TRUE) + 
  labs(title = paste0("Tuning profile: ", nnetModel$modelInfo$label))

K Nearest Neighbors

knn_model <- train(
  Yield ~ ., data = train_df, method = "knn",
  center = TRUE,
  scale = TRUE,
  trControl = trainControl("cv", number = 10),
  tuneLength = 25
)
knn_model

## k-Nearest Neighbors 
## 
## 144 samples
##  56 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 131, 131, 129, 130, 129, 129, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE       Rsquared   MAE      
##    5  0.6738513  0.5936204  0.5403976
##    7  0.7099843  0.5554000  0.5785518
##    9  0.7371194  0.5114338  0.6126272
##   11  0.7334601  0.5339671  0.5994098
##   13  0.7417565  0.5238572  0.6041291
##   15  0.7602849  0.4916716  0.6255913
##   17  0.7771461  0.4751727  0.6391590
##   19  0.7882919  0.4729021  0.6452486
##   21  0.7958795  0.4625282  0.6533769
##   23  0.8044015  0.4549176  0.6552781
##   25  0.8181549  0.4300606  0.6657827
##   27  0.8213276  0.4311949  0.6682049
##   29  0.8306874  0.4166684  0.6800271
##   31  0.8350849  0.4151758  0.6859423
##   33  0.8427047  0.4083959  0.6880047
##   35  0.8466887  0.4083938  0.6891310
##   37  0.8507499  0.4022125  0.6906788
##   39  0.8556134  0.3934145  0.6940113
##   41  0.8593488  0.3921115  0.6985115
##   43  0.8646777  0.3785678  0.7030986
##   45  0.8658290  0.3792486  0.7033885
##   47  0.8713258  0.3700940  0.7058445
##   49  0.8766391  0.3549698  0.7101961
##   51  0.8801130  0.3460556  0.7158283
##   53  0.8826976  0.3510717  0.7180841
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 5.

ggplot(knnModel, highlight = TRUE) + 
  labs(title = paste0("Tuning profile: ", knnModel$modelInfo$label))

knn_predictions <- predict(knn_model, test_df)

results <- data.frame(t(postResample(pred = knn_predictions, obs = test_df$Yield))) %>%
  mutate("Model"= "KNN") %>% rbind(results)

Results Summary

THe Random Forest Function seemed to be the highest performing algorithm in this case

results %>%
  select(Model, RMSE, Rsquared, MAE) %>%
  arrange(RMSE)

##        Model      RMSE  Rsquared       MAE
## 1         RF 0.4351280 0.8569379 0.3587283
## 2        KNN 0.6610492 0.5545215 0.5260295
## 3        KNN 0.6610492 0.5545215 0.5260295
## 4 Neural Net 0.7254229 0.5570892 0.6166974
## 5       MARS 0.7545343 0.5262237 0.6248569
## 6        PLS 0.7952574 0.4735648 0.6712068

Part B

Which predictors are most important in the optimal nonlinear regression model? Do either the biological or process variables dominate the list? How do the top ten important predictors compare to the top ten predictors from the optimal linear model?

# Variable Importance
ggplot(varImp(knn_model), top = 20) + 
  labs(title = paste0("Variable importance: ", knn_model$modelInfo$label))

# Variable Importance
ggplot(varImp(nnetModel), top = 20) + 
  labs(title = paste0("Variable importance: ", nnetModel$modelInfo$label))

# Variable Importance
ggplot(varImp(pls_model), top = 20) + 
  labs(title = paste0("Variable importance: ", pls_model$modelInfo$label))

# Variable Importance
ggplot(varImp(MARS_model), top = 20) + 
  labs(title = paste0("Variable importance: ", MARS_model$modelInfo$label))

Part C

Explore the relationships between the top predictors and the response for the predictors that are unique to the optimal nonlinear regression model. Do these plots reveal intuition about the biological or process predictors and their relationship with yield?

The variables below all represent features that are strongly correlated with the target variable and they all showed up relatively high in variable importance for the different algorithms

ggplot(train_df, aes(BiologicalMaterial12, Yield)) +
  geom_point()

ggplot(train_df, aes(BiologicalMaterial06, Yield)) +
  geom_point()

ggplot(train_df, aes(BiologicalMaterial03, Yield)) +
  geom_point()

ggplot(train_df, aes(ManufacturingProcess13, Yield)) +
  geom_point()

ggplot(train_df, aes(ManufacturingProcess17, Yield)) +
  geom_point()

ggplot(train_df, aes(ManufacturingProcess32, Yield)) +
  geom_point()