library(tidyverse)
library(caret)
library(plotmo)
library(earth)
library(kernlab)
library(forecast)
library(ipred)
library(mlbench)
library(AppliedPredictiveModeling)

7.2

Friedman (1991) introduced several benchmark data sets create by simulation. One of these simulations used the following nonlinear equation to create data:

\[y = 10 sin (\pi x_1 x_2) + 20(x_3−0.5)^2 + 10x_4 +5x_5 + N(0,\sigma^2)\]

where the x values are random variables uniformly distributed between [0, 1] (there are also 5 other non-informative variables also created in the simulation). The package mlbench contains a function called mlbench.friedman1 that simulates these data:

set.seed(200) 
trainingData <- mlbench.friedman1(200, sd = 1)  
trainingData$x <- data.frame(trainingData$x) 
featurePlot(trainingData$x, trainingData$y)

testData <- mlbench.friedman1(5000, sd = 1) 
testData$x <- data.frame(testData$x)

Tune several models on these data.

Which models appear to give the best performance?

Build, Tune and Explore Models

Multivariate Adaptive Regression Splines (MARS)

# Define the candidate models to test 
marsGrid <- expand.grid(.degree = 1:2, .nprune = 2:38) 
set.seed(100) 

marsTuned <- train(trainingData$x, trainingData$y, 
                   method = "earth", 
                   tuneGrid = marsGrid, 
                   trControl = trainControl(method = "cv", number = 10))

 marsTuned

## Multivariate Adaptive Regression Spline 
## 
## 200 samples
##  10 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
## Resampling results across tuning parameters:
## 
##   degree  nprune  RMSE      Rsquared   MAE     
##   1        2      4.327937  0.2544880  3.600474
##   1        3      3.572450  0.4912720  2.895811
##   1        4      2.596841  0.7183600  2.106341
##   1        5      2.370161  0.7659777  1.918669
##   1        6      2.276141  0.7881481  1.810001
##   1        7      1.766728  0.8751831  1.390215
##   1        8      1.780946  0.8723243  1.401345
##   1        9      1.665091  0.8819775  1.325515
##   1       10      1.663804  0.8821283  1.327657
##   1       11      1.657738  0.8822967  1.331730
##   1       12      1.653784  0.8827903  1.331504
##   1       13      1.648496  0.8823663  1.316407
##   1       14      1.639073  0.8841742  1.312833
##   1       15      1.639073  0.8841742  1.312833
##   1       16      1.639073  0.8841742  1.312833
##   1       17      1.639073  0.8841742  1.312833
##   1       18      1.639073  0.8841742  1.312833
##   1       19      1.639073  0.8841742  1.312833
##   1       20      1.639073  0.8841742  1.312833
##   1       21      1.639073  0.8841742  1.312833
##   1       22      1.639073  0.8841742  1.312833
##   1       23      1.639073  0.8841742  1.312833
##   1       24      1.639073  0.8841742  1.312833
##   1       25      1.639073  0.8841742  1.312833
##   1       26      1.639073  0.8841742  1.312833
##   1       27      1.639073  0.8841742  1.312833
##   1       28      1.639073  0.8841742  1.312833
##   1       29      1.639073  0.8841742  1.312833
##   1       30      1.639073  0.8841742  1.312833
##   1       31      1.639073  0.8841742  1.312833
##   1       32      1.639073  0.8841742  1.312833
##   1       33      1.639073  0.8841742  1.312833
##   1       34      1.639073  0.8841742  1.312833
##   1       35      1.639073  0.8841742  1.312833
##   1       36      1.639073  0.8841742  1.312833
##   1       37      1.639073  0.8841742  1.312833
##   1       38      1.639073  0.8841742  1.312833
##   2        2      4.327937  0.2544880  3.600474
##   2        3      3.572450  0.4912720  2.895811
##   2        4      2.661826  0.7070510  2.173471
##   2        5      2.404015  0.7578971  1.975387
##   2        6      2.243927  0.7914805  1.783072
##   2        7      1.856336  0.8605482  1.435682
##   2        8      1.754607  0.8763186  1.396841
##   2        9      1.603578  0.8938666  1.261361
##   2       10      1.492421  0.9084998  1.168700
##   2       11      1.317350  0.9292504  1.033926
##   2       12      1.304327  0.9320133  1.019108
##   2       13      1.277510  0.9323681  1.002927
##   2       14      1.269626  0.9350024  1.003346
##   2       15      1.266217  0.9359400  1.013893
##   2       16      1.268470  0.9354868  1.011414
##   2       17      1.268470  0.9354868  1.011414
##   2       18      1.268470  0.9354868  1.011414
##   2       19      1.268470  0.9354868  1.011414
##   2       20      1.268470  0.9354868  1.011414
##   2       21      1.268470  0.9354868  1.011414
##   2       22      1.268470  0.9354868  1.011414
##   2       23      1.268470  0.9354868  1.011414
##   2       24      1.268470  0.9354868  1.011414
##   2       25      1.268470  0.9354868  1.011414
##   2       26      1.268470  0.9354868  1.011414
##   2       27      1.268470  0.9354868  1.011414
##   2       28      1.268470  0.9354868  1.011414
##   2       29      1.268470  0.9354868  1.011414
##   2       30      1.268470  0.9354868  1.011414
##   2       31      1.268470  0.9354868  1.011414
##   2       32      1.268470  0.9354868  1.011414
##   2       33      1.268470  0.9354868  1.011414
##   2       34      1.268470  0.9354868  1.011414
##   2       35      1.268470  0.9354868  1.011414
##   2       36      1.268470  0.9354868  1.011414
##   2       37      1.268470  0.9354868  1.011414
##   2       38      1.268470  0.9354868  1.011414
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 15 and degree = 2.

marsTuned$bestTune

##    nprune degree
## 51     15      2

ggplot(marsTuned)

With MARS, the optimal model retains 15 terms and includes up to 2nd degree interactions. This is confirmed again below:

marsTuned$finalModel

## Selected 15 of 18 terms, and 5 of 10 predictors
## Termination condition: Reached nk 21
## Importance: X1, X4, X2, X5, X3, X6-unused, X7-unused, X8-unused, X9-unused, ...
## Number of terms at each degree of interaction: 1 10 4
## GCV 1.618197    RSS 217.6151    GRSq 0.9343005    RSq 0.9553786

varImp(marsTuned)

## earth variable importance
## 
##     Overall
## X1   100.00
## X4    85.14
## X2    69.24
## X5    49.31
## X3    40.00
## X9     0.00
## X6     0.00
## X8     0.00
## X7     0.00
## X10    0.00

plot(varImp(marsTuned))

The function plotmo plots regression surfaces for a model. It creates a separate plot for each variable showing the predicted response as the predictor variable changes. Further details found here.

plotmo(marsTuned)

##  plotmo grid:    X1        X2       X3        X4        X5        X6        X7
##           0.5139349 0.5106664 0.537307 0.4445841 0.5343299 0.4975981 0.4688035
##        X8        X9       X10
##  0.497961 0.5288716 0.5359218

Variable Importance

If you look at the plotmo function below, you can see that only 5 variables were plotted. Now when we look at the output below, only x1, x4, x2, x3, x5 are considered as important in the model.

Neural Networks (NNET)

## Create a specific candidate set of models to evaluate: 
nnetGrid <- expand.grid(decay = c(0, 0.01, .1), size = c(1:10), bag = FALSE) 

set.seed(100) 
nnetTuned <- train(trainingData$x, trainingData$y,  
                  method = "avNNet",  
                  tuneGrid = nnetGrid,  
                  trControl = trainControl(method = "cv", number = 10),
                  preProc = c("center", "scale"),  
                  linout = TRUE,  trace = FALSE,  
                  MaxNWts = 10 * (ncol(trainingData$x) + 1) + 10 + 1, 
                  maxit = 500)

## Warning: executing %dopar% sequentially: no parallel backend registered

nnetTuned

## Model Averaged Neural Network 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
## Resampling results across tuning parameters:
## 
##   decay  size  RMSE      Rsquared   MAE     
##   0.00    1    2.392711  0.7610354  1.897330
##   0.00    2    2.410532  0.7567109  1.907478
##   0.00    3    2.043957  0.8224281  1.630751
##   0.00    4    2.289347  0.8130639  1.749187
##   0.00    5    2.445600  0.7709399  1.824446
##   0.00    6    2.898295  0.7388800  2.052725
##   0.00    7    3.351563  0.6644147  2.460366
##   0.00    8    6.513566  0.4418645  3.563297
##   0.00    9    4.484215  0.5644107  2.877950
##   0.00   10    3.422545  0.6247430  2.439739
##   0.01    1    2.385381  0.7602926  1.887906
##   0.01    2    2.425125  0.7510903  1.935991
##   0.01    3    2.151209  0.8016018  1.701951
##   0.01    4    2.091925  0.8154383  1.676653
##   0.01    5    2.169742  0.7999255  1.738715
##   0.01    6    2.262032  0.8056619  1.817195
##   0.01    7    2.318301  0.7861811  1.856908
##   0.01    8    2.413847  0.7772629  1.938009
##   0.01    9    2.317190  0.7847500  1.857641
##   0.01   10    2.480407  0.7408505  1.995656
##   0.10    1    2.393965  0.7596431  1.894191
##   0.10    2    2.423612  0.7525959  1.935872
##   0.10    3    2.169914  0.7982380  1.726854
##   0.10    4    2.059080  0.8224160  1.648610
##   0.10    5    1.975656  0.8394000  1.578979
##   0.10    6    2.152198  0.8098015  1.693056
##   0.10    7    2.161512  0.8163011  1.693526
##   0.10    8    2.273716  0.7922525  1.822713
##   0.10    9    2.315333  0.7811273  1.785409
##   0.10   10    2.334803  0.7692182  1.872733
## 
## Tuning parameter 'bag' was held constant at a value of FALSE
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 5, decay = 0.1 and bag = FALSE.

nnetTuned$bestTune

##    size decay   bag
## 25    5   0.1 FALSE

nnetTuned$finalModel

## Model Averaged Neural Network with 5 Repeats  
## 
## a 10-5-1 network with 61 weights
## options were - linear output units  decay=0.1

ggplot(nnetTuned)

varImp(nnetTuned)

## loess r-squared variable importance
## 
##      Overall
## X4  100.0000
## X1   95.5047
## X2   89.6186
## X5   45.2170
## X3   29.9330
## X9    6.3299
## X10   5.5182
## X8    3.2527
## X6    0.8884
## X7    0.0000

plotmo(nnetTuned)

##  plotmo grid:    X1        X2       X3        X4        X5        X6        X7
##           0.5139349 0.5106664 0.537307 0.4445841 0.5343299 0.4975981 0.4688035
##        X8        X9       X10
##  0.497961 0.5288716 0.5359218

With `NNET 10 variables are considered as important to the response variable.

Support Vector Machines (SVM)

svmTuned <- train(trainingData$x, trainingData$y, 
                     method = "svmRadial", 
                     preProc = c("center", "scale"),
                     tuneLength = 14, 
                     trControl = trainControl(method = "cv"))

svmTuned

## Support Vector Machines with Radial Basis Function Kernel 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
## Resampling results across tuning parameters:
## 
##   C        RMSE      Rsquared   MAE     
##      0.25  2.485823  0.8027195  1.994537
##      0.50  2.229138  0.8161918  1.785385
##      1.00  2.054083  0.8349587  1.631458
##      2.00  1.947733  0.8487502  1.535577
##      4.00  1.900736  0.8551255  1.492755
##      8.00  1.870409  0.8595887  1.478525
##     16.00  1.863295  0.8606827  1.477106
##     32.00  1.863295  0.8606827  1.477106
##     64.00  1.863295  0.8606827  1.477106
##    128.00  1.863295  0.8606827  1.477106
##    256.00  1.863295  0.8606827  1.477106
##    512.00  1.863295  0.8606827  1.477106
##   1024.00  1.863295  0.8606827  1.477106
##   2048.00  1.863295  0.8606827  1.477106
## 
## Tuning parameter 'sigma' was held constant at a value of 0.06219643
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.06219643 and C = 16.

svmTuned$bestTune

##        sigma  C
## 7 0.06219643 16

svmTuned$finalModel

## Support Vector Machine object of class "ksvm" 
## 
## SV type: eps-svr  (regression) 
##  parameter : epsilon = 0.1  cost C = 16 
## 
## Gaussian Radial Basis kernel function. 
##  Hyperparameter : sigma =  0.0621964330848378 
## 
## Number of Support Vectors : 152 
## 
## Objective Function Value : -74.9011 
## Training error : 0.008485

ggplot(svmTuned)

varImp(svmTuned)

## loess r-squared variable importance
## 
##      Overall
## X4  100.0000
## X1   95.5047
## X2   89.6186
## X5   45.2170
## X3   29.9330
## X9    6.3299
## X10   5.5182
## X8    3.2527
## X6    0.8884
## X7    0.0000

plot(varImp(svmTuned))

plotmo(svmTuned)

##  plotmo grid:    X1        X2       X3        X4        X5        X6        X7
##           0.5139349 0.5106664 0.537307 0.4445841 0.5343299 0.4975981 0.4688035
##        X8        X9       X10
##  0.497961 0.5288716 0.5359218

K-Nearest Neighbors (KNN)

knnTune <- train(trainingData$x, 
                 trainingData$y,
                 method = "knn",
                 preProc = c("center", "scale"), 
                 tuneGrid = data.frame(.k = 1:20),
                 trControl = trainControl(method = "cv"))

knnTune

## k-Nearest Neighbors 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE      Rsquared   MAE     
##    1  4.155101  0.3952878  3.426372
##    2  3.329600  0.5487936  2.725990
##    3  3.340273  0.5474137  2.754800
##    4  3.303963  0.5572073  2.716737
##    5  3.220465  0.5929498  2.676269
##    6  3.183393  0.6075960  2.630516
##    7  3.143601  0.6303101  2.584373
##    8  3.138957  0.6428108  2.536260
##    9  3.131241  0.6525861  2.544453
##   10  3.088974  0.6791399  2.511649
##   11  3.092000  0.6866965  2.517350
##   12  3.088752  0.6885886  2.502131
##   13  3.077457  0.6980979  2.484329
##   14  3.119033  0.6876853  2.528248
##   15  3.103129  0.6979160  2.520581
##   16  3.094172  0.7068200  2.516521
##   17  3.129654  0.7112779  2.547965
##   18  3.150793  0.7057773  2.575678
##   19  3.140874  0.7140958  2.549941
##   20  3.160866  0.7178340  2.568251
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 13.

knnTune$bestTune

##     k
## 13 13

knnTune$finalModel

## 13-nearest neighbor regression model

ggplot(knnTune)

varImp(knnTune)

## loess r-squared variable importance
## 
##      Overall
## X4  100.0000
## X1   95.5047
## X2   89.6186
## X5   45.2170
## X3   29.9330
## X9    6.3299
## X10   5.5182
## X8    3.2527
## X6    0.8884
## X7    0.0000

plot(varImp(knnTune))

plotmo(knnTune)

##  plotmo grid:    X1        X2       X3        X4        X5        X6        X7
##           0.5139349 0.5106664 0.537307 0.4445841 0.5343299 0.4975981 0.4688035
##        X8        X9       X10
##  0.497961 0.5288716 0.5359218

Evaluation

# MARS
marspred <- predict(marsTuned, newdata = testData$x)
marspv <- postResample(pred = marspred, obs = testData$y) #performance values

# NNET
nnpred <- predict(nnetTuned, newdata = testData$x)
nnpv <- postResample(pred = nnpred, obs = testData$y) 

# SVM
svmpred <- predict(svmTuned, newdata = testData$x)
svmpv <- postResample(pred = svmpred, obs = testData$y) 

#KNN
knnpred <- predict(knnTune, newdata = testData$x)
knnpv <- postResample(pred = knnpred, obs = testData$y)

data.frame(marspv, nnpv, svmpv, knnpv) %>% kableExtra::kable() %>% kableExtra::kable_styling(bootstrap_options = "striped")

	marspv	nnpv	svmpv	knnpv
RMSE	1.1589948	2.1113956	2.0718255	3.1481557
Rsquared	0.9460418	0.8277556	0.8259563	0.6747755
MAE	0.9250230	1.5739011	1.5737503	2.5236041

It seems as though that the MARS model performed the best on the data. This is due to the fact that it has the lowest performance scores shown in the table above.

Does MARS select the informative predictors (those named X1–X5)?

Yes, and only those 5 predictors. Also the other models considers the variable X1 as the most important predictor while the other two models has X4 at the top.

7.5

Exercise 6.3 describes data for a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several nonlinear regression models.

Preprocessing

data("ChemicalManufacturingProcess")

preprocessing <- preProcess(ChemicalManufacturingProcess[,-1], method = c("center", "scale", "knnImpute", "corr", "nzv"))
Xpreprocess <- predict(preprocessing, ChemicalManufacturingProcess[,-1])

yield <- as.matrix(ChemicalManufacturingProcess$Yield)

set.seed(789)
split2 <- yield %>%
  createDataPartition(p = 0.8, list = FALSE, times = 1)

Xtrain.data  <- Xpreprocess[split2, ] #chem train
xtest.data <- Xpreprocess[-split2, ] #chem test
Ytrain.data  <- yield[split2, ] #yield train
ytest.data <- yield[-split2, ] #yield test

(a) Which nonlinear regression model gives the optimal resampling and test set performance?

NNET

nnetGrid <- expand.grid(decay = c(0, 0.01, .1), size = c(1:10), bag = FALSE)

set.seed(200)
chem_nnet_tuned <- train(Xtrain.data, Ytrain.data,
                  method = "avNNet",
                  tuneGrid = nnetGrid,
                  trControl = trainControl(method = "cv", number = 10),
                  linout = TRUE,  trace = FALSE,
                  MaxNWts = 10 * (ncol(Xtrain.data) + 1) + 10 + 1,
                  maxit = 500)

MARS

marsGrid <- expand.grid(.degree = 1:3, .nprune = 2:100)
set.seed(200)

chem_mars_tuned <- train(Xtrain.data, Ytrain.data,
                   method = "earth",
                   tuneGrid = marsGrid,
                   trControl = trainControl(method = "cv", number = 10))

SVM

set.seed(200)
chem_svm_tuned <- train(Xtrain.data, Ytrain.data,
                     method = "svmRadial",
                     preProc = c("center", "scale"),
                     tuneLength = 14,
                     trControl = trainControl(method = "cv"))

KNN

set.seed(200)
chem_knn_tuned <- train(Xtrain.data,
                        Ytrain.data,
                        method = "knn",
                        tuneGrid = data.frame(.k = 1:20),
                        trControl = trainControl(method = "cv"))

Evaluation

nnpred2 <- predict(chem_nnet_tuned, newdata = xtest.data)
nnpv2 <- postResample(pred = nnpred2, obs = ytest.data)

marspred2 <- predict(chem_mars_tuned, newdata = xtest.data)
marspv2 <- postResample(pred = marspred2, obs = ytest.data)

svmpred2 <- predict(chem_svm_tuned, newdata = xtest.data)
svmpv2 <- postResample(pred = svmpred2, obs = ytest.data)

knnpred <- predict(chem_knn_tuned, newdata = xtest.data)
knnpv <- postResample(pred = knnpred, obs = ytest.data)

data.frame(nnpv, marspv, svmpv, knnpv) %>% kableExtra::kable() %>% kableExtra::kable_styling(bootstrap_options = "striped")

	nnpv	marspv	svmpv	knnpv
RMSE	2.1113956	1.1589948	2.0718255	1.3461979
Rsquared	0.8277556	0.9460418	0.8259563	0.3373993
MAE	1.5739011	0.9250230	1.5737503	1.1138750

MARS is again the most optimal model.

Output from MARS Model

chem_mars_tuned

## Multivariate Adaptive Regression Spline 
## 
## 144 samples
##  56 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 130, 129, 129, 130, 131, 131, ... 
## Resampling results across tuning parameters:
## 
##   degree  nprune  RMSE      Rsquared   MAE      
##   1         2     1.436075  0.4577748  1.1408022
##   1         3     1.193294  0.6049739  0.9687530
##   1         4     1.193574  0.5759799  0.9516904
##   1         5     1.220425  0.5652127  0.9877018
##   1         6     1.233390  0.5544421  0.9929996
##   1         7     1.286913  0.5331624  1.0363345
##   1         8     1.257745  0.5563799  1.0121255
##   1         9     1.321295  0.5343454  1.0427887
##   1        10     1.297365  0.5482386  1.0330644
##   1        11     1.322958  0.5465284  1.0297670
##   1        12     1.335926  0.5363627  1.0406954
##   1        13     1.300053  0.5491050  1.0060914
##   1        14     1.311556  0.5461103  1.0154344
##   1        15     1.315674  0.5459365  1.0156035
##   1        16     1.306160  0.5491764  0.9961855
##   1        17     1.311054  0.5446757  0.9949384
##   1        18     1.311054  0.5446757  0.9949384
##   1        19     1.311054  0.5446757  0.9949384
##   1        20     1.311054  0.5446757  0.9949384
##   1        21     1.311054  0.5446757  0.9949384
##   1        22     1.311054  0.5446757  0.9949384
##   1        23     1.311054  0.5446757  0.9949384
##   1        24     1.311054  0.5446757  0.9949384
##   1        25     1.311054  0.5446757  0.9949384
##   1        26     1.311054  0.5446757  0.9949384
##   1        27     1.311054  0.5446757  0.9949384
##   1        28     1.311054  0.5446757  0.9949384
##   1        29     1.311054  0.5446757  0.9949384
##   1        30     1.311054  0.5446757  0.9949384
##   1        31     1.311054  0.5446757  0.9949384
##   1        32     1.311054  0.5446757  0.9949384
##   1        33     1.311054  0.5446757  0.9949384
##   1        34     1.311054  0.5446757  0.9949384
##   1        35     1.311054  0.5446757  0.9949384
##   1        36     1.311054  0.5446757  0.9949384
##   1        37     1.311054  0.5446757  0.9949384
##   1        38     1.311054  0.5446757  0.9949384
##   1        39     1.311054  0.5446757  0.9949384
##   1        40     1.311054  0.5446757  0.9949384
##   1        41     1.311054  0.5446757  0.9949384
##   1        42     1.311054  0.5446757  0.9949384
##   1        43     1.311054  0.5446757  0.9949384
##   1        44     1.311054  0.5446757  0.9949384
##   1        45     1.311054  0.5446757  0.9949384
##   1        46     1.311054  0.5446757  0.9949384
##   1        47     1.311054  0.5446757  0.9949384
##   1        48     1.311054  0.5446757  0.9949384
##   1        49     1.311054  0.5446757  0.9949384
##   1        50     1.311054  0.5446757  0.9949384
##   1        51     1.311054  0.5446757  0.9949384
##   1        52     1.311054  0.5446757  0.9949384
##   1        53     1.311054  0.5446757  0.9949384
##   1        54     1.311054  0.5446757  0.9949384
##   1        55     1.311054  0.5446757  0.9949384
##   1        56     1.311054  0.5446757  0.9949384
##   1        57     1.311054  0.5446757  0.9949384
##   1        58     1.311054  0.5446757  0.9949384
##   1        59     1.311054  0.5446757  0.9949384
##   1        60     1.311054  0.5446757  0.9949384
##   1        61     1.311054  0.5446757  0.9949384
##   1        62     1.311054  0.5446757  0.9949384
##   1        63     1.311054  0.5446757  0.9949384
##   1        64     1.311054  0.5446757  0.9949384
##   1        65     1.311054  0.5446757  0.9949384
##   1        66     1.311054  0.5446757  0.9949384
##   1        67     1.311054  0.5446757  0.9949384
##   1        68     1.311054  0.5446757  0.9949384
##   1        69     1.311054  0.5446757  0.9949384
##   1        70     1.311054  0.5446757  0.9949384
##   1        71     1.311054  0.5446757  0.9949384
##   1        72     1.311054  0.5446757  0.9949384
##   1        73     1.311054  0.5446757  0.9949384
##   1        74     1.311054  0.5446757  0.9949384
##   1        75     1.311054  0.5446757  0.9949384
##   1        76     1.311054  0.5446757  0.9949384
##   1        77     1.311054  0.5446757  0.9949384
##   1        78     1.311054  0.5446757  0.9949384
##   1        79     1.311054  0.5446757  0.9949384
##   1        80     1.311054  0.5446757  0.9949384
##   1        81     1.311054  0.5446757  0.9949384
##   1        82     1.311054  0.5446757  0.9949384
##   1        83     1.311054  0.5446757  0.9949384
##   1        84     1.311054  0.5446757  0.9949384
##   1        85     1.311054  0.5446757  0.9949384
##   1        86     1.311054  0.5446757  0.9949384
##   1        87     1.311054  0.5446757  0.9949384
##   1        88     1.311054  0.5446757  0.9949384
##   1        89     1.311054  0.5446757  0.9949384
##   1        90     1.311054  0.5446757  0.9949384
##   1        91     1.311054  0.5446757  0.9949384
##   1        92     1.311054  0.5446757  0.9949384
##   1        93     1.311054  0.5446757  0.9949384
##   1        94     1.311054  0.5446757  0.9949384
##   1        95     1.311054  0.5446757  0.9949384
##   1        96     1.311054  0.5446757  0.9949384
##   1        97     1.311054  0.5446757  0.9949384
##   1        98     1.311054  0.5446757  0.9949384
##   1        99     1.311054  0.5446757  0.9949384
##   1       100     1.311054  0.5446757  0.9949384
##   2         2     1.436075  0.4577748  1.1408022
##   2         3     1.208087  0.5969342  0.9862939
##   2         4     1.172703  0.5917301  0.9470231
##   2         5     1.174369  0.5826173  0.9531073
##   2         6     1.220553  0.5562528  0.9801425
##   2         7     1.239515  0.5637542  0.9767852
##   2         8     1.227054  0.5592244  0.9723114
##   2         9     1.224300  0.5785571  0.9752012
##   2        10     1.235363  0.5566674  0.9961565
##   2        11     1.252294  0.5527595  1.0096506
##   2        12     1.224109  0.5791192  0.9812716
##   2        13     1.482177  0.5807626  1.0659295
##   2        14     1.517563  0.5774507  1.0878785
##   2        15     1.562717  0.5644419  1.1173774
##   2        16     1.585096  0.5553231  1.1395206
##   2        17     1.691484  0.5585704  1.1843662
##   2        18     1.701229  0.5588211  1.1878379
##   2        19     1.707018  0.5682174  1.1889003
##   2        20     1.741695  0.5635167  1.2296257
##   2        21     1.798015  0.5624517  1.2531234
##   2        22     1.796379  0.5632940  1.2566458
##   2        23     2.062025  0.5620930  1.3718895
##   2        24     2.093221  0.5621082  1.3896171
##   2        25     2.158256  0.5620288  1.4209758
##   2        26     2.158256  0.5620288  1.4209758
##   2        27     2.179565  0.5619943  1.4329294
##   2        28     2.179565  0.5619943  1.4329294
##   2        29     2.179565  0.5619943  1.4329294
##   2        30     2.179565  0.5619943  1.4329294
##   2        31     2.179565  0.5619943  1.4329294
##   2        32     2.179565  0.5619943  1.4329294
##   2        33     2.179565  0.5619943  1.4329294
##   2        34     2.179565  0.5619943  1.4329294
##   2        35     2.179565  0.5619943  1.4329294
##   2        36     2.179565  0.5619943  1.4329294
##   2        37     2.179565  0.5619943  1.4329294
##   2        38     2.179565  0.5619943  1.4329294
##   2        39     2.179565  0.5619943  1.4329294
##   2        40     2.179565  0.5619943  1.4329294
##   2        41     2.179565  0.5619943  1.4329294
##   2        42     2.179565  0.5619943  1.4329294
##   2        43     2.179565  0.5619943  1.4329294
##   2        44     2.179565  0.5619943  1.4329294
##   2        45     2.179565  0.5619943  1.4329294
##   2        46     2.179565  0.5619943  1.4329294
##   2        47     2.179565  0.5619943  1.4329294
##   2        48     2.179565  0.5619943  1.4329294
##   2        49     2.179565  0.5619943  1.4329294
##   2        50     2.179565  0.5619943  1.4329294
##   2        51     2.179565  0.5619943  1.4329294
##   2        52     2.179565  0.5619943  1.4329294
##   2        53     2.179565  0.5619943  1.4329294
##   2        54     2.179565  0.5619943  1.4329294
##   2        55     2.179565  0.5619943  1.4329294
##   2        56     2.179565  0.5619943  1.4329294
##   2        57     2.179565  0.5619943  1.4329294
##   2        58     2.179565  0.5619943  1.4329294
##   2        59     2.179565  0.5619943  1.4329294
##   2        60     2.179565  0.5619943  1.4329294
##   2        61     2.179565  0.5619943  1.4329294
##   2        62     2.179565  0.5619943  1.4329294
##   2        63     2.179565  0.5619943  1.4329294
##   2        64     2.179565  0.5619943  1.4329294
##   2        65     2.179565  0.5619943  1.4329294
##   2        66     2.179565  0.5619943  1.4329294
##   2        67     2.179565  0.5619943  1.4329294
##   2        68     2.179565  0.5619943  1.4329294
##   2        69     2.179565  0.5619943  1.4329294
##   2        70     2.179565  0.5619943  1.4329294
##   2        71     2.179565  0.5619943  1.4329294
##   2        72     2.179565  0.5619943  1.4329294
##   2        73     2.179565  0.5619943  1.4329294
##   2        74     2.179565  0.5619943  1.4329294
##   2        75     2.179565  0.5619943  1.4329294
##   2        76     2.179565  0.5619943  1.4329294
##   2        77     2.179565  0.5619943  1.4329294
##   2        78     2.179565  0.5619943  1.4329294
##   2        79     2.179565  0.5619943  1.4329294
##   2        80     2.179565  0.5619943  1.4329294
##   2        81     2.179565  0.5619943  1.4329294
##   2        82     2.179565  0.5619943  1.4329294
##   2        83     2.179565  0.5619943  1.4329294
##   2        84     2.179565  0.5619943  1.4329294
##   2        85     2.179565  0.5619943  1.4329294
##   2        86     2.179565  0.5619943  1.4329294
##   2        87     2.179565  0.5619943  1.4329294
##   2        88     2.179565  0.5619943  1.4329294
##   2        89     2.179565  0.5619943  1.4329294
##   2        90     2.179565  0.5619943  1.4329294
##   2        91     2.179565  0.5619943  1.4329294
##   2        92     2.179565  0.5619943  1.4329294
##   2        93     2.179565  0.5619943  1.4329294
##   2        94     2.179565  0.5619943  1.4329294
##   2        95     2.179565  0.5619943  1.4329294
##   2        96     2.179565  0.5619943  1.4329294
##   2        97     2.179565  0.5619943  1.4329294
##   2        98     2.179565  0.5619943  1.4329294
##   2        99     2.179565  0.5619943  1.4329294
##   2       100     2.179565  0.5619943  1.4329294
##   3         2     1.436075  0.4577748  1.1408022
##   3         3     1.357914  0.5399819  1.0828529
##   3         4     1.324153  0.5406578  1.0577133
##   3         5     1.208242  0.6015042  0.9650823
##   3         6     1.245846  0.5631733  0.9926445
##   3         7     1.305641  0.5445680  1.0421780
##   3         8     1.293331  0.5598225  1.0159859
##   3         9     1.257642  0.5746593  0.9988200
##   3        10     1.302785  0.5647584  1.0042240
##   3        11     1.316448  0.5752579  1.0229778
##   3        12     1.358234  0.5565035  1.0296341
##   3        13     1.379716  0.5475452  1.0326861
##   3        14     1.393334  0.5474919  1.0527433
##   3        15     1.311340  0.5899059  1.0041141
##   3        16     1.319323  0.5807196  1.0151112
##   3        17     1.338858  0.5773579  1.0245406
##   3        18     1.396909  0.5552912  1.0534804
##   3        19     1.414198  0.5488446  1.0660197
##   3        20     1.434678  0.5366410  1.0664162
##   3        21     4.311283  0.4639792  1.8833911
##   3        22     4.616280  0.4532189  1.9875788
##   3        23     6.328470  0.4483235  2.4796212
##   3        24     6.436223  0.4474803  2.5422822
##   3        25     6.418235  0.4551389  2.5373112
##   3        26     6.398090  0.4639987  2.5237233
##   3        27     6.381858  0.4666206  2.5122745
##   3        28     6.387109  0.4655397  2.5182825
##   3        29     6.387109  0.4655397  2.5182825
##   3        30     6.387109  0.4655397  2.5182825
##   3        31     6.387109  0.4655397  2.5182825
##   3        32     6.387109  0.4655397  2.5182825
##   3        33     6.387109  0.4655397  2.5182825
##   3        34     6.387109  0.4655397  2.5182825
##   3        35     6.387109  0.4655397  2.5182825
##   3        36     6.387109  0.4655397  2.5182825
##   3        37     6.387109  0.4655397  2.5182825
##   3        38     6.387109  0.4655397  2.5182825
##   3        39     6.387109  0.4655397  2.5182825
##   3        40     6.387109  0.4655397  2.5182825
##   3        41     6.387109  0.4655397  2.5182825
##   3        42     6.387109  0.4655397  2.5182825
##   3        43     6.387109  0.4655397  2.5182825
##   3        44     6.387109  0.4655397  2.5182825
##   3        45     6.387109  0.4655397  2.5182825
##   3        46     6.387109  0.4655397  2.5182825
##   3        47     6.387109  0.4655397  2.5182825
##   3        48     6.387109  0.4655397  2.5182825
##   3        49     6.387109  0.4655397  2.5182825
##   3        50     6.387109  0.4655397  2.5182825
##   3        51     6.387109  0.4655397  2.5182825
##   3        52     6.387109  0.4655397  2.5182825
##   3        53     6.387109  0.4655397  2.5182825
##   3        54     6.387109  0.4655397  2.5182825
##   3        55     6.387109  0.4655397  2.5182825
##   3        56     6.387109  0.4655397  2.5182825
##   3        57     6.387109  0.4655397  2.5182825
##   3        58     6.387109  0.4655397  2.5182825
##   3        59     6.387109  0.4655397  2.5182825
##   3        60     6.387109  0.4655397  2.5182825
##   3        61     6.387109  0.4655397  2.5182825
##   3        62     6.387109  0.4655397  2.5182825
##   3        63     6.387109  0.4655397  2.5182825
##   3        64     6.387109  0.4655397  2.5182825
##   3        65     6.387109  0.4655397  2.5182825
##   3        66     6.387109  0.4655397  2.5182825
##   3        67     6.387109  0.4655397  2.5182825
##   3        68     6.387109  0.4655397  2.5182825
##   3        69     6.387109  0.4655397  2.5182825
##   3        70     6.387109  0.4655397  2.5182825
##   3        71     6.387109  0.4655397  2.5182825
##   3        72     6.387109  0.4655397  2.5182825
##   3        73     6.387109  0.4655397  2.5182825
##   3        74     6.387109  0.4655397  2.5182825
##   3        75     6.387109  0.4655397  2.5182825
##   3        76     6.387109  0.4655397  2.5182825
##   3        77     6.387109  0.4655397  2.5182825
##   3        78     6.387109  0.4655397  2.5182825
##   3        79     6.387109  0.4655397  2.5182825
##   3        80     6.387109  0.4655397  2.5182825
##   3        81     6.387109  0.4655397  2.5182825
##   3        82     6.387109  0.4655397  2.5182825
##   3        83     6.387109  0.4655397  2.5182825
##   3        84     6.387109  0.4655397  2.5182825
##   3        85     6.387109  0.4655397  2.5182825
##   3        86     6.387109  0.4655397  2.5182825
##   3        87     6.387109  0.4655397  2.5182825
##   3        88     6.387109  0.4655397  2.5182825
##   3        89     6.387109  0.4655397  2.5182825
##   3        90     6.387109  0.4655397  2.5182825
##   3        91     6.387109  0.4655397  2.5182825
##   3        92     6.387109  0.4655397  2.5182825
##   3        93     6.387109  0.4655397  2.5182825
##   3        94     6.387109  0.4655397  2.5182825
##   3        95     6.387109  0.4655397  2.5182825
##   3        96     6.387109  0.4655397  2.5182825
##   3        97     6.387109  0.4655397  2.5182825
##   3        98     6.387109  0.4655397  2.5182825
##   3        99     6.387109  0.4655397  2.5182825
##   3       100     6.387109  0.4655397  2.5182825
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 4 and degree = 2.

chem_mars_tuned$bestTune

##     nprune degree
## 102      4      2

chem_mars_tuned$finalModel

## Selected 4 of 47 terms, and 2 of 56 predictors
## Termination condition: RSq changed by less than 0.001 at 47 terms
## Importance: ManufacturingProcess32, ManufacturingProcess13, ...
## Number of terms at each degree of interaction: 1 3 (additive model)
## GCV 1.47066    RSS 187.5118    GRSq 0.5928938    RSq 0.6344774

ggplot(chem_mars_tuned)

plotmo(chem_mars_tuned)

##  plotmo grid:    BiologicalMaterial01 BiologicalMaterial02 BiologicalMaterial03
##                            -0.1070431          -0.04306519           -0.1062217
##  BiologicalMaterial04 BiologicalMaterial05 BiologicalMaterial06
##           -0.07283723         -0.004683137          -0.09620684
##  BiologicalMaterial08 BiologicalMaterial09 BiologicalMaterial10
##               0.06681          -0.04830923           -0.1178766
##  BiologicalMaterial11 BiologicalMaterial12 ManufacturingProcess01
##            -0.1012727          -0.03863564              0.1056672
##  ManufacturingProcess02 ManufacturingProcess03 ManufacturingProcess04
##               0.5096271              0.1087038              0.3424324
##  ManufacturingProcess05 ManufacturingProcess06 ManufacturingProcess07
##             -0.06529069             -0.2229103              0.4390925
##  ManufacturingProcess08 ManufacturingProcess09 ManufacturingProcess10
##               0.8941637              0.1066231             -0.1030952
##  ManufacturingProcess11 ManufacturingProcess12 ManufacturingProcess13
##              0.02020002             -0.4806937           -0.007834829
##  ManufacturingProcess14 ManufacturingProcess15 ManufacturingProcess16
##              0.04826216            -0.09295527             0.06169755
##  ManufacturingProcess17 ManufacturingProcess18 ManufacturingProcess19
##             0.005007187             0.06617593             -0.1360039
##  ManufacturingProcess20 ManufacturingProcess21 ManufacturingProcess22
##               0.0688801             -0.1744786             -0.1218132
##  ManufacturingProcess23 ManufacturingProcess24 ManufacturingProcess25
##             -0.01031118             -0.1438567              0.0651293
##  ManufacturingProcess26 ManufacturingProcess27 ManufacturingProcess28
##              0.06432695             0.06918722              0.7255096
##  ManufacturingProcess29 ManufacturingProcess30 ManufacturingProcess31
##              -0.0066778             0.03954225             0.09273307
##  ManufacturingProcess32 ManufacturingProcess33 ManufacturingProcess34
##             -0.08632349              0.1836771              0.1182687
##  ManufacturingProcess35 ManufacturingProcess36 ManufacturingProcess37
##             -0.05513017              0.4884872            -0.03063781
##  ManufacturingProcess38 ManufacturingProcess39 ManufacturingProcess40
##               0.7174727               0.231727             -0.4626528
##  ManufacturingProcess41 ManufacturingProcess42 ManufacturingProcess43
##              -0.4405878              0.2027957             -0.1289558
##  ManufacturingProcess44 ManufacturingProcess45
##               0.2946725              0.1522024

(b) Which predictors are most important in the optimal nonlinear regression model?

Do either the biological or process variables dominate the list? How do the top ten important predictors compare to the top ten predictors from the optimal linear model?

varImp(chem_mars_tuned)

## earth variable importance
## 
##   only 20 most important variables shown (out of 56)
## 
##                        Overall
## ManufacturingProcess32   100.0
## ManufacturingProcess13    58.9
## BiologicalMaterial02       0.0
## BiologicalMaterial11       0.0
## ManufacturingProcess31     0.0
## ManufacturingProcess44     0.0
## ManufacturingProcess25     0.0
## BiologicalMaterial09       0.0
## ManufacturingProcess39     0.0
## ManufacturingProcess42     0.0
## ManufacturingProcess30     0.0
## BiologicalMaterial12       0.0
## ManufacturingProcess14     0.0
## BiologicalMaterial04       0.0
## ManufacturingProcess45     0.0
## ManufacturingProcess35     0.0
## BiologicalMaterial10       0.0
## BiologicalMaterial06       0.0
## ManufacturingProcess06     0.0
## ManufacturingProcess43     0.0

plot(varImp(chem_mars_tuned))

ridgeGrid <- data.frame(.lambda = seq(0, .1, length = 15))
set.seed(101)

ridgeRegFit <- train(Xtrain.data, Ytrain.data, method = "ridge", tuneGrid = ridgeGrid, trControl = trainControl(method = "cv", number = 10))

varImp(ridgeRegFit)

## loess r-squared variable importance
## 
##   only 20 most important variables shown (out of 56)
## 
##                        Overall
## ManufacturingProcess13  100.00
## ManufacturingProcess32   97.82
## ManufacturingProcess17   86.84
## BiologicalMaterial06     83.64
## BiologicalMaterial03     78.13
## ManufacturingProcess09   72.37
## BiologicalMaterial12     72.20
## ManufacturingProcess36   70.51
## BiologicalMaterial02     63.10
## ManufacturingProcess06   61.88
## ManufacturingProcess31   58.39
## BiologicalMaterial11     56.85
## ManufacturingProcess33   47.06
## ManufacturingProcess11   45.94
## BiologicalMaterial04     45.43
## ManufacturingProcess29   44.71
## BiologicalMaterial08     44.36
## ManufacturingProcess12   38.22
## BiologicalMaterial01     35.56
## BiologicalMaterial09     33.79

predictions <- ridgeRegFit %>% predict(xtest.data)

cbind(
  RMSE = RMSE(predictions, ytest.data),
  R_squared = caret::R2(predictions, ytest.data)
)

##         RMSE R_squared
## [1,] 1.06489 0.5657402

plot(varImp(ridgeRegFit))

Both of the non-linear and linear optimal models have the manufacturing predictors as most important variables. However, the MARS non-linear model only considers the manufacturing process as important with ManufacturingProcess32 first then ManufacturingProcess13. The linear model, on the other hand, has those two predictors as important but the other way around and also considers biological predictors in it’s top 10.

(c) Explore the relationships between the top predictors and the response

for the predictors are unique to the optimal nonlinear regression model. Do these plots reveal intuition about the biological or process predictors and their relationship with yield?

cor(yield, ChemicalManufacturingProcess$ManufacturingProcess32)

##           [,1]
## [1,] 0.6083321

cor(yield, ChemicalManufacturingProcess$ManufacturingProcess13)

##            [,1]
## [1,] -0.5036797

Manufacturing proceses are possibly the steps taken to create the end product graded by a rate. Since only manufacturing processes are the most important in this model we can infer that with ManufacturingProcess32there is a positive correlation here which make sense. If the process is great then the product will be good. On the other hand, the outcome variable has a negative correlation with the ManufacturingProcess13 which means that if the process goes bad, then the product will not be at it’s best.

DATA624 Homework 8

Javern Wilson

4/20/2020

7.2

Which models appear to give the best performance?

Build, Tune and Explore Models

Multivariate Adaptive Regression Splines (MARS)

Variable Importance

Neural Networks (NNET)

Support Vector Machines (SVM)

K-Nearest Neighbors (KNN)

Evaluation

7.5

Preprocessing

(a) Which nonlinear regression model gives the optimal resampling and test set performance?

NNET

MARS

SVM

KNN

Evaluation

(b) Which predictors are most important in the optimal nonlinear regression model?

(c) Explore the relationships between the top predictors and the response