HW8DATA624

Author

Semyon Toybis

Assignment

We are required to complete questions 7.2 and 7.5 from chapter 7 of “Applied Predictive Modeling” by Max Kuhn and Kjell Johnson.

7.2

We are tasked with tuning models on simulated data via the Friedman1 function from the mlbench package. Below, I use the code provided in the book to create the data and tune models.

The below code creates 200 data points with 10 x variables and one y variable

set.seed(200)
trainingData <- mlbench.friedman1(200, sd = 1)
trainingData$x <- data.frame(trainingData$x)
featurePlot(trainingData$x, trainingData$y)

Next we create test data

testData <- mlbench.friedman1(5000, sd = 1)
testData$x <- data.frame(testData$x)

Next, we tune models on the data:

knnModel <- train(x = trainingData$x,
                  y = trainingData$y,
                  method = 'knn',
                  preProcess = c('center', 'scale'),
                  tuneLength = 10)

knnModel
k-Nearest Neighbors 

200 samples
 10 predictor

Pre-processing: centered (10), scaled (10) 
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
Resampling results across tuning parameters:

  k   RMSE      Rsquared   MAE     
   5  3.466085  0.5121775  2.816838
   7  3.349428  0.5452823  2.727410
   9  3.264276  0.5785990  2.660026
  11  3.214216  0.6024244  2.603767
  13  3.196510  0.6176570  2.591935
  15  3.184173  0.6305506  2.577482
  17  3.183130  0.6425367  2.567787
  19  3.198752  0.6483184  2.592683
  21  3.188993  0.6611428  2.588787
  23  3.200458  0.6638353  2.604529

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was k = 17.

Next, we use the final model (with k = 17) to predict data in the test set:

knnPred <- predict(knnModel, newdata = testData$x)
postResample(pred = knnPred, obs = testData$y)
     RMSE  Rsquared       MAE 
3.2040595 0.6819919 2.5683461 

MARS Model

Next, I try fitting a MARS model on the data.

set.seed(100)
marsGrid <- expand.grid(.degree = 1:3, .nprune = 2:50)
marsModel <- train(x = trainingData$x,
                   y = trainingData$y,
                   method = 'earth',
                   preProcess = c('center', 'scale'),
                   tuneGrid = marsGrid,
                   trControl = trainControl(method = "cv", number = 10))

marsModel
Multivariate Adaptive Regression Spline 

200 samples
 10 predictor

Pre-processing: centered (10), scaled (10) 
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
Resampling results across tuning parameters:

  degree  nprune  RMSE      Rsquared   MAE      
  1        2      4.327937  0.2544880  3.6004742
  1        3      3.572450  0.4912720  2.8958113
  1        4      2.596841  0.7183600  2.1063410
  1        5      2.370161  0.7659777  1.9186686
  1        6      2.276141  0.7881481  1.8100006
  1        7      1.766728  0.8751831  1.3902146
  1        8      1.780946  0.8723243  1.4013449
  1        9      1.665091  0.8819775  1.3255147
  1       10      1.663804  0.8821283  1.3276573
  1       11      1.657738  0.8822967  1.3317299
  1       12      1.653784  0.8827903  1.3315041
  1       13      1.648496  0.8823663  1.3164065
  1       14      1.639073  0.8841742  1.3128329
  1       15      1.639073  0.8841742  1.3128329
  1       16      1.639073  0.8841742  1.3128329
  1       17      1.639073  0.8841742  1.3128329
  1       18      1.639073  0.8841742  1.3128329
  1       19      1.639073  0.8841742  1.3128329
  1       20      1.639073  0.8841742  1.3128329
  1       21      1.639073  0.8841742  1.3128329
  1       22      1.639073  0.8841742  1.3128329
  1       23      1.639073  0.8841742  1.3128329
  1       24      1.639073  0.8841742  1.3128329
  1       25      1.639073  0.8841742  1.3128329
  1       26      1.639073  0.8841742  1.3128329
  1       27      1.639073  0.8841742  1.3128329
  1       28      1.639073  0.8841742  1.3128329
  1       29      1.639073  0.8841742  1.3128329
  1       30      1.639073  0.8841742  1.3128329
  1       31      1.639073  0.8841742  1.3128329
  1       32      1.639073  0.8841742  1.3128329
  1       33      1.639073  0.8841742  1.3128329
  1       34      1.639073  0.8841742  1.3128329
  1       35      1.639073  0.8841742  1.3128329
  1       36      1.639073  0.8841742  1.3128329
  1       37      1.639073  0.8841742  1.3128329
  1       38      1.639073  0.8841742  1.3128329
  1       39      1.639073  0.8841742  1.3128329
  1       40      1.639073  0.8841742  1.3128329
  1       41      1.639073  0.8841742  1.3128329
  1       42      1.639073  0.8841742  1.3128329
  1       43      1.639073  0.8841742  1.3128329
  1       44      1.639073  0.8841742  1.3128329
  1       45      1.639073  0.8841742  1.3128329
  1       46      1.639073  0.8841742  1.3128329
  1       47      1.639073  0.8841742  1.3128329
  1       48      1.639073  0.8841742  1.3128329
  1       49      1.639073  0.8841742  1.3128329
  1       50      1.639073  0.8841742  1.3128329
  2        2      4.327937  0.2544880  3.6004742
  2        3      3.572450  0.4912720  2.8958113
  2        4      2.661826  0.7070510  2.1734709
  2        5      2.404015  0.7578971  1.9753867
  2        6      2.243927  0.7914805  1.7830717
  2        7      1.856336  0.8605482  1.4356822
  2        8      1.754607  0.8763186  1.3968406
  2        9      1.653859  0.8870129  1.2813884
  2       10      1.434159  0.9166537  1.1339203
  2       11      1.320482  0.9289120  1.0347278
  2       12      1.317547  0.9306879  1.0359899
  2       13      1.296910  0.9306902  1.0146112
  2       14      1.221407  0.9395223  0.9631486
  2       15      1.230516  0.9390469  0.9761484
  2       16      1.236911  0.9387407  0.9745362
  2       17      1.236911  0.9387407  0.9745362
  2       18      1.236911  0.9387407  0.9745362
  2       19      1.236911  0.9387407  0.9745362
  2       20      1.236911  0.9387407  0.9745362
  2       21      1.236911  0.9387407  0.9745362
  2       22      1.236911  0.9387407  0.9745362
  2       23      1.236911  0.9387407  0.9745362
  2       24      1.236911  0.9387407  0.9745362
  2       25      1.236911  0.9387407  0.9745362
  2       26      1.236911  0.9387407  0.9745362
  2       27      1.236911  0.9387407  0.9745362
  2       28      1.236911  0.9387407  0.9745362
  2       29      1.236911  0.9387407  0.9745362
  2       30      1.236911  0.9387407  0.9745362
  2       31      1.236911  0.9387407  0.9745362
  2       32      1.236911  0.9387407  0.9745362
  2       33      1.236911  0.9387407  0.9745362
  2       34      1.236911  0.9387407  0.9745362
  2       35      1.236911  0.9387407  0.9745362
  2       36      1.236911  0.9387407  0.9745362
  2       37      1.236911  0.9387407  0.9745362
  2       38      1.236911  0.9387407  0.9745362
  2       39      1.236911  0.9387407  0.9745362
  2       40      1.236911  0.9387407  0.9745362
  2       41      1.236911  0.9387407  0.9745362
  2       42      1.236911  0.9387407  0.9745362
  2       43      1.236911  0.9387407  0.9745362
  2       44      1.236911  0.9387407  0.9745362
  2       45      1.236911  0.9387407  0.9745362
  2       46      1.236911  0.9387407  0.9745362
  2       47      1.236911  0.9387407  0.9745362
  2       48      1.236911  0.9387407  0.9745362
  2       49      1.236911  0.9387407  0.9745362
  2       50      1.236911  0.9387407  0.9745362
  3        2      4.327937  0.2544880  3.6004742
  3        3      3.572450  0.4912720  2.8958113
  3        4      2.661826  0.7070510  2.1734709
  3        5      2.404015  0.7578971  1.9753867
  3        6      2.258530  0.7888892  1.7954652
  3        7      1.850728  0.8620159  1.4273124
  3        8      1.751759  0.8768118  1.3917401
  3        9      1.659166  0.8866163  1.2790604
  3       10      1.443606  0.9158100  1.1243194
  3       11      1.339761  0.9276771  1.0405949
  3       12      1.320350  0.9307765  1.0276609
  3       13      1.301929  0.9303287  1.0151967
  3       14      1.253136  0.9346362  0.9746054
  3       15      1.267729  0.9329718  0.9811453
  3       16      1.274882  0.9327201  0.9851629
  3       17      1.280571  0.9324050  0.9937222
  3       18      1.280823  0.9322527  0.9964978
  3       19      1.283637  0.9318523  1.0049435
  3       20      1.283637  0.9318523  1.0049435
  3       21      1.283637  0.9318523  1.0049435
  3       22      1.283637  0.9318523  1.0049435
  3       23      1.283637  0.9318523  1.0049435
  3       24      1.283637  0.9318523  1.0049435
  3       25      1.283637  0.9318523  1.0049435
  3       26      1.283637  0.9318523  1.0049435
  3       27      1.283637  0.9318523  1.0049435
  3       28      1.283637  0.9318523  1.0049435
  3       29      1.283637  0.9318523  1.0049435
  3       30      1.283637  0.9318523  1.0049435
  3       31      1.283637  0.9318523  1.0049435
  3       32      1.283637  0.9318523  1.0049435
  3       33      1.283637  0.9318523  1.0049435
  3       34      1.283637  0.9318523  1.0049435
  3       35      1.283637  0.9318523  1.0049435
  3       36      1.283637  0.9318523  1.0049435
  3       37      1.283637  0.9318523  1.0049435
  3       38      1.283637  0.9318523  1.0049435
  3       39      1.283637  0.9318523  1.0049435
  3       40      1.283637  0.9318523  1.0049435
  3       41      1.283637  0.9318523  1.0049435
  3       42      1.283637  0.9318523  1.0049435
  3       43      1.283637  0.9318523  1.0049435
  3       44      1.283637  0.9318523  1.0049435
  3       45      1.283637  0.9318523  1.0049435
  3       46      1.283637  0.9318523  1.0049435
  3       47      1.283637  0.9318523  1.0049435
  3       48      1.283637  0.9318523  1.0049435
  3       49      1.283637  0.9318523  1.0049435
  3       50      1.283637  0.9318523  1.0049435

RMSE was used to select the optimal model using the smallest value.
The final values used for the model were nprune = 14 and degree = 2.

The model with the lowest RMSE was with 2 degrees of interaction and 14 terms

ggplot(marsModel)

summary(marsModel)
Call: earth(x=data.frame[200,10], y=c(18.46,16.1,17...), keepxy=TRUE, degree=2,
            nprune=14)

                                  coefficients
(Intercept)                         22.0339290
h(0.507267-X1)                      -4.5157241
h(X1-0.507267)                       2.6841329
h(0.325504-X2)                      -5.4438679
h(-0.216741-X3)                      3.3683600
h(X3- -0.216741)                     2.0371575
h(0.953812-X4)                      -2.7853388
h(X4-0.953812)                       2.7366442
h(1.17878-X5)                       -1.5636213
h(X1- -0.951872) * h(X2-0.325504)   -0.7790969
h(X1-0.507267) * h(X2- -0.798188)   -2.6276789
h(0.606835-X1) * h(0.325504-X2)      2.1773145
h(0.325504-X2) * h(X3-0.795427)      1.7739671
h(X2-0.325504) * h(X3- -0.917499)    0.5726623

Selected 14 of 21 terms, and 5 of 10 predictors (nprune=14)
Termination condition: Reached nk 21
Importance: X1, X4, X2, X5, X3, X6-unused, X7-unused, X8-unused, X9-unused, ...
Number of terms at each degree of interaction: 1 8 5
GCV 1.841667    RSS 255.2757    GRSq 0.9252276    RSq 0.9476564

Next, I try predicting the test data with the MARS model:

MARSPred <- predict(marsModel, newdata = testData$x)
postResample(pred = MARSPred, obs = testData$y)
     RMSE  Rsquared       MAE 
1.2779993 0.9338365 1.0147070 

SVM

Next, I try fitting an SVM model:

set.seed(225)
svmModel <- train(x = trainingData$x,
                   y = trainingData$y,
                   method = 'svmRadial',
                   preProcess = c('center', 'scale'),
                   tuneLength =  14,
                   trControl = trainControl(method = "cv", number = 10))

svmModel
Support Vector Machines with Radial Basis Function Kernel 

200 samples
 10 predictor

Pre-processing: centered (10), scaled (10) 
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
Resampling results across tuning parameters:

  C        RMSE      Rsquared   MAE     
     0.25  2.499701  0.7970965  1.996951
     0.50  2.238097  0.8115233  1.792365
     1.00  2.049451  0.8324583  1.639456
     2.00  1.961412  0.8444204  1.559239
     4.00  1.870907  0.8571727  1.498361
     8.00  1.836778  0.8629510  1.472100
    16.00  1.849663  0.8613122  1.482504
    32.00  1.849663  0.8613122  1.482504
    64.00  1.849663  0.8613122  1.482504
   128.00  1.849663  0.8613122  1.482504
   256.00  1.849663  0.8613122  1.482504
   512.00  1.849663  0.8613122  1.482504
  1024.00  1.849663  0.8613122  1.482504
  2048.00  1.849663  0.8613122  1.482504

Tuning parameter 'sigma' was held constant at a value of 0.06254786
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were sigma = 0.06254786 and C = 8.
plot(svmModel)

Next, I try predicting with the SVM model:

svmPred <- predict(svmModel, newdata = testData$x)
postResample(pred = svmPred, obs = testData$y)
     RMSE  Rsquared       MAE 
2.0523643 0.8293154 1.5571606 

The MARS model performs better than the KNN model and SVM model, as it has both a lower RMSE and a (much) higher R squared. The MARS model did select the informative predictors (X1-X5).

Below is a plot of the variable importance:

plot(varImp(marsModel), main = 'Variable Importance')

7.5

We are tasked with fitting non-linear models on the chemical manufacturing data.

data("ChemicalManufacturingProcess")

There are NAs in the data set:

sum(is.na(ChemicalManufacturingProcess))
[1] 106

Below I impute missing values using the k-nearest neighbors approach. I will use the KNN function from the VIM package. The function adds columns which state whether a value was imputed to the right of the original data. I will drop these columns from the data frame.

chem_data <- kNN(ChemicalManufacturingProcess)
chem_data <- chem_data[,1:ncol(ChemicalManufacturingProcess)]

Next, I split the data into a training and test set:

set.seed(10)
chem_trainIndex <- createDataPartition(chem_data$Yield, p = 0.8, list = FALSE)

chem_trainData <- chem_data[chem_trainIndex,]
chem_testData <- chem_data[-chem_trainIndex,]

KNN model

First, I will tune a KNN model on the data:

set.seed(210)
knnChemModel <- train(x = chem_trainData[,-1],
                  y = chem_trainData$Yield,
                  method = 'knn',
                  preProcess = c('center', 'scale'),
                  tuneGrid = data.frame(.k=1:20),
                  trControl = trainControl(method = "cv", number = 10))

knnChemModel
k-Nearest Neighbors 

144 samples
 57 predictor

Pre-processing: centered (57), scaled (57) 
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 128, 130, 130, 129, 130, 130, ... 
Resampling results across tuning parameters:

  k   RMSE      Rsquared   MAE     
   1  1.465714  0.4465654  1.116089
   2  1.228056  0.5717160  1.013607
   3  1.277520  0.5193685  1.035558
   4  1.277025  0.5402453  1.023845
   5  1.286623  0.5393261  1.032174
   6  1.306640  0.5267678  1.044836
   7  1.326741  0.5246410  1.062796
   8  1.336258  0.5251640  1.078861
   9  1.358468  0.5042863  1.092338
  10  1.342232  0.5279743  1.077845
  11  1.346397  0.5282144  1.094812
  12  1.349239  0.5271784  1.092292
  13  1.362739  0.5157005  1.100105
  14  1.372961  0.5046815  1.101857
  15  1.399120  0.4806478  1.112901
  16  1.398594  0.4840723  1.114808
  17  1.412197  0.4758792  1.126821
  18  1.427435  0.4660139  1.140383
  19  1.428054  0.4671728  1.135843
  20  1.435325  0.4611414  1.145407

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was k = 2.
plot(knnChemModel)

Next, I try predicting the test set with the KNN model:

knnChemPred <- predict(knnChemModel, newdata = chem_testData[,-1])
postResample(pred = knnChemPred, obs = chem_testData$Yield)
     RMSE  Rsquared       MAE 
1.5622837 0.2797031 1.2695312 

MARS model

Next, I tune a MARS model:

set.seed(215)
marsChemGrid <- expand.grid(.degree = 1:3, .nprune = 2:100)
marsChemModel <- train(x = chem_trainData[,-1],
                   y = chem_trainData$Yield,
                   method = 'earth',
                   preProcess = c('center', 'scale'),
                   tuneGrid = marsChemGrid,
                   trControl = trainControl(method = "cv", number = 10))
marsChemModel
Multivariate Adaptive Regression Spline 

144 samples
 57 predictor

Pre-processing: centered (57), scaled (57) 
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 129, 129, 128, 131, 130, 129, ... 
Resampling results across tuning parameters:

  degree  nprune  RMSE      Rsquared   MAE      
  1         2     1.397656  0.4796494  1.0945946
  1         3     1.175481  0.6259761  0.9526671
  1         4     1.211015  0.6036202  0.9718720
  1         5     1.250996  0.5842997  1.0087666
  1         6     2.596217  0.4764189  1.4077280
  1         7     2.645785  0.4428147  1.4283295
  1         8     2.912071  0.4227154  1.5171069
  1         9     2.769404  0.4223613  1.5086877
  1        10     2.719705  0.4170912  1.5176289
  1        11     2.747755  0.4140453  1.5183338
  1        12     2.432505  0.4160484  1.4308117
  1        13     2.395924  0.4136836  1.4295424
  1        14     2.372032  0.4223269  1.4183484
  1        15     2.375673  0.4197655  1.4198212
  1        16     2.377044  0.4187515  1.4287127
  1        17     2.377044  0.4187515  1.4287127
  1        18     2.377044  0.4187515  1.4287127
  1        19     2.377044  0.4187515  1.4287127
  1        20     2.377044  0.4187515  1.4287127
  1        21     2.377044  0.4187515  1.4287127
  1        22     2.377044  0.4187515  1.4287127
  1        23     2.377044  0.4187515  1.4287127
  1        24     2.377044  0.4187515  1.4287127
  1        25     2.377044  0.4187515  1.4287127
  1        26     2.377044  0.4187515  1.4287127
  1        27     2.377044  0.4187515  1.4287127
  1        28     2.377044  0.4187515  1.4287127
  1        29     2.377044  0.4187515  1.4287127
  1        30     2.377044  0.4187515  1.4287127
  1        31     2.377044  0.4187515  1.4287127
  1        32     2.377044  0.4187515  1.4287127
  1        33     2.377044  0.4187515  1.4287127
  1        34     2.377044  0.4187515  1.4287127
  1        35     2.377044  0.4187515  1.4287127
  1        36     2.377044  0.4187515  1.4287127
  1        37     2.377044  0.4187515  1.4287127
  1        38     2.377044  0.4187515  1.4287127
  1        39     2.377044  0.4187515  1.4287127
  1        40     2.377044  0.4187515  1.4287127
  1        41     2.377044  0.4187515  1.4287127
  1        42     2.377044  0.4187515  1.4287127
  1        43     2.377044  0.4187515  1.4287127
  1        44     2.377044  0.4187515  1.4287127
  1        45     2.377044  0.4187515  1.4287127
  1        46     2.377044  0.4187515  1.4287127
  1        47     2.377044  0.4187515  1.4287127
  1        48     2.377044  0.4187515  1.4287127
  1        49     2.377044  0.4187515  1.4287127
  1        50     2.377044  0.4187515  1.4287127
  1        51     2.377044  0.4187515  1.4287127
  1        52     2.377044  0.4187515  1.4287127
  1        53     2.377044  0.4187515  1.4287127
  1        54     2.377044  0.4187515  1.4287127
  1        55     2.377044  0.4187515  1.4287127
  1        56     2.377044  0.4187515  1.4287127
  1        57     2.377044  0.4187515  1.4287127
  1        58     2.377044  0.4187515  1.4287127
  1        59     2.377044  0.4187515  1.4287127
  1        60     2.377044  0.4187515  1.4287127
  1        61     2.377044  0.4187515  1.4287127
  1        62     2.377044  0.4187515  1.4287127
  1        63     2.377044  0.4187515  1.4287127
  1        64     2.377044  0.4187515  1.4287127
  1        65     2.377044  0.4187515  1.4287127
  1        66     2.377044  0.4187515  1.4287127
  1        67     2.377044  0.4187515  1.4287127
  1        68     2.377044  0.4187515  1.4287127
  1        69     2.377044  0.4187515  1.4287127
  1        70     2.377044  0.4187515  1.4287127
  1        71     2.377044  0.4187515  1.4287127
  1        72     2.377044  0.4187515  1.4287127
  1        73     2.377044  0.4187515  1.4287127
  1        74     2.377044  0.4187515  1.4287127
  1        75     2.377044  0.4187515  1.4287127
  1        76     2.377044  0.4187515  1.4287127
  1        77     2.377044  0.4187515  1.4287127
  1        78     2.377044  0.4187515  1.4287127
  1        79     2.377044  0.4187515  1.4287127
  1        80     2.377044  0.4187515  1.4287127
  1        81     2.377044  0.4187515  1.4287127
  1        82     2.377044  0.4187515  1.4287127
  1        83     2.377044  0.4187515  1.4287127
  1        84     2.377044  0.4187515  1.4287127
  1        85     2.377044  0.4187515  1.4287127
  1        86     2.377044  0.4187515  1.4287127
  1        87     2.377044  0.4187515  1.4287127
  1        88     2.377044  0.4187515  1.4287127
  1        89     2.377044  0.4187515  1.4287127
  1        90     2.377044  0.4187515  1.4287127
  1        91     2.377044  0.4187515  1.4287127
  1        92     2.377044  0.4187515  1.4287127
  1        93     2.377044  0.4187515  1.4287127
  1        94     2.377044  0.4187515  1.4287127
  1        95     2.377044  0.4187515  1.4287127
  1        96     2.377044  0.4187515  1.4287127
  1        97     2.377044  0.4187515  1.4287127
  1        98     2.377044  0.4187515  1.4287127
  1        99     2.377044  0.4187515  1.4287127
  1       100     2.377044  0.4187515  1.4287127
  2         2     1.397656  0.4796494  1.0945946
  2         3     1.340348  0.5209794  1.0631839
  2         4     1.266138  0.5662456  1.0098808
  2         5     1.249357  0.5916629  0.9692782
  2         6     1.289490  0.5910860  0.9982445
  2         7     1.308424  0.5886979  1.0096864
  2         8     1.292601  0.6021639  1.0138807
  2         9     1.351036  0.5880971  1.0486129
  2        10     1.517781  0.5523609  1.1008885
  2        11     1.513767  0.5611558  1.0919718
  2        12     1.573704  0.5268401  1.1442936
  2        13     1.527645  0.5537503  1.1066168
  2        14     1.592421  0.5664257  1.1111685
  2        15     1.560927  0.5768300  1.0956702
  2        16     1.721506  0.5571156  1.1260366
  2        17     1.816464  0.5702275  1.1714685
  2        18     1.797507  0.5638597  1.1788806
  2        19     1.792913  0.5605208  1.1756709
  2        20     1.752868  0.5752170  1.1511954
  2        21     1.769578  0.5748549  1.1496129
  2        22     1.770170  0.5733124  1.1550279
  2        23     1.774019  0.5675947  1.1584076
  2        24     1.769759  0.5672906  1.1540906
  2        25     1.783348  0.5639166  1.1623984
  2        26     1.804384  0.5577609  1.1796524
  2        27     1.808440  0.5554705  1.1864492
  2        28     1.808440  0.5554705  1.1864492
  2        29     1.808440  0.5554705  1.1864492
  2        30     1.808440  0.5554705  1.1864492
  2        31     1.808440  0.5554705  1.1864492
  2        32     1.808440  0.5554705  1.1864492
  2        33     1.808440  0.5554705  1.1864492
  2        34     1.808440  0.5554705  1.1864492
  2        35     1.808440  0.5554705  1.1864492
  2        36     1.808440  0.5554705  1.1864492
  2        37     1.808440  0.5554705  1.1864492
  2        38     1.808440  0.5554705  1.1864492
  2        39     1.808440  0.5554705  1.1864492
  2        40     1.808440  0.5554705  1.1864492
  2        41     1.808440  0.5554705  1.1864492
  2        42     1.808440  0.5554705  1.1864492
  2        43     1.808440  0.5554705  1.1864492
  2        44     1.808440  0.5554705  1.1864492
  2        45     1.808440  0.5554705  1.1864492
  2        46     1.808440  0.5554705  1.1864492
  2        47     1.808440  0.5554705  1.1864492
  2        48     1.808440  0.5554705  1.1864492
  2        49     1.808440  0.5554705  1.1864492
  2        50     1.808440  0.5554705  1.1864492
  2        51     1.808440  0.5554705  1.1864492
  2        52     1.808440  0.5554705  1.1864492
  2        53     1.808440  0.5554705  1.1864492
  2        54     1.808440  0.5554705  1.1864492
  2        55     1.808440  0.5554705  1.1864492
  2        56     1.808440  0.5554705  1.1864492
  2        57     1.808440  0.5554705  1.1864492
  2        58     1.808440  0.5554705  1.1864492
  2        59     1.808440  0.5554705  1.1864492
  2        60     1.808440  0.5554705  1.1864492
  2        61     1.808440  0.5554705  1.1864492
  2        62     1.808440  0.5554705  1.1864492
  2        63     1.808440  0.5554705  1.1864492
  2        64     1.808440  0.5554705  1.1864492
  2        65     1.808440  0.5554705  1.1864492
  2        66     1.808440  0.5554705  1.1864492
  2        67     1.808440  0.5554705  1.1864492
  2        68     1.808440  0.5554705  1.1864492
  2        69     1.808440  0.5554705  1.1864492
  2        70     1.808440  0.5554705  1.1864492
  2        71     1.808440  0.5554705  1.1864492
  2        72     1.808440  0.5554705  1.1864492
  2        73     1.808440  0.5554705  1.1864492
  2        74     1.808440  0.5554705  1.1864492
  2        75     1.808440  0.5554705  1.1864492
  2        76     1.808440  0.5554705  1.1864492
  2        77     1.808440  0.5554705  1.1864492
  2        78     1.808440  0.5554705  1.1864492
  2        79     1.808440  0.5554705  1.1864492
  2        80     1.808440  0.5554705  1.1864492
  2        81     1.808440  0.5554705  1.1864492
  2        82     1.808440  0.5554705  1.1864492
  2        83     1.808440  0.5554705  1.1864492
  2        84     1.808440  0.5554705  1.1864492
  2        85     1.808440  0.5554705  1.1864492
  2        86     1.808440  0.5554705  1.1864492
  2        87     1.808440  0.5554705  1.1864492
  2        88     1.808440  0.5554705  1.1864492
  2        89     1.808440  0.5554705  1.1864492
  2        90     1.808440  0.5554705  1.1864492
  2        91     1.808440  0.5554705  1.1864492
  2        92     1.808440  0.5554705  1.1864492
  2        93     1.808440  0.5554705  1.1864492
  2        94     1.808440  0.5554705  1.1864492
  2        95     1.808440  0.5554705  1.1864492
  2        96     1.808440  0.5554705  1.1864492
  2        97     1.808440  0.5554705  1.1864492
  2        98     1.808440  0.5554705  1.1864492
  2        99     1.808440  0.5554705  1.1864492
  2       100     1.808440  0.5554705  1.1864492
  3         2     1.397656  0.4796494  1.0945946
  3         3     1.253384  0.5900284  1.0217768
  3         4     1.223552  0.6241221  0.9930089
  3         5     1.222308  0.6389274  0.9839698
  3         6     1.300949  0.6094165  1.0303986
  3         7     1.422793  0.5593827  1.1082234
  3         8     1.714140  0.5081902  1.1985890
  3         9     1.781525  0.4865014  1.2305941
  3        10     1.764976  0.4850417  1.2225069
  3        11     1.786215  0.4893484  1.2423560
  3        12     1.783904  0.4838812  1.2567143
  3        13     1.790925  0.4836727  1.2443353
  3        14     1.760681  0.4968505  1.2329811
  3        15     1.811700  0.4703532  1.2635783
  3        16     1.800513  0.4816581  1.2519393
  3        17     1.863410  0.4640128  1.2807891
  3        18     1.848220  0.4684821  1.2741869
  3        19     1.849942  0.4543157  1.2756488
  3        20     1.870374  0.4624069  1.2816319
  3        21     1.850034  0.4670269  1.2518775
  3        22     1.868776  0.4726532  1.2452719
  3        23     1.899235  0.4727350  1.2535101
  3        24     1.913370  0.4671947  1.2616897
  3        25     1.928114  0.4585725  1.2780399
  3        26     1.918545  0.4599720  1.2768248
  3        27     1.911157  0.4657486  1.2683624
  3        28     1.911157  0.4657486  1.2683624
  3        29     1.911157  0.4657486  1.2683624
  3        30     1.673410  0.5193550  1.2156738
  3        31     1.673410  0.5193550  1.2156738
  3        32     1.673410  0.5193550  1.2156738
  3        33     1.673410  0.5193550  1.2156738
  3        34     1.673410  0.5193550  1.2156738
  3        35     1.673410  0.5193550  1.2156738
  3        36     1.673410  0.5193550  1.2156738
  3        37     1.673410  0.5193550  1.2156738
  3        38     1.673410  0.5193550  1.2156738
  3        39     1.673410  0.5193550  1.2156738
  3        40     1.673410  0.5193550  1.2156738
  3        41     1.673410  0.5193550  1.2156738
  3        42     1.673410  0.5193550  1.2156738
  3        43     1.673410  0.5193550  1.2156738
  3        44     1.673410  0.5193550  1.2156738
  3        45     1.673410  0.5193550  1.2156738
  3        46     1.673410  0.5193550  1.2156738
  3        47     1.673410  0.5193550  1.2156738
  3        48     1.673410  0.5193550  1.2156738
  3        49     1.673410  0.5193550  1.2156738
  3        50     1.673410  0.5193550  1.2156738
  3        51     1.673410  0.5193550  1.2156738
  3        52     1.673410  0.5193550  1.2156738
  3        53     1.673410  0.5193550  1.2156738
  3        54     1.673410  0.5193550  1.2156738
  3        55     1.673410  0.5193550  1.2156738
  3        56     1.673410  0.5193550  1.2156738
  3        57     1.673410  0.5193550  1.2156738
  3        58     1.673410  0.5193550  1.2156738
  3        59     1.673410  0.5193550  1.2156738
  3        60     1.673410  0.5193550  1.2156738
  3        61     1.673410  0.5193550  1.2156738
  3        62     1.673410  0.5193550  1.2156738
  3        63     1.673410  0.5193550  1.2156738
  3        64     1.673410  0.5193550  1.2156738
  3        65     1.673410  0.5193550  1.2156738
  3        66     1.673410  0.5193550  1.2156738
  3        67     1.673410  0.5193550  1.2156738
  3        68     1.673410  0.5193550  1.2156738
  3        69     1.673410  0.5193550  1.2156738
  3        70     1.673410  0.5193550  1.2156738
  3        71     1.673410  0.5193550  1.2156738
  3        72     1.673410  0.5193550  1.2156738
  3        73     1.673410  0.5193550  1.2156738
  3        74     1.673410  0.5193550  1.2156738
  3        75     1.673410  0.5193550  1.2156738
  3        76     1.673410  0.5193550  1.2156738
  3        77     1.673410  0.5193550  1.2156738
  3        78     1.673410  0.5193550  1.2156738
  3        79     1.673410  0.5193550  1.2156738
  3        80     1.673410  0.5193550  1.2156738
  3        81     1.673410  0.5193550  1.2156738
  3        82     1.673410  0.5193550  1.2156738
  3        83     1.673410  0.5193550  1.2156738
  3        84     1.673410  0.5193550  1.2156738
  3        85     1.673410  0.5193550  1.2156738
  3        86     1.673410  0.5193550  1.2156738
  3        87     1.673410  0.5193550  1.2156738
  3        88     1.673410  0.5193550  1.2156738
  3        89     1.673410  0.5193550  1.2156738
  3        90     1.673410  0.5193550  1.2156738
  3        91     1.673410  0.5193550  1.2156738
  3        92     1.673410  0.5193550  1.2156738
  3        93     1.673410  0.5193550  1.2156738
  3        94     1.673410  0.5193550  1.2156738
  3        95     1.673410  0.5193550  1.2156738
  3        96     1.673410  0.5193550  1.2156738
  3        97     1.673410  0.5193550  1.2156738
  3        98     1.673410  0.5193550  1.2156738
  3        99     1.673410  0.5193550  1.2156738
  3       100     1.673410  0.5193550  1.2156738

RMSE was used to select the optimal model using the smallest value.
The final values used for the model were nprune = 3 and degree = 1.

The model with the lowest RMSE was with 1 degrees of interaction and 3 terms

ggplot(marsChemModel)

summary(marsChemModel)
Call: earth(x=data.frame[144,57], y=c(38,42.44,42.0...), keepxy=TRUE, degree=1,
            nprune=3)

                                    coefficients
(Intercept)                            39.357846
h(0.641807-ManufacturingProcess09)     -1.011735
h(ManufacturingProcess32- -1.17911)     1.285230

Selected 3 of 21 terms, and 2 of 57 predictors (nprune=3)
Termination condition: RSq changed by less than 0.001 at 21 terms
Importance: ManufacturingProcess32, ManufacturingProcess09, ...
Number of terms at each degree of interaction: 1 2 (additive model)
GCV 1.362344    RSS 182.7907    GRSq 0.6218734    RSq 0.6427314

Next, I try predicting the test set data with the MARS model:

MARSChemPred <- predict(marsChemModel, newdata = chem_testData[,-1])
postResample(pred = MARSChemPred, obs = chem_testData$Yield)
     RMSE  Rsquared       MAE 
1.2554338 0.4264021 1.0514768 

SVM

Below I tune an SVM model:

set.seed(230)
svmChemModel <- train(x = chem_trainData[,-1],
                   y = chem_trainData$Yield,
                   method = 'svmRadial',
                   preProcess = c('center', 'scale'),
                   tuneLength =  14,
                   trControl = trainControl(method = "cv", number = 10))
svmChemModel
Support Vector Machines with Radial Basis Function Kernel 

144 samples
 57 predictor

Pre-processing: centered (57), scaled (57) 
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 129, 130, 130, 129, 131, 129, ... 
Resampling results across tuning parameters:

  C        RMSE      Rsquared   MAE      
     0.25  1.415883  0.5336956  1.1612506
     0.50  1.281674  0.5964956  1.0546869
     1.00  1.200240  0.6241479  0.9774171
     2.00  1.181807  0.6332608  0.9536772
     4.00  1.190212  0.6356245  0.9511234
     8.00  1.162745  0.6559100  0.9465857
    16.00  1.160748  0.6567212  0.9440564
    32.00  1.160748  0.6567212  0.9440564
    64.00  1.160748  0.6567212  0.9440564
   128.00  1.160748  0.6567212  0.9440564
   256.00  1.160748  0.6567212  0.9440564
   512.00  1.160748  0.6567212  0.9440564
  1024.00  1.160748  0.6567212  0.9440564
  2048.00  1.160748  0.6567212  0.9440564

Tuning parameter 'sigma' was held constant at a value of 0.01320701
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were sigma = 0.01320701 and C = 16.
ggplot(svmChemModel)

Next, I try predicting the test set data with the SVM model:

SVMChemPred <- predict(svmChemModel, newdata = chem_testData[,-1])
postResample(pred = SVMChemPred, obs = chem_testData$Yield)
     RMSE  Rsquared       MAE 
1.0995312 0.5563240 0.9233152 

A

We are asked to determine which model gives the optimal re-sampling and test set performance.

Below is the knnModel:

knnChemModel
k-Nearest Neighbors 

144 samples
 57 predictor

Pre-processing: centered (57), scaled (57) 
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 128, 130, 130, 129, 130, 130, ... 
Resampling results across tuning parameters:

  k   RMSE      Rsquared   MAE     
   1  1.465714  0.4465654  1.116089
   2  1.228056  0.5717160  1.013607
   3  1.277520  0.5193685  1.035558
   4  1.277025  0.5402453  1.023845
   5  1.286623  0.5393261  1.032174
   6  1.306640  0.5267678  1.044836
   7  1.326741  0.5246410  1.062796
   8  1.336258  0.5251640  1.078861
   9  1.358468  0.5042863  1.092338
  10  1.342232  0.5279743  1.077845
  11  1.346397  0.5282144  1.094812
  12  1.349239  0.5271784  1.092292
  13  1.362739  0.5157005  1.100105
  14  1.372961  0.5046815  1.101857
  15  1.399120  0.4806478  1.112901
  16  1.398594  0.4840723  1.114808
  17  1.412197  0.4758792  1.126821
  18  1.427435  0.4660139  1.140383
  19  1.428054  0.4671728  1.135843
  20  1.435325  0.4611414  1.145407

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was k = 2.

The RMSE is the re-sampling was 1.228056.

The RMSE for the test set is:

postResample(pred = knnChemPred, obs = chem_testData$Yield)
     RMSE  Rsquared       MAE 
1.5622837 0.2797031 1.2695312 

1.5622837.

Below is the MARS model:

marsChemModel
Multivariate Adaptive Regression Spline 

144 samples
 57 predictor

Pre-processing: centered (57), scaled (57) 
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 129, 129, 128, 131, 130, 129, ... 
Resampling results across tuning parameters:

  degree  nprune  RMSE      Rsquared   MAE      
  1         2     1.397656  0.4796494  1.0945946
  1         3     1.175481  0.6259761  0.9526671
  1         4     1.211015  0.6036202  0.9718720
  1         5     1.250996  0.5842997  1.0087666
  1         6     2.596217  0.4764189  1.4077280
  1         7     2.645785  0.4428147  1.4283295
  1         8     2.912071  0.4227154  1.5171069
  1         9     2.769404  0.4223613  1.5086877
  1        10     2.719705  0.4170912  1.5176289
  1        11     2.747755  0.4140453  1.5183338
  1        12     2.432505  0.4160484  1.4308117
  1        13     2.395924  0.4136836  1.4295424
  1        14     2.372032  0.4223269  1.4183484
  1        15     2.375673  0.4197655  1.4198212
  1        16     2.377044  0.4187515  1.4287127
  1        17     2.377044  0.4187515  1.4287127
  1        18     2.377044  0.4187515  1.4287127
  1        19     2.377044  0.4187515  1.4287127
  1        20     2.377044  0.4187515  1.4287127
  1        21     2.377044  0.4187515  1.4287127
  1        22     2.377044  0.4187515  1.4287127
  1        23     2.377044  0.4187515  1.4287127
  1        24     2.377044  0.4187515  1.4287127
  1        25     2.377044  0.4187515  1.4287127
  1        26     2.377044  0.4187515  1.4287127
  1        27     2.377044  0.4187515  1.4287127
  1        28     2.377044  0.4187515  1.4287127
  1        29     2.377044  0.4187515  1.4287127
  1        30     2.377044  0.4187515  1.4287127
  1        31     2.377044  0.4187515  1.4287127
  1        32     2.377044  0.4187515  1.4287127
  1        33     2.377044  0.4187515  1.4287127
  1        34     2.377044  0.4187515  1.4287127
  1        35     2.377044  0.4187515  1.4287127
  1        36     2.377044  0.4187515  1.4287127
  1        37     2.377044  0.4187515  1.4287127
  1        38     2.377044  0.4187515  1.4287127
  1        39     2.377044  0.4187515  1.4287127
  1        40     2.377044  0.4187515  1.4287127
  1        41     2.377044  0.4187515  1.4287127
  1        42     2.377044  0.4187515  1.4287127
  1        43     2.377044  0.4187515  1.4287127
  1        44     2.377044  0.4187515  1.4287127
  1        45     2.377044  0.4187515  1.4287127
  1        46     2.377044  0.4187515  1.4287127
  1        47     2.377044  0.4187515  1.4287127
  1        48     2.377044  0.4187515  1.4287127
  1        49     2.377044  0.4187515  1.4287127
  1        50     2.377044  0.4187515  1.4287127
  1        51     2.377044  0.4187515  1.4287127
  1        52     2.377044  0.4187515  1.4287127
  1        53     2.377044  0.4187515  1.4287127
  1        54     2.377044  0.4187515  1.4287127
  1        55     2.377044  0.4187515  1.4287127
  1        56     2.377044  0.4187515  1.4287127
  1        57     2.377044  0.4187515  1.4287127
  1        58     2.377044  0.4187515  1.4287127
  1        59     2.377044  0.4187515  1.4287127
  1        60     2.377044  0.4187515  1.4287127
  1        61     2.377044  0.4187515  1.4287127
  1        62     2.377044  0.4187515  1.4287127
  1        63     2.377044  0.4187515  1.4287127
  1        64     2.377044  0.4187515  1.4287127
  1        65     2.377044  0.4187515  1.4287127
  1        66     2.377044  0.4187515  1.4287127
  1        67     2.377044  0.4187515  1.4287127
  1        68     2.377044  0.4187515  1.4287127
  1        69     2.377044  0.4187515  1.4287127
  1        70     2.377044  0.4187515  1.4287127
  1        71     2.377044  0.4187515  1.4287127
  1        72     2.377044  0.4187515  1.4287127
  1        73     2.377044  0.4187515  1.4287127
  1        74     2.377044  0.4187515  1.4287127
  1        75     2.377044  0.4187515  1.4287127
  1        76     2.377044  0.4187515  1.4287127
  1        77     2.377044  0.4187515  1.4287127
  1        78     2.377044  0.4187515  1.4287127
  1        79     2.377044  0.4187515  1.4287127
  1        80     2.377044  0.4187515  1.4287127
  1        81     2.377044  0.4187515  1.4287127
  1        82     2.377044  0.4187515  1.4287127
  1        83     2.377044  0.4187515  1.4287127
  1        84     2.377044  0.4187515  1.4287127
  1        85     2.377044  0.4187515  1.4287127
  1        86     2.377044  0.4187515  1.4287127
  1        87     2.377044  0.4187515  1.4287127
  1        88     2.377044  0.4187515  1.4287127
  1        89     2.377044  0.4187515  1.4287127
  1        90     2.377044  0.4187515  1.4287127
  1        91     2.377044  0.4187515  1.4287127
  1        92     2.377044  0.4187515  1.4287127
  1        93     2.377044  0.4187515  1.4287127
  1        94     2.377044  0.4187515  1.4287127
  1        95     2.377044  0.4187515  1.4287127
  1        96     2.377044  0.4187515  1.4287127
  1        97     2.377044  0.4187515  1.4287127
  1        98     2.377044  0.4187515  1.4287127
  1        99     2.377044  0.4187515  1.4287127
  1       100     2.377044  0.4187515  1.4287127
  2         2     1.397656  0.4796494  1.0945946
  2         3     1.340348  0.5209794  1.0631839
  2         4     1.266138  0.5662456  1.0098808
  2         5     1.249357  0.5916629  0.9692782
  2         6     1.289490  0.5910860  0.9982445
  2         7     1.308424  0.5886979  1.0096864
  2         8     1.292601  0.6021639  1.0138807
  2         9     1.351036  0.5880971  1.0486129
  2        10     1.517781  0.5523609  1.1008885
  2        11     1.513767  0.5611558  1.0919718
  2        12     1.573704  0.5268401  1.1442936
  2        13     1.527645  0.5537503  1.1066168
  2        14     1.592421  0.5664257  1.1111685
  2        15     1.560927  0.5768300  1.0956702
  2        16     1.721506  0.5571156  1.1260366
  2        17     1.816464  0.5702275  1.1714685
  2        18     1.797507  0.5638597  1.1788806
  2        19     1.792913  0.5605208  1.1756709
  2        20     1.752868  0.5752170  1.1511954
  2        21     1.769578  0.5748549  1.1496129
  2        22     1.770170  0.5733124  1.1550279
  2        23     1.774019  0.5675947  1.1584076
  2        24     1.769759  0.5672906  1.1540906
  2        25     1.783348  0.5639166  1.1623984
  2        26     1.804384  0.5577609  1.1796524
  2        27     1.808440  0.5554705  1.1864492
  2        28     1.808440  0.5554705  1.1864492
  2        29     1.808440  0.5554705  1.1864492
  2        30     1.808440  0.5554705  1.1864492
  2        31     1.808440  0.5554705  1.1864492
  2        32     1.808440  0.5554705  1.1864492
  2        33     1.808440  0.5554705  1.1864492
  2        34     1.808440  0.5554705  1.1864492
  2        35     1.808440  0.5554705  1.1864492
  2        36     1.808440  0.5554705  1.1864492
  2        37     1.808440  0.5554705  1.1864492
  2        38     1.808440  0.5554705  1.1864492
  2        39     1.808440  0.5554705  1.1864492
  2        40     1.808440  0.5554705  1.1864492
  2        41     1.808440  0.5554705  1.1864492
  2        42     1.808440  0.5554705  1.1864492
  2        43     1.808440  0.5554705  1.1864492
  2        44     1.808440  0.5554705  1.1864492
  2        45     1.808440  0.5554705  1.1864492
  2        46     1.808440  0.5554705  1.1864492
  2        47     1.808440  0.5554705  1.1864492
  2        48     1.808440  0.5554705  1.1864492
  2        49     1.808440  0.5554705  1.1864492
  2        50     1.808440  0.5554705  1.1864492
  2        51     1.808440  0.5554705  1.1864492
  2        52     1.808440  0.5554705  1.1864492
  2        53     1.808440  0.5554705  1.1864492
  2        54     1.808440  0.5554705  1.1864492
  2        55     1.808440  0.5554705  1.1864492
  2        56     1.808440  0.5554705  1.1864492
  2        57     1.808440  0.5554705  1.1864492
  2        58     1.808440  0.5554705  1.1864492
  2        59     1.808440  0.5554705  1.1864492
  2        60     1.808440  0.5554705  1.1864492
  2        61     1.808440  0.5554705  1.1864492
  2        62     1.808440  0.5554705  1.1864492
  2        63     1.808440  0.5554705  1.1864492
  2        64     1.808440  0.5554705  1.1864492
  2        65     1.808440  0.5554705  1.1864492
  2        66     1.808440  0.5554705  1.1864492
  2        67     1.808440  0.5554705  1.1864492
  2        68     1.808440  0.5554705  1.1864492
  2        69     1.808440  0.5554705  1.1864492
  2        70     1.808440  0.5554705  1.1864492
  2        71     1.808440  0.5554705  1.1864492
  2        72     1.808440  0.5554705  1.1864492
  2        73     1.808440  0.5554705  1.1864492
  2        74     1.808440  0.5554705  1.1864492
  2        75     1.808440  0.5554705  1.1864492
  2        76     1.808440  0.5554705  1.1864492
  2        77     1.808440  0.5554705  1.1864492
  2        78     1.808440  0.5554705  1.1864492
  2        79     1.808440  0.5554705  1.1864492
  2        80     1.808440  0.5554705  1.1864492
  2        81     1.808440  0.5554705  1.1864492
  2        82     1.808440  0.5554705  1.1864492
  2        83     1.808440  0.5554705  1.1864492
  2        84     1.808440  0.5554705  1.1864492
  2        85     1.808440  0.5554705  1.1864492
  2        86     1.808440  0.5554705  1.1864492
  2        87     1.808440  0.5554705  1.1864492
  2        88     1.808440  0.5554705  1.1864492
  2        89     1.808440  0.5554705  1.1864492
  2        90     1.808440  0.5554705  1.1864492
  2        91     1.808440  0.5554705  1.1864492
  2        92     1.808440  0.5554705  1.1864492
  2        93     1.808440  0.5554705  1.1864492
  2        94     1.808440  0.5554705  1.1864492
  2        95     1.808440  0.5554705  1.1864492
  2        96     1.808440  0.5554705  1.1864492
  2        97     1.808440  0.5554705  1.1864492
  2        98     1.808440  0.5554705  1.1864492
  2        99     1.808440  0.5554705  1.1864492
  2       100     1.808440  0.5554705  1.1864492
  3         2     1.397656  0.4796494  1.0945946
  3         3     1.253384  0.5900284  1.0217768
  3         4     1.223552  0.6241221  0.9930089
  3         5     1.222308  0.6389274  0.9839698
  3         6     1.300949  0.6094165  1.0303986
  3         7     1.422793  0.5593827  1.1082234
  3         8     1.714140  0.5081902  1.1985890
  3         9     1.781525  0.4865014  1.2305941
  3        10     1.764976  0.4850417  1.2225069
  3        11     1.786215  0.4893484  1.2423560
  3        12     1.783904  0.4838812  1.2567143
  3        13     1.790925  0.4836727  1.2443353
  3        14     1.760681  0.4968505  1.2329811
  3        15     1.811700  0.4703532  1.2635783
  3        16     1.800513  0.4816581  1.2519393
  3        17     1.863410  0.4640128  1.2807891
  3        18     1.848220  0.4684821  1.2741869
  3        19     1.849942  0.4543157  1.2756488
  3        20     1.870374  0.4624069  1.2816319
  3        21     1.850034  0.4670269  1.2518775
  3        22     1.868776  0.4726532  1.2452719
  3        23     1.899235  0.4727350  1.2535101
  3        24     1.913370  0.4671947  1.2616897
  3        25     1.928114  0.4585725  1.2780399
  3        26     1.918545  0.4599720  1.2768248
  3        27     1.911157  0.4657486  1.2683624
  3        28     1.911157  0.4657486  1.2683624
  3        29     1.911157  0.4657486  1.2683624
  3        30     1.673410  0.5193550  1.2156738
  3        31     1.673410  0.5193550  1.2156738
  3        32     1.673410  0.5193550  1.2156738
  3        33     1.673410  0.5193550  1.2156738
  3        34     1.673410  0.5193550  1.2156738
  3        35     1.673410  0.5193550  1.2156738
  3        36     1.673410  0.5193550  1.2156738
  3        37     1.673410  0.5193550  1.2156738
  3        38     1.673410  0.5193550  1.2156738
  3        39     1.673410  0.5193550  1.2156738
  3        40     1.673410  0.5193550  1.2156738
  3        41     1.673410  0.5193550  1.2156738
  3        42     1.673410  0.5193550  1.2156738
  3        43     1.673410  0.5193550  1.2156738
  3        44     1.673410  0.5193550  1.2156738
  3        45     1.673410  0.5193550  1.2156738
  3        46     1.673410  0.5193550  1.2156738
  3        47     1.673410  0.5193550  1.2156738
  3        48     1.673410  0.5193550  1.2156738
  3        49     1.673410  0.5193550  1.2156738
  3        50     1.673410  0.5193550  1.2156738
  3        51     1.673410  0.5193550  1.2156738
  3        52     1.673410  0.5193550  1.2156738
  3        53     1.673410  0.5193550  1.2156738
  3        54     1.673410  0.5193550  1.2156738
  3        55     1.673410  0.5193550  1.2156738
  3        56     1.673410  0.5193550  1.2156738
  3        57     1.673410  0.5193550  1.2156738
  3        58     1.673410  0.5193550  1.2156738
  3        59     1.673410  0.5193550  1.2156738
  3        60     1.673410  0.5193550  1.2156738
  3        61     1.673410  0.5193550  1.2156738
  3        62     1.673410  0.5193550  1.2156738
  3        63     1.673410  0.5193550  1.2156738
  3        64     1.673410  0.5193550  1.2156738
  3        65     1.673410  0.5193550  1.2156738
  3        66     1.673410  0.5193550  1.2156738
  3        67     1.673410  0.5193550  1.2156738
  3        68     1.673410  0.5193550  1.2156738
  3        69     1.673410  0.5193550  1.2156738
  3        70     1.673410  0.5193550  1.2156738
  3        71     1.673410  0.5193550  1.2156738
  3        72     1.673410  0.5193550  1.2156738
  3        73     1.673410  0.5193550  1.2156738
  3        74     1.673410  0.5193550  1.2156738
  3        75     1.673410  0.5193550  1.2156738
  3        76     1.673410  0.5193550  1.2156738
  3        77     1.673410  0.5193550  1.2156738
  3        78     1.673410  0.5193550  1.2156738
  3        79     1.673410  0.5193550  1.2156738
  3        80     1.673410  0.5193550  1.2156738
  3        81     1.673410  0.5193550  1.2156738
  3        82     1.673410  0.5193550  1.2156738
  3        83     1.673410  0.5193550  1.2156738
  3        84     1.673410  0.5193550  1.2156738
  3        85     1.673410  0.5193550  1.2156738
  3        86     1.673410  0.5193550  1.2156738
  3        87     1.673410  0.5193550  1.2156738
  3        88     1.673410  0.5193550  1.2156738
  3        89     1.673410  0.5193550  1.2156738
  3        90     1.673410  0.5193550  1.2156738
  3        91     1.673410  0.5193550  1.2156738
  3        92     1.673410  0.5193550  1.2156738
  3        93     1.673410  0.5193550  1.2156738
  3        94     1.673410  0.5193550  1.2156738
  3        95     1.673410  0.5193550  1.2156738
  3        96     1.673410  0.5193550  1.2156738
  3        97     1.673410  0.5193550  1.2156738
  3        98     1.673410  0.5193550  1.2156738
  3        99     1.673410  0.5193550  1.2156738
  3       100     1.673410  0.5193550  1.2156738

RMSE was used to select the optimal model using the smallest value.
The final values used for the model were nprune = 3 and degree = 1.

The RMSE is the re-sampling was 1.775481.

The RMSE for the test set is:

postResample(pred = MARSChemPred, obs = chem_testData$Yield)
     RMSE  Rsquared       MAE 
1.2554338 0.4264021 1.0514768 

Below is the SVM model:

svmChemModel
Support Vector Machines with Radial Basis Function Kernel 

144 samples
 57 predictor

Pre-processing: centered (57), scaled (57) 
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 129, 130, 130, 129, 131, 129, ... 
Resampling results across tuning parameters:

  C        RMSE      Rsquared   MAE      
     0.25  1.415883  0.5336956  1.1612506
     0.50  1.281674  0.5964956  1.0546869
     1.00  1.200240  0.6241479  0.9774171
     2.00  1.181807  0.6332608  0.9536772
     4.00  1.190212  0.6356245  0.9511234
     8.00  1.162745  0.6559100  0.9465857
    16.00  1.160748  0.6567212  0.9440564
    32.00  1.160748  0.6567212  0.9440564
    64.00  1.160748  0.6567212  0.9440564
   128.00  1.160748  0.6567212  0.9440564
   256.00  1.160748  0.6567212  0.9440564
   512.00  1.160748  0.6567212  0.9440564
  1024.00  1.160748  0.6567212  0.9440564
  2048.00  1.160748  0.6567212  0.9440564

Tuning parameter 'sigma' was held constant at a value of 0.01320701
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were sigma = 0.01320701 and C = 16.

The RMSE for the re-sampling set is 1.160748.

The RMSE for the test set is:

postResample(pred = SVMChemPred, obs = chem_testData$Yield)
     RMSE  Rsquared       MAE 
1.0995312 0.5563240 0.9233152 

The SVM performed best on the test set in terms of RMSE and R squared (though the test set r-squared of 0.56 is still somewhat low). Thus, I would recommend the SVM model ahead of the KNN and MARS models.

B

Below are the top predictors for the SVM model:

plot(varImp(svmChemModel), main = 'Variable Importance')

Based on the above, Manufacturing Process 32 and 13 have the most importance in creating support vectors. While Biological Materials appear in the top ten, it seems the the Manufacturing Processes have more impact on the support vectors.

The Lasso model is different in nature than the SVM model; however, the Lasso model seemed to be more influenced by Manufacturing Processes.

C

Below I explore the relationship between Yield and the top ten variables that influence the support vectors:

final_variables <- varImp(svmChemModel)
final_variables <- as.data.frame(final_variables$importance)
final_variables$variable <- rownames(final_variables)

top_ten <- final_variables[order(-final_variables$Overall),][1:10,]
#final_variables <- final_variables[order(final_variables$Overall,decreasing = T),]

top_ten_names <- top_ten$variable

chem_data |> dplyr::select(Yield, all_of(top_ten_names)) |> ggpairs()

From the above, we can see that certain manufacturing processes are positively correlating to yield while some are negatively correlated. Biological Material is positively correlated. This suggests that better measurements on biological material may correspond with better yields while different manufacturing processes can positively or negatively correspond with yield.