Data 624 Week11

7.2 Friedman (1991) introduced several benchmark data sets created by simulation. One of these simulations used the following nonlinear equation to create data:

y=10sin(πx1x2)+20(x3−0.5)2+10x4+5x5+N(0,σ2)

where the x values are random variables uniformly distributed between [0,1] (there are also 5 other non-informative variables also created in the simulation). The package mlbench contains a function mlbench.friedman1 that simulates these data:

Qn : Which models appear to give the best performance? Does MARS select the informative predictors (those named X1–X5)?

## Loading required package: lattice
## Loading required package: ggplot2

Tune several models on these data. Lets try Knn Model first.

knnModel <- train(x = trainingData$x,
                  y = trainingData$y,
                  method = "knn",
                  preProc = c("center", "scale"),
                  tuneLength = 10)

knnModel
## k-Nearest Neighbors 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE      Rsquared   MAE     
##    5  3.466085  0.5121775  2.816838
##    7  3.349428  0.5452823  2.727410
##    9  3.264276  0.5785990  2.660026
##   11  3.214216  0.6024244  2.603767
##   13  3.196510  0.6176570  2.591935
##   15  3.184173  0.6305506  2.577482
##   17  3.183130  0.6425367  2.567787
##   19  3.198752  0.6483184  2.592683
##   21  3.188993  0.6611428  2.588787
##   23  3.200458  0.6638353  2.604529
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 17.

The knn model has selected k= 17. This means that the model would consider the nearest 17 data points and determine the predicted value. We will run the prediction on test data with KNN model we just created.

knnPred <- predict(knnModel, newdata = testData$x)
postResample(pred = knnPred, obs = testData$y)
##      RMSE  Rsquared       MAE 
## 3.2040595 0.6819919 2.5683461

SVM

Lets try the SVM model.

#install.packages("kernlab")
svmModel <- train(x = trainingData$x,
                        y = trainingData$y,
                        method = "svmRadial",
                        tuneLength=10,
                        preProc = c("center", "scale"))
svmModel
## Support Vector Machines with Radial Basis Function Kernel 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
## Resampling results across tuning parameters:
## 
##   C       RMSE      Rsquared   MAE     
##     0.25  2.545335  0.7804647  2.015121
##     0.50  2.319786  0.7965148  1.830009
##     1.00  2.188349  0.8119636  1.726027
##     2.00  2.103655  0.8241314  1.655842
##     4.00  2.066879  0.8294322  1.631051
##     8.00  2.052681  0.8313929  1.623550
##    16.00  2.049867  0.8318312  1.621820
##    32.00  2.049867  0.8318312  1.621820
##    64.00  2.049867  0.8318312  1.621820
##   128.00  2.049867  0.8318312  1.621820
## 
## Tuning parameter 'sigma' was held constant at a value of 0.06802164
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.06802164 and C = 16.
svmModelpred <- predict(svmModel, newdata = testData$x)
postResample(pred = svmModelpred, obs = testData$y)
##      RMSE  Rsquared       MAE 
## 2.0864652 0.8236735 1.5854649

The SVM has less error(RMSE, MAE) while compared with the KNN Model. R-square also got improved.

Mars

Lets try Mars and compare the accuracy.

marsGrid <- expand.grid(.degree=1:2,
                        .nprune=2:20)

marsModel <- train(x = trainingData$x,
                   y = trainingData$y,
                   method = "earth",
                   tuneGrid = marsGrid,
                   preProc = c("center", "scale"))
## Loading required package: earth
## Warning: package 'earth' was built under R version 4.0.3
## Loading required package: Formula
## Warning: package 'Formula' was built under R version 4.0.3
## Loading required package: plotmo
## Warning: package 'plotmo' was built under R version 4.0.3
## Loading required package: plotrix
## Warning: package 'plotrix' was built under R version 4.0.3
## Loading required package: TeachingDemos
## Warning: package 'TeachingDemos' was built under R version 4.0.3
marsModel
## Multivariate Adaptive Regression Spline 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
## Resampling results across tuning parameters:
## 
##   degree  nprune  RMSE      Rsquared   MAE     
##   1        2      4.447386  0.2254125  3.620675
##   1        3      3.790305  0.4344625  3.058704
##   1        4      2.801182  0.6884819  2.233531
##   1        5      2.551283  0.7412626  2.051644
##   1        6      2.493135  0.7492201  1.986528
##   1        7      2.089713  0.8239588  1.645996
##   1        8      1.889475  0.8565881  1.484798
##   1        9      1.816053  0.8673608  1.420333
##   1       10      1.819611  0.8674028  1.417343
##   1       11      1.819783  0.8670556  1.415058
##   1       12      1.832487  0.8651613  1.426371
##   1       13      1.845943  0.8632112  1.436005
##   1       14      1.855353  0.8613778  1.452115
##   1       15      1.854557  0.8617322  1.452920
##   1       16      1.856173  0.8616879  1.455393
##   1       17      1.856989  0.8615480  1.456862
##   1       18      1.856989  0.8615480  1.456862
##   1       19      1.856989  0.8615480  1.456862
##   1       20      1.856989  0.8615480  1.456862
##   2        2      4.434592  0.2241213  3.616685
##   2        3      3.799538  0.4319047  3.064845
##   2        4      2.806374  0.6871266  2.237911
##   2        5      2.524002  0.7462965  2.023657
##   2        6      2.446243  0.7602514  1.931404
##   2        7      2.147529  0.8127597  1.682839
##   2        8      1.977186  0.8393569  1.557609
##   2        9      1.831267  0.8635192  1.428370
##   2       10      1.639428  0.8902850  1.280510
##   2       11      1.545708  0.9019039  1.213559
##   2       12      1.499558  0.9081641  1.171249
##   2       13      1.494111  0.9087340  1.161702
##   2       14      1.492700  0.9102980  1.160345
##   2       15      1.484444  0.9116520  1.153052
##   2       16      1.487065  0.9109633  1.151057
##   2       17      1.496021  0.9098876  1.156630
##   2       18      1.487296  0.9111035  1.150491
##   2       19      1.486280  0.9113126  1.149198
##   2       20      1.486280  0.9113126  1.149198
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 15 and degree = 2.
marsModelpred <- predict(marsModel, newdata = testData$x)
postResample(pred = marsModelpred, obs = testData$y)
##      RMSE  Rsquared       MAE 
## 1.1908806 0.9428866 0.9496858

Qn : Which models appear to give the best performance? Does MARS select the informative predictors (those named X1–X5)?

From the above, Mars model gives a better lower error(in terms of RMSE, MAE) and R-square is more closed to 1.

varImp(marsModel)
## earth variable importance
## 
##    Overall
## X1  100.00
## X4   75.31
## X2   48.86
## X5   15.61
## X3    0.00

7.5. Exercise 6.3 describes data for a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several nonlinear regression models

  1. Which nonlinear regression model gives the optimal resampling and test set performance?
# Load the chemical manufacturing data
#install.packages('AppliedPredictiveModeling')
library(AppliedPredictiveModeling)
## Warning: package 'AppliedPredictiveModeling' was built under R version 4.0.3
data("ChemicalManufacturingProcess")

Lets look at the sample data from the dataframe. We see there are 58 columns and the target variable is Yield.

head(ChemicalManufacturingProcess)
##   Yield BiologicalMaterial01 BiologicalMaterial02 BiologicalMaterial03
## 1 38.00                 6.25                49.58                56.97
## 2 42.44                 8.01                60.97                67.48
## 3 42.03                 8.01                60.97                67.48
## 4 41.42                 8.01                60.97                67.48
## 5 42.49                 7.47                63.33                72.25
## 6 43.57                 6.12                58.36                65.31
##   BiologicalMaterial04 BiologicalMaterial05 BiologicalMaterial06
## 1                12.74                19.51                43.73
## 2                14.65                19.36                53.14
## 3                14.65                19.36                53.14
## 4                14.65                19.36                53.14
## 5                14.02                17.91                54.66
## 6                15.17                21.79                51.23
##   BiologicalMaterial07 BiologicalMaterial08 BiologicalMaterial09
## 1                  100                16.66                11.44
## 2                  100                19.04                12.55
## 3                  100                19.04                12.55
## 4                  100                19.04                12.55
## 5                  100                18.22                12.80
## 6                  100                18.30                12.13
##   BiologicalMaterial10 BiologicalMaterial11 BiologicalMaterial12
## 1                 3.46               138.09                18.83
## 2                 3.46               153.67                21.05
## 3                 3.46               153.67                21.05
## 4                 3.46               153.67                21.05
## 5                 3.05               147.61                21.05
## 6                 3.78               151.88                20.76
##   ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03
## 1                     NA                     NA                     NA
## 2                    0.0                      0                     NA
## 3                    0.0                      0                     NA
## 4                    0.0                      0                     NA
## 5                   10.7                      0                     NA
## 6                   12.0                      0                     NA
##   ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06
## 1                     NA                     NA                     NA
## 2                    917                 1032.2                  210.0
## 3                    912                 1003.6                  207.1
## 4                    911                 1014.6                  213.3
## 5                    918                 1027.5                  205.7
## 6                    924                 1016.8                  208.9
##   ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess09
## 1                     NA                     NA                  43.00
## 2                    177                    178                  46.57
## 3                    178                    178                  45.07
## 4                    177                    177                  44.92
## 5                    178                    178                  44.96
## 6                    178                    178                  45.32
##   ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12
## 1                     NA                     NA                     NA
## 2                     NA                     NA                      0
## 3                     NA                     NA                      0
## 4                     NA                     NA                      0
## 5                     NA                     NA                      0
## 6                     NA                     NA                      0
##   ManufacturingProcess13 ManufacturingProcess14 ManufacturingProcess15
## 1                   35.5                   4898                   6108
## 2                   34.0                   4869                   6095
## 3                   34.8                   4878                   6087
## 4                   34.8                   4897                   6102
## 5                   34.6                   4992                   6233
## 6                   34.0                   4985                   6222
##   ManufacturingProcess16 ManufacturingProcess17 ManufacturingProcess18
## 1                   4682                   35.5                   4865
## 2                   4617                   34.0                   4867
## 3                   4617                   34.8                   4877
## 4                   4635                   34.8                   4872
## 5                   4733                   33.9                   4886
## 6                   4786                   33.4                   4862
##   ManufacturingProcess19 ManufacturingProcess20 ManufacturingProcess21
## 1                   6049                   4665                    0.0
## 2                   6097                   4621                    0.0
## 3                   6078                   4621                    0.0
## 4                   6073                   4611                    0.0
## 5                   6102                   4659                   -0.7
## 6                   6115                   4696                   -0.6
##   ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24
## 1                     NA                     NA                     NA
## 2                      3                      0                      3
## 3                      4                      1                      4
## 4                      5                      2                      5
## 5                      8                      4                     18
## 6                      9                      1                      1
##   ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27
## 1                   4873                   6074                   4685
## 2                   4869                   6107                   4630
## 3                   4897                   6116                   4637
## 4                   4892                   6111                   4630
## 5                   4930                   6151                   4684
## 6                   4871                   6128                   4687
##   ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30
## 1                   10.7                   21.0                    9.9
## 2                   11.2                   21.4                    9.9
## 3                   11.1                   21.3                    9.4
## 4                   11.1                   21.3                    9.4
## 5                   11.3                   21.6                    9.0
## 6                   11.4                   21.7                   10.1
##   ManufacturingProcess31 ManufacturingProcess32 ManufacturingProcess33
## 1                   69.1                    156                     66
## 2                   68.7                    169                     66
## 3                   69.3                    173                     66
## 4                   69.3                    171                     68
## 5                   69.4                    171                     70
## 6                   68.2                    173                     70
##   ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36
## 1                    2.4                    486                  0.019
## 2                    2.6                    508                  0.019
## 3                    2.6                    509                  0.018
## 4                    2.5                    496                  0.018
## 5                    2.5                    468                  0.017
## 6                    2.5                    490                  0.018
##   ManufacturingProcess37 ManufacturingProcess38 ManufacturingProcess39
## 1                    0.5                      3                    7.2
## 2                    2.0                      2                    7.2
## 3                    0.7                      2                    7.2
## 4                    1.2                      2                    7.2
## 5                    0.2                      2                    7.3
## 6                    0.4                      2                    7.2
##   ManufacturingProcess40 ManufacturingProcess41 ManufacturingProcess42
## 1                     NA                     NA                   11.6
## 2                    0.1                   0.15                   11.1
## 3                    0.0                   0.00                   12.0
## 4                    0.0                   0.00                   10.6
## 5                    0.0                   0.00                   11.0
## 6                    0.0                   0.00                   11.5
##   ManufacturingProcess43 ManufacturingProcess44 ManufacturingProcess45
## 1                    3.0                    1.8                    2.4
## 2                    0.9                    1.9                    2.2
## 3                    1.0                    1.8                    2.3
## 4                    1.1                    1.8                    2.1
## 5                    1.1                    1.7                    2.1
## 6                    2.2                    1.8                    2.0
ncol(ChemicalManufacturingProcess)
## [1] 58

Lets do some preprocessing and clean the data. As part of it, we will see if there is any missing values.

There are 176 rows in this dataset. Out of that 24 rows has NAs. There are 106 total NA occurances in the data set.

Total number of Observation.

nrow(ChemicalManufacturingProcess)
## [1] 176

Total number of NAs

length(which(is.na(ChemicalManufacturingProcess)))
## [1] 106

Total number of rows with NA

length(which(!complete.cases(ChemicalManufacturingProcess)))
## [1] 24

Lets do a summary of data.

summary(ChemicalManufacturingProcess)
##      Yield       BiologicalMaterial01 BiologicalMaterial02 BiologicalMaterial03
##  Min.   :35.25   Min.   :4.580        Min.   :46.87        Min.   :56.97       
##  1st Qu.:38.75   1st Qu.:5.978        1st Qu.:52.68        1st Qu.:64.98       
##  Median :39.97   Median :6.305        Median :55.09        Median :67.22       
##  Mean   :40.18   Mean   :6.411        Mean   :55.69        Mean   :67.70       
##  3rd Qu.:41.48   3rd Qu.:6.870        3rd Qu.:58.74        3rd Qu.:70.43       
##  Max.   :46.34   Max.   :8.810        Max.   :64.75        Max.   :78.25       
##                                                                                
##  BiologicalMaterial04 BiologicalMaterial05 BiologicalMaterial06
##  Min.   : 9.38        Min.   :13.24        Min.   :40.60       
##  1st Qu.:11.24        1st Qu.:17.23        1st Qu.:46.05       
##  Median :12.10        Median :18.49        Median :48.46       
##  Mean   :12.35        Mean   :18.60        Mean   :48.91       
##  3rd Qu.:13.22        3rd Qu.:19.90        3rd Qu.:51.34       
##  Max.   :23.09        Max.   :24.85        Max.   :59.38       
##                                                                
##  BiologicalMaterial07 BiologicalMaterial08 BiologicalMaterial09
##  Min.   :100.0        Min.   :15.88        Min.   :11.44       
##  1st Qu.:100.0        1st Qu.:17.06        1st Qu.:12.60       
##  Median :100.0        Median :17.51        Median :12.84       
##  Mean   :100.0        Mean   :17.49        Mean   :12.85       
##  3rd Qu.:100.0        3rd Qu.:17.88        3rd Qu.:13.13       
##  Max.   :100.8        Max.   :19.14        Max.   :14.08       
##                                                                
##  BiologicalMaterial10 BiologicalMaterial11 BiologicalMaterial12
##  Min.   :1.770        Min.   :135.8        Min.   :18.35       
##  1st Qu.:2.460        1st Qu.:143.8        1st Qu.:19.73       
##  Median :2.710        Median :146.1        Median :20.12       
##  Mean   :2.801        Mean   :147.0        Mean   :20.20       
##  3rd Qu.:2.990        3rd Qu.:149.6        3rd Qu.:20.75       
##  Max.   :6.870        Max.   :158.7        Max.   :22.21       
##                                                                
##  ManufacturingProcess01 ManufacturingProcess02 ManufacturingProcess03
##  Min.   : 0.00          Min.   : 0.00          Min.   :1.47          
##  1st Qu.:10.80          1st Qu.:19.30          1st Qu.:1.53          
##  Median :11.40          Median :21.00          Median :1.54          
##  Mean   :11.21          Mean   :16.68          Mean   :1.54          
##  3rd Qu.:12.15          3rd Qu.:21.50          3rd Qu.:1.55          
##  Max.   :14.10          Max.   :22.50          Max.   :1.60          
##  NA's   :1              NA's   :3              NA's   :15            
##  ManufacturingProcess04 ManufacturingProcess05 ManufacturingProcess06
##  Min.   :911.0          Min.   : 923.0         Min.   :203.0         
##  1st Qu.:928.0          1st Qu.: 986.8         1st Qu.:205.7         
##  Median :934.0          Median : 999.2         Median :206.8         
##  Mean   :931.9          Mean   :1001.7         Mean   :207.4         
##  3rd Qu.:936.0          3rd Qu.:1008.9         3rd Qu.:208.7         
##  Max.   :946.0          Max.   :1175.3         Max.   :227.4         
##  NA's   :1              NA's   :1              NA's   :2             
##  ManufacturingProcess07 ManufacturingProcess08 ManufacturingProcess09
##  Min.   :177.0          Min.   :177.0          Min.   :38.89         
##  1st Qu.:177.0          1st Qu.:177.0          1st Qu.:44.89         
##  Median :177.0          Median :178.0          Median :45.73         
##  Mean   :177.5          Mean   :177.6          Mean   :45.66         
##  3rd Qu.:178.0          3rd Qu.:178.0          3rd Qu.:46.52         
##  Max.   :178.0          Max.   :178.0          Max.   :49.36         
##  NA's   :1              NA's   :1                                    
##  ManufacturingProcess10 ManufacturingProcess11 ManufacturingProcess12
##  Min.   : 7.500         Min.   : 7.500         Min.   :   0.0        
##  1st Qu.: 8.700         1st Qu.: 9.000         1st Qu.:   0.0        
##  Median : 9.100         Median : 9.400         Median :   0.0        
##  Mean   : 9.179         Mean   : 9.386         Mean   : 857.8        
##  3rd Qu.: 9.550         3rd Qu.: 9.900         3rd Qu.:   0.0        
##  Max.   :11.600         Max.   :11.500         Max.   :4549.0        
##  NA's   :9              NA's   :10             NA's   :1             
##  ManufacturingProcess13 ManufacturingProcess14 ManufacturingProcess15
##  Min.   :32.10          Min.   :4701           Min.   :5904          
##  1st Qu.:33.90          1st Qu.:4828           1st Qu.:6010          
##  Median :34.60          Median :4856           Median :6032          
##  Mean   :34.51          Mean   :4854           Mean   :6039          
##  3rd Qu.:35.20          3rd Qu.:4882           3rd Qu.:6061          
##  Max.   :38.60          Max.   :5055           Max.   :6233          
##                         NA's   :1                                    
##  ManufacturingProcess16 ManufacturingProcess17 ManufacturingProcess18
##  Min.   :   0           Min.   :31.30          Min.   :   0          
##  1st Qu.:4561           1st Qu.:33.50          1st Qu.:4813          
##  Median :4588           Median :34.40          Median :4835          
##  Mean   :4566           Mean   :34.34          Mean   :4810          
##  3rd Qu.:4619           3rd Qu.:35.10          3rd Qu.:4862          
##  Max.   :4852           Max.   :40.00          Max.   :4971          
##                                                                      
##  ManufacturingProcess19 ManufacturingProcess20 ManufacturingProcess21
##  Min.   :5890           Min.   :   0           Min.   :-1.8000       
##  1st Qu.:6001           1st Qu.:4553           1st Qu.:-0.6000       
##  Median :6022           Median :4582           Median :-0.3000       
##  Mean   :6028           Mean   :4556           Mean   :-0.1642       
##  3rd Qu.:6050           3rd Qu.:4610           3rd Qu.: 0.0000       
##  Max.   :6146           Max.   :4759           Max.   : 3.6000       
##                                                                      
##  ManufacturingProcess22 ManufacturingProcess23 ManufacturingProcess24
##  Min.   : 0.000         Min.   :0.000          Min.   : 0.000        
##  1st Qu.: 3.000         1st Qu.:2.000          1st Qu.: 4.000        
##  Median : 5.000         Median :3.000          Median : 8.000        
##  Mean   : 5.406         Mean   :3.017          Mean   : 8.834        
##  3rd Qu.: 8.000         3rd Qu.:4.000          3rd Qu.:14.000        
##  Max.   :12.000         Max.   :6.000          Max.   :23.000        
##  NA's   :1              NA's   :1              NA's   :1             
##  ManufacturingProcess25 ManufacturingProcess26 ManufacturingProcess27
##  Min.   :   0           Min.   :   0           Min.   :   0          
##  1st Qu.:4832           1st Qu.:6020           1st Qu.:4560          
##  Median :4855           Median :6047           Median :4587          
##  Mean   :4828           Mean   :6016           Mean   :4563          
##  3rd Qu.:4877           3rd Qu.:6070           3rd Qu.:4609          
##  Max.   :4990           Max.   :6161           Max.   :4710          
##  NA's   :5              NA's   :5              NA's   :5             
##  ManufacturingProcess28 ManufacturingProcess29 ManufacturingProcess30
##  Min.   : 0.000         Min.   : 0.00          Min.   : 0.000        
##  1st Qu.: 0.000         1st Qu.:19.70          1st Qu.: 8.800        
##  Median :10.400         Median :19.90          Median : 9.100        
##  Mean   : 6.592         Mean   :20.01          Mean   : 9.161        
##  3rd Qu.:10.750         3rd Qu.:20.40          3rd Qu.: 9.700        
##  Max.   :11.500         Max.   :22.00          Max.   :11.200        
##  NA's   :5              NA's   :5              NA's   :5             
##  ManufacturingProcess31 ManufacturingProcess32 ManufacturingProcess33
##  Min.   : 0.00          Min.   :143.0          Min.   :56.00         
##  1st Qu.:70.10          1st Qu.:155.0          1st Qu.:62.00         
##  Median :70.80          Median :158.0          Median :64.00         
##  Mean   :70.18          Mean   :158.5          Mean   :63.54         
##  3rd Qu.:71.40          3rd Qu.:162.0          3rd Qu.:65.00         
##  Max.   :72.50          Max.   :173.0          Max.   :70.00         
##  NA's   :5                                     NA's   :5             
##  ManufacturingProcess34 ManufacturingProcess35 ManufacturingProcess36
##  Min.   :2.300          Min.   :463.0          Min.   :0.01700       
##  1st Qu.:2.500          1st Qu.:490.0          1st Qu.:0.01900       
##  Median :2.500          Median :495.0          Median :0.02000       
##  Mean   :2.494          Mean   :495.6          Mean   :0.01957       
##  3rd Qu.:2.500          3rd Qu.:501.5          3rd Qu.:0.02000       
##  Max.   :2.600          Max.   :522.0          Max.   :0.02200       
##  NA's   :5              NA's   :5              NA's   :5             
##  ManufacturingProcess37 ManufacturingProcess38 ManufacturingProcess39
##  Min.   :0.000          Min.   :0.000          Min.   :0.000         
##  1st Qu.:0.700          1st Qu.:2.000          1st Qu.:7.100         
##  Median :1.000          Median :3.000          Median :7.200         
##  Mean   :1.014          Mean   :2.534          Mean   :6.851         
##  3rd Qu.:1.300          3rd Qu.:3.000          3rd Qu.:7.300         
##  Max.   :2.300          Max.   :3.000          Max.   :7.500         
##                                                                      
##  ManufacturingProcess40 ManufacturingProcess41 ManufacturingProcess42
##  Min.   :0.00000        Min.   :0.00000        Min.   : 0.00         
##  1st Qu.:0.00000        1st Qu.:0.00000        1st Qu.:11.40         
##  Median :0.00000        Median :0.00000        Median :11.60         
##  Mean   :0.01771        Mean   :0.02371        Mean   :11.21         
##  3rd Qu.:0.00000        3rd Qu.:0.00000        3rd Qu.:11.70         
##  Max.   :0.10000        Max.   :0.20000        Max.   :12.10         
##  NA's   :1              NA's   :1                                    
##  ManufacturingProcess43 ManufacturingProcess44 ManufacturingProcess45
##  Min.   : 0.0000        Min.   :0.000          Min.   :0.000         
##  1st Qu.: 0.6000        1st Qu.:1.800          1st Qu.:2.100         
##  Median : 0.8000        Median :1.900          Median :2.200         
##  Mean   : 0.9119        Mean   :1.805          Mean   :2.138         
##  3rd Qu.: 1.0250        3rd Qu.:1.900          3rd Qu.:2.300         
##  Max.   :11.0000        Max.   :2.100          Max.   :2.600         
## 

There are whole lot of features has missing values. It looks like this plot not showing the feature names in the x axis. Around 13% rows has missing values. We can either drop these rows or impute the missing values.

library(VIM)
## Loading required package: colorspace
## Loading required package: grid
## VIM is ready to use.
## Suggestions and bug-reports can be submitted at: https://github.com/statistikat/VIM/issues
## 
## Attaching package: 'VIM'
## The following object is masked from 'package:datasets':
## 
##     sleep
summary(aggr(ChemicalManufacturingProcess))

## 
##  Missings per variable: 
##                Variable Count
##                   Yield     0
##    BiologicalMaterial01     0
##    BiologicalMaterial02     0
##    BiologicalMaterial03     0
##    BiologicalMaterial04     0
##    BiologicalMaterial05     0
##    BiologicalMaterial06     0
##    BiologicalMaterial07     0
##    BiologicalMaterial08     0
##    BiologicalMaterial09     0
##    BiologicalMaterial10     0
##    BiologicalMaterial11     0
##    BiologicalMaterial12     0
##  ManufacturingProcess01     1
##  ManufacturingProcess02     3
##  ManufacturingProcess03    15
##  ManufacturingProcess04     1
##  ManufacturingProcess05     1
##  ManufacturingProcess06     2
##  ManufacturingProcess07     1
##  ManufacturingProcess08     1
##  ManufacturingProcess09     0
##  ManufacturingProcess10     9
##  ManufacturingProcess11    10
##  ManufacturingProcess12     1
##  ManufacturingProcess13     0
##  ManufacturingProcess14     1
##  ManufacturingProcess15     0
##  ManufacturingProcess16     0
##  ManufacturingProcess17     0
##  ManufacturingProcess18     0
##  ManufacturingProcess19     0
##  ManufacturingProcess20     0
##  ManufacturingProcess21     0
##  ManufacturingProcess22     1
##  ManufacturingProcess23     1
##  ManufacturingProcess24     1
##  ManufacturingProcess25     5
##  ManufacturingProcess26     5
##  ManufacturingProcess27     5
##  ManufacturingProcess28     5
##  ManufacturingProcess29     5
##  ManufacturingProcess30     5
##  ManufacturingProcess31     5
##  ManufacturingProcess32     0
##  ManufacturingProcess33     5
##  ManufacturingProcess34     5
##  ManufacturingProcess35     5
##  ManufacturingProcess36     5
##  ManufacturingProcess37     0
##  ManufacturingProcess38     0
##  ManufacturingProcess39     0
##  ManufacturingProcess40     1
##  ManufacturingProcess41     1
##  ManufacturingProcess42     0
##  ManufacturingProcess43     0
##  ManufacturingProcess44     0
##  ManufacturingProcess45     0
## 
##  Missings in combinations of variables: 
##                                                                                                         Combinations
##  0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0
##  0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:1:1:1:1:1:1:1:0:1:1:1:1:0:0:0:0:0:0:0:0:0
##  0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:1:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0
##  0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:1:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0
##  0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:1:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0
##  0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:1:0:0:0:0:0:0:1:1:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0
##  0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:1:0:0:0:0:0:0:1:1:0:0:1:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0
##  0:0:0:0:0:0:0:0:0:0:0:0:0:0:1:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0
##  0:0:0:0:0:0:0:0:0:0:0:0:0:1:1:1:1:1:1:1:1:0:1:1:1:0:0:0:0:0:0:0:0:0:1:1:1:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:1:1:0:0:0:0
##  Count    Percent
##    152 86.3636364
##      5  2.8409091
##      1  0.5681818
##      1  0.5681818
##      6  3.4090909
##      7  3.9772727
##      1  0.5681818
##      2  1.1363636
##      1  0.5681818

Imputing and Splitting

KNN

Lets try our first model with KNN().

knnModel <- train(x = X.train,
                  y = y.train,
                  method = "knn",
                  preProc = c("center", "scale"),
                  tuneLength = 10)
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
knnModel
## k-Nearest Neighbors 
## 
## 132 samples
##  57 predictor
## 
## Pre-processing: centered (57), scaled (57) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 132, 132, 132, 132, 132, 132, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE      Rsquared   MAE     
##    5  1.386818  0.4069868  1.130305
##    7  1.370576  0.4193097  1.120586
##    9  1.384592  0.4058596  1.130505
##   11  1.398555  0.3989628  1.144835
##   13  1.410754  0.3902433  1.154346
##   15  1.416930  0.3904867  1.160173
##   17  1.417252  0.3977882  1.161859
##   19  1.425860  0.3955884  1.167947
##   21  1.435216  0.3926678  1.175034
##   23  1.447417  0.3907068  1.186565
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 7.

Lets predict with test data and determine the errors and rSquare.

knnPred <- predict(knnModel, newdata = X.test)
postResample(pred = knnPred, obs = y.test)
##      RMSE  Rsquared       MAE 
## 1.4967980 0.5121008 1.1812662

SVM

Lets try with SVM

svmModel <- train(x = X.train,
                        y = y.train,
                        method = "svmRadial",
                        tuneLength=10,
                        preProc = c("center", "scale"))
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
svmModel
## Support Vector Machines with Radial Basis Function Kernel 
## 
## 132 samples
##  57 predictor
## 
## Pre-processing: centered (57), scaled (57) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 132, 132, 132, 132, 132, 132, ... 
## Resampling results across tuning parameters:
## 
##   C       RMSE      Rsquared   MAE      
##     0.25  1.445287  0.4618584  1.1834154
##     0.50  1.340407  0.5008550  1.0926548
##     1.00  1.264424  0.5370274  1.0281439
##     2.00  1.223382  0.5561368  0.9872089
##     4.00  1.212827  0.5576232  0.9710348
##     8.00  1.215786  0.5571255  0.9705215
##    16.00  1.216463  0.5570912  0.9711200
##    32.00  1.216463  0.5570912  0.9711200
##    64.00  1.216463  0.5570912  0.9711200
##   128.00  1.216463  0.5570912  0.9711200
## 
## Tuning parameter 'sigma' was held constant at a value of 0.01386174
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.01386174 and C = 4.

Predict on test set

svmPred <- predict(svmModel, newdata = X.test)
postResample(pred = svmPred, obs = y.test)
##      RMSE  Rsquared       MAE 
## 1.2585773 0.6251842 1.0072735

Mars

Lets try our last model with Mars()

marsGrid <- expand.grid(.degree=1:2,
                        .nprune=2:20)

marsModel <- train(x = X.train,
                   y = y.train,
                   method = "earth",
                   tuneGrid = marsGrid,
                   preProc = c("center", "scale"))
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: BiologicalMaterial07
marsModel
## Multivariate Adaptive Regression Spline 
## 
## 132 samples
##  57 predictor
## 
## Pre-processing: centered (57), scaled (57) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 132, 132, 132, 132, 132, 132, ... 
## Resampling results across tuning parameters:
## 
##   degree  nprune  RMSE      Rsquared   MAE     
##   1        2      1.535570  0.3375963  1.187483
##   1        3      1.340388  0.4919964  1.049814
##   1        4      3.932410  0.4895318  1.393907
##   1        5      3.443236  0.4770059  1.325244
##   1        6      3.813132  0.4446017  1.396980
##   1        7      4.875685  0.3940506  1.585389
##   1        8      4.140970  0.4097375  1.466265
##   1        9      5.218548  0.3875712  1.642331
##   1       10      5.355814  0.3723563  1.683616
##   1       11      5.510000  0.3261928  1.738215
##   1       12      5.337506  0.3145538  1.727836
##   1       13      6.157829  0.3109134  1.888466
##   1       14      6.702187  0.2993218  1.987325
##   1       15      6.831920  0.2997222  2.034952
##   1       16      6.564210  0.3027182  1.991806
##   1       17      6.548614  0.3019879  1.995204
##   1       18      6.533720  0.3032393  1.990802
##   1       19      6.537509  0.3022043  1.994733
##   1       20      6.535477  0.3033874  1.994230
##   2        2      1.533899  0.3395688  1.190380
##   2        3      1.386810  0.4557288  1.090908
##   2        4      1.934649  0.4611534  1.155901
##   2        5      1.695142  0.4757415  1.118137
##   2        6      2.101742  0.4556661  1.186778
##   2        7      2.287001  0.4452221  1.226043
##   2        8      2.229628  0.4527732  1.214881
##   2        9      2.241617  0.4342188  1.231014
##   2       10      2.260622  0.4340273  1.237285
##   2       11      2.473603  0.4086686  1.288932
##   2       12      2.429886  0.3959948  1.307211
##   2       13      2.443104  0.4019047  1.312503
##   2       14      2.401061  0.3988870  1.305754
##   2       15      2.534867  0.3969389  1.328302
##   2       16      2.673429  0.3762920  1.376511
##   2       17      2.865160  0.3688746  1.411509
##   2       18      2.696567  0.3724343  1.390933
##   2       19      2.854104  0.3589748  1.439246
##   2       20      2.893790  0.3584368  1.440659
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 3 and degree = 1.

Predict on test set

marsModelpred <- predict(marsModel, newdata = X.test)
postResample(pred = marsModelpred, obs = y.test)
##      RMSE  Rsquared       MAE 
## 1.4956756 0.4415462 1.1910414
resamp <- resamples(list(KNN=knnModel, MARS=marsModel, SVM=svmModel))
summary(resamp)
## 
## Call:
## summary.resamples(object = resamp)
## 
## Models: KNN, MARS, SVM 
## Number of resamples: 25 
## 
## MAE 
##           Min.   1st Qu.    Median      Mean  3rd Qu.     Max. NA's
## KNN  0.9466563 1.0526108 1.1242598 1.1205860 1.184430 1.288532    0
## MARS 0.8329489 0.9640332 1.0187049 1.0498143 1.116347 1.457561    0
## SVM  0.7748748 0.9059040 0.9316888 0.9710348 1.070985 1.156674    0
## 
## RMSE 
##           Min.  1st Qu.   Median     Mean  3rd Qu.     Max. NA's
## KNN  1.1145218 1.305401 1.376347 1.370576 1.440231 1.530900    0
## MARS 1.0329034 1.200956 1.313639 1.340388 1.456943 1.868650    0
## SVM  0.9890417 1.116357 1.154228 1.212827 1.327700 1.441459    0
## 
## Rsquared 
##           Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
## KNN  0.2742013 0.3331019 0.4318388 0.4193097 0.4978750 0.5836994    0
## MARS 0.1500053 0.4257317 0.5111191 0.4919964 0.5902840 0.6738815    0
## SVM  0.3867768 0.4820051 0.5613091 0.5576232 0.6431642 0.6893192    0

(a) Which nonlinear regression model gives the optimal resampling and test set performance?

Out of 3 models I tried, SVM stands out as an optimal model with overall R2(closer to 1) with less RMSE & MAE. From over test predict results too, SVM provides a good convergence.

(b) Which predictors are most important in the optimal nonlinear regression model? Do either the biological or process variables dominate the list? How do the top ten important predictors compare to the top ten predictors from the optimal linear model?

varImp(svmModel)
## loess r-squared variable importance
## 
##   only 20 most important variables shown (out of 57)
## 
##                        Overall
## ManufacturingProcess32  100.00
## ManufacturingProcess13   94.37
## BiologicalMaterial03     86.71
## BiologicalMaterial06     86.61
## ManufacturingProcess09   73.49
## ManufacturingProcess17   70.57
## BiologicalMaterial12     63.83
## ManufacturingProcess36   63.76
## ManufacturingProcess06   58.47
## ManufacturingProcess31   54.85
## BiologicalMaterial02     54.22
## ManufacturingProcess11   48.59
## BiologicalMaterial11     45.41
## ManufacturingProcess33   45.09
## ManufacturingProcess30   43.00
## BiologicalMaterial04     42.53
## ManufacturingProcess20   40.85
## ManufacturingProcess12   40.53
## ManufacturingProcess29   37.11
## ManufacturingProcess02   36.01

6 out of the top 10 ranked predictors are ManufacturingProcess predictors. The top ranking predictor is ManufacturingProcess32. It appears that the ManufacturingProcess predictors are more important.

The top predictor from the above list is ManufacturingProcess32. Out of top 10 predictors, 6 are process variables and 4 are biological variables. Hence process dominates the biological variable.

(c) Explore the relationships between the top predictors and the response for the predictors that are unique to the optimal nonlinear regression model. Do these plots reveal intuition about the biological or process predictors and their relationship with yield?

Below are the different plot of top 10 predicators versus our traget vraiable(yeild). As you see, there are no noticible relationship between predictors to yield as we look at individually.

ManufacturingProcess33 v/s Yield

plot(x = X.train$ManufacturingProcess33, y.train)

ManufacturingProcess31 v/s Yield

plot(x = X.train$ManufacturingProcess31, y.train)

BiologicalMaterial11 v/s Yield

plot(x = X.train$BiologicalMaterial11, y.train)

BiologicalMaterial04 v/s Yield

plot(x = X.train$BiologicalMaterial04, y.train)

ManufacturingProcess29 v/s Yield

plot(x = X.train$ManufacturingProcess29, y.train)

ManufacturingProcess11 v/s Yield

plot(x = X.train$ManufacturingProcess11, y.train)

ManufacturingProcess12 v/s yield

plot(x = X.train$ManufacturingProcess12, y.train)

BiologicalMaterial08 v/s Yield

plot(x = X.train$BiologicalMaterial08, y.train)

BiologicalMaterial09 v/s Yield

plot(x = X.train$BiologicalMaterial09, y.train)

BiologicalMaterial01 v/s Yield

plot(x = X.train$BiologicalMaterial01, y.train)