7.2
Friedman (1991) introduced several benchmark data sets created by simulation. One of these simulations used the following nonlinear equation to create data: \[ y = 10sin(\pi x_1x_2) + 20(x_3 - 0.5)^2 +1-x_4 + 5x_5 +N(0, \sigma^2)\] where the x values are random variables uniformly distributed between 0,1. The package mlbench contains a function called mlbench.friedma1 that simulates these data:
Tune several models on these data. For example:
## k-Nearest Neighbors
##
## 200 samples
## 10 predictor
##
## Pre-processing: centered (10), scaled (10)
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ...
## Resampling results across tuning parameters:
##
## k RMSE Rsquared MAE
## 5 3.617103 0.4815238 2.932357
## 7 3.469391 0.5268823 2.827466
## 9 3.404273 0.5542161 2.764704
## 11 3.367277 0.5745575 2.726791
## 13 3.313918 0.6022923 2.681996
## 15 3.310264 0.6142757 2.687478
## 17 3.308316 0.6266659 2.686031
## 19 3.306431 0.6392056 2.690283
## 21 3.317481 0.6435421 2.700244
## 23 3.323616 0.6521553 2.708896
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 19.
## RMSE Rsquared MAE
## 3.2286834 0.6871735 2.5939727
The RMSE value we got is 3.23, \[R^2\] is .69, and MAE is 2.59. Next we will tune some MARS models and compare.
## Selected 12 of 18 terms, and 6 of 10 predictors
## Termination condition: Reached nk 21
## Importance: X1, X4, X2, X5, X3, X6, X7-unused, X8-unused, X9-unused, ...
## Number of terms at each degree of interaction: 1 11 (additive model)
## GCV 2.540556 RSS 397.9654 GRSq 0.8968524 RSq 0.9183982
We see that 12 of 18 terms and 6 or 10 predictors are selected. For importance we have X1, X4, X2, X5, X6, X7 through X10 were unused. Let’s take a look at the summary to get more extensive output
## Call: earth(x=trainingData$x, y=trainingData$y)
##
## coefficients
## (Intercept) 18.451984
## h(0.621722-X1) -11.074396
## h(0.601063-X2) -10.744225
## h(X3-0.281766) 20.607853
## h(0.447442-X3) 17.880232
## h(X3-0.447442) -23.282007
## h(X3-0.636458) 15.150350
## h(0.734892-X4) -10.027487
## h(X4-0.734892) 9.092045
## h(0.850094-X5) -4.723407
## h(X5-0.850094) 10.832932
## h(X6-0.361791) -1.956821
##
## Selected 12 of 18 terms, and 6 of 10 predictors
## Termination condition: Reached nk 21
## Importance: X1, X4, X2, X5, X3, X6, X7-unused, X8-unused, X9-unused, ...
## Number of terms at each degree of interaction: 1 11 (additive model)
## GCV 2.540556 RSS 397.9654 GRSq 0.8968524 RSq 0.9183982
With summary, we can see the coefficients and intercepts for this model. We see the hinge functions for each of the predictors. For example, for X1, the hinge funtion is h(.62 - x1). Now, let’s tune the model using external resampling with the train function.
## Multivariate Adaptive Regression Spline
##
## 200 samples
## 10 predictor
##
## Pre-processing: centered (10), scaled (10)
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ...
## Resampling results across tuning parameters:
##
## degree nprune RMSE Rsquared MAE
## 1 2 4.461735 0.2126864 3.697085
## 1 4 2.839751 0.6758087 2.283042
## 1 6 2.372350 0.7734818 1.883979
## 1 8 1.866813 0.8598841 1.454385
## 1 10 1.819863 0.8665983 1.399155
## 1 12 1.823581 0.8663811 1.397191
## 1 14 1.841520 0.8636975 1.410815
## 2 2 4.461735 0.2126864 3.697085
## 2 4 2.886416 0.6657159 2.323079
## 2 6 2.347399 0.7772242 1.839180
## 2 8 1.933750 0.8493698 1.504360
## 2 10 1.646725 0.8893854 1.279194
## 2 12 1.534422 0.9047396 1.197307
## 2 14 1.508731 0.9083735 1.174498
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 14 and degree = 2.
## RMSE Rsquared MAE
## 1.2779993 0.9338365 1.0147070
## earth variable importance
##
## Overall
## X1 100.00
## X4 84.98
## X2 68.87
## X5 48.55
## X3 38.96
## X8 0.00
## X6 0.00
## X7 0.00
## X10 0.00
## X9 0.00
We were able to tune the MARS model and check for variable importance. This tuned model had X1 as most important, X4, X2, X5 and X3 are the other vairables that were important. The RMSE value is 1.28, \[R^2\] IS .94 and MAE is 1.01. This RMSE value is better than the 3.2 that we got with KNN. I will try a SVM model next.
## Support Vector Machines with Radial Basis Function Kernel
##
## 200 samples
## 10 predictor
##
## Pre-processing: centered (10), scaled (10)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
## Resampling results across tuning parameters:
##
## C RMSE Rsquared MAE
## 0.25 2.488560 0.8070561 1.986379
## 0.50 2.248461 0.8168043 1.796747
## 1.00 2.061844 0.8397372 1.644829
## 2.00 1.941198 0.8547226 1.525840
## 4.00 1.885180 0.8632244 1.491907
## 8.00 1.873598 0.8642778 1.492104
## 16.00 1.879064 0.8632513 1.498301
## 32.00 1.879064 0.8632513 1.498301
## 64.00 1.879064 0.8632513 1.498301
## 128.00 1.879064 0.8632513 1.498301
## 256.00 1.879064 0.8632513 1.498301
## 512.00 1.879064 0.8632513 1.498301
## 1024.00 1.879064 0.8632513 1.498301
## 2048.00 1.879064 0.8632513 1.498301
##
## Tuning parameter 'sigma' was held constant at a value of 0.07022076
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.07022076 and C = 8.
Let’s take a look at the finalModel, which contains the model created by the ksvm function.
## Support Vector Machine object of class "ksvm"
##
## SV type: eps-svr (regression)
## parameter : epsilon = 0.1 cost C = 8
##
## Gaussian Radial Basis kernel function.
## Hyperparameter : sigma = 0.0702207565619088
##
## Number of Support Vectors : 155
##
## Objective Function Value : -63.1839
## Training error : 0.008775
The model uses 155 training set data points as support vectors. We will now complete out predications then, look at variable importance.
## RMSE Rsquared MAE
## 2.0856664 0.8239632 1.5849823
The RMSE is 2.08, \[R^2\] is .82 and MAE is 1.58. The RMSE value is better than KNN but not as good as MARS.
## loess r-squared variable importance
##
## Overall
## X4 100.0000
## X1 95.5047
## X2 89.6186
## X5 45.2170
## X3 29.9330
## X9 6.3299
## X10 5.5182
## X8 3.2527
## X6 0.8884
## X7 0.0000
Variable importance is ranked X4, X1, X2, X5, X3, X9, X10, X8, X6, X7.
Overall, it appears that our MARS model performed the best as it had the lowest RMSE. Our MARS model did infact, select variables X1-X5 as being important.
7.5
Exercise 6.3 describes data for a chemical manufacturing process. Use the same data imputations, data splitting, and pre-processing steps as before and train several non-linear regression models.
I first subset my data into a dataframe of the predictor variables and another for the response/yield variable, then into test and train sets. Before, I used the mice package to handle my inputation but I have realized that there is a way to take care of all of my preprocessing in a quicker way. The preProcess function from the caret package allows me to impute, scale, center and perform a BoxCox transformation in one swoop. I also removed highly correlated data and near zero values. Now that we have completed our data preprocessing and splitting, we will perform a knn, neural network, MARS, svm models.
## k-Nearest Neighbors
##
## 132 samples
## 47 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 132, 132, 132, 132, 132, 132, ...
## Resampling results across tuning parameters:
##
## k RMSE Rsquared MAE
## 5 1.471155 0.3970451 1.185740
## 7 1.473831 0.3952187 1.188562
## 9 1.466505 0.3984768 1.180142
## 11 1.468380 0.3982967 1.185868
## 13 1.468671 0.3995290 1.186408
## 15 1.458793 0.4119094 1.179862
## 17 1.448414 0.4230017 1.168782
## 19 1.448308 0.4284784 1.166523
## 21 1.452348 0.4301752 1.172608
## 23 1.452411 0.4374085 1.170903
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 19.
From the output, the best RMSE value for knn is when k is 19.
## RMSE Rsquared MAE
## 1.3398772 0.3645254 1.0810766
## loess r-squared variable importance
##
## only 20 most important variables shown (out of 47)
##
## Overall
## ManufacturingProcess17 100.00
## ManufacturingProcess13 96.37
## ManufacturingProcess32 88.50
## ManufacturingProcess09 79.09
## BiologicalMaterial06 75.17
## ManufacturingProcess36 70.24
## BiologicalMaterial03 69.76
## ManufacturingProcess06 64.01
## ManufacturingProcess11 60.76
## BiologicalMaterial11 53.96
## BiologicalMaterial08 50.38
## BiologicalMaterial04 50.15
## ManufacturingProcess30 47.47
## ManufacturingProcess33 47.33
## ManufacturingProcess12 39.60
## BiologicalMaterial01 39.21
## BiologicalMaterial09 35.53
## BiologicalMaterial10 27.33
## ManufacturingProcess15 26.57
## ManufacturingProcess26 23.82
The RMSE value is 1.34, RSquared .36 and MAE is 1.08. Next to a basisic neural network model, as our data has been scaled.
## RMSE Rsquared MAE
## 2.4705065 0.1318926 2.0825607
## loess r-squared variable importance
##
## only 20 most important variables shown (out of 47)
##
## Overall
## ManufacturingProcess17 100.00
## ManufacturingProcess13 96.37
## ManufacturingProcess32 88.50
## ManufacturingProcess09 79.09
## BiologicalMaterial06 75.17
## ManufacturingProcess36 70.24
## BiologicalMaterial03 69.76
## ManufacturingProcess06 64.01
## ManufacturingProcess11 60.76
## BiologicalMaterial11 53.96
## BiologicalMaterial08 50.38
## BiologicalMaterial04 50.15
## ManufacturingProcess30 47.47
## ManufacturingProcess33 47.33
## ManufacturingProcess12 39.60
## BiologicalMaterial01 39.21
## BiologicalMaterial09 35.53
## BiologicalMaterial10 27.33
## ManufacturingProcess15 26.57
## ManufacturingProcess26 23.82
The metrics for the nnet model were worse than for the knn model with RMSE at 2.47, Rsquared at .13 and MAE of 2.08. Let’s take a look at out MARS model then svm.