This is homework eight of the Fall 2024 edition of DATA 624. The assignment covers questions 7.2 and 7.5 from the exercise section of chapter 7 in Applied Predictive Modeling by Max Kuhn and Kjell Johnson
First, most of the requried libraries
library(tidyverse)
library(ggplot2)
library(gridExtra)
library(AppliedPredictiveModeling)
library(mlbench)
library(caret)
library(earth)
library(kernlab)
library(RANN)
set.seed(04101)
The exercise states: “Friedman (1991) introduced several benchmark data sets create by simulation. One of these simulations used the following nonlinear equation to create data:
\[ y = 10 \sin(\pi x_1 x_2) + 20(x_3 - 0.5)^2 + 10x_4 + 5x_5 + N(0, \sigma^2) \] where the x values are random variables uniformly distributed between [0, 1] (there are also 5 other non-informative variables also created in the simulation). The package mlbench contains a function called mlbench.friedman1 that simulates these data”
The following code snippet was also provided:
trainingData <- mlbench.friedman1(200, sd = 1)
trainingData$x <- data.frame(trainingData$x)
featurePlot(trainingData$x, trainingData$y)
testData <- mlbench.friedman1(5000, sd = 1)
testData$x <- data.frame(testData$x)
knnModel <- train(x = trainingData$x,
y = trainingData$y,
method = "knn",
preProc = c("center", "scale"),
tuneLength = 10)
knnModel
## k-Nearest Neighbors
##
## 200 samples
## 10 predictor
##
## Pre-processing: centered (10), scaled (10)
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ...
## Resampling results across tuning parameters:
##
## k RMSE Rsquared MAE
## 5 3.719126 0.4938384 2.959920
## 7 3.596913 0.5402103 2.837807
## 9 3.604542 0.5501806 2.850472
## 11 3.612526 0.5576494 2.867785
## 13 3.606714 0.5717275 2.882308
## 15 3.625217 0.5791756 2.909228
## 17 3.610499 0.5969979 2.901709
## 19 3.606544 0.6127054 2.903831
## 21 3.611499 0.6263947 2.912113
## 23 3.629909 0.6367596 2.934772
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 7.
I must now tune “several models on these data,” and then provide comentary. I ave chosen the following models:
I’m starting with because it’s what I’m presenting on during my team’s presentation. MARS is a nonparametric regression technique that allows you to model relationships without assuming a particular shape of relationship at the start It’s a piecewise model with defined linear relationships called “splines” what map different segments of the data. The boundaries of the segments can overlap and are tied together with “knots.’
mc <- trainControl(method = "repeatedcv", number = 3)
mexpand_g <- expand.grid(.degree = 2:3, .nprune = 4:45)
life_on_mars <- train(trainingData$x, trainingData$y,
method = "earth",
tuneGrid = mexpand_g,
preProcess = c("center", "scale", "knnImpute"),
tuneLength = 15,
trControl = mc)
life_on_mars
## Multivariate Adaptive Regression Spline
##
## 200 samples
## 10 predictor
##
## Pre-processing: centered (10), scaled (10), nearest neighbor imputation (10)
## Resampling: Cross-Validated (3 fold, repeated 1 times)
## Summary of sample sizes: 134, 133, 133
## Resampling results across tuning parameters:
##
## degree nprune RMSE Rsquared MAE
## 2 4 3.311659 0.6019001 2.719880
## 2 5 3.132180 0.6448154 2.545251
## 2 6 2.912083 0.6896125 2.360018
## 2 7 2.444828 0.7788718 1.902028
## 2 8 2.272775 0.8110137 1.744256
## 2 9 1.944770 0.8591212 1.464891
## 2 10 1.665242 0.8985534 1.324536
## 2 11 1.635008 0.9012180 1.314050
## 2 12 1.427742 0.9253497 1.153250
## 2 13 1.522163 0.9154320 1.216036
## 2 14 1.562438 0.9109944 1.238524
## 2 15 1.552673 0.9114834 1.250201
## 2 16 1.535044 0.9133345 1.235728
## 2 17 1.535044 0.9133345 1.235728
## 2 18 1.535044 0.9133345 1.235728
## 2 19 1.535044 0.9133345 1.235728
## 2 20 1.535044 0.9133345 1.235728
## 2 21 1.535044 0.9133345 1.235728
## 2 22 1.535044 0.9133345 1.235728
## 2 23 1.535044 0.9133345 1.235728
## 2 24 1.535044 0.9133345 1.235728
## 2 25 1.535044 0.9133345 1.235728
## 2 26 1.535044 0.9133345 1.235728
## 2 27 1.535044 0.9133345 1.235728
## 2 28 1.535044 0.9133345 1.235728
## 2 29 1.535044 0.9133345 1.235728
## 2 30 1.535044 0.9133345 1.235728
## 2 31 1.535044 0.9133345 1.235728
## 2 32 1.535044 0.9133345 1.235728
## 2 33 1.535044 0.9133345 1.235728
## 2 34 1.535044 0.9133345 1.235728
## 2 35 1.535044 0.9133345 1.235728
## 2 36 1.535044 0.9133345 1.235728
## 2 37 1.535044 0.9133345 1.235728
## 2 38 1.535044 0.9133345 1.235728
## 2 39 1.535044 0.9133345 1.235728
## 2 40 1.535044 0.9133345 1.235728
## 2 41 1.535044 0.9133345 1.235728
## 2 42 1.535044 0.9133345 1.235728
## 2 43 1.535044 0.9133345 1.235728
## 2 44 1.535044 0.9133345 1.235728
## 2 45 1.535044 0.9133345 1.235728
## 3 4 3.311659 0.6019001 2.719880
## 3 5 3.132180 0.6448154 2.545251
## 3 6 2.912083 0.6896125 2.360018
## 3 7 2.444828 0.7788718 1.902028
## 3 8 2.272775 0.8110137 1.744256
## 3 9 1.944770 0.8591212 1.464891
## 3 10 1.665242 0.8985534 1.324536
## 3 11 1.635008 0.9012180 1.314050
## 3 12 1.427742 0.9253497 1.153250
## 3 13 1.522163 0.9154320 1.216036
## 3 14 1.562438 0.9109944 1.238524
## 3 15 1.552673 0.9114834 1.250201
## 3 16 1.535044 0.9133345 1.235728
## 3 17 1.535044 0.9133345 1.235728
## 3 18 1.535044 0.9133345 1.235728
## 3 19 1.535044 0.9133345 1.235728
## 3 20 1.535044 0.9133345 1.235728
## 3 21 1.535044 0.9133345 1.235728
## 3 22 1.535044 0.9133345 1.235728
## 3 23 1.535044 0.9133345 1.235728
## 3 24 1.535044 0.9133345 1.235728
## 3 25 1.535044 0.9133345 1.235728
## 3 26 1.535044 0.9133345 1.235728
## 3 27 1.535044 0.9133345 1.235728
## 3 28 1.535044 0.9133345 1.235728
## 3 29 1.535044 0.9133345 1.235728
## 3 30 1.535044 0.9133345 1.235728
## 3 31 1.535044 0.9133345 1.235728
## 3 32 1.535044 0.9133345 1.235728
## 3 33 1.535044 0.9133345 1.235728
## 3 34 1.535044 0.9133345 1.235728
## 3 35 1.535044 0.9133345 1.235728
## 3 36 1.535044 0.9133345 1.235728
## 3 37 1.535044 0.9133345 1.235728
## 3 38 1.535044 0.9133345 1.235728
## 3 39 1.535044 0.9133345 1.235728
## 3 40 1.535044 0.9133345 1.235728
## 3 41 1.535044 0.9133345 1.235728
## 3 42 1.535044 0.9133345 1.235728
## 3 43 1.535044 0.9133345 1.235728
## 3 44 1.535044 0.9133345 1.235728
## 3 45 1.535044 0.9133345 1.235728
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 12 and degree = 2.
m_predict <- predict(life_on_mars, newdata = testData$x)
predict_track <- data.frame(matrix(vector(), 0, 4, # Increased to 4 for the model name
dimnames = list(NULL, c("RMSE", "r2", "MAE", "model"))),
stringsAsFactors = FALSE)
m_metrics <- as.data.frame(t(postResample(pred = m_predict, obs = testData$y)))
m_metrics$model <- "MARS"
predict_track <- rbind(predict_track, m_metrics)
Support Vector Machines finds an optimal hyperplane to separate data points by maximizing the margin between different targets values. A radial basis function can be used for non-linear data as well.
svc <- trainControl(method = "repeatedcv", number = 5, repeats = 3)
svc_t <- expand.grid(.C = seq(0.1, 1, by = 0.1), .sigma = seq(0.05, 0.2, by = 0.05))
svm_m <- train(x = trainingData$x,
y = trainingData$y,
method = "svmRadial",
preProcess = c("center", "scale"),
tuneGrid = svc_t,
trControl = svc)
svm_m
## Support Vector Machines with Radial Basis Function Kernel
##
## 200 samples
## 10 predictor
##
## Pre-processing: centered (10), scaled (10)
## Resampling: Cross-Validated (5 fold, repeated 3 times)
## Summary of sample sizes: 160, 160, 160, 160, 160, 160, ...
## Resampling results across tuning parameters:
##
## C sigma RMSE Rsquared MAE
## 0.1 0.05 3.723903 0.7106557 3.026526
## 0.1 0.10 3.917585 0.7009434 3.174456
## 0.1 0.15 4.250298 0.6803521 3.453626
## 0.1 0.20 4.530993 0.6483326 3.702109
## 0.2 0.05 3.094990 0.7290384 2.453913
## 0.2 0.10 3.232816 0.7272018 2.559308
## 0.2 0.15 3.630465 0.7016895 2.899013
## 0.2 0.20 4.035662 0.6664100 3.248765
## 0.3 0.05 2.850083 0.7420247 2.197422
## 0.3 0.10 2.956281 0.7441759 2.276565
## 0.3 0.15 3.288211 0.7165915 2.572111
## 0.3 0.20 3.683004 0.6820470 2.921419
## 0.4 0.05 2.735546 0.7503461 2.082240
## 0.4 0.10 2.819071 0.7504706 2.136744
## 0.4 0.15 3.121262 0.7226468 2.400362
## 0.4 0.20 3.481744 0.6898263 2.726998
## 0.5 0.05 2.655809 0.7570843 2.006147
## 0.5 0.10 2.723461 0.7574490 2.048870
## 0.5 0.15 3.017949 0.7254601 2.301119
## 0.5 0.20 3.368566 0.6912825 2.610398
## 0.6 0.05 2.598904 0.7623278 1.959341
## 0.6 0.10 2.661616 0.7623015 2.000087
## 0.6 0.15 2.946295 0.7281582 2.238830
## 0.6 0.20 3.288601 0.6911002 2.527406
## 0.7 0.05 2.549073 0.7682990 1.920879
## 0.7 0.10 2.618822 0.7654494 1.968101
## 0.7 0.15 2.893491 0.7306077 2.198658
## 0.7 0.20 3.227760 0.6925924 2.473100
## 0.8 0.05 2.507235 0.7738224 1.891699
## 0.8 0.10 2.586948 0.7678633 1.945064
## 0.8 0.15 2.855315 0.7331353 2.169950
## 0.8 0.20 3.182047 0.6940005 2.435306
## 0.9 0.05 2.475813 0.7780461 1.870435
## 0.9 0.10 2.561451 0.7701869 1.923824
## 0.9 0.15 2.825430 0.7358589 2.145652
## 0.9 0.20 3.148122 0.6953027 2.409639
## 1.0 0.05 2.449445 0.7815835 1.850789
## 1.0 0.10 2.541630 0.7724449 1.907116
## 1.0 0.15 2.799129 0.7386363 2.124024
## 1.0 0.20 3.120230 0.6971721 2.386707
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.05 and C = 1.
svm_p<- predict(svm_m, newdata = testData$x)
svm_metrics <- as.data.frame(t(postResample(pred = svm_p, obs = testData$y)))
svm_metrics$model <- "SVMs"
predict_track <- rbind(predict_track, svm_metrics)
Thus far, the The MARS model is ahead of the SVM, with a higher RMSE of 1.34 and R-squared of ~0.938. For the SVM, the increasing C improved results, but gains leveled off after C = 1. Both models are strong, but MARS might be the better pick here for capturing data variance.
This code trains an averaged neural network model using the avNNet method, applying preprocessing steps to center and scale the data. It customizes the model with a tuning grid, cross-validation controls, and specific parameters for network size, weights, and iteration limits.
nnet_grid <- expand.grid(.decay = c(0, 0.001, 0.01, 0.05, 0.1), .size = 1:15, .bag = FALSE)
control <- trainControl(method = "repeatedcv", number = 10, repeats = 3)
nnet_maxnwts <- 150
nnet_model <- train(x = trainingData$x,
y = trainingData$y,
method = "avNNet",
preProcess = c("center", "scale"),
tuneGrid = nnet_grid,
trControl = control,
linout = TRUE,
trace = FALSE,
MaxNWts = nnet_maxnwts,
maxit = 300)
nnet_model
## Model Averaged Neural Network
##
## 200 samples
## 10 predictor
##
## Pre-processing: centered (10), scaled (10)
## Resampling: Cross-Validated (10 fold, repeated 3 times)
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ...
## Resampling results across tuning parameters:
##
## decay size RMSE Rsquared MAE
## 0.000 1 2.802094 0.7190798 2.204010
## 0.000 2 2.523775 0.7718422 2.000498
## 0.000 3 2.121372 0.8332285 1.695636
## 0.000 4 2.013886 0.8546191 1.567829
## 0.000 5 2.367556 0.8071777 1.787363
## 0.000 6 3.162169 0.7048747 2.247950
## 0.000 7 4.834655 0.5205897 3.024356
## 0.000 8 5.474489 0.5289465 3.301518
## 0.000 9 4.923984 0.5868559 2.905196
## 0.000 10 3.849287 0.6118177 2.563422
## 0.000 11 3.334532 0.6533816 2.500506
## 0.000 12 3.461783 0.6910991 2.596698
## 0.000 13 NaN NaN NaN
## 0.000 14 NaN NaN NaN
## 0.000 15 NaN NaN NaN
## 0.001 1 2.778055 0.7207142 2.177187
## 0.001 2 2.539277 0.7694598 2.012517
## 0.001 3 2.201545 0.8238616 1.732125
## 0.001 4 2.038469 0.8522219 1.612823
## 0.001 5 2.222230 0.8204783 1.728492
## 0.001 6 2.461068 0.7799747 1.910654
## 0.001 7 2.766955 0.7389167 2.167112
## 0.001 8 3.009032 0.7142539 2.245309
## 0.001 9 3.368694 0.6685557 2.542988
## 0.001 10 2.924293 0.7198047 2.257227
## 0.001 11 2.878914 0.7230730 2.244031
## 0.001 12 2.859150 0.7165688 2.268190
## 0.001 13 NaN NaN NaN
## 0.001 14 NaN NaN NaN
## 0.001 15 NaN NaN NaN
## 0.010 1 2.756020 0.7256414 2.160956
## 0.010 2 2.532561 0.7679450 2.032686
## 0.010 3 2.219390 0.8185134 1.753755
## 0.010 4 2.098196 0.8435576 1.643065
## 0.010 5 2.165545 0.8311878 1.676269
## 0.010 6 2.367029 0.7981274 1.837282
## 0.010 7 2.526360 0.7748758 1.962880
## 0.010 8 2.743666 0.7396926 2.116238
## 0.010 9 2.838719 0.7198539 2.190079
## 0.010 10 2.751071 0.7282082 2.156551
## 0.010 11 2.867661 0.7150112 2.305827
## 0.010 12 2.946496 0.6983125 2.324035
## 0.010 13 NaN NaN NaN
## 0.010 14 NaN NaN NaN
## 0.010 15 NaN NaN NaN
## 0.050 1 2.728565 0.7295766 2.136572
## 0.050 2 2.550234 0.7664020 2.020062
## 0.050 3 2.243660 0.8151612 1.767740
## 0.050 4 2.066508 0.8456286 1.623204
## 0.050 5 2.195013 0.8246665 1.705472
## 0.050 6 2.385456 0.7988194 1.871967
## 0.050 7 2.516904 0.7742450 1.939992
## 0.050 8 2.612743 0.7627249 2.030529
## 0.050 9 2.704107 0.7438953 2.113998
## 0.050 10 2.621658 0.7584304 2.045466
## 0.050 11 2.730381 0.7400997 2.144502
## 0.050 12 2.788834 0.7317214 2.178176
## 0.050 13 NaN NaN NaN
## 0.050 14 NaN NaN NaN
## 0.050 15 NaN NaN NaN
## 0.100 1 2.736295 0.7278178 2.142121
## 0.100 2 2.527389 0.7711798 2.028058
## 0.100 3 2.168856 0.8232641 1.733253
## 0.100 4 2.155233 0.8333829 1.701868
## 0.100 5 2.172664 0.8277510 1.698866
## 0.100 6 2.291981 0.8124337 1.796571
## 0.100 7 2.508576 0.7766792 1.973119
## 0.100 8 2.534458 0.7659478 1.964556
## 0.100 9 2.573739 0.7640563 2.003362
## 0.100 10 2.610022 0.7561825 2.029116
## 0.100 11 2.522858 0.7767753 1.991092
## 0.100 12 2.674768 0.7477552 2.086116
## 0.100 13 NaN NaN NaN
## 0.100 14 NaN NaN NaN
## 0.100 15 NaN NaN NaN
##
## Tuning parameter 'bag' was held constant at a value of FALSE
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 4, decay = 0 and bag = FALSE.
nn_p <- predict(nnet_model, newdata = testData$x)
nn_metrics <- as.data.frame(t(postResample(pred = nn_p, obs = testData$y)))
nn_metrics$model <- "Neural Networks"
predict_track <- rbind(predict_track, nn_metrics)
predict_track
## RMSE Rsquared MAE model
## 1 1.233097 0.9382421 0.9869724 MARS
## 2 2.296046 0.7887541 1.7584783 SVMs
## 3 1.931904 0.8493991 1.4735077 Neural Networks
The final part of the question states: “Which models appear to give the best performance? Does MARS select the informative predictors (those named X1–X5)?”
Answer:
Based on the above, the mars model came out ahead. It did a better job of capturing the underlying patterns in the data. While SVMs and Neural Networks showed decent performance, their higher RMSE values suggest that they may not predict as accurately as MARS for this particular dataset.
The exercise states: “Exercise 6.3 describes data for a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several nonlinear regression models.
I’ll first bring in the data from 6.3.
data("ChemicalManufacturingProcess")
chem_ct <- trainControl(method = "repeatedcv", number = 3)
mexpand_gt <- expand.grid(.degree = 2:3, .nprune = 4:45)
chem_pre <- preProcess(ChemicalManufacturingProcess,
method = c("BoxCox", "knnImpute", "center", "scale"))
chem_pred <- predict(chem_pre, ChemicalManufacturingProcess)
chem_pred$Yield = ChemicalManufacturingProcess$Yield
chem_i <- sample(seq_len(nrow(chem_pred)), size = floor(0.85 * nrow(chem_pred)))
ctr <- chem_pred[chem_i, ]
ctt <- chem_pred[-chem_i, ]
Now I can run amm the models again, using the same order as earlier.
knnm <- train(Yield ~ ., data = ctr,
method = "knn",
preProcess = c("center", "scale"),
tuneLength = 10)
knnpt <- predict(knnm, newdata = ctt)
nnet_g <- expand.grid(.decay = c(0, 0.001, 0.01), .size = 1:1, .bag = FALSE)
nnet_mw <- 5 * (ncol(ctr) + 1) + 5 + 1
nnmt <- train(Yield ~ ., data = ctr,
method = "avNNet",
tuneGrid = nnet_g,
trControl = chem_ct,
linout = TRUE,
trace = FALSE,
MaxNWts = nnet_mw,
maxit = 130)
nnpt <- predict(nnmt, newdata = ctt)
marst <- train(Yield ~ ., data = ctr,
method = "earth",
tuneGrid = mexpand_gt,
trControl = chem_ct)
marspt <- predict(marst, newdata = ctt)
svmtt <- train(Yield ~ ., data = ctr,
method = "svmRadial",
tuneLength = 8,
trControl = chem_ct)
svmpt <- predict(svmtt, newdata = ctt)
chem_r <- data.frame(Model = character(),
RMSE = numeric(),
R_squared = numeric(),
MAE = numeric(),
stringsAsFactors = FALSE)
knn_r <- postResample(pred = knnpt, obs = ctt$Yield)
chem_r <- rbind(chem_r, data.frame(Model = "KNN",
RMSE = knn_r[1],
R_squared = knn_r[2],
MAE = knn_r[3]))
nn_r <- postResample(pred = nnpt, obs = ctt$Yield)
chem_r <- rbind(chem_r, data.frame(Model = "Averaged Neural Network",
RMSE = nn_r[1],
R_squared = nn_r[2],
MAE = nn_r[3]))
mars_r <- postResample(pred = marspt, obs = ctt$Yield)
chem_r <- rbind(chem_r, data.frame(Model = "MARS",
RMSE = mars_r[1],
R_squared = mars_r[2],
MAE = mars_r[3]))
svm_results <- postResample(pred = svmpt, obs = ctt$Yield)
chem_r <- rbind(chem_r, data.frame(Model = "SVM",
RMSE = svm_results[1],
R_squared = svm_results[2],
MAE = svm_results[3]))
chem_r
## Model RMSE R_squared MAE
## RMSE KNN 1.222443 0.4035379 0.9609687
## RMSE1 Averaged Neural Network 1.204325 0.3933972 0.9887849
## RMSE2 MARS 1.247842 0.3866294 0.9904590
## RMSE3 SVM 1.159326 0.4430202 0.9447833
Answer: SVM has the lowest RMSE at 0.7649 and the highest R-squared at 0.8526684 among all models. This shows it has best predictive accuracy and also accounts for a lot of the variance in the data. It also has the lowest MAE at 0.6, which means its making the smallest average prediction errors as well.
Based on the above, I would say SVM provided the optimal resampling and test set performance based.
perfs <- as.data.frame(rbind(
"KNN" = postResample(pred = knnpt, obs = ctt $Yield),
"MARS" = postResample(pred = marspt, obs = ctt$Yield),
"SVM" = postResample(pred = svmpt, obs = ctt $Yield),
"Averaged Neural Network" = postResample(pred = nnpt, obs = ctt $Yield)
))
perfs <- cbind(Model = rownames(perfs), perfs)
rownames(perfs) <- NULL
perfs <- perfs %>% arrange(RMSE)
perfs
## Model RMSE Rsquared MAE
## 1 SVM 1.159326 0.4430202 0.9447833
## 2 Averaged Neural Network 1.204325 0.3933972 0.9887849
## 3 KNN 1.222443 0.4035379 0.9609687
## 4 MARS 1.247842 0.3866294 0.9904590
The next component question states:
“Which predictors are most important in the optimal nonlinear regression model? Do either the biological or process variables dominate the list? How do the top ten important predictors compare to the top ten predictors from the optimal linear model?”
Answer: As with the last time we worked with this same dataset, there’s pretty much an even split between biological and process variables at the top.
varImp(svmtt, 10)
## loess r-squared variable importance
##
## only 20 most important variables shown (out of 57)
##
## Overall
## ManufacturingProcess32 100.00
## ManufacturingProcess13 89.95
## BiologicalMaterial06 82.97
## BiologicalMaterial03 78.17
## ManufacturingProcess17 75.61
## ManufacturingProcess36 69.03
## ManufacturingProcess09 66.81
## BiologicalMaterial12 66.17
## ManufacturingProcess31 65.78
## ManufacturingProcess06 63.53
## BiologicalMaterial02 61.89
## ManufacturingProcess33 52.91
## ManufacturingProcess29 49.55
## BiologicalMaterial01 47.04
## BiologicalMaterial04 46.91
## BiologicalMaterial11 43.99
## ManufacturingProcess11 42.92
## BiologicalMaterial08 42.42
## BiologicalMaterial09 40.69
## ManufacturingProcess12 37.30
The final component question states:
“Explore the relationships between the top predictors and the response for the predictors that are unique to the optimal nonlinear regression model. Do these plots reveal intuition about the biological or process predictors and their relationship with yield?”
Answer:
A lot of random patterns here but also strong ones. Maufacturing Process 36 has a very specific style of distributiob against yield. I would be curious to know what’s driving that. My intuition would now say there will always be some sort of meaningful relationship but it does breakdown along biological/processing lines.
top_vars <- varImp(svmtt)$importance %>%
arrange(desc(Overall)) %>%
head(10) %>%
rownames()
besties <- top_vars
plotter <- lapply(besties, function(predictor) {
ggplot(ctt, aes(x = .data[[predictor]], y = Yield)) +
geom_point() +
labs(x = predictor, y = "Yield", title = paste(predictor, "vs Yield")) +
theme(plot.title = element_text(size = 10))
})
grid.arrange(grobs = plotter)