Do problems 7.2 and 7.5 in Kuhn and Johnson.

Problem 7.2

Prompt

Friedman (1991) introduced several benchmark data sets create by simulation. One of these simulations used the following nonlinear equation to create data:

y =10 sin(\(\pi\) \(x_{1}\) \(x_{2}\)) + 20 ( \(x_{3}\) −0.5)^2 +10 \(x_{4}\) +5 \(x_{5}\) +N(0,\(\sigma\)^2)

where the x values are random variables uniformly distributed between [0, 1] (there are also 5 other non-informative variables also created in the simulation). The package mlbench contains a function called mlbench.friedman1 that simulates these data:

library(mlbench)
set.seed(200)
trainingData <- mlbench.friedman1(200, sd = 1)
## We convert the 'x' data from a matrix to a data frame
## One reason is that this will give the columns names.
trainingData$x <- data.frame(trainingData$x)
## Look at the data using
featurePlot(trainingData$x, trainingData$y)

## or other methods.

## This creates a list with a vector 'y' and a matrix
## of predictors 'x'. Also simulate a large test set to
## estimate the true error rate with good precision:
testData <- mlbench.friedman1(5000, sd = 1)
testData$x <- data.frame(testData$x)

Question

Which models appear to give the best performance? Does MARS select the informative predictors (those named X1–X5)?

(starting with KKN)

KNN (Given)

Given: Tune several models on these data. For example:

set.seed(200)
library(caret)

# I altered the given knn code here to use the same Ctrl for this and other models in this exercise
ctrl_friedman <- trainControl(method = "boot", number = 25)

knnModel <- train(x = trainingData$x, y = trainingData$y, method = "knn", 
                  preProc = c("center", "scale"), 
                  tuneLength = 10,
                  trControl = ctrl_friedman
                  )
knnModel
## k-Nearest Neighbors 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE      Rsquared   MAE     
##    5  3.654912  0.4779838  2.958475
##    7  3.529432  0.5118581  2.861742
##    9  3.446330  0.5425096  2.780756
##   11  3.378049  0.5723793  2.719410
##   13  3.332339  0.5953773  2.692863
##   15  3.309235  0.6111389  2.663046
##   17  3.317408  0.6201421  2.678898
##   19  3.311667  0.6333800  2.682098
##   21  3.316340  0.6407537  2.688887
##   23  3.326040  0.6491480  2.705915
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 15.
set.seed(200)
knnPred <- predict(knnModel, newdata = testData$x)
## The function 'postResample' can be used to get the test set
## performance values
postResample(pred = knnPred, obs = testData$y)
##      RMSE  Rsquared       MAE 
## 3.1750657 0.6785946 2.5443169

End of Given.

varImp(knnModel)
## loess r-squared variable importance
## 
##      Overall
## X4  100.0000
## X1   95.5047
## X2   89.6186
## X5   45.2170
## X3   29.9330
## X9    6.3299
## X10   5.5182
## X8    3.2527
## X6    0.8884
## X7    0.0000

The prompt instructed to “Tune several models on these data.” and it handed me the KNN code. I’ll also do Neural Network, MARS, and SVM.

Neural Network

set.seed(200)

# 1 to 10 hidden units (.size)
# 3 levels of weight decay (.decay)
nnetGrid <- expand.grid(.decay = c(0, 0.01, 0.1),
                        .size = c(1:10),
                        .bag = FALSE)

# Calc the max number of weights for the network
# formula = H * (P + 1) + H + 1.
# H = max hidden units (10), P = number of predictors.
nnet_maxnwts <- 10 * (ncol(trainingData$x) + 1) + 10 + 1

nnetModel <- train(x = trainingData$x, 
                   y = trainingData$y,
                   method = "avNNet", #nnet", # Orig. I was using nnet, but after re-referencing the testbook I opted for avg
                   tuneGrid = nnetGrid,
                   preProc = c("center", "scale"),  
                   linout = TRUE,                  
                   trace = FALSE,                
                   maxit = 500,                    
                   MaxNWts = nnet_maxnwts,
                   trControl = ctrl_friedman)          
## Warning: executing %dopar% sequentially: no parallel backend registered
nnetModel
## Model Averaged Neural Network 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
## Resampling results across tuning parameters:
## 
##   decay  size  RMSE      Rsquared   MAE     
##   0.00    1    2.637077  0.7332729  2.066757
##   0.00    2    2.640423  0.7279464  2.087866
##   0.00    3    2.555596  0.7550349  1.965609
##   0.00    4    2.959883  0.6920833  2.145818
##   0.00    5    4.052500  0.5635305  2.739857
##   0.00    6    4.738139  0.4979767  3.113756
##   0.00    7    6.384782  0.4000771  3.861728
##   0.00    8    5.162968  0.4502105  3.330547
##   0.00    9    2.980724  0.6819868  2.272371
##   0.00   10    3.116202  0.6617214  2.343142
##   0.01    1    2.594008  0.7376449  2.019703
##   0.01    2    2.653338  0.7253689  2.086926
##   0.01    3    2.375266  0.7801808  1.855903
##   0.01    4    2.383978  0.7806968  1.874352
##   0.01    5    2.603854  0.7430974  2.042697
##   0.01    6    2.764637  0.7195195  2.174964
##   0.01    7    2.734356  0.7254255  2.183603
##   0.01    8    2.695449  0.7283454  2.113154
##   0.01    9    2.688533  0.7227853  2.140330
##   0.01   10    2.751950  0.7118557  2.172988
##   0.10    1    2.607491  0.7347228  2.025417
##   0.10    2    2.627592  0.7300560  2.075675
##   0.10    3    2.390372  0.7776017  1.889988
##   0.10    4    2.425665  0.7715648  1.903991
##   0.10    5    2.453306  0.7665766  1.916573
##   0.10    6    2.547994  0.7528392  1.999993
##   0.10    7    2.591949  0.7473735  2.019356
##   0.10    8    2.525499  0.7543117  1.975703
##   0.10    9    2.485213  0.7579695  1.964294
##   0.10   10    2.595296  0.7423659  2.060157
## 
## Tuning parameter 'bag' was held constant at a value of FALSE
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 3, decay = 0.01 and bag = FALSE.
# predict + eval
nnetPred <- predict(nnetModel, newdata = testData$x) 
postResample(pred = nnetPred, obs = testData$y)
##      RMSE  Rsquared       MAE 
## 2.1852336 0.8096827 1.6280847
varImp(nnetModel)
## loess r-squared variable importance
## 
##      Overall
## X4  100.0000
## X1   95.5047
## X2   89.6186
## X5   45.2170
## X3   29.9330
## X9    6.3299
## X10   5.5182
## X8    3.2527
## X6    0.8884
## X7    0.0000

MARS

set.seed(200)
# Train MARS model
marsModel <- train(x = trainingData$x, 
                   y = trainingData$y,
                   method = "earth",
                   tuneGrid = expand.grid(degree = 1:2, nprune = 2:20),
                   preProc = c("center", "scale"),
                   tuneLength = 10,
                   trControl = ctrl_friedman)

marsModel
## Multivariate Adaptive Regression Spline 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
## Resampling results across tuning parameters:
## 
##   degree  nprune  RMSE      Rsquared   MAE     
##   1        2      4.447045  0.2249607  3.650128
##   1        3      3.744821  0.4546610  3.019175
##   1        4      2.828643  0.6892908  2.244131
##   1        5      2.524326  0.7516356  2.027435
##   1        6      2.406670  0.7747079  1.906733
##   1        7      2.027113  0.8375721  1.594956
##   1        8      1.874633  0.8618476  1.474219
##   1        9      1.800794  0.8728377  1.411703
##   1       10      1.810047  0.8721377  1.412023
##   1       11      1.821314  0.8714221  1.427124
##   1       12      1.831608  0.8700790  1.430044
##   1       13      1.839717  0.8686550  1.440537
##   1       14      1.849381  0.8672327  1.450876
##   1       15      1.856211  0.8663787  1.452430
##   1       16      1.857086  0.8661612  1.454255
##   1       17      1.853742  0.8667095  1.452920
##   1       18      1.853742  0.8667095  1.452920
##   1       19      1.853742  0.8667095  1.452920
##   1       20      1.853742  0.8667095  1.452920
##   2        2      4.447780  0.2248695  3.650597
##   2        3      3.737891  0.4543357  3.018103
##   2        4      2.854288  0.6832049  2.259488
##   2        5      2.513582  0.7550084  2.004730
##   2        6      2.387478  0.7799585  1.889787
##   2        7      2.044028  0.8354683  1.615415
##   2        8      1.910896  0.8568917  1.500375
##   2        9      1.810765  0.8703004  1.404288
##   2       10      1.677078  0.8885385  1.321634
##   2       11      1.561012  0.9045745  1.234778
##   2       12      1.503867  0.9112625  1.183593
##   2       13      1.507992  0.9112557  1.172444
##   2       14      1.505298  0.9114749  1.171595
##   2       15      1.527789  0.9091635  1.188885
##   2       16      1.532851  0.9082380  1.188192
##   2       17      1.551046  0.9061881  1.199236
##   2       18      1.566120  0.9047152  1.213000
##   2       19      1.574242  0.9036555  1.217297
##   2       20      1.574242  0.9036555  1.217297
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 12 and degree = 2.
# Predict + eval
marsPred <- predict(marsModel, newdata = testData$x)
postResample(pred = marsPred, obs = testData$y)
##      RMSE  Rsquared       MAE 
## 1.3227340 0.9291489 1.0524686
varImp(marsModel)
## earth variable importance
## 
##    Overall
## X1  100.00
## X4   75.40
## X2   49.00
## X5   15.72
## X3    0.00

SVM

set.seed(200)
# Train SVM model 
svmModel <- train(x = trainingData$x, 
                  y = trainingData$y,
                  method = "svmRadial",
                  preProc = c("center", "scale"),
                  tuneLength = 10,
                  trControl = ctrl_friedman)

svmModel
## Support Vector Machines with Radial Basis Function Kernel 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
## Resampling results across tuning parameters:
## 
##   C       RMSE      Rsquared   MAE     
##     0.25  2.635010  0.7685188  2.074977
##     0.50  2.423373  0.7839086  1.902162
##     1.00  2.284133  0.8001542  1.791776
##     2.00  2.196624  0.8126474  1.713560
##     4.00  2.143035  0.8209820  1.668024
##     8.00  2.119156  0.8246312  1.649384
##    16.00  2.117438  0.8248677  1.648568
##    32.00  2.117438  0.8248677  1.648568
##    64.00  2.117438  0.8248677  1.648568
##   128.00  2.117438  0.8248677  1.648568
## 
## Tuning parameter 'sigma' was held constant at a value of 0.06299324
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.06299324 and C = 16.
# Predict + eval
svmPred <- predict(svmModel, newdata = testData$x)
postResample(pred = svmPred, obs = testData$y)
##      RMSE  Rsquared       MAE 
## 2.0736997 0.8256573 1.5751967
varImp(svmModel)
## loess r-squared variable importance
## 
##      Overall
## X4  100.0000
## X1   95.5047
## X2   89.6186
## X5   45.2170
## X3   29.9330
## X9    6.3299
## X10   5.5182
## X8    3.2527
## X6    0.8884
## X7    0.0000

Question Reiterated: Which models appear to give the best performance? Does MARS select the informative predictors (those named X1–X5)?

Recapping the models I tested:

# Calculate test set performance for all models
knn_test  <- postResample(pred = knnPred, obs = testData$y)
nnet_test <- postResample(pred = nnetPred, obs = testData$y)
mars_test <- postResample(pred = marsPred, obs = testData$y)
svm_test  <- postResample(pred = svmPred, obs = testData$y)

# Bind the test set performance metrics together
model_comparison <- bind_rows(
  "MARS" = mars_test,
  "Neural Network" = nnet_test,
  "SVM" = svm_test,
  "KNN" = knn_test,
  .id = "Model"
) %>%
  arrange(RMSE) # Sort by lowest Test RMSE

# Display table
kable(model_comparison, 
      digits = 3, 
      caption = "Final Test Set Model Performance Comparison",
      align = 'lccc')
Final Test Set Model Performance Comparison
Model RMSE Rsquared MAE
MARS 1.323 0.929 1.052
SVM 2.074 0.826 1.575
Neural Network 2.185 0.810 1.628
KNN 3.175 0.679 2.544

It looks like the MARS model gave the best performance since it has the lowest RMSE and highest R-squared.

I also looked at the informative predictors of each model as I went, and MARS selected the informative predictors (X1-X5).

When I first saw this, the MARS model performing this much better than the others almost seemed concerning, but it actually makes sense since the algorithm is basically tailor-made to find the types of patterns that were used to generate this data.

Problem 7.5

Exercise 6.3 describes data for a chemical manufacturing process. Use the same data imputation, data splitting, and pre-processing steps as before and train several nonlinear regression models.

Setup

library(AppliedPredictiveModeling)
data(ChemicalManufacturingProcess)

dim(ChemicalManufacturingProcess) # 176, 58
## [1] 176  58
# split train v. test
set.seed(200) 
train_indices <- createDataPartition(ChemicalManufacturingProcess$Yield, p = 0.8, list = FALSE)

train_data <- ChemicalManufacturingProcess[train_indices, ]
test_data  <- ChemicalManufacturingProcess[-train_indices, ]

# Split Predictors (x) and Target (y)
x_train <- train_data[, -1]
y_train <- train_data$Yield
x_test  <- test_data[, -1]
y_test  <- test_data$Yield

# Note: Imputation and transformations are handled inside the train() 
# function to prevent data leakage across CV folds.

I set up my pre-processing parameters in the cross-validation later on, which will handle imputation and YeoJohnson (instead of box-cox due to the presence of 0s in the data) transformations where they are necessary on the predictors. I want to avoid data leakage overall, which is why I am doing it this way. But I admittedly went back and forth on if the imputation should still happen prior to the 10-fold cross-validation or not, since running it both ways did seem to effect my “best” performing model. I finally landed on this because is was the cleanest in terms of data leakage.

# 10-fold cross-validation
ctrl_chem <- trainControl(method = "cv", number = 10)
# I used boot for the prev exercise, but I'm opting to use cross validation for this one

I used bootstrapping for the previous exercise’s control parameter. However, I’m using 10-fold cross-validation for this exercise because there is less data (176 rows), and with that little, retaining 90% of the data per fold seemed much more valuable. I didn’t want these models to have too little data to work with effectively.

Model Training

I’m training the four primary nonlinear models covered in Chapter 7 (the same ones I did in the previous section) -

  • KNN
  • Neural Networks (the model-averaged version)
  • MARS
  • SVM (radial)
# repeating similar model work from the previous exercise, 
# but I'm combining the modeling step into one chunk for all models
# to make the chunks break ups a little less lengthy for myself

# K-Nearest Neighbors
set.seed(200)
knnModel_chem <- train(x = x_train, y = y_train, 
                       method = "knn",  
                       preProc = c("nzv", "knnImpute", "center", "scale", "YeoJohnson"),
                       tuneLength = 10, 
                       trControl = ctrl_chem)

# Neural Network (Model Averaged)
set.seed(200)
nnetModel_chem <- train(x = x_train, y = y_train, 
                        method = "avNNet",  
                        preProc = c("nzv", "knnImpute", "center", "scale", "YeoJohnson"),
                        tuneLength = 5,
                        linout = TRUE, 
                        trace = FALSE, 
                        maxit = 500,
                        trControl = ctrl_chem)

# MARS 
set.seed(200)
marsGrid_chem <- expand.grid(degree = 1:2, nprune = 2:20)
marsModel_chem <- train(x = x_train, y = y_train, 
                        method = "earth",   
                        preProc = c("nzv", "knnImpute", "center", "scale", "YeoJohnson"),
                        tuneGrid = marsGrid_chem, 
                        trControl = ctrl_chem)

# SVM (Support Vector Machine, radial)
set.seed(200)
svmModel_chem <- train(x = x_train, y = y_train, 
                       method = "svmRadial", 
                       preProc = c("nzv", "knnImpute", "center", "scale", "YeoJohnson"),  
                       tuneLength = 10, 
                       trControl = ctrl_chem)

I’ll actually look at the performance of these models in the sections below.

a

  1. Which nonlinear regression model gives the optimal resampling and test set performance?
# Predict on test set using raw x_test
knnPred_chem  <- predict(knnModel_chem, newdata = x_test)
nnetPred_chem <- predict(nnetModel_chem, newdata = x_test)
marsPred_chem <- predict(marsModel_chem, newdata = x_test)
svmPred_chem  <- predict(svmModel_chem, newdata = x_test)

# Calc performance metrics
knn_test  <- postResample(pred = knnPred_chem, obs = y_test)
nnet_test <- postResample(pred = nnetPred_chem, obs = y_test)
mars_test <- postResample(pred = marsPred_chem, obs = y_test)
svm_test  <- postResample(pred = svmPred_chem, obs = y_test)

# Pull the optimal training metrics
train_perf <- bind_rows(
  "KNN"  = getTrainPerf(knnModel_chem),
  "NNET" = getTrainPerf(nnetModel_chem),
  "MARS" = getTrainPerf(marsModel_chem),
  "SVM"  = getTrainPerf(svmModel_chem),
  .id = "Model"
) %>%
  select(Model, Train_RMSE = TrainRMSE, Train_Rsquared = TrainRsquared)

# Combuine metrics for display
test_perf <- bind_rows(
  "KNN"  = knn_test,
  "NNET" = nnet_test,
  "MARS" = mars_test,
  "SVM"  = svm_test,
  .id = "Model"
) %>%
  select(Model, Test_RMSE = RMSE, Test_Rsquared = Rsquared)

# Merge and display
final_chem_comparison <- left_join(train_perf, test_perf, by = "Model") %>%
  arrange(Test_RMSE)

kable(final_chem_comparison, 
      digits = 3, 
      caption = "Chemical Manufacturing: Resampling vs. Test Set Performance",
      align = 'lcccc')
Chemical Manufacturing: Resampling vs. Test Set Performance
Model Train_RMSE Train_Rsquared Test_RMSE Test_Rsquared
MARS 1.199 0.586 1.279 0.594
SVM 1.112 0.650 1.314 0.645
NNET 1.349 0.453 1.502 0.494
KNN 1.253 0.540 1.586 0.455

To be perfectly honest, I was hoping for better results. But that aside, looking at the final performance table, the MARS model is the best performing with the lowest test RMSE and second to highest R-squared (which improved from training to testing. SVM is a close contender though, with a slightly higher test RMSE but slightly higher R squared as well. However, SVM’s R-squared worsened from training to test (which might imply some overfitting).

I have some opinions about what was actually the best model that I go into in part b as well.

b

  1. Which predictors are most important in the optimal nonlinear regression model? Do either the biological or process variables dominate the list? How do the top ten important predictors compare to the top ten predictors from the optimal linear model?
# variable importance for the NNET model
svm_importance <- varImp(svmModel_chem)
svm_importance 
## loess r-squared variable importance
## 
##   only 20 most important variables shown (out of 57)
## 
##                        Overall
## ManufacturingProcess32  100.00
## BiologicalMaterial06     87.80
## ManufacturingProcess13   78.22
## BiologicalMaterial03     76.44
## ManufacturingProcess36   72.00
## ManufacturingProcess31   70.73
## BiologicalMaterial12     69.15
## ManufacturingProcess17   68.43
## ManufacturingProcess11   65.12
## ManufacturingProcess09   64.10
## ManufacturingProcess06   62.46
## BiologicalMaterial02     53.48
## ManufacturingProcess33   50.50
## BiologicalMaterial11     48.64
## ManufacturingProcess30   47.71
## ManufacturingProcess29   44.85
## ManufacturingProcess10   42.81
## ManufacturingProcess12   38.46
## BiologicalMaterial09     38.31
## BiologicalMaterial04     37.86
# plot
plot(svm_importance, top = 10, main = "Top 10 Predictors (Close to Optimal SVM Model)")

# variable importance for the MARS model
mars_importance <- varImp(marsModel_chem)
mars_importance 
## earth variable importance
## 
##                        Overall
## ManufacturingProcess32     100
## ManufacturingProcess09       0
# plot
plot(mars_importance, top = 10, main = "Top 10 Predictors (Optimal MARS Model)")

Based on the variable importance in the SVM model, the most important predictor is ManufacturingProcess32, second place goes to BiologicalMaterial06.

Neither category dominates the top of the list, but Bio material has a stronger hold in the top 5, and Manufacturing has a strong hold in the top 10.

This seems like a shift worth mentioning from the optimal linear model (PLS) that I did in a previous homework, where in the linear model, manufacturing process variables heavily dominated the top tier of predictors, while biological variables were pushed further down the list. The SVM model’s rankings suggest that the raw biological materials have complex, nonlinear relationships with the final yield that the linear model was perhaps unable to capture.

I also want to call out that for MARS, the most important predictor is ManufacturingProcess32, and second place again went to BiologicalMaterial06. However, these are the only two surviving predictors for MARS. (I genuinly have gone over this many times trying to figure out if I may have done something wrong. Please let me know if I have.) I believe that MARS, which can pretty ruthlessley trim predictors, trimmed down to just the top two predictors. I personally, based on this and based on the previous model performance comparison, take SVM to be a better model, because from a domain perspective, I think it is highly improbable that a complex chemical manufacturing process with 57 biological and mechanical variables is solely driven by two factors.

c

  1. Explore the relationships between the top predictors and the response for the predictors that are unique to the optimal nonlinear regression model. Do these plots reveal intuition about the biological or process predictors and their relationship with yield?

I went back to my previous homework and pulled this list: PLS Top 10 (Linear) (prev hw): MP32, MP36, MP09, MP13, MP17, BM02, BM06, MP06, BM03, MP11. SVM Top 10 (Nonlinear): MP32, BM06, MP13, BM03, MP36, MP31, BM12, MP17, MP11, MP09.

The two predictors unique to the optimal nonlinear model’s top ten are BiologicalMaterial12 and ManufacturingProcess31.

# Extracting the unique predictors identified in the comparison 
p1 <- ggplot(ChemicalManufacturingProcess, aes(x = BiologicalMaterial12, y = Yield)) +
  geom_point(alpha = 0.5, color = "darkgray") +
  geom_smooth(method = "loess", color = "steelblue", formula = y ~ x) +
  theme_minimal() +
  labs(title = "Yield vs. BiologicalMaterial12",
       x = "Biological Material 12",
       y = "Yield")

p2 <- ggplot(ChemicalManufacturingProcess, aes(x = ManufacturingProcess31, y = Yield)) +
  geom_point(alpha = 0.5, color = "darkgray") +
  geom_smooth(method = "loess", color = "darkred", formula = y ~ x) +
  theme_minimal() +
  coord_cartesian(xlim = c(50, 80)) +  # Replace 0 and 10 with your desired min and max
  labs(title = "Yield vs. ManufacturingProcess31",
       x = "Manufacturing Process 31",
       y = "Yield")

# Display plots  
grid.arrange(p1, p2, ncol = 2)
## Warning: Removed 5 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_point()`).

By isolating the predictors unique to the nonlinear model (BiologicalMaterial12 and ManufacturingProcess31), I can see that these relationships are definitely not straight lines, and would be hard for a linear model.

BiologicalMaterial12 shows a curvy line mapped over it, but it does look surprisingly like noise when just looking at the data points underneath.

ManufacturingProcess31 has a cluster of data points, and a pretty clear maximum value for the predictor as well. Some machine setting probably have hard caps like this, and the nonlinear model potentially recognizes these operational limits and thresholds, which is why it prioritized these this variable to get some better predictive performance.