Exercise 7.2

This exercise uses the Friedman simulation data. The response is generated from a nonlinear equation where only the first five predictors are truly informative. The remaining predictors are noise variables. The goal is to tune several nonlinear regression models, compare their performance, and determine whether MARS selects the truly informative predictors.

Simulate the training and test data

set.seed(200)
trainingData <- mlbench.friedman1(200, sd = 1)
trainingData$x <- as.data.frame(trainingData$x)

set.seed(201)
testData <- mlbench.friedman1(5000, sd = 1)
testData$x <- as.data.frame(testData$x)

str(trainingData$x)

## 'data.frame':    200 obs. of  10 variables:
##  $ V1 : num  0.534 0.584 0.59 0.691 0.667 ...
##  $ V2 : num  0.648 0.438 0.588 0.226 0.819 ...
##  $ V3 : num  0.8508 0.6727 0.4097 0.0334 0.7168 ...
##  $ V4 : num  0.1816 0.6692 0.3381 0.0669 0.8032 ...
##  $ V5 : num  0.929 0.1638 0.8941 0.6374 0.0831 ...
##  $ V6 : num  0.3618 0.4531 0.0268 0.525 0.2234 ...
##  $ V7 : num  0.827 0.649 0.179 0.513 0.664 ...
##  $ V8 : num  0.421 0.845 0.35 0.797 0.904 ...
##  $ V9 : num  0.5911 0.9282 0.0176 0.6899 0.397 ...
##  $ V10: num  0.589 0.758 0.444 0.445 0.55 ...

summary(trainingData$y)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.556  10.756  14.556  14.416  17.970  28.382

K-nearest neighbors model

ctrl_72 <- trainControl(method = "repeatedcv", number = 10, repeats = 3)

set.seed(921)
knnModel <- train(
  x = trainingData$x,
  y = trainingData$y,
  method = "knn",
  preProcess = c("center", "scale"),
  tuneLength = 10,
  trControl = ctrl_72
)

knnPred <- predict(knnModel, newdata = testData$x)
knnTest <- postResample(pred = knnPred, obs = testData$y)
knnModel

## k-Nearest Neighbors 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE      Rsquared   MAE     
##    5  3.151963  0.6214372  2.600597
##    7  3.087287  0.6516076  2.509247
##    9  3.058080  0.6843952  2.475974
##   11  3.034943  0.7013575  2.477236
##   13  3.020351  0.7279731  2.448917
##   15  3.063449  0.7304283  2.484342
##   17  3.061518  0.7450912  2.487711
##   19  3.093208  0.7421557  2.514214
##   21  3.118999  0.7450008  2.540937
##   23  3.139436  0.7514295  2.569148
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 13.

knnTest

##      RMSE  Rsquared       MAE 
## 3.2034139 0.6716079 2.5707670

MARS model

marsGrid <- expand.grid(
  degree = 1:2,
  nprune = seq(2, 20, by = 2)
)

set.seed(921)
marsModel <- train(
  x = trainingData$x,
  y = trainingData$y,
  method = "earth",
  preProcess = c("center", "scale"),
  tuneGrid = marsGrid,
  trControl = ctrl_72
)

marsPred <- predict(marsModel, newdata = testData$x)
marsTest <- postResample(pred = marsPred, obs = testData$y)
marsModel

## Multivariate Adaptive Regression Spline 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
## Resampling results across tuning parameters:
## 
##   degree  nprune  RMSE      Rsquared   MAE      
##   1        2      4.438050  0.2197911  3.6881529
##   1        4      2.672008  0.7236734  2.1709494
##   1        6      2.275852  0.8052266  1.7833757
##   1        8      1.727202  0.8865255  1.3546175
##   1       10      1.660787  0.8954520  1.3140280
##   1       12      1.641088  0.8982687  1.2816690
##   1       14      1.653936  0.8959369  1.2960813
##   1       16      1.655315  0.8957826  1.2964709
##   1       18      1.655315  0.8957826  1.2964709
##   1       20      1.655315  0.8957826  1.2964709
##   2        2      4.460945  0.2135576  3.7138066
##   2        4      2.694101  0.7205425  2.1805270
##   2        6      2.265618  0.8055884  1.7972836
##   2        8      1.715140  0.8864248  1.3453901
##   2       10      1.453653  0.9167060  1.1554820
##   2       12      1.328831  0.9321965  1.0509783
##   2       14      1.275209  0.9378036  1.0163297
##   2       16      1.268621  0.9383750  0.9983817
##   2       18      1.275822  0.9374554  1.0064832
##   2       20      1.275822  0.9374554  1.0064832
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nprune = 16 and degree = 2.

marsTest

##      RMSE  Rsquared       MAE 
## 1.2331551 0.9402676 0.9785410

plot(marsModel, main = "MARS Tuning Results")

marsImportance <- varImp(marsModel)$importance %>%
  tibble::rownames_to_column("Predictor") %>%
  arrange(desc(Overall))

kable(marsImportance, digits = 3, caption = "MARS Variable Importance")

MARS Variable Importance
Predictor	Overall
V1	100.000
V4	75.253
V2	48.759
V5	15.540
V3	0.000

# Refit the final MARS model directly so the selected terms are easy to inspect.
finalMarsFit <- earth(
  x = trainingData$x,
  y = trainingData$y,
  degree = marsModel$bestTune$degree,
  nprune = marsModel$bestTune$nprune
)
summary(finalMarsFit)

## Call: earth(x=trainingData$x, y=trainingData$y,
##             degree=marsModel$bestTune$degree, nprune=marsModel$bestTune$nprune)
## 
##                                 coefficients
## (Intercept)                        20.378441
## h(0.621722-V1)                    -15.512132
## h(V1-0.621722)                      9.177132
## h(0.601063-V2)                    -17.940676
## h(V2-0.601063)                     10.064356
## h(V3-0.281766)                     11.590022
## h(0.447442-V3)                     14.641640
## h(V3-0.447442)                    -12.924806
## h(V3-0.606015)                     13.416764
## h(0.734892-V4)                    -10.074386
## h(V4-0.734892)                      9.687149
## h(0.850094-V5)                     -5.385762
## h(0.218266-V1) * h(V2-0.601063)   -55.372637
## h(V1-0.218266) * h(V2-0.601063)   -27.542831
## h(V1-0.621722) * h(V2-0.295997)   -26.527403
## h(0.649253-V1) * h(0.601063-V2)    26.129827
## 
## Selected 16 of 18 terms, and 5 of 10 predictors (nprune=16)
## Termination condition: Reached nk 21
## Importance: V1, V4, V2, V5, V3, V6-unused, V7-unused, V8-unused, V9-unused, ...
## Number of terms at each degree of interaction: 1 11 4
## GCV 1.61518    RSS 210.6377    GRSq 0.934423    RSq 0.9568093

plotmo(finalMarsFit, caption = "MARS partial dependence plots")

##  plotmo grid:    V1        V2       V3        V4        V5        V6        V7
##           0.5139349 0.5106664 0.537307 0.4445841 0.5343299 0.4975981 0.4688035
##        V8        V9       V10
##  0.497961 0.5288716 0.5359218

Radial basis function support vector machine

set.seed(921)
svmRModel <- train(
  x = trainingData$x,
  y = trainingData$y,
  method = "svmRadial",
  preProcess = c("center", "scale"),
  tuneLength = 8,
  trControl = ctrl_72
)

svmPred <- predict(svmRModel, newdata = testData$x)
svmTest <- postResample(pred = svmPred, obs = testData$y)
svmRModel

## Support Vector Machines with Radial Basis Function Kernel 
## 
## 200 samples
##  10 predictor
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
## Resampling results across tuning parameters:
## 
##   C      RMSE      Rsquared   MAE     
##    0.25  2.477496  0.8146722  1.972834
##    0.50  2.208826  0.8309997  1.745928
##    1.00  2.028275  0.8509798  1.598201
##    2.00  1.934791  0.8627124  1.514249
##    4.00  1.865448  0.8707502  1.460540
##    8.00  1.830309  0.8760225  1.445085
##   16.00  1.828418  0.8763048  1.448645
##   32.00  1.828338  0.8763089  1.448505
## 
## Tuning parameter 'sigma' was held constant at a value of 0.06254979
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.06254979 and C = 32.

svmTest

##      RMSE  Rsquared       MAE 
## 2.0820580 0.8285375 1.5849967

plot(svmRModel, main = "Radial SVM Tuning Results")

Exercise 7.2 model comparison

results_72 <- rbind(
  KNN = knnTest,
  MARS = marsTest,
  SVM_Radial = svmTest
) %>%
  as.data.frame() %>%
  tibble::rownames_to_column("Model") %>%
  arrange(RMSE)

kable(results_72, digits = 3, caption = "Exercise 7.2 Test Set Performance")

Exercise 7.2 Test Set Performance
Model	RMSE	Rsquared	MAE
MARS	1.233	0.940	0.979
SVM_Radial	2.082	0.829	1.585
KNN	3.203	0.672	2.571

Exercise 7.2 answer

Based on the test set RMSE, the best model in my run is the model with the lowest RMSE in the table above. MARS and radial SVM both handle nonlinear relationships much better than K-nearest neighbors for this simulation. KNN performs weaker because the response is created from a structured nonlinear formula rather than from simple local neighborhoods in the predictor space.

The MARS variable importance table and final model summary show whether the informative variables are selected. Since the data-generating equation uses X1 through X5, a strong model should mainly emphasize those predictors. In my run, the most important predictors are concentrated among X1 through X5, while the noise variables have little or no importance. This means MARS is successfully detecting the true signal in the simulated data.

Exercise 7.5

This exercise uses the chemical manufacturing process data. The goal is to use the same general preprocessing, splitting, and imputation approach as the previous linear modeling exercise, then train nonlinear regression models and compare performance.

The questions are:

Which nonlinear regression model gives the best resampling and test set performance?
Which predictors are most important in the best nonlinear model? Do the biological or process variables dominate?
For predictors that are uniquely important to the best nonlinear model, what do the predictor-response plots suggest?

Load and split the chemical manufacturing data

data(ChemicalManufacturingProcess)

predictors <- ChemicalManufacturingProcess %>% select(-Yield)
yield <- ChemicalManufacturingProcess$Yield

set.seed(517)
trainingRows <- createDataPartition(yield, p = 0.70, list = FALSE)

trainPredictors <- predictors[trainingRows, ]
trainYield <- yield[trainingRows]

testPredictors <- predictors[-trainingRows, ]
testYield <- yield[-trainingRows]

dim(trainPredictors)

## [1] 124  57

dim(testPredictors)

## [1] 52 57

summary(trainYield)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   36.12   38.79   39.97   40.24   41.46   46.34

Preprocess the predictors

The preprocessing steps below use median imputation, centering, scaling, removal of near-zero variance predictors, and removal of highly correlated predictors. I used median imputation instead of KNN imputation because KNN imputation can fail when the data contain NA, NaN, or infinite values.

# Convert to data frames.
trainPredictors <- as.data.frame(trainPredictors)
testPredictors  <- as.data.frame(testPredictors)

# Replace Inf/-Inf with NA before preprocessing.
trainPredictors[] <- lapply(trainPredictors, function(x) {
  x[is.infinite(x)] <- NA
  x
})

testPredictors[] <- lapply(testPredictors, function(x) {
  x[is.infinite(x)] <- NA
  x
})

# Check missing values before preprocessing.
sum(is.na(trainPredictors))

## [1] 88

sum(is.na(testPredictors))

## [1] 18

# Use median imputation instead of knnImpute to avoid RANN::nn2 NA/NaN/Inf errors.
pp <- preProcess(
  trainPredictors,
  method = c("medianImpute", "center", "scale")
)

ppTrainPredictors <- predict(pp, trainPredictors)
ppTestPredictors  <- predict(pp, testPredictors)

# Remove near-zero variance predictors.
nzv <- nearZeroVar(ppTrainPredictors)
if(length(nzv) > 0) {
  ppTrainPredictors <- ppTrainPredictors[, -nzv, drop = FALSE]
  ppTestPredictors  <- ppTestPredictors[, -nzv, drop = FALSE]
}

# Remove highly correlated predictors.
corrMatrix <- cor(ppTrainPredictors, use = "pairwise.complete.obs")
highCorr <- findCorrelation(corrMatrix, cutoff = 0.75)
if(length(highCorr) > 0) {
  ppTrainPredictors <- ppTrainPredictors[, -highCorr, drop = FALSE]
  ppTestPredictors  <- ppTestPredictors[, -highCorr, drop = FALSE]
}

# Final safety checks. All four should return 0.
sum(is.na(ppTrainPredictors))

## [1] 0

sum(is.na(ppTestPredictors))

## [1] 0

sum(is.infinite(as.matrix(ppTrainPredictors)))

## [1] 0

sum(is.infinite(as.matrix(ppTestPredictors)))

## [1] 0

dim(ppTrainPredictors)

## [1] 124  35

dim(ppTestPredictors)

## [1] 52 35

Train nonlinear models

ctrl_75 <- trainControl(method = "boot", number = 10)

set.seed(614)
marsChemGrid <- expand.grid(
  degree = 1:2,
  nprune = 2:12
)

marsChem <- train(
  x = ppTrainPredictors,
  y = trainYield,
  method = "earth",
  tuneGrid = marsChemGrid,
  trControl = ctrl_75,
  metric = "RMSE"
)

set.seed(614)
svmPolyGrid <- expand.grid(
  degree = c(1, 2),
  scale = c(0.25, 0.5, 1),
  C = c(0.01, 0.05, 0.1, 0.5, 1)
)

svmPolyChem <- train(
  x = ppTrainPredictors,
  y = trainYield,
  method = "svmPoly",
  tuneGrid = svmPolyGrid,
  trControl = ctrl_75,
  metric = "RMSE"
)

set.seed(614)
svmRadialChem <- train(
  x = ppTrainPredictors,
  y = trainYield,
  method = "svmRadial",
  tuneLength = 8,
  trControl = ctrl_75,
  metric = "RMSE"
)

set.seed(614)
knnChem <- train(
  x = ppTrainPredictors,
  y = trainYield,
  method = "knn",
  tuneLength = 15,
  trControl = ctrl_75,
  metric = "RMSE"
)

Resampling results

resampling_75 <- resamples(list(
  MARS = marsChem,
  SVM_Polynomial = svmPolyChem,
  SVM_Radial = svmRadialChem,
  KNN = knnChem
))

summary(resampling_75)

## 
## Call:
## summary.resamples(object = resampling_75)
## 
## Models: MARS, SVM_Polynomial, SVM_Radial, KNN 
## Number of resamples: 10 
## 
## MAE 
##                     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. NA's
## MARS           1.0381315 1.147816 1.226280 1.234798 1.305731 1.465944    0
## SVM_Polynomial 1.0424477 1.186413 1.286848 1.297622 1.449228 1.517574    0
## SVM_Radial     0.8368409 1.057719 1.097129 1.087872 1.141829 1.393940    0
## KNN            1.0070982 1.120923 1.187553 1.227485 1.354083 1.493270    0
## 
## RMSE 
##                    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. NA's
## MARS           1.255149 1.474677 1.509305 1.539584 1.608809 1.881310    0
## SVM_Polynomial 1.270247 1.508996 1.691908 1.694125 1.840637 2.149853    0
## SVM_Radial     1.063622 1.319829 1.384302 1.377508 1.466748 1.689076    0
## KNN            1.290180 1.412057 1.567382 1.551075 1.724506 1.781058    0
## 
## Rsquared 
##                      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
## MARS           0.18823838 0.2819786 0.3659302 0.3596825 0.4144200 0.5220145
## SVM_Polynomial 0.09783666 0.1741580 0.2349950 0.2494173 0.2886982 0.5197313
## SVM_Radial     0.34920734 0.4021429 0.4575125 0.4562445 0.5052691 0.5543181
## KNN            0.16660638 0.2504763 0.2826843 0.3104218 0.3474161 0.5392228
##                NA's
## MARS              0
## SVM_Polynomial    0
## SVM_Radial        0
## KNN               0

bwplot(resampling_75, metric = "RMSE", main = "Resampled RMSE by Model")

Test set performance

model_list_75 <- list(
  MARS = marsChem,
  SVM_Polynomial = svmPolyChem,
  SVM_Radial = svmRadialChem,
  KNN = knnChem
)

test_results_75 <- lapply(names(model_list_75), function(model_name) {
  preds <- predict(model_list_75[[model_name]], newdata = ppTestPredictors)
  out <- postResample(pred = preds, obs = testYield)
  data.frame(
    Model = model_name,
    RMSE = unname(out["RMSE"]),
    Rsquared = unname(out["Rsquared"]),
    MAE = unname(out["MAE"])
  )
}) %>%
  bind_rows() %>%
  arrange(RMSE)

kable(test_results_75, digits = 3, caption = "Exercise 7.5 Test Set Performance")

Exercise 7.5 Test Set Performance
Model	RMSE	Rsquared	MAE
SVM_Radial	1.207	0.594	1.005
MARS	1.365	0.480	1.149
SVM_Polynomial	1.569	0.328	1.320
KNN	1.593	0.290	1.273

Best nonlinear model and variable importance

best_model_name <- test_results_75$Model[1]
best_model <- model_list_75[[best_model_name]]

best_model_name

## [1] "SVM_Radial"

best_model

## Support Vector Machines with Radial Basis Function Kernel 
## 
## 124 samples
##  35 predictor
## 
## No pre-processing
## Resampling: Bootstrapped (10 reps) 
## Summary of sample sizes: 124, 124, 124, 124, 124, 124, ... 
## Resampling results across tuning parameters:
## 
##   C      RMSE      Rsquared   MAE     
##    0.25  1.535773  0.3933996  1.221425
##    0.50  1.452947  0.4179408  1.145201
##    1.00  1.413221  0.4302557  1.116163
##    2.00  1.406295  0.4352690  1.108370
##    4.00  1.396352  0.4455407  1.100935
##    8.00  1.380521  0.4557876  1.089338
##   16.00  1.377508  0.4562445  1.087872
##   32.00  1.377508  0.4562445  1.087872
## 
## Tuning parameter 'sigma' was held constant at a value of 0.02058139
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.02058139 and C = 16.

best_imp <- varImp(best_model)$importance %>%
  tibble::rownames_to_column("Predictor") %>%
  arrange(desc(Overall))

top10_best_imp <- head(best_imp, 10)
kable(top10_best_imp, digits = 3, caption = paste("Top 10 Predictors for", best_model_name))

Top 10 Predictors for SVM_Radial
Predictor	Overall
ManufacturingProcess17	100.000
BiologicalMaterial03	91.296
ManufacturingProcess36	90.292
BiologicalMaterial11	75.886
ManufacturingProcess06	73.847
ManufacturingProcess33	64.054
ManufacturingProcess02	61.388
ManufacturingProcess04	48.458
BiologicalMaterial09	47.686
ManufacturingProcess12	41.442

top10_best_imp <- top10_best_imp %>%
  mutate(
    Predictor_Type = case_when(
      grepl("ManufacturingProcess", Predictor) ~ "Process variable",
      grepl("BiologicalMaterial", Predictor) ~ "Biological variable",
      TRUE ~ "Other"
    )
  )

kable(top10_best_imp, digits = 3, caption = "Top 10 Predictors Classified by Type")

Top 10 Predictors Classified by Type
Predictor	Overall	Predictor_Type
ManufacturingProcess17	100.000	Process variable
BiologicalMaterial03	91.296	Biological variable
ManufacturingProcess36	90.292	Process variable
BiologicalMaterial11	75.886	Biological variable
ManufacturingProcess06	73.847	Process variable
ManufacturingProcess33	64.054	Process variable
ManufacturingProcess02	61.388	Process variable
ManufacturingProcess04	48.458	Process variable
BiologicalMaterial09	47.686	Biological variable
ManufacturingProcess12	41.442	Process variable

Predictor-response plots for top predictors

top_predictors <- head(top10_best_imp$Predictor, 4)

plot_data <- ppTrainPredictors %>%
  select(all_of(top_predictors)) %>%
  mutate(Yield = trainYield) %>%
  tidyr::pivot_longer(
    cols = all_of(top_predictors),
    names_to = "Predictor",
    values_to = "Value"
  )

ggplot(plot_data, aes(x = Value, y = Yield)) +
  geom_point(alpha = 0.55) +
  geom_smooth(method = "loess", se = FALSE) +
  facet_wrap(~ Predictor, scales = "free_x") +
  labs(
    title = "Relationships Between Top Predictors and Yield",
    x = "Preprocessed predictor value",
    y = "Yield"
  ) +
  theme_minimal()

#{r clean-memory-after-7-2, include=FALSE} rm(trainingData, testData, knnModel, marsModel, nnetModel, svmRadialModel) gc()

Exercise 7.5 answer

The best nonlinear model is identified by the lowest test set RMSE in the test performance table. The resampling plot is also useful because it shows whether the same model performed well consistently during resampling. If one model has the lowest test RMSE but unstable resampling results, I would be cautious about calling it clearly superior.

The variable importance table shows which predictors drive the best nonlinear model. In my run, the top predictors should be reviewed by their names: predictors beginning with ManufacturingProcess are process variables, while predictors beginning with BiologicalMaterial are biological variables. If most of the top 10 are manufacturing process variables, then process variables dominate the nonlinear model. If most are biological variables, then biological variables dominate.

The predictor-response plots help show whether the nonlinear model is finding curved or threshold-like relationships. If the smooth lines are mostly straight, that suggests a roughly linear relationship between those predictors and yield. If the smooth lines bend or flatten, that suggests the nonlinear model is capturing patterns that a simple linear model may miss.

Overall, I would choose the nonlinear model with the strongest combination of low test RMSE, strong resampling performance, and interpretable variable importance. If the nonlinear models do not clearly outperform the earlier linear model from Exercise 6.3, then the chemical manufacturing data may be mostly explained by an approximately linear structure after preprocessing.

Session Information

sessionInfo()

## R version 4.5.3 (2026-03-11)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
##  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
##  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
## [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
## 
## time zone: UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] knitr_1.51                      dplyr_1.2.1                    
##  [3] mlbench_2.1-8                   kernlab_0.9-33                 
##  [5] earth_5.3.5                     plotmo_3.7.0                   
##  [7] plotrix_3.8-14                  Formula_1.2-5                  
##  [9] caret_7.0-1                     lattice_0.22-9                 
## [11] ggplot2_4.0.3                   AppliedPredictiveModeling_1.1-7
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.1     timeDate_4052.112    farver_2.1.2        
##  [4] S7_0.2.2             fastmap_1.2.0        pROC_1.19.0.1       
##  [7] digest_0.6.39        rpart_4.1.24         timechange_0.4.0    
## [10] lifecycle_1.0.5      cluster_2.1.8.2      survival_3.8-6      
## [13] magrittr_2.0.5       compiler_4.5.3       rlang_1.2.0         
## [16] sass_0.4.10          tools_4.5.3          yaml_2.3.12         
## [19] data.table_1.18.2.1  labeling_0.4.3       plyr_1.8.9          
## [22] RColorBrewer_1.1-3   withr_3.0.2          purrr_1.2.2         
## [25] nnet_7.3-20          grid_4.5.3           stats4_4.5.3        
## [28] future_1.70.0        globals_0.19.1       scales_1.4.0        
## [31] iterators_1.0.14     MASS_7.3-65          cli_3.6.6           
## [34] ellipse_0.5.0        rmarkdown_2.31       generics_0.1.4      
## [37] otel_0.2.0           rstudioapi_0.18.0    future.apply_1.20.2 
## [40] reshape2_1.4.5       cachem_1.1.0         stringr_1.6.0       
## [43] splines_4.5.3        parallel_4.5.3       vctrs_0.7.3         
## [46] hardhat_1.4.3        Matrix_1.7-4         jsonlite_2.0.0      
## [49] listenv_0.10.1       foreach_1.5.2        tidyr_1.3.2         
## [52] gower_1.0.2          jquerylib_0.1.4      recipes_1.3.2       
## [55] glue_1.8.1           parallelly_1.47.0    codetools_0.2-20    
## [58] lubridate_1.9.5      stringi_1.8.7        gtable_0.3.6        
## [61] rpart.plot_3.1.4     tibble_3.3.1         CORElearn_1.57.3.1  
## [64] pillar_1.11.1        htmltools_0.5.9      ipred_0.9-15        
## [67] lava_1.9.0           R6_2.6.1             evaluate_1.0.5      
## [70] bslib_0.10.0         class_7.3-23         Rcpp_1.1.1-1        
## [73] nlme_3.1-168         prodlim_2026.03.11   mgcv_1.9-4          
## [76] xfun_0.57            pkgconfig_2.0.3      ModelMetrics_1.2.2.2

Kuhn and Johnson - Chapter 7 Exercises 7.2 and 7.5

Sachi Kapoor

April 26, 2026