Main Model (Boruta Features + SMOTE): CV Performance Distribution
CV Performance Distribution (Main Boruta Model).
This report details the steps and results of a machine learning pipeline developed for cancer data analysis. It includes data preprocessing, various feature selection techniques (Near-Zero Variance and Boruta), hyperparameter tuning of a Random Forest model, class imbalance handling using SMOTE, and evaluation using 10-fold cross-validation and a hold-out test set. The impact of key steps like different feature selection methods and SMOTE is also visualized.
The following global configurations were used for the analysis:
file_path <- "Norm_avg_plasma_data - Sheet2.csv" # IMPORTANT: Ensure this file is in your R working directory
positive_class_name <- "Cancer"
negative_class_name <- "Normal"
The dataset is loaded from Norm_avg_plasma_data - Sheet2.csv. It’s assumed to have samples as rows, a header row, with the first column for sample identifiers, the second for class labels, and subsequent columns for feature values.
## Dimensions of raw data: 75 rows, 2813 columns.
## Column names from CSV header:
| Unique.Id | Sample.type | X0 | X0.1 | X0.117 | X0.133333 | X0.15 | X0.166667 | X0.183 | X0.2 | X0.216667 | X0.233333 | X0.25 | X0.266667 | X0.283333 | X0.3 | X0.317 | X0.333333 | X0.35 | X0.366667 | X0.383 | X0.4 | X0.416667 | X0.433333 | X0.45 | X0.466667 | X0.483333 | X0.5 | X0.517 | X0.533333 | X0.55 | X0.566667 | X0.583 | X0.6 | X0.616667 | X0.633333 | X0.65 | X0.666667 | X0.683333 | X0.7 | X0.717 | X0.733333 | X0.75 | X0.766667 | X0.783 | X0.8 | X0.816667 | X0.833333 | X0.85 | X0.867 | X0.883333 | X0.9 | X0.916667 | X0.933333 | X0.95 | X0.966667 | X0.983 | X1 | X1.016667 | X1.033333 | X1.05 | X1.066667 | X1.083333 | X1.1 | X1.116667 | X1.133333 | X1.15 | X1.166667 | X1.183333 | X1.2 | X1.216667 | X1.233333 | X1.25 | X1.266667 | X1.283333 | X1.3 | X1.316667 | X1.333333 | X1.35 | X1.366667 | X1.383333 | X1.4 | X1.416667 | X1.433333 | X1.45 | X1.466667 | X1.483333 | X1.5 | X1.516667 | X1.533333 | X1.55 | X1.566667 | X1.583333 | X1.6 | X1.616667 | X1.633333 | X1.65 | X1.666667 | X1.683333 | X1.7 | X1.716667 | X1.733333 | X1.75 | X1.766667 | X1.783333 | X1.8 | X1.816667 | X1.833333 | X1.85 | X1.866667 | X1.883333 | X1.9 | X1.916667 | X1.933333 | X1.95 | X1.966667 | X1.983333 | X2 | X2.016667 | X2.033333 | X2.05 | X2.066667 | X2.083333 | X2.1 | X2.116667 | X2.133333 | X2.15 | X2.166667 | X2.183333 | X2.2 | X2.216667 | X2.233333 | X2.25 | X2.266667 | X2.283333 | X2.3 | X2.316667 | X2.333333 | X2.35 | X2.366667 | X2.383333 | X2.4 | X2.416667 | X2.433333 | X2.45 | X2.466667 | X2.483333 | X2.5 | X2.516667 | X2.533333 | X2.55 | X2.566667 | X2.583333 | X2.6 | X2.616667 | X2.633333 | X2.65 | X2.666667 | X2.683333 | X2.7 | X2.716667 | X2.733333 | X2.75 | X2.766667 | X2.783333 | X2.8 | X2.816667 | X2.833333 | X2.85 | X2.866667 | X2.883333 | X2.9 | X2.916667 | X2.933333 | X2.95 | X2.966667 | X2.983333 | X3 | X3.016667 | X3.033333 | X3.05 | X3.066667 | X3.083333 | X3.1 | X3.116667 | X3.133333 | X3.15 | X3.166667 | X3.183333 | X3.2 | X3.216667 | X3.233333 | X3.25 | X3.266667 | X3.283333 | X3.3 | X3.316667 | X3.333333 | X3.35 | X3.366667 | X3.383333 | X3.4 | X3.416667 | X3.433333 | X3.45 | X3.466667 | X3.483333 | X3.5 | X3.516667 | X3.533333 | X3.55 | X3.566667 | X3.583333 | X3.6 | X3.616667 | X3.633333 | X3.65 | X3.666667 | X3.683333 | X3.7 | X3.716667 | X3.733333 | X3.75 | X3.766667 | X3.783333 | X3.8 | X3.816667 | X3.833333 | X3.85 | X3.866667 | X3.883333 | X3.9 | X3.916667 | X3.933333 | X3.95 | X3.966667 | X3.983333 | X4 | X4.016667 | X4.033333 | X4.05 | X4.066667 | X4.083333 | X4.1 | X4.116667 | X4.133333 | X4.15 | X4.166667 | X4.183333 | X4.2 | X4.216667 | X4.233333 | X4.25 | X4.266667 | X4.283333 | X4.3 | X4.316667 | X4.333333 | X4.35 | X4.366667 | X4.383333 | X4.4 | X4.416667 | X4.433333 | X4.45 | X4.466667 | X4.483333 | X4.5 | X4.516667 | X4.533333 | X4.55 | X4.566667 | X4.583333 | X4.6 | X4.616667 | X4.633333 | X4.65 | X4.666667 | X4.683333 | X4.7 | X4.716667 | X4.733333 | X4.75 | X4.766667 | X4.783333 | X4.8 | X4.816667 | X4.833333 | X4.85 | X4.866667 | X4.883333 | X4.9 | X4.916667 | X4.933333 | X4.95 | X4.966667 | X4.983333 | X5 | X5.016667 | X5.033333 | X5.05 | X5.066667 | X5.083333 | X5.1 | X5.116667 | X5.133333 | X5.15 | X5.166667 | X5.183333 | X5.2 | X5.216667 | X5.233333 | X5.25 | X5.266667 | X5.283333 | X5.3 | X5.316667 | X5.333333 | X5.35 | X5.366667 | X5.383333 | X5.4 | X5.416667 | X5.433333 | X5.45 | X5.466667 | X5.483333 | X5.5 | X5.516667 | X5.533333 | X5.55 | X5.566667 | X5.583333 | X5.6 | X5.616667 | X5.633333 | X5.65 | X5.666667 | X5.683333 | X5.7 | X5.716667 | X5.733333 | X5.75 | X5.766667 | X5.783333 | X5.8 | X5.816667 | X5.833333 | X5.85 | X5.866667 | X5.883333 | X5.9 | X5.916667 | X5.933333 | X5.95 | X5.966667 | X5.983333 | X6 | X6.016667 | X6.033333 | X6.05 | X6.066667 | X6.083333 | X6.1 | X6.116667 | X6.133333 | X6.15 | X6.166667 | X6.183333 | X6.2 | X6.216667 | X6.233333 | X6.25 | X6.266667 | X6.283333 | X6.3 | X6.316667 | X6.333333 | X6.35 | X6.366667 | X6.383333 | X6.4 | X6.416667 | X6.433333 | X6.45 | X6.466667 | X6.483333 | X6.5 | X6.516667 | X6.533333 | X6.55 | X6.566667 | X6.583333 | X6.6 | X6.616667 | X6.633333 | X6.65 | X6.666667 | X6.683333 | X6.7 | X6.716667 | X6.733333 | X6.75 | X6.766667 | X6.783333 | X6.8 | X6.816667 | X6.833333 | X6.85 | X6.866667 | X6.883333 | X6.9 | X6.916667 | X6.933333 | X6.95 | X6.966667 | X6.983333 | X7 | X7.016667 | X7.033333 | X7.05 | X7.066667 | X7.083333 | X7.1 | X7.116667 | X7.133333 | X7.15 | X7.166667 | X7.183333 | X7.2 | X7.216667 | X7.233333 | X7.25 | X7.266667 | X7.283333 | X7.3 | X7.316667 | X7.333333 | X7.35 | X7.366667 | X7.383333 | X7.4 | X7.416667 | X7.433333 | X7.45 | X7.466667 | X7.483333 | X7.5 | X7.516667 | X7.533333 | X7.55 | X7.566667 | X7.583333 | X7.6 | X7.616667 | X7.633333 | X7.65 | X7.666667 | X7.683333 | X7.7 | X7.716667 | X7.733333 | X7.75 | X7.766667 | X7.783333 | X7.8 | X7.816667 | X7.833333 | X7.85 | X7.866667 | X7.883333 | X7.9 | X7.916667 | X7.933333 | X7.95 | X7.966667 | X7.983333 | X8 | X8.016667 | X8.033333 | X8.05 | X8.066667 | X8.083333 | X8.1 | X8.116667 | X8.133333 | X8.15 | X8.166667 | X8.183333 | X8.2 | X8.216667 | X8.233333 | X8.25 | X8.266667 | X8.283333 | X8.3 | X8.316667 | X8.333333 | X8.35 | X8.366667 | X8.383333 | X8.4 | X8.416667 | X8.433333 | X8.45 | X8.466667 | X8.483333 | X8.5 | X8.516667 | X8.533333 | X8.55 | X8.566667 | X8.583333 | X8.6 | X8.616667 | X8.633333 | X8.65 | X8.666667 | X8.683333 | X8.7 | X8.716667 | X8.733333 | X8.75 | X8.766667 | X8.783333 | X8.8 | X8.816667 | X8.833333 | X8.85 | X8.866667 | X8.883333 | X8.9 | X8.916667 | X8.933333 | X8.95 | X8.966667 | X8.983333 | X9 | X9.016667 | X9.033333 | X9.05 | X9.066667 | X9.083333 | X9.1 | X9.116667 | X9.133333 | X9.15 | X9.166667 | X9.183333 | X9.2 | X9.216667 | X9.233333 | X9.25 | X9.266667 | X9.283333 | X9.3 | X9.316667 | X9.333333 | X9.35 | X9.366667 | X9.383333 | X9.4 | X9.416667 | X9.433333 | X9.45 | X9.466667 | X9.483333 | X9.5 | X9.516667 | X9.533333 | X9.55 | X9.566667 | X9.583333 | X9.6 | X9.616667 | X9.633333 | X9.65 | X9.666667 | X9.683333 | X9.7 | X9.716667 | X9.733333 | X9.75 | X9.766667 | X9.783333 | X9.8 | X9.816667 | X9.833333 | X9.85 | X9.866667 | X9.883333 | X9.9 | X9.916667 | X9.933333 | X9.95 | X9.966667 | X9.983333 | X10 | X10.01667 | X10.03333 | X10.05 | X10.06667 | X10.08333 | X10.1 | X10.11667 | X10.13333 | X10.15 | X10.16667 | X10.18333 | X10.2 | X10.21667 | X10.23333 | X10.25 | X10.26667 | X10.28333 | X10.3 | X10.31667 | X10.33333 | X10.35 | X10.36667 | X10.38333 | X10.4 | X10.41667 | X10.43333 | X10.45 | X10.46667 | X10.48333 | X10.5 | X10.51667 | X10.53333 | X10.55 | X10.56667 | X10.58333 | X10.6 | X10.61667 | X10.63333 | X10.65 | X10.66667 | X10.68333 | X10.7 | X10.71667 | X10.73333 | X10.75 | X10.76667 | X10.78333 | X10.8 | X10.81667 | X10.83333 | X10.85 | X10.86667 | X10.88333 | X10.9 | X10.91667 | X10.93333 | X10.95 | X10.96667 | X10.98333 | X11 | X11.01667 | X11.03333 | X11.05 | X11.06667 | X11.08333 | X11.1 | X11.11667 | X11.13333 | X11.15 | X11.16667 | X11.18333 | X11.2 | X11.21667 | X11.23333 | X11.25 | X11.26667 | X11.28333 | X11.3 | X11.31667 | X11.33333 | X11.35 | X11.36667 | X11.38333 | X11.4 | X11.41667 | X11.43333 | X11.45 | X11.46667 | X11.48333 | X11.5 | X11.51667 | X11.53333 | X11.55 | X11.56667 | X11.58333 | X11.6 | X11.61667 | X11.63333 | X11.65 | X11.66667 | X11.68333 | X11.7 | X11.71667 | X11.73333 | X11.75 | X11.76667 | X11.78333 | X11.8 | X11.81667 | X11.83333 | X11.85 | X11.86667 | X11.88333 | X11.9 | X11.91667 | X11.93333 | X11.95 | X11.96667 | X11.98333 | X12 | X12.01667 | X12.03333 | X12.05 | X12.06667 | X12.08333 | X12.1 | X12.11667 | X12.13333 | X12.15 | X12.16667 | X12.18333 | X12.2 | X12.21667 | X12.23333 | X12.25 | X12.26667 | X12.28333 | X12.3 | X12.31667 | X12.33333 | X12.35 | X12.36667 | X12.38333 | X12.4 | X12.41667 | X12.43333 | X12.45 | X12.46667 | X12.48333 | X12.5 | X12.51667 | X12.53333 | X12.55 | X12.56667 | X12.58333 | X12.6 | X12.61667 | X12.63333 | X12.65 | X12.66667 | X12.68333 | X12.7 | X12.71667 | X12.73333 | X12.75 | X12.76667 | X12.78333 | X12.8 | X12.81667 | X12.83333 | X12.85 | X12.86667 | X12.88333 | X12.9 | X12.91667 | X12.93333 | X12.95 | X12.96667 | X12.98333 | X13 | X13.01667 | X13.03333 | X13.05 | X13.06667 | X13.08333 | X13.1 | X13.11667 | X13.13333 | X13.15 | X13.16667 | X13.18333 | X13.2 | X13.21667 | X13.23333 | X13.25 | X13.26667 | X13.28333 | X13.3 | X13.31667 | X13.33333 | X13.35 | X13.36667 | X13.38333 | X13.4 | X13.41667 | X13.43333 | X13.45 | X13.46667 | X13.48333 | X13.5 | X13.51667 | X13.53333 | X13.55 | X13.56667 | X13.58333 | X13.6 | X13.61667 | X13.63333 | X13.65 | X13.66667 | X13.68333 | X13.7 | X13.71667 | X13.73333 | X13.75 | X13.76667 | X13.78333 | X13.8 | X13.81667 | X13.83333 | X13.85 | X13.86667 | X13.88333 | X13.9 | X13.91667 | X13.93333 | X13.95 | X13.96667 | X13.98333 | X14 | X14.01667 | X14.03333 | X14.05 | X14.06667 | X14.08333 | X14.1 | X14.11667 | X14.13333 | X14.15 | X14.16667 | X14.18333 | X14.2 | X14.21667 | X14.23333 | X14.25 | X14.26667 | X14.28333 | X14.3 | X14.31667 | X14.33333 | X14.35 | X14.36667 | X14.38333 | X14.4 | X14.41667 | X14.43333 | X14.45 | X14.46667 | X14.48333 | X14.5 | X14.51667 | X14.53333 | X14.55 | X14.56667 | X14.58333 | X14.6 | X14.61667 | X14.63333 | X14.65 | X14.66667 | X14.68333 | X14.7 | X14.71667 | X14.73333 | X14.75 | X14.76667 | X14.78333 | X14.8 | X14.81667 | X14.83333 | X14.85 | X14.86667 | X14.88333 | X14.9 | X14.91667 | X14.93333 | X14.95 | X14.96667 | X14.98333 | X15 | X15.01667 | X15.03333 | X15.05 | X15.06667 | X15.08333 | X15.1 | X15.11667 | X15.13333 | X15.15 | X15.16667 | X15.18333 | X15.2 | X15.21667 | X15.23333 | X15.25 | X15.26667 | X15.28333 | X15.3 | X15.31667 | X15.33333 | X15.35 | X15.36667 | X15.38333 | X15.4 | X15.41667 | X15.43333 | X15.45 | X15.46667 | X15.48333 | X15.5 | X15.51667 | X15.53333 | X15.55 | X15.56667 | X15.58333 | X15.6 | X15.61667 | X15.63333 | X15.65 | X15.66667 | X15.68333 | X15.7 | X15.71667 | X15.73333 | X15.75 | X15.76667 | X15.78333 | X15.8 | X15.81667 | X15.83333 | X15.85 | X15.86667 | X15.88333 | X15.9 | X15.91667 | X15.93333 | X15.95 | X15.96667 | X15.98333 | X16 | X16.01667 | X16.03333 | X16.05 | X16.06667 | X16.08333 | X16.1 | X16.11667 | X16.13333 | X16.15 | X16.16667 | X16.18333 | X16.2 | X16.21667 | X16.23333 | X16.25 | X16.26667 | X16.28333 | X16.3 | X16.31667 | X16.33333 | X16.35 | X16.36667 | X16.38333 | X16.4 | X16.41667 | X16.43333 | X16.45 | X16.46667 | X16.48333 | X16.5 | X16.51667 | X16.53333 | X16.55 | X16.56667 | X16.58333 | X16.6 | X16.61667 | X16.63333 | X16.65 | X16.66667 | X16.68333 | X16.7 | X16.71667 | X16.73333 | X16.75 | X16.76667 | X16.78333 | X16.8 | X16.81667 | X16.83333 | X16.85 | X16.86667 | X16.88333 | X16.9 | X16.91667 | X16.93333 | X16.95 | X16.96667 | X16.98333 | X17 | X17.01667 | X17.03333 | X17.05 | X17.06667 | X17.08333 | X17.1 | X17.11667 | X17.13333 | X17.15 | X17.16667 | X17.18333 | X17.2 | X17.21667 | X17.23333 | X17.25 | X17.26667 | X17.28333 | X17.3 | X17.31667 | X17.33333 | X17.35 | X17.36667 | X17.38333 | X17.4 | X17.41667 | X17.43333 | X17.45 | X17.46667 | X17.48333 | X17.5 | X17.51667 | X17.53333 | X17.55 | X17.56667 | X17.58333 | X17.6 | X17.61667 | X17.63333 | X17.65 | X17.66667 | X17.68333 | X17.7 | X17.71667 | X17.73333 | X17.75 | X17.76667 | X17.78333 | X17.8 | X17.81667 | X17.83333 | X17.85 | X17.86667 | X17.88333 | X17.9 | X17.91667 | X17.93333 | X17.95 | X17.96667 | X17.98333 | X18 | X18.01667 | X18.03333 | X18.05 | X18.06667 | X18.08333 | X18.1 | X18.11667 | X18.13333 | X18.15 | X18.16667 | X18.18333 | X18.2 | X18.21667 | X18.23333 | X18.25 | X18.26667 | X18.28333 | X18.3 | X18.31667 | X18.33333 | X18.35 | X18.36667 | X18.38333 | X18.4 | X18.41667 | X18.43333 | X18.45 | X18.46667 | X18.48333 | X18.5 | X18.51667 | X18.53333 | X18.55 | X18.56667 | X18.58333 | X18.6 | X18.61667 | X18.63333 | X18.65 | X18.66667 | X18.68333 | X18.7 | X18.71667 | X18.73333 | X18.75 | X18.76667 | X18.78333 | X18.8 | X18.81667 | X18.83333 | X18.85 | X18.86667 | X18.88333 | X18.9 | X18.91667 | X18.93333 | X18.95 | X18.96667 | X18.98333 | X19 | X19.01667 | X19.03333 | X19.05 | X19.06667 | X19.08333 | X19.1 | X19.11667 | X19.13333 | X19.15 | X19.16667 | X19.18333 | X19.2 | X19.21667 | X19.23333 | X19.25 | X19.26667 | X19.28333 | X19.3 | X19.31667 | X19.33333 | X19.35 | X19.36667 | X19.38333 | X19.4 | X19.41667 | X19.43333 | X19.45 | X19.46667 | X19.48333 | X19.5 | X19.51667 | X19.53333 | X19.55 | X19.56667 | X19.58333 | X19.6 | X19.61667 | X19.63333 | X19.65 | X19.66667 | X19.68333 | X19.7 | X19.71667 | X19.73333 | X19.75 | X19.76667 | X19.78333 | X19.8 | X19.81667 | X19.83333 | X19.85 | X19.86667 | X19.88333 | X19.9 | X19.91667 | X19.93333 | X19.95 | X19.96667 | X19.98333 | X20 | X20.01667 | X20.03333 | X20.05 | X20.06667 | X20.08333 | X20.1 | X20.11667 | X20.13333 | X20.15 | X20.16667 | X20.18333 | X20.2 | X20.21667 | X20.23333 | X20.25 | X20.26667 | X20.28333 | X20.3 | X20.31667 | X20.33333 | X20.35 | X20.36667 | X20.38333 | X20.4 | X20.41667 | X20.43333 | X20.45 | X20.46667 | X20.48333 | X20.5 | X20.51667 | X20.53333 | X20.55 | X20.56667 | X20.58333 | X20.6 | X20.61667 | X20.63333 | X20.65 | X20.66667 | X20.68333 | X20.7 | X20.71667 | X20.73333 | X20.75 | X20.76667 | X20.78333 | X20.8 | X20.81667 | X20.83333 | X20.85 | X20.86667 | X20.88333 | X20.9 | X20.91667 | X20.93333 | X20.95 | X20.96667 | X20.98333 | X21 | X21.01667 | X21.03333 | X21.05 | X21.06667 | X21.08333 | X21.1 | X21.11667 | X21.13333 | X21.15 | X21.16667 | X21.18333 | X21.2 | X21.21667 | X21.23333 | X21.25 | X21.26667 | X21.28333 | X21.3 | X21.31667 | X21.33333 | X21.35 | X21.36667 | X21.38333 | X21.4 | X21.41667 | X21.43333 | X21.45 | X21.46667 | X21.48333 | X21.5 | X21.51667 | X21.53333 | X21.55 | X21.56667 | X21.58333 | X21.6 | X21.61667 | X21.63333 | X21.65 | X21.66667 | X21.68333 | X21.7 | X21.71667 | X21.73333 | X21.75 | X21.76667 | X21.78333 | X21.8 | X21.81667 | X21.83333 | X21.85 | X21.86667 | X21.88333 | X21.9 | X21.91667 | X21.93333 | X21.95 | X21.96667 | X21.98333 | X22 | X22.01667 | X22.03333 | X22.05 | X22.06667 | X22.08333 | X22.1 | X22.11667 | X22.13333 | X22.15 | X22.16667 | X22.18333 | X22.2 | X22.21667 | X22.23333 | X22.25 | X22.26667 | X22.28333 | X22.3 | X22.31667 | X22.33333 | X22.35 | X22.36667 | X22.38333 | X22.4 | X22.41667 | X22.43333 | X22.45 | X22.46667 | X22.48333 | X22.5 | X22.51667 | X22.53333 | X22.55 | X22.56667 | X22.58333 | X22.6 | X22.61667 | X22.63333 | X22.65 | X22.66667 | X22.68333 | X22.7 | X22.71667 | X22.73333 | X22.75 | X22.76667 | X22.78333 | X22.8 | X22.81667 | X22.83333 | X22.85 | X22.86667 | X22.88333 | X22.9 | X22.91667 | X22.93333 | X22.95 | X22.96667 | X22.98333 | X23 | X23.01667 | X23.03333 | X23.05 | X23.06667 | X23.08333 | X23.1 | X23.11667 | X23.13333 | X23.15 | X23.16667 | X23.18333 | X23.2 | X23.21667 | X23.23333 | X23.25 | X23.26667 | X23.28333 | X23.3 | X23.31667 | X23.33333 | X23.35 | X23.36667 | X23.38333 | X23.4 | X23.41667 | X23.43333 | X23.45 | X23.46667 | X23.48333 | X23.5 | X23.51667 | X23.53333 | X23.55 | X23.56667 | X23.58333 | X23.6 | X23.61667 | X23.63333 | X23.65 | X23.66667 | X23.68333 | X23.7 | X23.71667 | X23.73333 | X23.75 | X23.76667 | X23.78333 | X23.8 | X23.81667 | X23.83333 | X23.85 | X23.86667 | X23.88333 | X23.9 | X23.91667 | X23.93333 | X23.95 | X23.96667 | X23.98333 | X24 | X24.01667 | X24.03333 | X24.05 | X24.06667 | X24.08333 | X24.1 | X24.11667 | X24.13333 | X24.15 | X24.16667 | X24.18333 | X24.2 | X24.21667 | X24.23333 | X24.25 | X24.26667 | X24.28333 | X24.3 | X24.31667 | X24.33333 | X24.35 | X24.36667 | X24.38333 | X24.4 | X24.41667 | X24.43333 | X24.45 | X24.46667 | X24.48333 | X24.5 | X24.51667 | X24.53333 | X24.55 | X24.56667 | X24.58333 | X24.6 | X24.61667 | X24.63333 | X24.65 | X24.66667 | X24.68333 | X24.7 | X24.71667 | X24.73333 | X24.75 | X24.76667 | X24.78333 | X24.8 | X24.81667 | X24.83333 | X24.85 | X24.86667 | X24.88333 | X24.9 | X24.91667 | X24.93333 | X24.95 | X24.96667 | X24.98333 | X25 | X25.01667 | X25.03333 | X25.05 | X25.06667 | X25.08333 | X25.1 | X25.11667 | X25.13333 | X25.15 | X25.16667 | X25.18333 | X25.2 | X25.21667 | X25.23333 | X25.25 | X25.26667 | X25.28333 | X25.3 | X25.31667 | X25.33333 | X25.35 | X25.36667 | X25.38333 | X25.4 | X25.41667 | X25.43333 | X25.45 | X25.46667 | X25.48333 | X25.5 | X25.51667 | X25.53333 | X25.55 | X25.56667 | X25.58333 | X25.6 | X25.61667 | X25.63333 | X25.65 | X25.66667 | X25.68333 | X25.7 | X25.71667 | X25.73333 | X25.75 | X25.76667 | X25.78333 | X25.8 | X25.81667 | X25.83333 | X25.85 | X25.86667 | X25.88333 | X25.9 | X25.91667 | X25.93333 | X25.95 | X25.96667 | X25.98333 | X26 | X26.01667 | X26.03333 | X26.05 | X26.06667 | X26.08333 | X26.1 | X26.11667 | X26.13333 | X26.15 | X26.16667 | X26.18333 | X26.2 | X26.21667 | X26.23333 | X26.25 | X26.26667 | X26.28333 | X26.3 | X26.31667 | X26.33333 | X26.35 | X26.36667 | X26.38333 | X26.4 | X26.41667 | X26.43333 | X26.45 | X26.46667 | X26.48333 | X26.5 | X26.51667 | X26.53333 | X26.55 | X26.56667 | X26.58333 | X26.6 | X26.61667 | X26.63333 | X26.65 | X26.66667 | X26.68333 | X26.7 | X26.71667 | X26.73333 | X26.75 | X26.76667 | X26.78333 | X26.8 | X26.81667 | X26.83333 | X26.85 | X26.86667 | X26.88333 | X26.9 | X26.91667 | X26.93333 | X26.95 | X26.96667 | X26.98333 | X27 | X27.01667 | X27.03333 | X27.05 | X27.06667 | X27.08333 | X27.1 | X27.11667 | X27.13333 | X27.15 | X27.16667 | X27.18333 | X27.2 | X27.21667 | X27.23333 | X27.25 | X27.26667 | X27.28333 | X27.3 | X27.31667 | X27.33333 | X27.35 | X27.36667 | X27.38333 | X27.4 | X27.41667 | X27.43333 | X27.45 | X27.46667 | X27.48333 | X27.5 | X27.51667 | X27.53333 | X27.55 | X27.56667 | X27.58333 | X27.6 | X27.61667 | X27.63333 | X27.65 | X27.66667 | X27.68333 | X27.7 | X27.71667 | X27.73333 | X27.75 | X27.76667 | X27.78333 | X27.8 | X27.81667 | X27.83333 | X27.85 | X27.86667 | X27.88333 | X27.9 | X27.91667 | X27.93333 | X27.95 | X27.96667 | X27.98333 | X28 | X28.01667 | X28.03333 | X28.05 | X28.06667 | X28.08333 | X28.1 | X28.11667 | X28.13333 | X28.15 | X28.16667 | X28.18333 | X28.2 | X28.21667 | X28.23333 | X28.25 | X28.26667 | X28.28333 | X28.3 | X28.31667 | X28.33333 | X28.35 | X28.36667 | X28.38333 | X28.4 | X28.41667 | X28.43333 | X28.45 | X28.46667 | X28.48333 | X28.5 | X28.51667 | X28.53333 | X28.55 | X28.56667 | X28.58333 | X28.6 | X28.61667 | X28.63333 | X28.65 | X28.66667 | X28.68333 | X28.7 | X28.71667 | X28.73333 | X28.75 | X28.76667 | X28.78333 | X28.8 | X28.81667 | X28.83333 | X28.85 | X28.86667 | X28.88333 | X28.9 | X28.91667 | X28.93333 | X28.95 | X28.96667 | X28.98333 | X29 | X29.01667 | X29.03333 | X29.05 | X29.06667 | X29.08333 | X29.1 | X29.11667 | X29.13333 | X29.15 | X29.16667 | X29.18333 | X29.2 | X29.21667 | X29.23333 | X29.25 | X29.26667 | X29.28333 | X29.3 | X29.31667 | X29.33333 | X29.35 | X29.36667 | X29.38333 | X29.4 | X29.41667 | X29.43333 | X29.45 | X29.46667 | X29.48333 | X29.5 | X29.51667 | X29.53333 | X29.55 | X29.56667 | X29.58333 | X29.6 | X29.61667 | X29.63333 | X29.65 | X29.66667 | X29.68333 | X29.7 | X29.71667 | X29.73333 | X29.75 | X29.76667 | X29.78333 | X29.8 | X29.81667 | X29.83333 | X29.85 | X29.86667 | X29.88333 | X29.9 | X29.91667 | X29.93333 | X29.95 | X29.96667 | X29.98333 | X30 | X30.01667 | X30.03333 | X30.05 | X30.06667 | X30.08333 | X30.1 | X30.11667 | X30.13333 | X30.15 | X30.16667 | X30.18333 | X30.2 | X30.21667 | X30.23333 | X30.25 | X30.26667 | X30.28333 | X30.3 | X30.31667 | X30.33333 | X30.35 | X30.36667 | X30.38333 | X30.4 | X30.41667 | X30.43333 | X30.45 | X30.46667 | X30.48333 | X30.5 | X30.51667 | X30.53333 | X30.55 | X30.56667 | X30.58333 | X30.6 | X30.61667 | X30.63333 | X30.65 | X30.66667 | X30.68333 | X30.7 | X30.71667 | X30.73333 | X30.75 | X30.76667 | X30.78333 | X30.8 | X30.81667 | X30.83333 | X30.85 | X30.86667 | X30.88333 | X30.9 | X30.91667 | X30.93333 | X30.95 | X30.96667 | X30.98333 | X31 | X31.01667 | X31.03333 | X31.05 | X31.06667 | X31.08333 | X31.1 | X31.11667 | X31.13333 | X31.15 | X31.16667 | X31.18333 | X31.2 | X31.21667 | X31.23333 | X31.25 | X31.26667 | X31.28333 | X31.3 | X31.31667 | X31.33333 | X31.35 | X31.36667 | X31.38333 | X31.4 | X31.41667 | X31.43333 | X31.45 | X31.46667 | X31.48333 | X31.5 | X31.51667 | X31.53333 | X31.55 | X31.56667 | X31.58333 | X31.6 | X31.61667 | X31.63333 | X31.65 | X31.66667 | X31.68333 | X31.7 | X31.71667 | X31.73333 | X31.75 | X31.76667 | X31.78333 | X31.8 | X31.81667 | X31.83333 | X31.85 | X31.86667 | X31.88333 | X31.9 | X31.91667 | X31.93333 | X31.95 | X31.96667 | X31.98333 | X32 | X32.01667 | X32.03333 | X32.05 | X32.06667 | X32.08333 | X32.1 | X32.11667 | X32.13333 | X32.15 | X32.16667 | X32.18333 | X32.2 | X32.21667 | X32.23333 | X32.25 | X32.26667 | X32.28333 | X32.3 | X32.31667 | X32.33333 | X32.35 | X32.36667 | X32.38333 | X32.4 | X32.41667 | X32.43333 | X32.45 | X32.46667 | X32.48333 | X32.5 | X32.51667 | X32.53333 | X32.55 | X32.56667 | X32.58333 | X32.6 | X32.61667 | X32.63333 | X32.65 | X32.66667 | X32.68333 | X32.7 | X32.71667 | X32.73333 | X32.75 | X32.76667 | X32.78333 | X32.8 | X32.81667 | X32.83333 | X32.85 | X32.86667 | X32.88333 | X32.9 | X32.91667 | X32.93333 | X32.95 | X32.96667 | X32.98333 | X33 | X33.01667 | X33.03333 | X33.05 | X33.06667 | X33.08333 | X33.1 | X33.11667 | X33.13333 | X33.15 | X33.16667 | X33.18333 | X33.2 | X33.21667 | X33.23333 | X33.25 | X33.26667 | X33.28333 | X33.3 | X33.31667 | X33.33333 | X33.35 | X33.36667 | X33.38333 | X33.4 | X33.41667 | X33.43333 | X33.45 | X33.46667 | X33.48333 | X33.5 | X33.51667 | X33.53333 | X33.55 | X33.56667 | X33.58333 | X33.6 | X33.61667 | X33.63333 | X33.65 | X33.66667 | X33.68333 | X33.7 | X33.71667 | X33.73333 | X33.75 | X33.76667 | X33.78333 | X33.8 | X33.81667 | X33.83333 | X33.85 | X33.86667 | X33.88333 | X33.9 | X33.91667 | X33.93333 | X33.95 | X33.96667 | X33.98333 | X34 | X34.01667 | X34.03333 | X34.05 | X34.06667 | X34.08333 | X34.1 | X34.11667 | X34.13333 | X34.15 | X34.16667 | X34.18333 | X34.2 | X34.21667 | X34.23333 | X34.25 | X34.26667 | X34.28333 | X34.3 | X34.31667 | X34.33333 | X34.35 | X34.36667 | X34.38333 | X34.4 | X34.41667 | X34.43333 | X34.45 | X34.46667 | X34.48333 | X34.5 | X34.51667 | X34.53333 | X34.55 | X34.56667 | X34.58333 | X34.6 | X34.61667 | X34.63333 | X34.65 | X34.66667 | X34.68333 | X34.7 | X34.71667 | X34.73333 | X34.75 | X34.76667 | X34.78333 | X34.8 | X34.81667 | X34.83333 | X34.85 | X34.86667 | X34.88333 | X34.9 | X34.91667 | X34.93333 | X34.95 | X34.96667 | X34.98333 | X35 | X35.01667 | Wavelength_200 | X201 | X202 | X203 | X204 | X205 | X206 | X207 | X208 | X209 | X210 | X211 | X212 | X213 | X214 | X215 | X216 | X217 | X218 | X219 | X220 | X221 | X222 | X223 | X224 | X225 | X226 | X227 | X228 | X229 | X230 | X231 | X232 | X233 | X234 | X235 | X236 | X237 | X238 | X239 | X240 | X241 | X242 | X243 | X244 | X245 | X246 | X247 | X248 | X249 | X250 | X251 | X252 | X253 | X254 | X255 | X256 | X257 | X258 | X259 | X260 | X261 | X262 | X263 | X264 | X265 | X266 | X267 | X268 | X269 | X270 | X271 | X272 | X273 | X274 | X275 | X276 | X277 | X278 | X279 | X280 | X281 | X282 | X283 | X284 | X285 | X286 | X287 | X288 | X289 | X290 | X291 | X292 | X293 | X294 | X295 | X296 | X297 | X298 | X299 | X300 | X301 | X302 | X303 | X304 | X305 | X306 | X307 | X308 | X309 | X310 | X311 | X312 | X313 | X314 | X315 | X316 | X317 | X318 | X319 | X320 | X321 | X322 | X323 | X324 | X325 | X326 | X327 | X328 | X329 | X330 | X331 | X332 | X333 | X334 | X335 | X336 | X337 | X338 | X339 | X340 | X341 | X342 | X343 | X344 | X345 | X346 | X347 | X348 | X349 | X350 | X351 | X352 | X353 | X354 | X355 | X356 | X357 | X358 | X359 | X360 | X361 | X362 | X363 | X364 | X365 | X366 | X367 | X368 | X369 | X370 | X371 | X372 | X373 | X374 | X375 | X376 | X377 | X378 | X379 | X380 | X381 | X382 | X383 | X384 | X385 | X386 | X387 | X388 | X389 | X390 | X391 | X392 | X393 | X394 | X395 | X396 | X397 | X398 | X399 | X400 | X401 | X402 | X403 | X404 | X405 | X406 | X407 | X408 | X409 | X410 | X411 | X412 | X413 | X414 | X415 | X416 | X417 | X418 | X419 | X420 | X421 | X422 | X423 | X424 | X425 | X426 | X427 | X428 | X429 | X430 | X431 | X432 | X433 | X434 | X435 | X436 | X437 | X438 | X439 | X440 | X441 | X442 | X443 | X444 | X445 | X446 | X447 | X448 | X449 | X450 | X451 | X452 | X453 | X454 | X455 | X456 | X457 | X458 | X459 | X460 | X461 | X462 | X463 | X464 | X465 | X466 | X467 | X468 | X469 | X470 | X471 | X472 | X473 | X474 | X475 | X476 | X477 | X478 | X479 | X480 | X481 | X482 | X483 | X484 | X485 | X486 | X487 | X488 | X489 | X490 | X491 | X492 | X493 | X494 | X495 | X496 | X497 | X498 | X499 | X500 | X501 | X502 | X503 | X504 | X505 | X506 | X507 | X508 | X509 | X510 | X511 | X512 | X513 | X514 | X515 | X516 | X517 | X518 | X519 | X520 | X521 | X522 | X523 | X524 | X525 | X526 | X527 | X528 | X529 | X530 | X531 | X532 | X533 | X534 | X535 | X536 | X537 | X538 | X539 | X540 | X541 | X542 | X543 | X544 | X545 | X546 | X547 | X548 | X549 | X550 | X551 | X552 | X553 | X554 | X555 | X556 | X557 | X558 | X559 | X560 | X561 | X562 | X563 | X564 | X565 | X566 | X567 | X568 | X569 | X570 | X571 | X572 | X573 | X574 | X575 | X576 | X577 | X578 | X579 | X580 | X581 | X582 | X583 | X584 | X585 | X586 | X587 | X588 | X589 | X590 | X591 | X592 | X593 | X594 | X595 | X596 | X597 | X598 | X599 | X600 | X601 | X602 | X603 | X604 | X605 | X606 | X607 | X608 | X609 | X610 | X611 | X612 | X613 | X614 | X615 | X616 | X617 | X618 | X619 | X620 | X621 | X622 | X623 | X624 | X625 | X626 | X627 | X628 | X629 | X630 | X631 | X632 | X633 | X634 | X635 | X636 | X637 | X638 | X639 | X640 | X641 | X642 | X643 | X644 | X645 | X646 | X647 | X648 | X649 | X650 | X651 | X652 | X653 | X654 | X655 | X656 | X657 | X658 | X659 | X660 | X661 | X662 | X663 | X664 | X665 | X666 | X667 | X668 | X669 | X670 | X671 | X672 | X673 | X674 | X675 | X676 | X677 | X678 | X679 | X680 | X681 | X682 | X683 | X684 | X685 | X686 | X687 | X688 | X689 | X690 | X691 | X692 | X693 | X694 | X695 | X696 | X697 | X698 | X699 | X700 | X701 | X702 | X703 | X704 | X705 | X706 | X707 | X708 | X709 | X710 | X711 | X712 | X713 | X714 | X715 | X716 | X717 | X718 | X719 | X720 | X721 | X722 | X723 | X724 | X725 | X726 | X727 | X728 | X729 | X730 | X731 | X732 | X733 | X734 | X735 | X736 | X737 | X738 | X739 | X740 | X741 | X742 | X743 | X744 | X745 | X746 | X747 | X748 | X749 | X750 | X751 | X752 | X753 | X754 | X755 | X756 | X757 | X758 | X759 | X760 | X761 | X762 | X763 | X764 | X765 | X766 | X767 | X768 | X769 | X770 | X771 | X772 | X773 | X774 | X775 | X776 | X777 | X778 | X779 | X780 | X781 | X782 | X783 | X784 | X785 | X786 | X787 | X788 | X789 | X790 | X791 | X792 | X793 | X794 | X795 | X796 | X797 | X798 | X799 | X800 | X801 | X802 | X803 | X804 | X805 | X806 | X807 | X808 | X809 | X810 | X811 | X812 | X813 | X814 | X815 | X816 | X817 | X818 | X819 | X820 | X821 | X822 | X823 | X824 | X825 | X826 | X827 | X828 | X829 | X830 | X831 | X832 | X833 | X834 | X835 | X836 | X837 | X838 | X839 | X840 | X841 | X842 | X843 | X844 | X845 | X846 | X847 | X848 | X849 | X850 | X851 | X852 | X853 | X854 | X855 | X856 | X857 | X858 | X859 | X860 | X861 | X862 | X863 | X864 | X865 | X866 | X867 | X868 | X869 | X870 | X871 | X872 | X873 | X874 | X875 | X876 | X877 | X878 | X879 | X880 | X881 | X882 | X883 | X884 | X885 | X886 | X887 | X888 | X889 | X890 | X891 | X892 | X893 | X894 | X895 | X896 | X897 | X898 | X899 | X900 | X1.10 | X2.10 | X3.10 | X4.10 | X5.10 | X6.10 | X7.10 | X8.10 | X9.10 | X10.10 | X11.10 | X12.10 | X13.10 |
##
## First 5 rows and 5 columns of raw data:
| Unique.Id | Sample.type | X0 | X0.1 | X0.117 |
|---|---|---|---|---|
| PC1 | Cancer | -3.63e-03 | -0.002010 | -1.52e-03 |
| PC2 | Cancer | -7.33e-05 | -0.000210 | -3.70e-04 |
| PC3 | Cancer | -1.47e-05 | 0.000195 | 1.75e-04 |
| PC4 | Cancer | -8.60e-04 | 0.000411 | 7.00e-04 |
| PC5 | Cancer | 1.12e-04 | 0.000057 | 1.93e-05 |
This section covers the initial processing of the raw data to prepare
it for feature engineering and modeling. This includes extracting sample
IDs, class labels, and features, converting features to numeric, and
handling any samples with missing class labels. This prepared dataset
(df_initial_features) will be the input for feature
selection methods.
## Number of features identified initially: 2811
## Dimensions of initial feature data (df_initial_features) after NA Class removal: 75 samples, 2812 columns (including Class).
## Class distribution in df_initial_features:
| Var1 | Freq |
|---|---|
| Normal | 48 |
| Cancer | 27 |
We explore two feature selection methods: Near-Zero Variance (NZV) filtering and Boruta.
NZV filtering removes features with little to no variance.
## Number of near-zero variance features removed: 0
## Number of features remaining after NZV filtering: 2811
## Dimensions of NZV-filtered feature data (df_nzv_features): 75 samples, 2812 columns (including Class).
Boruta is a wrapper algorithm built around Random Forest that iteratively compares the importance of original features with that of random shadow features.
Note: Boruta can be computationally intensive,
especially with a large number of features or samples. The
maxRuns parameter controls the maximum number of Random
Forest runs.
## Starting Boruta feature selection... This may take some time.
## Boruta feature selection complete.
## Number of features selected by Boruta (Confirmed only): 7
## Dimensions of Boruta-selected feature data (df_boruta_features): 75 samples, 8 columns (including Class).
# Helper function for training and preprocessing
train_rf_model <- function(data_to_train, model_name_suffix, train_control_config, tune_grid_config) {
set.seed(123)
train_idx_helper <- createDataPartition(data_to_train$Class, p = .80, list = FALSE, times = 1)
cv_train_data_helper <- data_to_train[train_idx_helper, ]
original_full_data_rownames <- rownames(data_to_train)
holdout_indices_from_original <- which(!original_full_data_rownames %in% rownames(cv_train_data_helper))
cat(paste("\n--- Training RF Model:", model_name_suffix, "---\n"))
cat("Dimensions of CV training data for", model_name_suffix, ":", dim(cv_train_data_helper)[1],"x",dim(cv_train_data_helper)[2],"\n")
features_for_preproc_helper <- cv_train_data_helper[, -which(names(cv_train_data_helper) == "Class"), drop = FALSE]
# Ensure there are features to preprocess
if(ncol(features_for_preproc_helper) == 0) {
cat("Warning: No features to preprocess for model", model_name_suffix, ". Model training might fail or be trivial.\n")
# Create a dummy processed_cv_train_data if no features, to allow train to proceed (it will likely be a majority class classifier)
processed_cv_train_data_helper <- cv_train_data_helper[, "Class", drop=FALSE]
current_preProcValues_helper <- NULL # No preprocessor
} else {
current_preProcValues_helper <- preProcess(features_for_preproc_helper, method = c("center", "scale", "medianImpute"))
processed_cv_train_features_helper <- predict(current_preProcValues_helper, features_for_preproc_helper)
processed_cv_train_data_helper <- cbind(processed_cv_train_features_helper, Class = cv_train_data_helper$Class)
}
minority_size_helper <- min(table(processed_cv_train_data_helper$Class))
if ("sampling" %in% names(train_control_config) && !is.null(train_control_config$sampling) && minority_size_helper < 10 && ncol(features_for_preproc_helper) > 0) {
# Warning for SMOTE
}
set.seed(456)
if(ncol(processed_cv_train_data_helper) <= 1 && !"Class" %in% colnames(processed_cv_train_data_helper)) { # Only class or empty
cat("Skipping model training for", model_name_suffix, "due to no predictive features after preprocessing.\n")
return(list(model = NULL, preprocessor = current_preProcValues_helper, holdout_indices_original = holdout_indices_from_original))
}
model_fit <- tryCatch({
train(
Class ~ .,
data = processed_cv_train_data_helper,
method = "rf",
trControl = train_control_config,
metric = "ROC",
tuneGrid = tune_grid_config,
importance = TRUE,
na.action = na.omit
)
}, error = function(e) {
cat("Error during training for", model_name_suffix, ":", e$message, "\n")
return(NULL) # Return NULL if training fails
})
if(!is.null(model_fit)){
cat("Training complete for RF Model:", model_name_suffix, "\n")
}
return(list(model = model_fit,
preprocessor = current_preProcValues_helper,
holdout_indices_original = holdout_indices_from_original))
}
create_rf_tune_grid <- function(data_frame_for_features) {
# Check if there are any feature columns other than 'Class'
feature_cols <- setdiff(colnames(data_frame_for_features), "Class")
if(length(feature_cols) == 0) {
# No features to tune mtry for, return a default grid that RF can handle (e.g. mtry=1 if Class is the only col, though RF will fail)
# Or, more robustly, this case should be handled before calling train (e.g. skip training)
# For now, let's assume if this function is called, there's at least one feature.
# If features_for_preproc_helper is empty, this will lead to num_feats = 0
warning("create_rf_tune_grid called with no feature columns. mtry grid will be minimal.")
return(expand.grid(mtry = 1)) # Default mtry for RF if p=0 is problematic
}
num_feats <- length(feature_cols)
default_m <- floor(sqrt(num_feats))
grid_vals <- unique(c(
max(1, floor(default_m / 2)), # ensure mtry is at least 1
max(1, default_m),
min(num_feats, max(1, default_m * 2)),
min(num_feats, 10),
min(num_feats, 20)
))
grid_vals <- sort(unique(grid_vals[grid_vals <= num_feats & grid_vals > 0]))
if(length(grid_vals) == 0) { grid_vals <- c(max(1, num_feats)) }
return(expand.grid(mtry = grid_vals))
}
trainControl Configurationbase_train_control <- trainControl(
method = "cv", number = 10, summaryFunction = twoClassSummary,
classProbs = TRUE, verboseIter = FALSE, allowParallel = TRUE
)
Four Random Forest model scenarios are trained and compared: 1. All Features + SMOTE: Uses all initial features with SMOTE. 2. NZV Features + SMOTE: Uses NZV-filtered features with SMOTE. 3. Boruta Features + SMOTE (New Main Model): Uses Boruta-selected features with SMOTE. This is now considered the primary model. 4. Boruta Features + No SMOTE: Uses Boruta-selected features without SMOTE (for SMOTE impact on best feature set).
Scenario 1: All Features + SMOTE Tune grid (mtry values): 10, 20, 26, 53, 106
— Training RF Model: AllFeatures_SMOTE — Dimensions of CV training data for AllFeatures_SMOTE : 61 x 2812 Training complete for RF Model: AllFeatures_SMOTE Best mtry: 20 CV AUCROC (best tune): 0.8062
Scenario 2: NZV Features + SMOTE Tune grid (mtry values): 10, 20, 26, 53, 106
— Training RF Model: NZVFeatures_SMOTE — Dimensions of CV training data for NZVFeatures_SMOTE : 61 x 2812 Training complete for RF Model: NZVFeatures_SMOTE Best mtry: 20 CV AUCROC (best tune): 0.8062
Scenario 3: Boruta Features + SMOTE (Main Model) Tune grid (mtry values): 1, 2, 4, 7
— Training RF Model: BorutaFeatures_SMOTE (Main) — Dimensions of CV training data for BorutaFeatures_SMOTE (Main) : 61 x 8 Training complete for RF Model: BorutaFeatures_SMOTE (Main) Best mtry: 4 CV AUCROC (best tune): 0.7375
Scenario 4: Boruta Features + NO SMOTE Tune grid (mtry values): 1, 2, 4, 7
— Training RF Model: BorutaFeatures_NoSMOTE — Dimensions of CV training data for BorutaFeatures_NoSMOTE : 61 x 8 Training complete for RF Model: BorutaFeatures_NoSMOTE Best mtry: 1 CV AUCROC (best tune): 0.7458
Hyperparameter Tuning Profile for the Main Model (Boruta Features + SMOTE).
This plot compares the best cross-validated AUCROC from models trained with All Features, NZV-filtered Features, and Boruta-selected Features (all with SMOTE and hyperparameter tuning).
Impact of Feature Selection Method on Cross-Validated AUCROC.
Impact of SMOTE on Performance (Boruta Features, Best Tune).
CV Performance Distribution (Main Boruta Model).
The main model (Boruta Features + SMOTE, best mtry) is
evaluated on a separate hold-out test set.
## **Confusion Matrix (Hold-out - Main Boruta Model):**
##
## **Overall Statistics (Hold-out - Main Boruta Model):**
##
## **Class Statistics (Hold-out - Main Boruta Model):**
##
## AUC (Hold-out - Main Boruta Model): 0.9333
ROC Curve for Main Boruta Model on Hold-out Test Set.
The plot below shows the top features identified by the main model
(Boruta Features + SMOTE, best mtry).
Top Important Features (Main Boruta Model).
This report presented a comprehensive machine learning pipeline, including data preprocessing, feature selection (NZV and Boruta), hyperparameter tuning for a Random Forest model, class imbalance handling (SMOTE), and thorough evaluation using 10-fold cross-validation and a hold-out test set. The main model utilized features selected by Boruta, along with SMOTE and hyperparameter tuning.
Key findings include: - The performance of models with different
mtry values. - The impact of different feature selection
methods (All Features vs. NZV vs. Boruta) on model performance (AUCROC).
- The effect of SMOTE on AUCROC, Sensitivity, and Specificity for the
Boruta-selected feature set. - The stability of the main Boruta model’s
performance across 10 CV folds. - The final performance of the main
Boruta model on an unseen hold-out test set. - The top features
contributing to the main Boruta model’s predictions.
These results provide a solid foundation for understanding the dataset and the model’s predictive capabilities. Boruta feature selection, in conjunction with other pipeline steps, aimed to identify a robust and informative set of features for classification.