In this demo, we will use the colon data set from the “rda” package
# Load the colon data form the rda package
data(colon,package = "rda")
#FRESA.CAD requires a data frame. One of the columns must be the class
Colon <- as.data.frame(cbind(Class = colon.y, colon.x))
#The class should be 0 for controls and 1 for cases
Colon$Class <- Colon$Class - 1
The colon cancer dataset has 62 observations and 2000 features. I will cross-validate (CV) a quadratic discriminant analysis (QDA) classifier that predicts the presence of cancer. Before estimating the QDA parameters a univariate filter based on the Wilcoxon-test will select the top 12 features. (25% of 80% of the 62 samples, with a Pearson correlation lower than 0.95)
The CV will select 80% of the samples randomly for training, and the other 20% will be a holdout for validation.
The CV will be repeated 75 times; hence, on average each sample will have 15 estimations.
# Cross validate a QDA classifier using only the top ranked features
QDAcv <- randomCV(Colon,"Class",
MASS::qda,trainFraction = 0.8,
repetitions = 75,
featureSelectionFunction = univariate_Wilcoxon,
featureSelection.control = list(limit = 0.10,thr = 0.95))
ADAcv <- randomCV(Colon,"Class",
adaboost,
trainSampleSets=QDAcv$trainSamplesSets,
featureSelectionFunction = univariate_Wilcoxon,
featureSelection.control = list(limit = 0.10,thr = 0.95),asFactor = TRUE,nIter=10
)
BESScv <- randomCV(Colon,"Class",BESS,trainSampleSets=QDAcv$trainSamplesSets)
BSWiMScv <- randomCV(Colon,"Class",BSWiMS.model,trainSampleSets=QDAcv$trainSamplesSets)
GMVEBSWiMSCV <- randomCV(Colon,"Class",GMVEBSWiMS,trainSampleSets=QDAcv$trainSamplesSets)
bs <- predictionStats_binary(GMVEBSWiMSCV$medianTest,"GMVE:BSWiMS")
GMVEBSWiMSCV$featureFrequency
BOOST_BSWiMSCV <- randomCV(Colon,"Class",BOOST_BSWiMS,trainSampleSets=QDAcv$trainSamplesSets)
bs <- predictionStats_binary(BOOST_BSWiMSCV$medianTest,"BOOST_BSWiMS")
BOOST_BSWiMSCV$featureFrequency
After CV, we can visuzlize the ROC and extract the test performance:
The QDA,BeSS, ADA and BSWiMS CV will be compared to ohter common classifiers using the FRESA::BinaryBenchmark() function.
The same training and test sets will be used in all classifiers.
#comparing the cross validation to standard classifiers
par(mfrow = c(2,2),cex = 0.45);
ClassBenchmark <- BinaryBenchmark(referenceCV = list(QDA=QDAcv,BeSS=BESScv,ADABOOST=ADAcv,BSWiMS=BSWiMScv,GMVEBSWiMSCV=GMVEBSWiMSCV,BOOST_BSWiM=BOOST_BSWiMSCV))
par(mfrow = c(1,1),cex = 1.0);
Once done, we can compare CV test results using the plot() function. The plot function also generates summary tables of the CV results.
#ploting the results
op <-par();
prBenchmark <- plot(ClassBenchmark)
pander::pander(prBenchmark$metrics,caption = "Clasifier Performance",round = 3)
| BSWiMS | LASSO | ENS | BOOST_BSWiM | KNN | ADABOOST | RF | SVM | GMVEBSWiMSCV | BeSS | QDA | RPART | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BER | 0.114 | 0.116 | 0.117 | 0.128 | 0.128 | 0.149 | 0.152 | 0.152 | 0.162 | 0.162 | 0.174 | 0.257 |
| ACC | 0.887 | 0.887 | 0.887 | 0.871 | 0.871 | 0.855 | 0.855 | 0.855 | 0.855 | 0.855 | 0.839 | 0.774 |
| AUC | 0.889 | 0.857 | 0.877 | 0.877 | 0.853 | 0.879 | 0.866 | 0.847 | 0.881 | 0.874 | 0.88 | 0.773 |
| SEN | 0.9 | 0.9 | 0.9 | 0.875 | 0.875 | 0.875 | 0.875 | 0.875 | 0.9 | 0.9 | 0.875 | 0.85 |
| SPE | 0.864 | 0.864 | 0.864 | 0.864 | 0.864 | 0.818 | 0.818 | 0.818 | 0.773 | 0.773 | 0.773 | 0.636 |
| CIDX | 0.875 | 0.844 | 0.88 | 0.865 | 0.839 | 0.854 | 0.844 | 0.844 | 0.854 | 0.74 | 0.865 | 0.708 |
#pander::pander(prBenchmark$metrics_filter,caption = "Average Filter Performance",round = 3)
par(op);
gplots::heatmap.2(t(as.matrix(ClassBenchmark$featureSelectionFrequency[1:50,])),trace = "none",mar = c(10,10),main = "Features",cexRow = 0.5,cexCol = 0.5)