The objective of this project was to build classifiers to predict whether an individual has heart disease based clinical data obtained from the Cleveland Clinical Foundation, the Hungarian Institute of Cardiology (Budapest), the V.A Medical Center (Long Beach CA) and the University Hospital Zurich (Switzerland) UCI Machine Learning Repository. In Phase I, we cleaned the data and re-categorised the target feature so that it was binary. In Phase II, we have built three binary-classifiers trained on the cleaned data. The rest of this report is organised as follows. Section 2 describes an overview of our methodology. Section 3 discusses the classifiers fine-tuning process and detailed performance analysis of each classifier. Section 4 compares the performance of the classifiers using the same resampling method. Section 5 critiques our methodology. The last section concludes with a summary.
The variable description is produced here from the heart-disease.names file:
The target feature has two classes and hence it is a binary classification problem. To reiterate, the goal is to predict whether a person has heart disease.
We considered three classifiers - Naive Bayes (NB), Random Forest (RF), and \(K\)-Nearest Neighbour (KNN). The NB was the baseline classifier. Each classifier was trained to make probability predictions so that we were able to adjust the prediction threshold to refine the performance. We split the full data set into a 70 % training set and 30 % test set. Each set resembled the full data by having the same proportion of target classes i.e. approximately 45 % of patients having sign heart disease and 55 % exhibiting symptoms of heart disease. For the fine-tuning process, we ran five-folded cross-validation stratified sampling on each classifier. Stratified sampling was used to cater for the slight class imbalance of the target feature.
Next, for each classifer, we determined the optimal probability threshold. Using the tuned hyperparameters and the optimal thresholds, we made predictions on the test data. During model training (hyperparameter tuning and threshold adjustment), we relied on mean misclassification error rate (mmce). In addition to mmce, we also used the confusion matrix on the test data to evaluate the classifiers’ performance. The modelling was implemented in R with the mlr package. Bischl et al. (2016)
Since the training set might have unwittingly excluded rare instances, the NB classifier may produce some fitted zero probabilities as predictions. To mitigate this, we ran a grid search to determine the optimal value of the Laplacian smoothing parameter. Using the stratified sampling discussed in the previous section, we experimented using values ranging from 0 to 30.
The optimal Laplacian parameter was 10 with a mean test error of 0.205.
We tune-fined the number of variables randomly sampled as candidates at each split (i.e. mtry). For a classification problem, Breiman (2001) suggests mtry = \(\sqrt{p}\) where \(p\) is the number of descriptive features. In our case, \(\sqrt{p} = \sqrt{11}=3.31\). Therefore, we experimented with mtry = 2, 3, and 4. We left other hyperparameters, such as the number of trees to grow at the default value. The result was a mtry value of 2 with a mean test error of 0.193.
By using the optimal kernel, we ran a grid search on \(k=2,3,...20\). The outcome was k of 20 and mmce test error of 0.199.
Feature selection was used to identify an optimal subset of the available features. Selecting a subset of relevant features can make machine learning algorithm training faster, reduce complexity of the model, improve accuracy and reduce overfitting.There are three broard categoried of feature selection methods: filter methods, wrapper methods and embedded methods.
The SPSA-FSR wrapper method was used to select for relevant features (https://cran.r-project.org/web/packages/spFSR/vignettes/spFSR.html). Aksakalli and Malekipirbazai (2016)
The fiter method assigns an importance value to each feature. Based on these values the features are ranked and a feature subset is selected. The learner was fused with the filter method for training of each classification model.
The wrapper method used the performance of a learning classifier to access the usefulness of the feature set. In order to select a feature subset the learner was trained repeatedly on different fleature subsets and the subset which lead to the best learner performance was chosen.
Loaded dataset and removed redundant index column. The FBS feature contained logical (True/False) variables and this was chnged to numerical (1,0) variables.
#Load dataset
data <- read.csv('Heart_Disease_cleaned_data.csv', na = "unknown", stringsAsFactors = FALSE)
data <- data[,-1]
data[, sapply(data, is.character)] <- lapply( data[, sapply(data, is.character )], factor)
data$FBS<-as.numeric(data$FBS)
str(data)
## 'data.frame': 740 obs. of 13 variables:
## $ Age : int 63 67 67 37 41 56 62 57 63 53 ...
## $ Gender : Factor w/ 2 levels "female","male": 2 2 2 2 1 2 1 1 2 2 ...
## $ CP : Factor w/ 4 levels "asymptomatic",..: 4 1 1 3 2 2 1 1 1 1 ...
## $ Trestbps: int 145 160 120 130 130 120 140 120 130 140 ...
## $ Chol : int 233 286 229 250 204 236 268 354 254 203 ...
## $ FBS : num 1 0 0 0 0 0 0 0 0 1 ...
## $ RestECG : Factor w/ 3 levels "hypertropy","normal",..: 1 1 1 2 1 2 1 2 1 1 ...
## $ Thalach : int 150 108 129 187 172 178 160 163 147 155 ...
## $ Exang : Factor w/ 2 levels "no","yes": 1 2 2 1 1 1 1 2 1 2 ...
## $ Oldpeak : num 2.3 1.5 2.6 3.5 1.4 0.8 3.6 0.6 1.4 3.1 ...
## $ Slope : Factor w/ 3 levels "downsloping",..: 1 2 2 1 3 3 1 3 2 1 ...
## $ Thal : Factor w/ 3 levels "fixed defect",..: 1 2 3 2 2 2 2 2 3 3 ...
## $ Goal : int 0 1 1 0 0 0 1 0 1 1 ...
Determined the number of missing values in each column (Table ).The bar graphs for Slope and Thal (\(\beta\)-Thalassemia cardiomyopathy) show that these missing values comprise a significant proportion of these features.
| name | type | na | mean | disp | median | mad | min | max | nlevs |
|---|---|---|---|---|---|---|---|---|---|
| Age | integer | 0 | 53.0972973 | 9.4081267 | 54.0 | 10.3782 | 28 | 77.0 | 0 |
| Gender | factor | 0 | NA | 0.2351351 | NA | NA | 174 | 566.0 | 2 |
| CP | factor | 0 | NA | 0.4702703 | NA | NA | 37 | 392.0 | 4 |
| Trestbps | integer | 0 | 132.7540541 | 18.5812497 | 130.0 | 14.8260 | 0 | 200.0 | 0 |
| Chol | integer | 0 | 220.1364865 | 93.6145555 | 231.0 | 54.8562 | 0 | 603.0 | 0 |
| FBS | numeric | 0 | 0.1500000 | 0.3573129 | 0.0 | 0.0000 | 0 | 1.0 | 0 |
| RestECG | factor | 0 | NA | 0.3986486 | NA | NA | 120 | 445.0 | 3 |
| Thalach | integer | 0 | 138.7445946 | 25.8460815 | 140.0 | 29.6520 | 60 | 202.0 | 0 |
| Exang | factor | 0 | NA | 0.4000000 | NA | NA | 296 | 444.0 | 2 |
| Oldpeak | numeric | 0 | 0.8943243 | 1.0871598 | 0.5 | 0.7413 | -1 | 6.2 | 0 |
| Slope | factor | 209 | NA | NA | NA | NA | 48 | 310.0 | 3 |
| Thal | factor | 340 | NA | NA | NA | NA | 39 | 187.0 | 3 |
| Goal | integer | 0 | 0.5175676 | 0.5000293 | 1.0 | 0.0000 | 0 | 1.0 | 0 |
| x | |
|---|---|
| Age | 0 |
| Gender | 0 |
| CP | 0 |
| Trestbps | 0 |
| Chol | 0 |
| FBS | 0 |
| RestECG | 0 |
| Thalach | 0 |
| Exang | 0 |
| Oldpeak | 0 |
| Slope | 209 |
| Thal | 340 |
| Goal | 0 |
Several features contained a number of missing values (209 and 340 for Slope and Thal, respectively).
Figure. 1 Histograms showing the distribution of data for the (left panel) slope and (right panel) Thal features.
Figure 1 shows that these missing values comprised a signifiant proportion of these features.
The missing values in the Slope (slope of the peak exercise ST segment) and Thal (\(\beta\)-Thalassemia cardiomyopathy) features were imputed based on all features in the dataset.
##
## downsloping flat upsloping
## 48 310 173
##
## fixed defect normal reversible defect
## 39 187 174
##
## downsloping flat upsloping
## 48 370 322
##
## fixed defect normal reversible defect
## 39 327 374
| Age | Gender | CP | Trestbps | Chol | FBS | RestECG | Thalach | Exang | Oldpeak | Slope | Thal | Goal |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 63 | male | typical angina | 145 | 233 | 1 | hypertropy | 150 | no | 2.3 | downsloping | fixed defect | No Disease |
| 67 | male | asymptomatic | 160 | 286 | 0 | hypertropy | 108 | yes | 1.5 | flat | normal | Heart Disease |
| 67 | male | asymptomatic | 120 | 229 | 0 | hypertropy | 129 | yes | 2.6 | flat | reversible defect | Heart Disease |
| 37 | male | non-anginal pain | 130 | 250 | 0 | normal | 187 | no | 3.5 | downsloping | normal | No Disease |
| 41 | female | atypical angina | 130 | 204 | 0 | hypertropy | 172 | no | 1.4 | upsloping | normal | No Disease |
| 56 | male | atypical angina | 120 | 236 | 0 | normal | 178 | no | 0.8 | upsloping | normal | No Disease |
Most of the imputed values for Slope were assigned to ‘flat’ and ‘unsloping’ while imputed values for Thal were assigned to ‘normal’ and ‘reversible defect’.
The histogram and qq-plot for the ‘ST Depression’ feature is slightly right skewed (Figs. 2 and 3). It was not possible to apply a box-cox transformation since the ‘ST Depression’ feature contained several negative values. Therefore, we tested a number of different algorithms to determine which produced a more normal distribution of the data (Fig. 3).
Figure 2. QQ-plot of data for the ST Depression feature. The plot indicates that the data is right skewed.
Figure 3. Histogram and box plots illustrating the distribution of data for the ST Depression feature before (left) and after (right) transformation with a cubed root alrogithm.
Figure 4. QQ-plot of data for the ST Depression feature following transformation with a cubed root algorithm. The plot indicates that the data is more normally distributed following the cubed root transformation.
Cubed root transformation of the data lead to a more normal distribution. Several outliers were apparent in the boxplot for the transformed data (Figs. 3 and 4).
##
## Attaching package: 'GGally'
## The following object is masked from 'package:dplyr':
##
## nasa
Figure 5. Satter matrix for numerical features showing the distribution of data for various binary combinations of features.
We examined the scatter matrix of numerical features to identify possible correlations between various features (Fig. 5). There appeared to be small but significant correlation between patient age and Oldpeak (ST depression induced by exercise relative to rest). Several differences in distribution were also observed for Age versus Oldpeak and Trestbps versus Thalach.
The data were standardized to compensate for differences in the scales used for various features.
## [1] -8.969575e-18
## [1] 1
For each categorical variable column, for n factor levels there will be n-1 dummy variables. Several classifiers (e.g. kNN) can only handle numerical datasets.
The entire dataset was comprised of a combination of several smaller datasets obtained from hospitals situated in different countries. To avoid biases that may exist in various datasets (e.g. test conducted and/or interpretation of test results) the rows in the dataset were randomized prior to splitting into training and test sets.
##
## 0 1
## 0.5366795 0.4633205
##
## 0 1
## 0.472973 0.527027
The datasets were reasonably balanced and representative of the full dataset. We shall use training data for modeling and test data for model evaluation.
The data were initially modeled using all 17 features.
## Loading required package: kknn
##
## Attaching package: 'kknn'
## The following object is masked from 'package:caret':
##
## contr.dummy
## Type len Def Constr Req Tunable Trafo
## laplace numeric - 0 0 to Inf - TRUE -
## Type len Def Constr Req Tunable Trafo
## ntree integer - 500 1 to Inf - TRUE -
## mtry integer - - 1 to Inf - TRUE -
## replace logical - TRUE - - TRUE -
## classwt numericvector <NA> - 0 to Inf - TRUE -
## cutoff numericvector <NA> - 0 to 1 - TRUE -
## strata untyped - - - - FALSE -
## sampsize integervector <NA> - 1 to Inf - TRUE -
## nodesize integer - 1 1 to Inf - TRUE -
## maxnodes integer - - 1 to Inf - TRUE -
## importance logical - FALSE - - TRUE -
## localImp logical - FALSE - - TRUE -
## proximity logical - FALSE - - FALSE -
## oob.prox logical - - - Y FALSE -
## norm.votes logical - TRUE - - FALSE -
## do.trace logical - FALSE - - FALSE -
## keep.forest logical - TRUE - - FALSE -
## keep.inbag logical - FALSE - - FALSE -
## Type len Def Constr Req
## k integer - 7 1 to Inf -
## distance numeric - 2 0 to Inf -
## kernel discrete - optimal rectangular,triangular,epanechnikov,b... -
## scale logical - TRUE - -
## Tunable Trafo
## k TRUE -
## distance TRUE -
## kernel TRUE -
## scale TRUE -
For naive bayes we fine-tuned the Laplace parameter testing values between 0 and 30.
For random forest we fine-tuned mtry testing values of 2, 3, 4 (the number of variables randomly sampled as candidates at each split). Breiman, L. (2001), Random Forests, Machine Learning 45(1), 5-32.
## [Tune] Started tuning learner classif.naiveBayes for parameter set:
## Type len Def Constr Req Tunable Trafo
## laplace numeric - - 0 to 30 - TRUE -
## With control class: TuneControlGrid
## Imputation value: 1
## [Tune-x] 1: laplace=0
## [Tune-y] 1: mmce.test.mean=0.2065347; time: 0.0 min
## [Tune-x] 2: laplace=3.33
## [Tune-y] 2: mmce.test.mean=0.2065347; time: 0.0 min
## [Tune-x] 3: laplace=6.67
## [Tune-y] 3: mmce.test.mean=0.2065347; time: 0.0 min
## [Tune-x] 4: laplace=10
## [Tune-y] 4: mmce.test.mean=0.2065347; time: 0.0 min
## [Tune-x] 5: laplace=13.3
## [Tune-y] 5: mmce.test.mean=0.2065347; time: 0.0 min
## [Tune-x] 6: laplace=16.7
## [Tune-y] 6: mmce.test.mean=0.2065347; time: 0.0 min
## [Tune-x] 7: laplace=20
## [Tune-y] 7: mmce.test.mean=0.2065347; time: 0.0 min
## [Tune-x] 8: laplace=23.3
## [Tune-y] 8: mmce.test.mean=0.2065347; time: 0.0 min
## [Tune-x] 9: laplace=26.7
## [Tune-y] 9: mmce.test.mean=0.2065347; time: 0.0 min
## [Tune-x] 10: laplace=30
## [Tune-y] 10: mmce.test.mean=0.2065347; time: 0.0 min
## [Tune] Result: laplace=0 : mmce.test.mean=0.2065347
## [Tune] Started tuning learner classif.randomForest for parameter set:
## Type len Def Constr Req Tunable Trafo
## mtry discrete - - 2,3,4 - TRUE -
## With control class: TuneControlGrid
## Imputation value: 1
## [Tune-x] 1: mtry=2
## [Tune-y] 1: mmce.test.mean=0.1949216; time: 0.0 min
## [Tune-x] 2: mtry=3
## [Tune-y] 2: mmce.test.mean=0.1910941; time: 0.0 min
## [Tune-x] 3: mtry=4
## [Tune-y] 3: mmce.test.mean=0.1988051; time: 0.0 min
## [Tune] Result: mtry=3 : mmce.test.mean=0.1910941
## [Tune] Started tuning learner classif.kknn for parameter set:
## Type len Def Constr Req Tunable
## k discrete - - 2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,... - TRUE
## Trafo
## k -
## With control class: TuneControlGrid
## Imputation value: 1
## [Tune-x] 1: k=2
## [Tune-y] 1: mmce.test.mean=0.2336072; time: 0.0 min
## [Tune-x] 2: k=3
## [Tune-y] 2: mmce.test.mean=0.2336072; time: 0.0 min
## [Tune-x] 3: k=4
## [Tune-y] 3: mmce.test.mean=0.2336072; time: 0.0 min
## [Tune-x] 4: k=5
## [Tune-y] 4: mmce.test.mean=0.2085138; time: 0.0 min
## [Tune-x] 5: k=6
## [Tune-y] 5: mmce.test.mean=0.2104742; time: 0.0 min
## [Tune-x] 6: k=7
## [Tune-y] 6: mmce.test.mean=0.2162435; time: 0.0 min
## [Tune-x] 7: k=8
## [Tune-y] 7: mmce.test.mean=0.2162435; time: 0.0 min
## [Tune-x] 8: k=9
## [Tune-y] 8: mmce.test.mean=0.2084951; time: 0.0 min
## [Tune-x] 9: k=10
## [Tune-y] 9: mmce.test.mean=0.2046490; time: 0.0 min
## [Tune-x] 10: k=11
## [Tune-y] 10: mmce.test.mean=0.2027259; time: 0.0 min
## [Tune-x] 11: k=12
## [Tune-y] 11: mmce.test.mean=0.2027259; time: 0.0 min
## [Tune-x] 12: k=13
## [Tune-y] 12: mmce.test.mean=0.1969193; time: 0.0 min
## [Tune-x] 13: k=14
## [Tune-y] 13: mmce.test.mean=0.2007842; time: 0.0 min
## [Tune-x] 14: k=15
## [Tune-y] 14: mmce.test.mean=0.2027072; time: 0.0 min
## [Tune-x] 15: k=16
## [Tune-y] 15: mmce.test.mean=0.2007842; time: 0.0 min
## [Tune-x] 16: k=17
## [Tune-y] 16: mmce.test.mean=0.1949963; time: 0.0 min
## [Tune-x] 17: k=18
## [Tune-y] 17: mmce.test.mean=0.1969380; time: 0.0 min
## [Tune-x] 18: k=19
## [Tune-y] 18: mmce.test.mean=0.1930732; time: 0.0 min
## [Tune-x] 19: k=20
## [Tune-y] 19: mmce.test.mean=0.1969380; time: 0.0 min
## [Tune] Result: k=19 : mmce.test.mean=0.1930732
Figure 6. Plot for the optimization of the threshold for the kNn, Naive Bayes and Random Forest classifiers trained on all 17 features.
Figure 7. AUC plot for the kNN, Naive Bayes and Random Forest classifers trained on all 17 features.
The AUC plots were similar for the kNN, Naive Bayes and Random Forest classifiers trained on all 17 features (Fig. 7).
| Naive Bayes | Random Forest | kNN | |
|---|---|---|---|
| mmce | 0.1846847 | 0.1891892 | 0.1711712 |
| auc | 0.8889703 | 0.8832316 | 0.8903134 |
The k-Nearest Neighbours classifier performed the best, when the models were trained on all 17 features, with a mean misclassification error of 0.162 and auv value of 0.893 (Table 4). While the random forest classifier performed slightly better than the Naive Bayes model.
## predicted
## true 0 1
## 0 79 26 tpr: 0.75 fnr: 0.25
## 1 15 102 fpr: 0.13 tnr: 0.87
## ppv: 0.84 for: 0.2 lrp: 5.87 acc: 0.82
## fdr: 0.16 npv: 0.8 lrm: 0.28 dor: 20.66
##
##
## Abbreviations:
## tpr - True positive rate (Sensitivity, Recall)
## fpr - False positive rate (Fall-out)
## fnr - False negative rate (Miss rate)
## tnr - True negative rate (Specificity)
## ppv - Positive predictive value (Precision)
## for - False omission rate
## lrp - Positive likelihood ratio (LR+)
## fdr - False discovery rate
## npv - Negative predictive value
## acc - Accuracy
## lrm - Negative likelihood ratio (LR-)
## dor - Diagnostic odds ratio
## predicted
## true 0 1
## 0 88 17 tpr: 0.84 fnr: 0.16
## 1 25 92 fpr: 0.21 tnr: 0.79
## ppv: 0.78 for: 0.16 lrp: 3.92 acc: 0.81
## fdr: 0.22 npv: 0.84 lrm: 0.21 dor: 19.05
##
##
## Abbreviations:
## tpr - True positive rate (Sensitivity, Recall)
## fpr - False positive rate (Fall-out)
## fnr - False negative rate (Miss rate)
## tnr - True negative rate (Specificity)
## ppv - Positive predictive value (Precision)
## for - False omission rate
## lrp - Positive likelihood ratio (LR+)
## fdr - False discovery rate
## npv - Negative predictive value
## acc - Accuracy
## lrm - Negative likelihood ratio (LR-)
## dor - Diagnostic odds ratio
## predicted
## true 0 1
## 0 88 17 tpr: 0.84 fnr: 0.16
## 1 21 96 fpr: 0.18 tnr: 0.82
## ppv: 0.81 for: 0.15 lrp: 4.67 acc: 0.83
## fdr: 0.19 npv: 0.85 lrm: 0.2 dor: 23.66
##
##
## Abbreviations:
## tpr - True positive rate (Sensitivity, Recall)
## fpr - False positive rate (Fall-out)
## fnr - False negative rate (Miss rate)
## tnr - True negative rate (Specificity)
## ppv - Positive predictive value (Precision)
## for - False omission rate
## lrp - Positive likelihood ratio (LR+)
## fdr - False discovery rate
## npv - Negative predictive value
## acc - Accuracy
## lrm - Negative likelihood ratio (LR-)
## dor - Diagnostic odds ratio
Various feature selection algorithms were then used to determine whether classifier performance could be improved by using a subset of relevant features. Features were initially selected using the spFRS feature selection algorithm (https://cran.r-project.org/web/packages/spFSR/vignettes/spFSR.html). Aksakalli and Malekipirbazai (2016)
## [1] 18
Figure 8. Measurement of kNN classifier performance with each iteration.
The plot shows the global optimum and indicates that the best performance was obtained in less than 5 iterations.
Figure 9. kNN classifier performance for various values of k (nearest neighbours value).
## $k
## [1] 20
## mmce.test.mean
## 0.1891337
The optimal value of k for k-Nearest Neighbours was 16 with a mean misclassification error of 0.191.
## Learner classif.kknn from package kknn
## Type: classif
## Name: k-Nearest Neighbor; Short name: kknn
## Class: classif.kknn
## Properties: twoclass,multiclass,numerics,factors,prob
## Predict-Type: prob
## Hyperparameters: k=20
## SPSA-FSR begins:
## Wrapper = kknn
## Measure = auc
## Number of selected features = 0
##
## iter value st.dev num.ft best.value
## 1 0.84831 0.0386 10 0.84831 *
## 2 0.85045 0.03912 10 0.85045 *
## 3 0.84892 0.0329 9 0.85045
## 4 0.87116 0.03408 14 0.87116 *
## 5 0.8711 0.04188 13 0.87116
## 6 0.86768 0.0489 14 0.87116
## 7 0.87128 0.04088 15 0.87128 *
## 8 0.86703 0.02838 15 0.87128
## 9 0.88086 0.02832 15 0.88086 *
## 10 0.87785 0.03425 16 0.88086
## 11 0.8835 0.04131 16 0.8835 *
## 12 0.87788 0.03144 16 0.8835
## 13 0.8834 0.02277 16 0.8835
## 14 0.88251 0.02726 16 0.8835
## 15 0.88391 0.03351 16 0.88391 *
## 16 0.88336 0.02945 16 0.88391
## 17 0.88308 0.02628 16 0.88391
## 18 0.87488 0.03567 16 0.88391
## 19 0.87972 0.0272 16 0.88391
## 20 0.88147 0.01924 15 0.88391
## 21 0.88332 0.03599 15 0.88391
## 22 0.88397 0.02405 15 0.88397 *
## 23 0.88262 0.03212 15 0.88397
## 24 0.88081 0.03267 15 0.88397
## 25 0.87858 0.03862 15 0.88397
## 26 0.87614 0.03591 14 0.88397
## 27 0.88554 0.02922 14 0.88554 *
## 28 0.88435 0.03152 14 0.88554
## 29 0.88206 0.03217 14 0.88554
## 30 0.88248 0.03499 14 0.88554
## 31 0.88521 0.02602 14 0.88554
## 32 0.88237 0.02018 15 0.88554
## 33 0.88477 0.02616 15 0.88554
## 34 0.88343 0.01858 15 0.88554
## 35 0.88221 0.0382 15 0.88554
## 36 0.88076 0.0348 15 0.88554
## 37 0.89088 0.02396 13 0.89088 *
## 38 0.88535 0.02089 13 0.89088
## 39 0.88379 0.04471 14 0.89088
## 40 0.88274 0.02205 13 0.89088
## 41 0.88986 0.02839 13 0.89088
## 42 0.88507 0.02033 13 0.89088
## 43 0.88561 0.03659 13 0.89088
## 44 0.88304 0.03216 14 0.89088
## 45 0.8674 0.04019 11 0.89088
## 46 0.88801 0.04103 13 0.89088
## 47 0.88685 0.02698 14 0.89088
## 48 0.88564 0.0407 13 0.89088
## 49 0.8847 0.0281 13 0.89088
## 50 0.88606 0.03383 13 0.89088
## 51 0.88411 0.02569 13 0.89088
## 52 0.88402 0.03104 13 0.89088
## 53 0.88582 0.03655 13 0.89088
## 54 0.88753 0.03518 13 0.89088
## 55 0.88829 0.03083 13 0.89088
## 56 0.88744 0.03009 13 0.89088
## 57 0.88623 0.03301 13 0.89088
##
## Best iteration = 37
## Number of selected features = 13
## Best measure value = 0.89088
## Std. dev. of best measure = 0.02396
## Run time = 1.94 minutes.
## SPSA-FSR begins:
## Wrapper = kknn
## Measure = auc
## Number of selected features = 0
##
## iter value st.dev num.ft best.value
## 1 0.82636 0.03098 7 0.82636 *
## 2 0.85019 0.03777 14 0.85019 *
## 3 0.85066 0.03541 14 0.85066 *
## 4 0.84673 0.0315 14 0.85066
## 5 0.8519 0.02455 13 0.8519 *
## 6 0.84519 0.03769 15 0.8519
## 7 0.84752 0.04447 12 0.8519
## 8 0.85007 0.02717 12 0.8519
## 9 0.84774 0.02711 12 0.8519
## 10 0.84297 0.03564 14 0.8519
## 11 0.8581 0.0399 14 0.8581 *
## 12 0.8489 0.04299 14 0.8581
## 13 0.85235 0.03707 15 0.8581
## 14 0.84298 0.03577 15 0.8581
## 15 0.85267 0.02511 15 0.8581
## 16 0.85054 0.04824 15 0.8581
## 17 0.84493 0.02327 15 0.8581
## 18 0.84545 0.03688 15 0.8581
## 19 0.84181 0.02811 15 0.8581
## 20 0.85658 0.03847 15 0.8581
## 21 0.85719 0.02556 16 0.8581
## 22 0.85798 0.04043 16 0.8581
## 23 0.86486 0.03935 16 0.86486 *
## 24 0.86621 0.03321 15 0.86621 *
## 25 0.86392 0.03206 16 0.86621
## 26 0.85959 0.03544 16 0.86621
## 27 0.85792 0.02921 16 0.86621
## 28 0.86342 0.03107 15 0.86621
## 29 0.86362 0.03867 15 0.86621
## 30 0.87021 0.03082 15 0.87021 *
## 31 0.86098 0.03869 15 0.87021
## 32 0.86106 0.04692 15 0.87021
## 33 0.86138 0.03032 17 0.87021
## 34 0.86232 0.03679 15 0.87021
## 35 0.86744 0.02741 16 0.87021
## 36 0.8606 0.02932 15 0.87021
## 37 0.86306 0.03498 15 0.87021
## 38 0.86676 0.02865 15 0.87021
## 39 0.86185 0.02067 15 0.87021
## 40 0.86579 0.03168 15 0.87021
## 41 0.87107 0.03085 16 0.87107 *
## 42 0.86743 0.03407 16 0.87107
## 43 0.86444 0.0218 16 0.87107
## 44 0.8648 0.04288 15 0.87107
## 45 0.86293 0.02487 16 0.87107
## 46 0.86843 0.02907 16 0.87107
## 47 0.86944 0.03211 16 0.87107
## 48 0.864 0.02805 16 0.87107
## 49 0.86673 0.03224 16 0.87107
## 50 0.86923 0.02595 16 0.87107
## 51 0.86136 0.0474 16 0.87107
## 52 0.86106 0.03793 16 0.87107
## 53 0.86585 0.03256 17 0.87107
## 54 0.86247 0.03264 17 0.87107
## 55 0.87866 0.0466 16 0.87866 *
## 56 0.86805 0.0243 15 0.87866
## 57 0.87018 0.02627 16 0.87866
## 58 0.86824 0.03282 15 0.87866
## 59 0.86954 0.03391 15 0.87866
## 60 0.87084 0.02534 15 0.87866
## 61 0.86602 0.03214 15 0.87866
## 62 0.8375 0.0194 13 0.87866
## 63 0.84482 0.02915 13 0.87866
## 64 0.83977 0.03551 13 0.87866
## 65 0.86942 0.027 13 0.87866
## 66 0.86123 0.02637 15 0.87866
## 67 0.8612 0.03579 15 0.87866
## 68 0.86241 0.03492 16 0.87866
## 69 0.86966 0.03029 16 0.87866
## 70 0.8669 0.03697 15 0.87866
## 71 0.86324 0.0337 15 0.87866
## 72 0.87115 0.03101 14 0.87866
## 73 0.8626 0.04298 14 0.87866
## 74 0.86201 0.02229 14 0.87866
## 75 0.86425 0.03348 14 0.87866
##
## Best iteration = 55
## Number of selected features = 16
## Best measure value = 0.87866
## Std. dev. of best measure = 0.0466
## Run time = 2.75 minutes.
Figure 10. Scatter plot of mean accuracy rate for each iteration for the kNN classifier (left panel) with and (right panel) without hyperparameter tuning. Error bars of +/- 1 standard deviation are also shown.
| Features | Importance | Features | Importance |
|---|---|---|---|
| FBS | 0.60206 | CP.atypical.angina | 0.64466 |
| CP.typical.angina | 0.59768 | RestECG.normal | 0.61690 |
| Chol | 0.57066 | Oldpeak | 0.61149 |
| Thal.normal | 0.56516 | Thal.normal | 0.60425 |
| Gender.male | 0.56237 | Gender.male | 0.60006 |
| Trestbps | 0.55991 | CP.non.anginal.pain | 0.59784 |
| Oldpeak | 0.55713 | Slope.upsloping | 0.59186 |
| CP.atypical.angina | 0.55564 | CP.typical.angina | 0.58580 |
| Age | 0.55564 | RestECG.ST.T.wave.abnormality | 0.57317 |
| RestECG.normal | 0.54773 | FBS | 0.55879 |
The top 10 features selected by the spFSR algorithm for the kNN classified with and without tuned parameters were similar (Table 3). However, 4 of these features differed and the order of importance differed significantly depending on whether the classifier hyperparameters had been optimised.
Figures 11. Features selected by the spFRS algorithm fused with the kNN classifier and ranked according to importance.
spsa_kNN_1 <- spFSR::plotImportance(spsaMod_kNN_tuned)
Figures 12. Features selected by the spFRS algorithm fused with the kNN classifier with tuned hyperparameters and ranked according to importance.
Figure 13. AUC plot for the kNN classifer. kNN - kNN trained on all 17 features; kNN_spsa - kNN classifier trained on features selected using the spFRS algorithm; kNN_spsa_tuned - kNN classifier with tuned hyperparameters and trained on features selected using the spFRS algorithm.
The AUC plots were similar for the kNN classifier with and without optimization of the hyperparameters andwith or without spFRS feature selection (Fig. 13).
| kNN | kNN_spsa | kNN_spsa_tuned | |
|---|---|---|---|
| mmce | 0.1666667 | 0.2432432 | 0.2297297 |
| auc | 0.8919821 | 0.8698413 | 0.8553114 |
The kNN classifier appeared to performed better without hyperparameter optimization or spFRS feature selection with a mean misclassification error of 0.180 and AUC value of 0.885 (Table 6).
## predicted
## true 0 1
## 0 89 16 tpr: 0.85 fnr: 0.15
## 1 21 96 fpr: 0.18 tnr: 0.82
## ppv: 0.81 for: 0.14 lrp: 4.72 acc: 0.83
## fdr: 0.19 npv: 0.86 lrm: 0.19 dor: 25.43
##
##
## Abbreviations:
## tpr - True positive rate (Sensitivity, Recall)
## fpr - False positive rate (Fall-out)
## fnr - False negative rate (Miss rate)
## tnr - True negative rate (Specificity)
## ppv - Positive predictive value (Precision)
## for - False omission rate
## lrp - Positive likelihood ratio (LR+)
## fdr - False discovery rate
## npv - Negative predictive value
## acc - Accuracy
## lrm - Negative likelihood ratio (LR-)
## dor - Diagnostic odds ratio
## predicted
## true 0 1
## 0 81 24 tpr: 0.77 fnr: 0.23
## 1 30 87 fpr: 0.26 tnr: 0.74
## ppv: 0.73 for: 0.22 lrp: 3.01 acc: 0.76
## fdr: 0.27 npv: 0.78 lrm: 0.31 dor: 9.79
##
##
## Abbreviations:
## tpr - True positive rate (Sensitivity, Recall)
## fpr - False positive rate (Fall-out)
## fnr - False negative rate (Miss rate)
## tnr - True negative rate (Specificity)
## ppv - Positive predictive value (Precision)
## for - False omission rate
## lrp - Positive likelihood ratio (LR+)
## fdr - False discovery rate
## npv - Negative predictive value
## acc - Accuracy
## lrm - Negative likelihood ratio (LR-)
## dor - Diagnostic odds ratio
## predicted
## true 0 1
## 0 88 17 tpr: 0.84 fnr: 0.16
## 1 34 83 fpr: 0.29 tnr: 0.71
## ppv: 0.72 for: 0.17 lrp: 2.88 acc: 0.77
## fdr: 0.28 npv: 0.83 lrm: 0.23 dor: 12.64
##
##
## Abbreviations:
## tpr - True positive rate (Sensitivity, Recall)
## fpr - False positive rate (Fall-out)
## fnr - False negative rate (Miss rate)
## tnr - True negative rate (Specificity)
## ppv - Positive predictive value (Precision)
## for - False omission rate
## lrp - Positive likelihood ratio (LR+)
## fdr - False discovery rate
## npv - Negative predictive value
## acc - Accuracy
## lrm - Negative likelihood ratio (LR-)
## dor - Diagnostic odds ratio
## [1] 18
Figure 14. The plot shows that the best performance was obtained in less than 20 iterations.
Figure 15. Optimisation of mtry hyperparameter for the random forest classifier.
## $mtry
## [1] 3
## mmce.test.mean fpr.test.mean tpr.test.mean
## 0.1832898 0.2416667 0.8669481
The optimal of mtry was 2.0 with mean misclassification error of 0.183.
## SPSA-FSR begins:
## Wrapper = rf
## Measure = auc
## Number of selected features = 0
##
## iter value st.dev num.ft best.value
## 1 0.86272 0.0379 9 0.86272 *
## 2 0.86115 0.02728 10 0.86272
## 3 0.85701 0.02322 10 0.86272
## 4 0.86609 0.04009 9 0.86609 *
## 5 0.8605 0.01941 10 0.86609
## 6 0.85961 0.03981 11 0.86609
## 7 0.86795 0.03048 11 0.86795 *
## 8 0.86138 0.04024 10 0.86795
## 9 0.85896 0.0474 8 0.86795
## 10 0.86695 0.02844 12 0.86795
## 11 0.8633 0.03962 10 0.86795
## 12 0.86299 0.03528 11 0.86795
## 13 0.87491 0.04411 14 0.87491 *
## 14 0.88292 0.02198 12 0.88292 *
## 15 0.8816 0.02685 12 0.88292
## 16 0.87992 0.02055 12 0.88292
## 17 0.87133 0.03382 14 0.88292
## 18 0.88289 0.03541 13 0.88292
## 19 0.88868 0.03208 11 0.88868 *
## 20 0.88323 0.02402 9 0.88868
## 21 0.85849 0.03453 8 0.88868
## 22 0.87945 0.03793 10 0.88868
## 23 0.88755 0.02895 10 0.88868
## 24 0.8885 0.04092 10 0.88868
## 25 0.88434 0.03136 10 0.88868
## 26 0.88614 0.03342 10 0.88868
## 27 0.87897 0.03294 10 0.88868
## 28 0.88385 0.02315 10 0.88868
## 29 0.88283 0.03224 10 0.88868
## 30 0.88485 0.021 10 0.88868
## 31 0.88695 0.02621 10 0.88868
## 32 0.88595 0.02565 10 0.88868
## 33 0.88412 0.03572 10 0.88868
## 34 0.88482 0.03888 11 0.88868
## 35 0.8861 0.04683 11 0.88868
## 36 0.89087 0.02724 11 0.89087 *
## 37 0.88807 0.02869 11 0.89087
## 38 0.88831 0.04513 11 0.89087
## 39 0.88503 0.03356 11 0.89087
## 40 0.88332 0.02746 11 0.89087
## 41 0.89081 0.02688 11 0.89087
## 42 0.8839 0.02367 11 0.89087
## 43 0.88434 0.02563 11 0.89087
## 44 0.88813 0.0299 11 0.89087
## 45 0.89083 0.02874 11 0.89087
## 46 0.89094 0.04196 11 0.89094 *
## 47 0.89177 0.03083 11 0.89177 *
## 48 0.8912 0.03144 11 0.89177
## 49 0.89115 0.02317 11 0.89177
## 50 0.88551 0.02194 11 0.89177
## 51 0.88624 0.02697 11 0.89177
## 52 0.88715 0.04395 11 0.89177
## 53 0.89193 0.02504 11 0.89193 *
## 54 0.88365 0.02413 11 0.89193
## 55 0.88662 0.0264 11 0.89193
## 56 0.88724 0.02753 11 0.89193
## 57 0.87481 0.03371 11 0.89193
## 58 0.88056 0.03348 11 0.89193
## 59 0.88299 0.03959 12 0.89193
## 60 0.88644 0.02822 12 0.89193
## 61 0.87988 0.03305 11 0.89193
## 62 0.88789 0.02656 11 0.89193
## 63 0.89066 0.03677 11 0.89193
## 64 0.8841 0.04052 11 0.89193
## 65 0.88499 0.04131 11 0.89193
## 66 0.88205 0.04338 11 0.89193
## 67 0.8834 0.02699 11 0.89193
## 68 0.88533 0.02052 10 0.89193
## 69 0.88908 0.0302 10 0.89193
## 70 0.888 0.02186 10 0.89193
## 71 0.88535 0.02858 10 0.89193
## 72 0.88556 0.02369 10 0.89193
## 73 0.88224 0.0271 10 0.89193
##
## Best iteration = 53
## Number of selected features = 11
## Best measure value = 0.89193
## Std. dev. of best measure = 0.02504
## Run time = 14.36 minutes.
## SPSA-FSR begins:
## Wrapper = rf
## Measure = auc
## Number of selected features = 0
##
## iter value st.dev num.ft best.value
## 1 0.8649 0.03696 9 0.8649 *
## 2 0.8718 0.03475 12 0.8718 *
## 3 0.868 0.02671 11 0.8718
## 4 0.87638 0.03255 10 0.87638 *
## 5 0.86499 0.0383 9 0.87638
## 6 0.86413 0.03327 8 0.87638
## 7 0.88665 0.02856 12 0.88665 *
## 8 0.88625 0.01705 12 0.88665
## 9 0.8672 0.03854 10 0.88665
## 10 0.87854 0.03744 14 0.88665
## 11 0.88368 0.03347 12 0.88665
## 12 0.88164 0.0404 12 0.88665
## 13 0.88081 0.03335 12 0.88665
## 14 0.88263 0.02778 12 0.88665
## 15 0.88477 0.04211 12 0.88665
## 16 0.88378 0.0286 12 0.88665
## 17 0.88649 0.03481 12 0.88665
## 18 0.88033 0.03507 12 0.88665
## 19 0.88801 0.03302 12 0.88801 *
## 20 0.88543 0.02057 12 0.88801
## 21 0.8822 0.03562 12 0.88801
## 22 0.88829 0.03345 12 0.88829 *
## 23 0.88407 0.03459 12 0.88829
## 24 0.8837 0.02414 12 0.88829
## 25 0.88454 0.01784 12 0.88829
## 26 0.87858 0.04016 12 0.88829
## 27 0.8888 0.02776 12 0.8888 *
## 28 0.8845 0.0399 12 0.8888
## 29 0.885 0.02445 12 0.8888
## 30 0.88197 0.03706 12 0.8888
## 31 0.88549 0.02919 12 0.8888
## 32 0.88436 0.03498 12 0.8888
## 33 0.88245 0.02972 12 0.8888
## 34 0.88484 0.0286 12 0.8888
## 35 0.87648 0.0446 12 0.8888
## 36 0.88215 0.0259 12 0.8888
## 37 0.88468 0.02487 12 0.8888
## 38 0.88321 0.03639 12 0.8888
## 39 0.87934 0.03124 12 0.8888
## 40 0.88549 0.03395 12 0.8888
## 41 0.88052 0.02714 12 0.8888
## 42 0.89161 0.02642 11 0.89161 *
## 43 0.8916 0.03043 11 0.89161
## 44 0.88838 0.02369 11 0.89161
## 45 0.88744 0.03543 11 0.89161
## 46 0.88931 0.02391 11 0.89161
## 47 0.88411 0.03539 11 0.89161
## 48 0.88612 0.02604 11 0.89161
## 49 0.89115 0.02975 11 0.89161
## 50 0.88347 0.03438 11 0.89161
## 51 0.89053 0.02372 11 0.89161
## 52 0.89066 0.03223 11 0.89161
## 53 0.88373 0.04086 11 0.89161
## 54 0.89195 0.02719 11 0.89195 *
## 55 0.89038 0.0357 11 0.89195
## 56 0.88865 0.04041 11 0.89195
## 57 0.88921 0.03291 11 0.89195
## 58 0.88366 0.03933 11 0.89195
## 59 0.89088 0.03154 11 0.89195
## 60 0.88678 0.02984 11 0.89195
## 61 0.88801 0.02385 11 0.89195
## 62 0.89269 0.03181 11 0.89269 *
## 63 0.88914 0.0287 11 0.89269
## 64 0.88966 0.02655 11 0.89269
## 65 0.88857 0.03153 12 0.89269
## 66 0.8864 0.03088 12 0.89269
## 67 0.88233 0.03154 11 0.89269
## 68 0.88283 0.02783 11 0.89269
## 69 0.88734 0.03079 11 0.89269
## 70 0.88569 0.03236 11 0.89269
## 71 0.88855 0.01847 11 0.89269
## 72 0.88538 0.03167 11 0.89269
## 73 0.88669 0.03532 11 0.89269
## 74 0.88822 0.02176 11 0.89269
## 75 0.88294 0.04056 11 0.89269
## 76 0.88692 0.02634 11 0.89269
## 77 0.891 0.03291 11 0.89269
## 78 0.88765 0.02195 11 0.89269
## 79 0.88675 0.02782 11 0.89269
## 80 0.88596 0.02435 11 0.89269
## 81 0.88209 0.03122 11 0.89269
## 82 0.88914 0.03077 11 0.89269
##
## Best iteration = 62
## Number of selected features = 11
## Best measure value = 0.89269
## Std. dev. of best measure = 0.03181
## Run time = 16 minutes.
Figure 16. Scatter plots of mean accuracy rate for each iteration for the random forest classifier (left panel) with and (right panel) without hyperparameter tuning. Error bars of +/- 1 standard deviation are also shown.
| Features | Importance | Features | Importance |
|---|---|---|---|
| CP.non.anginal.pain | 0.86828 | Slope.upsloping | 0.67339 |
| Oldpeak | 0.77995 | CP.atypical.angina | 0.61807 |
| Exang | 0.74931 | Oldpeak | 0.61475 |
| Thal.normal | 0.72170 | CP.typical.angina | 0.59232 |
| CP.atypical.angina | 0.71092 | Thal.normal | 0.58233 |
| RestECG.ST.T.wave.abnormality | 0.62218 | CP.non.anginal.pain | 0.57465 |
| RestECG.normal | 0.61559 | FBS | 0.56930 |
| Gender.male | 0.61351 | Chol | 0.56317 |
| FBS | 0.58535 | RestECG.normal | 0.55036 |
| Chol | 0.58038 | Gender.male | 0.54852 |
The top 10 features selected using the spFRS algorithm were similar for the random forest classifier with and without hyperparameter optimization and features selection (Table 7).
Figures 17. Features selected by the spFRS algorithm fused with the random forest classifier and ranked according to importance.
spFSR::plotImportance(spsaMod_rf_tuned)
Figures 18. Features selected by the spFRS algorithm fused with the random forest classifier with optimized hyperparameters and ranked according to importance.
Figure 19. AUC plot for the random forest classifer. RF - Random forest classifier trained on all 17 features; RF_spsa - Random forest classifier trained on features selected using the spFRS algorithm; RF_spsa_tuned - Random forest classifier with tuned hyperparameters and trained on features selected using the spFRS algorithm.
RF <- performance(pred_on_test_rf, measures = list(mmce, auc))
RF_spsa <- performance(pred_on_test_spsa_rf, measures = list(mmce, auc))
RF_spsa_tuned <- performance(pred_on_test_spsa_rf_tuned, measures = list(mmce, auc))
data_RF <- data.frame(RF, RF_spsa, RF_spsa_tuned)
kable(data_RF, caption = "Table 8. Performance for random forest classifier with and without tuned hyperparameters and spFRS feature selection") %>% kable_styling(full_width = F, font_size = 12)
| RF | RF_spsa | RF_spsa_tuned | |
|---|---|---|---|
| mmce | 0.1801802 | 0.1891892 | 0.1846847 |
| auc | 0.8885633 | 0.8763940 | 0.8641026 |
The random forest classifier without spFRS feature selection performed the best with a mean misclassification error of 0.171 and an AUC value of 0.893 (Table 8).
## predicted
## true 0 1
## 0 91 14 tpr: 0.87 fnr: 0.13
## 1 26 91 fpr: 0.22 tnr: 0.78
## ppv: 0.78 for: 0.13 lrp: 3.9 acc: 0.82
## fdr: 0.22 npv: 0.87 lrm: 0.17 dor: 22.75
##
##
## Abbreviations:
## tpr - True positive rate (Sensitivity, Recall)
## fpr - False positive rate (Fall-out)
## fnr - False negative rate (Miss rate)
## tnr - True negative rate (Specificity)
## ppv - Positive predictive value (Precision)
## for - False omission rate
## lrp - Positive likelihood ratio (LR+)
## fdr - False discovery rate
## npv - Negative predictive value
## acc - Accuracy
## lrm - Negative likelihood ratio (LR-)
## dor - Diagnostic odds ratio
## predicted
## true 0 1
## 0 90 15 tpr: 0.86 fnr: 0.14
## 1 27 90 fpr: 0.23 tnr: 0.77
## ppv: 0.77 for: 0.14 lrp: 3.71 acc: 0.81
## fdr: 0.23 npv: 0.86 lrm: 0.19 dor: 20
##
##
## Abbreviations:
## tpr - True positive rate (Sensitivity, Recall)
## fpr - False positive rate (Fall-out)
## fnr - False negative rate (Miss rate)
## tnr - True negative rate (Specificity)
## ppv - Positive predictive value (Precision)
## for - False omission rate
## lrp - Positive likelihood ratio (LR+)
## fdr - False discovery rate
## npv - Negative predictive value
## acc - Accuracy
## lrm - Negative likelihood ratio (LR-)
## dor - Diagnostic odds ratio
## predicted
## true 0 1
## 0 89 16 tpr: 0.85 fnr: 0.15
## 1 25 92 fpr: 0.21 tnr: 0.79
## ppv: 0.78 for: 0.15 lrp: 3.97 acc: 0.82
## fdr: 0.22 npv: 0.85 lrm: 0.19 dor: 20.47
##
##
## Abbreviations:
## tpr - True positive rate (Sensitivity, Recall)
## fpr - False positive rate (Fall-out)
## fnr - False negative rate (Miss rate)
## tnr - True negative rate (Specificity)
## ppv - Positive predictive value (Precision)
## for - False omission rate
## lrp - Positive likelihood ratio (LR+)
## fdr - False discovery rate
## npv - Negative predictive value
## acc - Accuracy
## lrm - Negative likelihood ratio (LR-)
## dor - Diagnostic odds ratio
Filter methods assign an importance to each feature. The feature is ranked according to importance value resulting in a feature subset. Create an object named mfv by calling generateFilterValuesData from mlr on classif.task and using the filter method randomForest.importance.
## Supervised task: dataHeart
## Type: regr
## Target: Goal
## Observations: 740
## Features:
## numerics factors ordered functionals
## 17 0 0 0
## Missings: FALSE
## Has weights: FALSE
## Has blocking: FALSE
## Has coordinates: FALSE
Figure 20. Features selectered sing a filter selection algorithm in mlr (https://mlr-org.github.io/mlr/articles/tutorial/devel/feature_selection.html).(Bischl et al. 2016)
We can also compare this with other filter methods to identify consistencies in the selected features (i.e. information.gain and chi.squared) using the interactive mode plotFilterValuesGGVIS(FV) (Fig. 19).
Figure 21. Features selectered using a filter selection algorithm in mlr based on (Left panel) information gain and (right panel) chi squared (https://mlr-org.github.io/mlr/articles/tutorial/devel/feature_selection.html).(Bischl et al. 2016)
Using these 3 filters normal \(\beta\)-Thalassemia was consistently the most important feature while FBS (Fasting Blood Sugar) was the least important.
We now ‘fused’ the random forest classification learner with the information.gain filter to train the model.
The optimal percentage of features to keep was determined by 5-fold cross-validation. We use ‘information gain’ as an importance measure and select the 10 features with highest importance. In each resampling iteration feature selection is carried out on the corresponding training data set before fitting the learner.
## [Tune] Started tuning learner classif.randomForest.filtered for parameter set:
## Type len Def Constr Req Tunable Trafo
## fw.abs discrete - - 10 - TRUE -
## With control class: TuneControlGrid
## Imputation value: 1
## [Tune-x] 1: fw.abs=10
## [Tune-y] 1: mmce.test.mean=0.2202703; time: 0.0 min
## [Tune] Result: fw.abs=10 : mmce.test.mean=0.2202703
The optimal percentage and corresponding misclassification error are:
## $fw.abs
## [1] 10
## mmce.test.mean
## 0.2202703
We can now fuse it with fw percentage by “wrapper” the random forest learner with the chi-squared method before training the model:
Now applied getFilteredFeatures on the trained model to view the selected features.
Used a random search with ten iterations on the random forest classifier and classif.task.
## mmce.test.mean
## 0.208507
| Filter | Wrapper |
|---|---|
| Age | Chol |
| Thalach | FBS |
| Oldpeak | CP.typical.angina |
| Gender.male | Exang |
| CP.atypical.angina | Slope.flat |
| Exang | Slope.upsloping |
| Slope.flat | Thal.normal |
| Slope.upsloping | NA |
| Thal.normal | NA |
| Thal.reversible.defect | NA |
The wrapper method selected fewer features and 4 of these were also selected by the filter method (Table 9).
By comparing the misclassification error rates, a random search wrapper method out performed the chi squared (filtered) method. We then fused the wrapper method in a learnerusing makeFeatSelWrapper together with makeFeatSelControlRandom and makeResampleDesc objects.
Obtain the confusion matrix by running calculateConfusionMatrix(pred_on_test) and get the ROC.
## predicted
## true 0 1 -err.-
## 0 84 21 21
## 1 25 92 25
## -err.- 25 21 46
Figure 22. AUC plot for the random forest classifer. RF - Random forest classifier trained on all 17 features; RF_spsa_tuned - Random forest classifier with tuned hyperparameters and trained on features selected using the spFRS algorithm; RF_wrapper - Random forest classifier with tned parameters and trained on features selected using the selectFeatures mlr algorithm.
The AUC plot show that the random forest classifier with optimized hyperparameters and using the wrapper method for feature selection performed slightly better than the classifier with or without spFRS feature selection.
| x | |
|---|---|
| mmce | 0.2072072 |
| auc | 0.8689052 |
The random forest classifier using the wrapper method for feature selection had a mean misclassification error of 0.216 and AUC value of 0.860 (Table 10).
## predicted
## true 0 1
## 0 84 21 tpr: 0.8 fnr: 0.2
## 1 25 92 fpr: 0.21 tnr: 0.79
## ppv: 0.77 for: 0.19 lrp: 3.74 acc: 0.79
## fdr: 0.23 npv: 0.81 lrm: 0.25 dor: 14.72
##
##
## Abbreviations:
## tpr - True positive rate (Sensitivity, Recall)
## fpr - False positive rate (Fall-out)
## fnr - False negative rate (Miss rate)
## tnr - True negative rate (Specificity)
## ppv - Positive predictive value (Precision)
## for - False omission rate
## lrp - Positive likelihood ratio (LR+)
## fdr - False discovery rate
## npv - Negative predictive value
## acc - Accuracy
## lrm - Negative likelihood ratio (LR-)
## dor - Diagnostic odds ratio
| kNN + spFSR | kNN + spFSR + tuned | RF + spFSR | RF + spFSR + tuned | RF + Filter | RF + Wrapper |
|---|---|---|---|---|---|
| Thal.normal | FBS | CP.atypical.angina | CP.atypical.angina | Age | Trestbps |
| FBS | Oldpeak | Thal.normal | Thal.normal | Thalach | Chol |
| Age | Exang | Chol | Oldpeak | Oldpeak | Gender.male |
| Thalach | Chol | Slope.flat | Chol | Gender.male | CP.atypical.angina |
| RestECG.normal | RestECG.normal | Gender.male | Exang | CP.atypical.angina | CP.typical.angina |
| Oldpeak | CP.typical.angina | CP.non.anginal.pain | FBS | Exang | Thal.normal |
| Chol | RestECG.ST.T.wave.abnormality | Oldpeak | CP.non.anginal.pain | Slope.flat | Thal.reversible.defect |
| CP.non.anginal.pain | CP.non.anginal.pain | CP.typical.angina | Thalach | Slope.upsloping | NA |
| Trestbps | Slope.upsloping | FBS | Slope.upsloping | Thal.normal | NA |
| Gender.male | CP.atypical.angina | Thal.reversible.defect | Gender.male | Thal.reversible.defect | NA |
| Classifier | Optimal Parameters | Feature Selection | mmce | AUC | Precision | Recall |
|---|---|---|---|---|---|---|
| Naïve Bayes | Yes | 0.180 | 0.891 | 0.76 | 0.84 | |
| Random Forest | Yes | 0.176 | 0.886 | 0.87 | 0.78 | |
| k-Nearest Neighbours | Yes | 0.162 | 0.893 | 0.87 | 0.81 | |
| NA | NA | NA | NA | |||
| k-Nearest Neighbours | Yes | 0.180 | 0.885 | 0.84 | 0.79 | |
| k-Nearest Neighbours | No | spFSR | 0.225 | 0.864 | 0.78 | 0.75 |
| k-Nearest Neighbours | Yes | spFSR | 0.243 | 0.840 | 0.80 | 0.72 |
| NA | NA | NA | NA | |||
| Random Forest | Yes | 0.171 | 0.893 | 0.88 | 0.79 | |
| Random Forest | No | spFSR | 0.212 | 0.867 | 0.85 | 0.74 |
| Random Forest | Yes | spFSR | 0.207 | 0.869 | 0.85 | 0.75 |
| NA | NA | NA | NA | |||
| Random Forest | Yes | selFeat | 0.216 | 0.860 | 0.72 | 0.80 |
All classifiers accurately predicted individual with heart disease reasonably well (AUC values > 0.84, precision > 0.70 and recall > 0.70) (Table 12). The kNN classifier trained on all 17 features in the dataset had the lowest mean misclassification error (0.162).
The kNN and random forest classifiers performed reasonably well at predicting patients with heart disease (precision of up to 88% and recall of up to 84%). Initial results comparing the Naive Bayes, kNN and random forest classifiers indicated the the Naive Bayes classifier did not perform as well as the other classifiers. Training the kNN and random forest classifiers with features selected by various feature selection algorithms did not significantly improve the preformance of these classifiers.
Several of the top ten features selected by the various feature selection algorithms were related to chest pain (CP.atypical.angina, CP.non.anginal.pain and CP.typical.angina) and cholesterol levels (chol), known indicators of the potential for heart disease. ST depression induced by exercise relative to rest (Oldpeak) was also selected by these algorithms and is an indicator of heart disease. Interestingly, all of the features previously (Phase 1 of this project) identified as potential predictors of heart disease were selected by these algorithms. These featues were patient age (Age), cholestrol levels (Chol), ST depression induced by exercise relative to rest (Oldpeak), maxium heart rate (Thalach), slope of the peak exercise ST segment (Slope.upsloping) and exercise induced angina (Exang). Even so, it is possible that the number of features used in this study (or the number of observations) may not be large enough to observe any direct benefit from feature selection.
In future studies other supervised machine learning algorithms such as Support Vector Machines (SVM) or ensemble methods could be tested to determine whether they may perform better than those trialed in these studies.
The kNN and random forest classifiers performed reasonably well at identifying heart disease patients. However, some improvement is required before these predictors can be used in a clinical setting, as a significant number of diseased patients were not identified in these studies.
Aksakalli, Vural, and Milad Malekipirbazai. 2016. “Feature Selection via Binary Simultaneous Perturbation Stochastic Approximation.” Pattern Rceognition Letters.
Bischl, Bernd, Michel Lang, Lars Kotthoff, Julia Schiffner, Jakob Richter, Erich Studerus, Giuseppe Casalicchio, and Zachary M. Jones. 2016. “mlr: Machine Learning in R.” Journal of Machine Learning Research.
Breiman, L. 2001. “Random Forests.” Machine Learning.