Introduction

The objective of this project was to build classifiers to predict whether an individual has heart disease based clinical data obtained from the Cleveland Clinical Foundation, the Hungarian Institute of Cardiology (Budapest), the V.A Medical Center (Long Beach CA) and the University Hospital Zurich (Switzerland) UCI Machine Learning Repository. In Phase I, we cleaned the data and re-categorised the target feature so that it was binary. In Phase II, we have built three binary-classifiers trained on the cleaned data. The rest of this report is organised as follows. Section 2 describes an overview of our methodology. Section 3 discusses the classifiers fine-tuning process and detailed performance analysis of each classifier. Section 4 compares the performance of the classifiers using the same resampling method. Section 5 critiques our methodology. The last section concludes with a summary.

Descriptive Features

The variable description is produced here from the heart-disease.names file:

  • Age: age in years
  • Gender: gender (1 = male; 0 = female)
  • Cp: chest pain type
    • Value 1: typical angina
    • Value 2: atypical angina
    • Value 3: non-anginal pain
    • Value 4: asymptomatic
  • Trestbps: resting blood pressure (in mm Hg on admission to the hospital)
  • Chol: serum cholesterol in mg/dl
  • Fbs: fasting blood sugar > 120 mg/dl (1 = true; 0 = false)
  • Restecg: resting electrocardiographic results
    • Value 0: normal
    • Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 005 mV)
    • Value 2: showing probable or definite left ventricular hypertropy by Estes criteria
  • Thalach: maximum heart rate achieved in beats per minute (bpm)
  • Exang: exercise induced angina (1 = yes; 0 = no)
  • Oldpeak: ST depression induced by exercise relative to rest
  • Slope: the slope of the peak exercise ST segment
    • Value 1: upsloping
    • Value 2: flat
    • Value 3: down-sloping
  • Ca: number of major vessels (0-3) colored by fluoroscopy
  • Thal: 3 = normal; 6 = fixed defect; 7 = reversible defect

The target feature has two classes and hence it is a binary classification problem. To reiterate, the goal is to predict whether a person has heart disease.

Methodology

We considered three classifiers - Naive Bayes (NB), Random Forest (RF), and \(K\)-Nearest Neighbour (KNN). The NB was the baseline classifier. Each classifier was trained to make probability predictions so that we were able to adjust the prediction threshold to refine the performance. We split the full data set into a 70 % training set and 30 % test set. Each set resembled the full data by having the same proportion of target classes i.e. approximately 45 % of patients having sign heart disease and 55 % exhibiting symptoms of heart disease. For the fine-tuning process, we ran five-folded cross-validation stratified sampling on each classifier. Stratified sampling was used to cater for the slight class imbalance of the target feature.

Next, for each classifer, we determined the optimal probability threshold. Using the tuned hyperparameters and the optimal thresholds, we made predictions on the test data. During model training (hyperparameter tuning and threshold adjustment), we relied on mean misclassification error rate (mmce). In addition to mmce, we also used the confusion matrix on the test data to evaluate the classifiers’ performance. The modelling was implemented in R with the mlr package. Bischl et al. (2016)

Hyperparameter Fine-Tuning

Naive Bayes

Since the training set might have unwittingly excluded rare instances, the NB classifier may produce some fitted zero probabilities as predictions. To mitigate this, we ran a grid search to determine the optimal value of the Laplacian smoothing parameter. Using the stratified sampling discussed in the previous section, we experimented using values ranging from 0 to 30.

The optimal Laplacian parameter was 10 with a mean test error of 0.205.

Random Forest

We tune-fined the number of variables randomly sampled as candidates at each split (i.e. mtry). For a classification problem, Breiman (2001) suggests mtry = \(\sqrt{p}\) where \(p\) is the number of descriptive features. In our case, \(\sqrt{p} = \sqrt{11}=3.31\). Therefore, we experimented with mtry = 2, 3, and 4. We left other hyperparameters, such as the number of trees to grow at the default value. The result was a mtry value of 2 with a mean test error of 0.193.

\(K\)-Nearest Neighbours

By using the optimal kernel, we ran a grid search on \(k=2,3,...20\). The outcome was k of 20 and mmce test error of 0.199.

Feature Selection

Feature selection was used to identify an optimal subset of the available features. Selecting a subset of relevant features can make machine learning algorithm training faster, reduce complexity of the model, improve accuracy and reduce overfitting.There are three broard categoried of feature selection methods: filter methods, wrapper methods and embedded methods.

Simultaneous Perturbation Stochastic Approximation for Feature Selection and Ranking (SPSA-FSR Algorithm)

The SPSA-FSR wrapper method was used to select for relevant features (https://cran.r-project.org/web/packages/spFSR/vignettes/spFSR.html). Aksakalli and Malekipirbazai (2016)

Filter method

The fiter method assigns an importance value to each feature. Based on these values the features are ranked and a feature subset is selected. The learner was fused with the filter method for training of each classification model.

Wrapper method

The wrapper method used the performance of a learning classifier to access the usefulness of the feature set. In order to select a feature subset the learner was trained repeatedly on different fleature subsets and the subset which lead to the best learner performance was chosen.

Load dataset

Loaded dataset and removed redundant index column. The FBS feature contained logical (True/False) variables and this was chnged to numerical (1,0) variables.

#Load dataset
data <- read.csv('Heart_Disease_cleaned_data.csv', na = "unknown", stringsAsFactors = FALSE)
data <- data[,-1]
data[, sapply(data, is.character)] <- lapply( data[, sapply(data, is.character )], factor) 
data$FBS<-as.numeric(data$FBS)
str(data)
## 'data.frame':    740 obs. of  13 variables:
##  $ Age     : int  63 67 67 37 41 56 62 57 63 53 ...
##  $ Gender  : Factor w/ 2 levels "female","male": 2 2 2 2 1 2 1 1 2 2 ...
##  $ CP      : Factor w/ 4 levels "asymptomatic",..: 4 1 1 3 2 2 1 1 1 1 ...
##  $ Trestbps: int  145 160 120 130 130 120 140 120 130 140 ...
##  $ Chol    : int  233 286 229 250 204 236 268 354 254 203 ...
##  $ FBS     : num  1 0 0 0 0 0 0 0 0 1 ...
##  $ RestECG : Factor w/ 3 levels "hypertropy","normal",..: 1 1 1 2 1 2 1 2 1 1 ...
##  $ Thalach : int  150 108 129 187 172 178 160 163 147 155 ...
##  $ Exang   : Factor w/ 2 levels "no","yes": 1 2 2 1 1 1 1 2 1 2 ...
##  $ Oldpeak : num  2.3 1.5 2.6 3.5 1.4 0.8 3.6 0.6 1.4 3.1 ...
##  $ Slope   : Factor w/ 3 levels "downsloping",..: 1 2 2 1 3 3 1 3 2 1 ...
##  $ Thal    : Factor w/ 3 levels "fixed defect",..: 1 2 3 2 2 2 2 2 3 3 ...
##  $ Goal    : int  0 1 1 0 0 0 1 0 1 1 ...

Data processing

Determined the number of missing values in each column (Table ).The bar graphs for Slope and Thal (\(\beta\)-Thalassemia cardiomyopathy) show that these missing values comprise a significant proportion of these features.

Table 1. Feature summary for Heart Disease dataset
name type na mean disp median mad min max nlevs
Age integer 0 53.0972973 9.4081267 54.0 10.3782 28 77.0 0
Gender factor 0 NA 0.2351351 NA NA 174 566.0 2
CP factor 0 NA 0.4702703 NA NA 37 392.0 4
Trestbps integer 0 132.7540541 18.5812497 130.0 14.8260 0 200.0 0
Chol integer 0 220.1364865 93.6145555 231.0 54.8562 0 603.0 0
FBS numeric 0 0.1500000 0.3573129 0.0 0.0000 0 1.0 0
RestECG factor 0 NA 0.3986486 NA NA 120 445.0 3
Thalach integer 0 138.7445946 25.8460815 140.0 29.6520 60 202.0 0
Exang factor 0 NA 0.4000000 NA NA 296 444.0 2
Oldpeak numeric 0 0.8943243 1.0871598 0.5 0.7413 -1 6.2 0
Slope factor 209 NA NA NA NA 48 310.0 3
Thal factor 340 NA NA NA NA 39 187.0 3
Goal integer 0 0.5175676 0.5000293 1.0 0.0000 0 1.0 0
Table 2. Missing values
x
Age 0
Gender 0
CP 0
Trestbps 0
Chol 0
FBS 0
RestECG 0
Thalach 0
Exang 0
Oldpeak 0
Slope 209
Thal 340
Goal 0

Several features contained a number of missing values (209 and 340 for Slope and Thal, respectively).

Figure. 1 Histograms showing the distribution of data for the (left panel) slope and (right panel) Thal features.

Figure 1 shows that these missing values comprised a signifiant proportion of these features.

Imputation of missing values

The missing values in the Slope (slope of the peak exercise ST segment) and Thal (\(\beta\)-Thalassemia cardiomyopathy) features were imputed based on all features in the dataset.

## 
## downsloping        flat   upsloping 
##          48         310         173
## 
##      fixed defect            normal reversible defect 
##                39               187               174
## 
## downsloping        flat   upsloping 
##          48         370         322
## 
##      fixed defect            normal reversible defect 
##                39               327               374
Table 3. Dataset containing imputed missing values
Age Gender CP Trestbps Chol FBS RestECG Thalach Exang Oldpeak Slope Thal Goal
63 male typical angina 145 233 1 hypertropy 150 no 2.3 downsloping fixed defect No Disease
67 male asymptomatic 160 286 0 hypertropy 108 yes 1.5 flat normal Heart Disease
67 male asymptomatic 120 229 0 hypertropy 129 yes 2.6 flat reversible defect Heart Disease
37 male non-anginal pain 130 250 0 normal 187 no 3.5 downsloping normal No Disease
41 female atypical angina 130 204 0 hypertropy 172 no 1.4 upsloping normal No Disease
56 male atypical angina 120 236 0 normal 178 no 0.8 upsloping normal No Disease

Most of the imputed values for Slope were assigned to ‘flat’ and ‘unsloping’ while imputed values for Thal were assigned to ‘normal’ and ‘reversible defect’.

Transformation of ‘ST Depression’ feature

The histogram and qq-plot for the ‘ST Depression’ feature is slightly right skewed (Figs. 2 and 3). It was not possible to apply a box-cox transformation since the ‘ST Depression’ feature contained several negative values. Therefore, we tested a number of different algorithms to determine which produced a more normal distribution of the data (Fig. 3).

Figure 2. QQ-plot of data for the ST Depression feature. The plot indicates that the data is right skewed.

Figure 3. Histogram and box plots illustrating the distribution of data for the ST Depression feature before (left) and after (right) transformation with a cubed root alrogithm.

Figure 4. QQ-plot of data for the ST Depression feature following transformation with a cubed root algorithm. The plot indicates that the data is more normally distributed following the cubed root transformation.

Cubed root transformation of the data lead to a more normal distribution. Several outliers were apparent in the boxplot for the transformed data (Figs. 3 and 4).

## 
## Attaching package: 'GGally'
## The following object is masked from 'package:dplyr':
## 
##     nasa

Figure 5. Satter matrix for numerical features showing the distribution of data for various binary combinations of features.

We examined the scatter matrix of numerical features to identify possible correlations between various features (Fig. 5). There appeared to be small but significant correlation between patient age and Oldpeak (ST depression induced by exercise relative to rest). Several differences in distribution were also observed for Age versus Oldpeak and Trestbps versus Thalach.

Standardisation

The data were standardized to compensate for differences in the scales used for various features.

## [1] -8.969575e-18
## [1] 1

Dummify categorical variables

For each categorical variable column, for n factor levels there will be n-1 dummy variables. Several classifiers (e.g. kNN) can only handle numerical datasets.

Shuffle rows prior to splitting dataset

The entire dataset was comprised of a combination of several smaller datasets obtained from hospitals situated in different countries. To avoid biases that may exist in various datasets (e.g. test conducted and/or interpretation of test results) the rows in the dataset were randomized prior to splitting into training and test sets.

Split data into training and test data (70:30).

Determine the proportion of disease category in the test and training datasets.

## 
##         0         1 
## 0.5366795 0.4633205
## 
##        0        1 
## 0.472973 0.527027

The datasets were reasonably balanced and representative of the full dataset. We shall use training data for modeling and test data for model evaluation.

Modeling with all features

The data were initially modeled using all 17 features.

Model configuration

Configure classification task

Configure learners with probability type

## Loading required package: kknn
## 
## Attaching package: 'kknn'
## The following object is masked from 'package:caret':
## 
##     contr.dummy

Model fine-tuning

##            Type len Def   Constr Req Tunable Trafo
## laplace numeric   -   0 0 to Inf   -    TRUE     -
##                      Type  len   Def   Constr Req Tunable Trafo
## ntree             integer    -   500 1 to Inf   -    TRUE     -
## mtry              integer    -     - 1 to Inf   -    TRUE     -
## replace           logical    -  TRUE        -   -    TRUE     -
## classwt     numericvector <NA>     - 0 to Inf   -    TRUE     -
## cutoff      numericvector <NA>     -   0 to 1   -    TRUE     -
## strata            untyped    -     -        -   -   FALSE     -
## sampsize    integervector <NA>     - 1 to Inf   -    TRUE     -
## nodesize          integer    -     1 1 to Inf   -    TRUE     -
## maxnodes          integer    -     - 1 to Inf   -    TRUE     -
## importance        logical    - FALSE        -   -    TRUE     -
## localImp          logical    - FALSE        -   -    TRUE     -
## proximity         logical    - FALSE        -   -   FALSE     -
## oob.prox          logical    -     -        -   Y   FALSE     -
## norm.votes        logical    -  TRUE        -   -   FALSE     -
## do.trace          logical    - FALSE        -   -   FALSE     -
## keep.forest       logical    -  TRUE        -   -   FALSE     -
## keep.inbag        logical    - FALSE        -   -   FALSE     -
##              Type len     Def                                   Constr Req
## k         integer   -       7                                 1 to Inf   -
## distance  numeric   -       2                                 0 to Inf   -
## kernel   discrete   - optimal rectangular,triangular,epanechnikov,b...   -
## scale     logical   -    TRUE                                        -   -
##          Tunable Trafo
## k           TRUE     -
## distance    TRUE     -
## kernel      TRUE     -
## scale       TRUE     -

For naive bayes we fine-tuned the Laplace parameter testing values between 0 and 30.

For random forest we fine-tuned mtry testing values of 2, 3, 4 (the number of variables randomly sampled as candidates at each split). Breiman, L. (2001), Random Forests, Machine Learning 45(1), 5-32.

Configure the tune control search and a 5-CV stratified sampling scheme

Configure the tune wrapper with tune-tuning settings

Model Training

Train the tune wrappers

## [Tune] Started tuning learner classif.naiveBayes for parameter set:
##            Type len Def  Constr Req Tunable Trafo
## laplace numeric   -   - 0 to 30   -    TRUE     -
## With control class: TuneControlGrid
## Imputation value: 1
## [Tune-x] 1: laplace=0
## [Tune-y] 1: mmce.test.mean=0.2065347; time: 0.0 min
## [Tune-x] 2: laplace=3.33
## [Tune-y] 2: mmce.test.mean=0.2065347; time: 0.0 min
## [Tune-x] 3: laplace=6.67
## [Tune-y] 3: mmce.test.mean=0.2065347; time: 0.0 min
## [Tune-x] 4: laplace=10
## [Tune-y] 4: mmce.test.mean=0.2065347; time: 0.0 min
## [Tune-x] 5: laplace=13.3
## [Tune-y] 5: mmce.test.mean=0.2065347; time: 0.0 min
## [Tune-x] 6: laplace=16.7
## [Tune-y] 6: mmce.test.mean=0.2065347; time: 0.0 min
## [Tune-x] 7: laplace=20
## [Tune-y] 7: mmce.test.mean=0.2065347; time: 0.0 min
## [Tune-x] 8: laplace=23.3
## [Tune-y] 8: mmce.test.mean=0.2065347; time: 0.0 min
## [Tune-x] 9: laplace=26.7
## [Tune-y] 9: mmce.test.mean=0.2065347; time: 0.0 min
## [Tune-x] 10: laplace=30
## [Tune-y] 10: mmce.test.mean=0.2065347; time: 0.0 min
## [Tune] Result: laplace=0 : mmce.test.mean=0.2065347
## [Tune] Started tuning learner classif.randomForest for parameter set:
##          Type len Def Constr Req Tunable Trafo
## mtry discrete   -   -  2,3,4   -    TRUE     -
## With control class: TuneControlGrid
## Imputation value: 1
## [Tune-x] 1: mtry=2
## [Tune-y] 1: mmce.test.mean=0.1949216; time: 0.0 min
## [Tune-x] 2: mtry=3
## [Tune-y] 2: mmce.test.mean=0.1910941; time: 0.0 min
## [Tune-x] 3: mtry=4
## [Tune-y] 3: mmce.test.mean=0.1988051; time: 0.0 min
## [Tune] Result: mtry=3 : mmce.test.mean=0.1910941
## [Tune] Started tuning learner classif.kknn for parameter set:
##       Type len Def                                   Constr Req Tunable
## k discrete   -   - 2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,...   -    TRUE
##   Trafo
## k     -
## With control class: TuneControlGrid
## Imputation value: 1
## [Tune-x] 1: k=2
## [Tune-y] 1: mmce.test.mean=0.2336072; time: 0.0 min
## [Tune-x] 2: k=3
## [Tune-y] 2: mmce.test.mean=0.2336072; time: 0.0 min
## [Tune-x] 3: k=4
## [Tune-y] 3: mmce.test.mean=0.2336072; time: 0.0 min
## [Tune-x] 4: k=5
## [Tune-y] 4: mmce.test.mean=0.2085138; time: 0.0 min
## [Tune-x] 5: k=6
## [Tune-y] 5: mmce.test.mean=0.2104742; time: 0.0 min
## [Tune-x] 6: k=7
## [Tune-y] 6: mmce.test.mean=0.2162435; time: 0.0 min
## [Tune-x] 7: k=8
## [Tune-y] 7: mmce.test.mean=0.2162435; time: 0.0 min
## [Tune-x] 8: k=9
## [Tune-y] 8: mmce.test.mean=0.2084951; time: 0.0 min
## [Tune-x] 9: k=10
## [Tune-y] 9: mmce.test.mean=0.2046490; time: 0.0 min
## [Tune-x] 10: k=11
## [Tune-y] 10: mmce.test.mean=0.2027259; time: 0.0 min
## [Tune-x] 11: k=12
## [Tune-y] 11: mmce.test.mean=0.2027259; time: 0.0 min
## [Tune-x] 12: k=13
## [Tune-y] 12: mmce.test.mean=0.1969193; time: 0.0 min
## [Tune-x] 13: k=14
## [Tune-y] 13: mmce.test.mean=0.2007842; time: 0.0 min
## [Tune-x] 14: k=15
## [Tune-y] 14: mmce.test.mean=0.2027072; time: 0.0 min
## [Tune-x] 15: k=16
## [Tune-y] 15: mmce.test.mean=0.2007842; time: 0.0 min
## [Tune-x] 16: k=17
## [Tune-y] 16: mmce.test.mean=0.1949963; time: 0.0 min
## [Tune-x] 17: k=18
## [Tune-y] 17: mmce.test.mean=0.1969380; time: 0.0 min
## [Tune-x] 18: k=19
## [Tune-y] 18: mmce.test.mean=0.1930732; time: 0.0 min
## [Tune-x] 19: k=20
## [Tune-y] 19: mmce.test.mean=0.1969380; time: 0.0 min
## [Tune] Result: k=19 : mmce.test.mean=0.1930732

Model Prediction

Predict on training data

Model Evaluation

Obtain threshold values for each learner

Figure 6. Plot for the optimization of the threshold for the kNn, Naive Bayes and Random Forest classifiers trained on all 17 features.

Evaluation on test data

AUC plots for Naive Bayes, Random Forest and kNN models

Figure 7. AUC plot for the kNN, Naive Bayes and Random Forest classifers trained on all 17 features.

The AUC plots were similar for the kNN, Naive Bayes and Random Forest classifiers trained on all 17 features (Fig. 7).

Performance for Naive Bayes model

Misclassification Error and AUC value

Table 4. Performance for Naive Bayes, Random Forest and kNN classifiers
Naive Bayes Random Forest kNN
mmce 0.1846847 0.1891892 0.1711712
auc 0.8889703 0.8832316 0.8903134

The k-Nearest Neighbours classifier performed the best, when the models were trained on all 17 features, with a mean misclassification error of 0.162 and auv value of 0.893 (Table 4). While the random forest classifier performed slightly better than the Naive Bayes model.

Confusion Matrix, Precision and Recall for Naive Bayes

##     predicted
## true 0         1                            
##    0 79        26       tpr: 0.75 fnr: 0.25 
##    1 15        102      fpr: 0.13 tnr: 0.87 
##      ppv: 0.84 for: 0.2 lrp: 5.87 acc: 0.82 
##      fdr: 0.16 npv: 0.8 lrm: 0.28 dor: 20.66
## 
## 
## Abbreviations:
## tpr - True positive rate (Sensitivity, Recall)
## fpr - False positive rate (Fall-out)
## fnr - False negative rate (Miss rate)
## tnr - True negative rate (Specificity)
## ppv - Positive predictive value (Precision)
## for - False omission rate
## lrp - Positive likelihood ratio (LR+)
## fdr - False discovery rate
## npv - Negative predictive value
## acc - Accuracy
## lrm - Negative likelihood ratio (LR-)
## dor - Diagnostic odds ratio

Confusion Matrix, Precision and Recall for Random Forest model

##     predicted
## true 0         1                             
##    0 88        17        tpr: 0.84 fnr: 0.16 
##    1 25        92        fpr: 0.21 tnr: 0.79 
##      ppv: 0.78 for: 0.16 lrp: 3.92 acc: 0.81 
##      fdr: 0.22 npv: 0.84 lrm: 0.21 dor: 19.05
## 
## 
## Abbreviations:
## tpr - True positive rate (Sensitivity, Recall)
## fpr - False positive rate (Fall-out)
## fnr - False negative rate (Miss rate)
## tnr - True negative rate (Specificity)
## ppv - Positive predictive value (Precision)
## for - False omission rate
## lrp - Positive likelihood ratio (LR+)
## fdr - False discovery rate
## npv - Negative predictive value
## acc - Accuracy
## lrm - Negative likelihood ratio (LR-)
## dor - Diagnostic odds ratio

Confusion Matrix, Precision and Recall for k-Nearest Neighbours model

##     predicted
## true 0         1                             
##    0 88        17        tpr: 0.84 fnr: 0.16 
##    1 21        96        fpr: 0.18 tnr: 0.82 
##      ppv: 0.81 for: 0.15 lrp: 4.67 acc: 0.83 
##      fdr: 0.19 npv: 0.85 lrm: 0.2  dor: 23.66
## 
## 
## Abbreviations:
## tpr - True positive rate (Sensitivity, Recall)
## fpr - False positive rate (Fall-out)
## fnr - False negative rate (Miss rate)
## tnr - True negative rate (Specificity)
## ppv - Positive predictive value (Precision)
## for - False omission rate
## lrp - Positive likelihood ratio (LR+)
## fdr - False discovery rate
## npv - Negative predictive value
## acc - Accuracy
## lrm - Negative likelihood ratio (LR-)
## dor - Diagnostic odds ratio

Modeling with spFSR Feature Selection

Various feature selection algorithms were then used to determine whether classifier performance could be improved by using a subset of relevant features. Features were initially selected using the spFRS feature selection algorithm (https://cran.r-project.org/web/packages/spFSR/vignettes/spFSR.html). Aksakalli and Malekipirbazai (2016)

spFSR Feature Selection for k-Nearest Neighbours

## [1] 18

Determine if hyperparameters can be optimized further

Figure 8. Measurement of kNN classifier performance with each iteration.

The plot shows the global optimum and indicates that the best performance was obtained in less than 5 iterations.

Best hyperparameters

Figure 9. kNN classifier performance for various values of k (nearest neighbours value).

## $k
## [1] 20
## mmce.test.mean 
##      0.1891337

The optimal value of k for k-Nearest Neighbours was 16 with a mean misclassification error of 0.191.

Construct a learner with the tuned parameters

Best hyperparameters

## Learner classif.kknn from package kknn
## Type: classif
## Name: k-Nearest Neighbor; Short name: kknn
## Class: classif.kknn
## Properties: twoclass,multiclass,numerics,factors,prob
## Predict-Type: prob
## Hyperparameters: k=20

Run spsa on tuned and not-tuned learner

## SPSA-FSR begins:
## Wrapper = kknn
## Measure = auc
## Number of selected features = 0
## 
## iter  value   st.dev  num.ft  best.value
## 1     0.84831 0.0386  10      0.84831 *
## 2     0.85045 0.03912 10      0.85045 *
## 3     0.84892 0.0329  9       0.85045
## 4     0.87116 0.03408 14      0.87116 *
## 5     0.8711  0.04188 13      0.87116
## 6     0.86768 0.0489  14      0.87116
## 7     0.87128 0.04088 15      0.87128 *
## 8     0.86703 0.02838 15      0.87128
## 9     0.88086 0.02832 15      0.88086 *
## 10    0.87785 0.03425 16      0.88086
## 11    0.8835  0.04131 16      0.8835 *
## 12    0.87788 0.03144 16      0.8835
## 13    0.8834  0.02277 16      0.8835
## 14    0.88251 0.02726 16      0.8835
## 15    0.88391 0.03351 16      0.88391 *
## 16    0.88336 0.02945 16      0.88391
## 17    0.88308 0.02628 16      0.88391
## 18    0.87488 0.03567 16      0.88391
## 19    0.87972 0.0272  16      0.88391
## 20    0.88147 0.01924 15      0.88391
## 21    0.88332 0.03599 15      0.88391
## 22    0.88397 0.02405 15      0.88397 *
## 23    0.88262 0.03212 15      0.88397
## 24    0.88081 0.03267 15      0.88397
## 25    0.87858 0.03862 15      0.88397
## 26    0.87614 0.03591 14      0.88397
## 27    0.88554 0.02922 14      0.88554 *
## 28    0.88435 0.03152 14      0.88554
## 29    0.88206 0.03217 14      0.88554
## 30    0.88248 0.03499 14      0.88554
## 31    0.88521 0.02602 14      0.88554
## 32    0.88237 0.02018 15      0.88554
## 33    0.88477 0.02616 15      0.88554
## 34    0.88343 0.01858 15      0.88554
## 35    0.88221 0.0382  15      0.88554
## 36    0.88076 0.0348  15      0.88554
## 37    0.89088 0.02396 13      0.89088 *
## 38    0.88535 0.02089 13      0.89088
## 39    0.88379 0.04471 14      0.89088
## 40    0.88274 0.02205 13      0.89088
## 41    0.88986 0.02839 13      0.89088
## 42    0.88507 0.02033 13      0.89088
## 43    0.88561 0.03659 13      0.89088
## 44    0.88304 0.03216 14      0.89088
## 45    0.8674  0.04019 11      0.89088
## 46    0.88801 0.04103 13      0.89088
## 47    0.88685 0.02698 14      0.89088
## 48    0.88564 0.0407  13      0.89088
## 49    0.8847  0.0281  13      0.89088
## 50    0.88606 0.03383 13      0.89088
## 51    0.88411 0.02569 13      0.89088
## 52    0.88402 0.03104 13      0.89088
## 53    0.88582 0.03655 13      0.89088
## 54    0.88753 0.03518 13      0.89088
## 55    0.88829 0.03083 13      0.89088
## 56    0.88744 0.03009 13      0.89088
## 57    0.88623 0.03301 13      0.89088
## 
## Best iteration = 37
## Number of selected features = 13
## Best measure value = 0.89088
## Std. dev. of best measure = 0.02396
## Run time = 1.94 minutes.
## SPSA-FSR begins:
## Wrapper = kknn
## Measure = auc
## Number of selected features = 0
## 
## iter  value   st.dev  num.ft  best.value
## 1     0.82636 0.03098 7       0.82636 *
## 2     0.85019 0.03777 14      0.85019 *
## 3     0.85066 0.03541 14      0.85066 *
## 4     0.84673 0.0315  14      0.85066
## 5     0.8519  0.02455 13      0.8519 *
## 6     0.84519 0.03769 15      0.8519
## 7     0.84752 0.04447 12      0.8519
## 8     0.85007 0.02717 12      0.8519
## 9     0.84774 0.02711 12      0.8519
## 10    0.84297 0.03564 14      0.8519
## 11    0.8581  0.0399  14      0.8581 *
## 12    0.8489  0.04299 14      0.8581
## 13    0.85235 0.03707 15      0.8581
## 14    0.84298 0.03577 15      0.8581
## 15    0.85267 0.02511 15      0.8581
## 16    0.85054 0.04824 15      0.8581
## 17    0.84493 0.02327 15      0.8581
## 18    0.84545 0.03688 15      0.8581
## 19    0.84181 0.02811 15      0.8581
## 20    0.85658 0.03847 15      0.8581
## 21    0.85719 0.02556 16      0.8581
## 22    0.85798 0.04043 16      0.8581
## 23    0.86486 0.03935 16      0.86486 *
## 24    0.86621 0.03321 15      0.86621 *
## 25    0.86392 0.03206 16      0.86621
## 26    0.85959 0.03544 16      0.86621
## 27    0.85792 0.02921 16      0.86621
## 28    0.86342 0.03107 15      0.86621
## 29    0.86362 0.03867 15      0.86621
## 30    0.87021 0.03082 15      0.87021 *
## 31    0.86098 0.03869 15      0.87021
## 32    0.86106 0.04692 15      0.87021
## 33    0.86138 0.03032 17      0.87021
## 34    0.86232 0.03679 15      0.87021
## 35    0.86744 0.02741 16      0.87021
## 36    0.8606  0.02932 15      0.87021
## 37    0.86306 0.03498 15      0.87021
## 38    0.86676 0.02865 15      0.87021
## 39    0.86185 0.02067 15      0.87021
## 40    0.86579 0.03168 15      0.87021
## 41    0.87107 0.03085 16      0.87107 *
## 42    0.86743 0.03407 16      0.87107
## 43    0.86444 0.0218  16      0.87107
## 44    0.8648  0.04288 15      0.87107
## 45    0.86293 0.02487 16      0.87107
## 46    0.86843 0.02907 16      0.87107
## 47    0.86944 0.03211 16      0.87107
## 48    0.864   0.02805 16      0.87107
## 49    0.86673 0.03224 16      0.87107
## 50    0.86923 0.02595 16      0.87107
## 51    0.86136 0.0474  16      0.87107
## 52    0.86106 0.03793 16      0.87107
## 53    0.86585 0.03256 17      0.87107
## 54    0.86247 0.03264 17      0.87107
## 55    0.87866 0.0466  16      0.87866 *
## 56    0.86805 0.0243  15      0.87866
## 57    0.87018 0.02627 16      0.87866
## 58    0.86824 0.03282 15      0.87866
## 59    0.86954 0.03391 15      0.87866
## 60    0.87084 0.02534 15      0.87866
## 61    0.86602 0.03214 15      0.87866
## 62    0.8375  0.0194  13      0.87866
## 63    0.84482 0.02915 13      0.87866
## 64    0.83977 0.03551 13      0.87866
## 65    0.86942 0.027   13      0.87866
## 66    0.86123 0.02637 15      0.87866
## 67    0.8612  0.03579 15      0.87866
## 68    0.86241 0.03492 16      0.87866
## 69    0.86966 0.03029 16      0.87866
## 70    0.8669  0.03697 15      0.87866
## 71    0.86324 0.0337  15      0.87866
## 72    0.87115 0.03101 14      0.87866
## 73    0.8626  0.04298 14      0.87866
## 74    0.86201 0.02229 14      0.87866
## 75    0.86425 0.03348 14      0.87866
## 
## Best iteration = 55
## Number of selected features = 16
## Best measure value = 0.87866
## Std. dev. of best measure = 0.0466
## Run time = 2.75 minutes.

Plot the measure values across iterations

Figure 10. Scatter plot of mean accuracy rate for each iteration for the kNN classifier (left panel) with and (right panel) without hyperparameter tuning. Error bars of +/- 1 standard deviation are also shown.

Get feature importances for kNN

Get feature importances for kNN with tuned parameters

Table 5. Selected Features for k-Nearest Neighbours calssifier with and without spFSR feature selection
kNN + spFSR
kNN + spFSR + Tuned
Features Importance Features Importance
FBS 0.60206 CP.atypical.angina 0.64466
CP.typical.angina 0.59768 RestECG.normal 0.61690
Chol 0.57066 Oldpeak 0.61149
Thal.normal 0.56516 Thal.normal 0.60425
Gender.male 0.56237 Gender.male 0.60006
Trestbps 0.55991 CP.non.anginal.pain 0.59784
Oldpeak 0.55713 Slope.upsloping 0.59186
CP.atypical.angina 0.55564 CP.typical.angina 0.58580
Age 0.55564 RestECG.ST.T.wave.abnormality 0.57317
RestECG.normal 0.54773 FBS 0.55879

The top 10 features selected by the spFSR algorithm for the kNN classified with and without tuned parameters were similar (Table 3). However, 4 of these features differed and the order of importance differed significantly depending on whether the classifier hyperparameters had been optimised.

Importance Plotting

kNN classifier and spFRS feature selection

Figures 11. Features selected by the spFRS algorithm fused with the kNN classifier and ranked according to importance.

kNN classifier with tuned hyperparameters and spFRS feature selection

spsa_kNN_1 <- spFSR::plotImportance(spsaMod_kNN_tuned)

Figures 12. Features selected by the spFRS algorithm fused with the kNN classifier with tuned hyperparameters and ranked according to importance.

Train models

Define the test data

Prediction

Evaluation

AUC plots for k-nearest neighbours

Figure 13. AUC plot for the kNN classifer. kNN - kNN trained on all 17 features; kNN_spsa - kNN classifier trained on features selected using the spFRS algorithm; kNN_spsa_tuned - kNN classifier with tuned hyperparameters and trained on features selected using the spFRS algorithm.

The AUC plots were similar for the kNN classifier with and without optimization of the hyperparameters andwith or without spFRS feature selection (Fig. 13).

Performance for kNN model

Misclassification Error and AUC value

Table 6. Performance for kNN classifier with and without tuned hyperparameters and spFRS feature selection
kNN kNN_spsa kNN_spsa_tuned
mmce 0.1666667 0.2432432 0.2297297
auc 0.8919821 0.8698413 0.8553114

The kNN classifier appeared to performed better without hyperparameter optimization or spFRS feature selection with a mean misclassification error of 0.180 and AUC value of 0.885 (Table 6).

Confusion Matrix, Precision and Recall for kNN model

##     predicted
## true 0         1                             
##    0 89        16        tpr: 0.85 fnr: 0.15 
##    1 21        96        fpr: 0.18 tnr: 0.82 
##      ppv: 0.81 for: 0.14 lrp: 4.72 acc: 0.83 
##      fdr: 0.19 npv: 0.86 lrm: 0.19 dor: 25.43
## 
## 
## Abbreviations:
## tpr - True positive rate (Sensitivity, Recall)
## fpr - False positive rate (Fall-out)
## fnr - False negative rate (Miss rate)
## tnr - True negative rate (Specificity)
## ppv - Positive predictive value (Precision)
## for - False omission rate
## lrp - Positive likelihood ratio (LR+)
## fdr - False discovery rate
## npv - Negative predictive value
## acc - Accuracy
## lrm - Negative likelihood ratio (LR-)
## dor - Diagnostic odds ratio

Confusion Matrix, Precision and Recall for kNN with spsa feature selection

##     predicted
## true 0         1                            
##    0 81        24        tpr: 0.77 fnr: 0.23
##    1 30        87        fpr: 0.26 tnr: 0.74
##      ppv: 0.73 for: 0.22 lrp: 3.01 acc: 0.76
##      fdr: 0.27 npv: 0.78 lrm: 0.31 dor: 9.79
## 
## 
## Abbreviations:
## tpr - True positive rate (Sensitivity, Recall)
## fpr - False positive rate (Fall-out)
## fnr - False negative rate (Miss rate)
## tnr - True negative rate (Specificity)
## ppv - Positive predictive value (Precision)
## for - False omission rate
## lrp - Positive likelihood ratio (LR+)
## fdr - False discovery rate
## npv - Negative predictive value
## acc - Accuracy
## lrm - Negative likelihood ratio (LR-)
## dor - Diagnostic odds ratio

Confusion Matrix, Precision and Recall for kNN with tuned parameters and spsa feature selection

##     predicted
## true 0         1                             
##    0 88        17        tpr: 0.84 fnr: 0.16 
##    1 34        83        fpr: 0.29 tnr: 0.71 
##      ppv: 0.72 for: 0.17 lrp: 2.88 acc: 0.77 
##      fdr: 0.28 npv: 0.83 lrm: 0.23 dor: 12.64
## 
## 
## Abbreviations:
## tpr - True positive rate (Sensitivity, Recall)
## fpr - False positive rate (Fall-out)
## fnr - False negative rate (Miss rate)
## tnr - True negative rate (Specificity)
## ppv - Positive predictive value (Precision)
## for - False omission rate
## lrp - Positive likelihood ratio (LR+)
## fdr - False discovery rate
## npv - Negative predictive value
## acc - Accuracy
## lrm - Negative likelihood ratio (LR-)
## dor - Diagnostic odds ratio

spFSR Feature Selection for Random Forest

## [1] 18

Determine if hyperparameters can be optimized further

Figure 14. The plot shows that the best performance was obtained in less than 20 iterations.

Best hyperparameters

Figure 15. Optimisation of mtry hyperparameter for the random forest classifier.

Best hyperparameters

## $mtry
## [1] 3
## mmce.test.mean  fpr.test.mean  tpr.test.mean 
##      0.1832898      0.2416667      0.8669481

The optimal of mtry was 2.0 with mean misclassification error of 0.183.

Construct a learner with the tuned parameters

Run spsa on tuned and not-tuned learner

## SPSA-FSR begins:
## Wrapper = rf
## Measure = auc
## Number of selected features = 0
## 
## iter  value   st.dev  num.ft  best.value
## 1     0.86272 0.0379  9       0.86272 *
## 2     0.86115 0.02728 10      0.86272
## 3     0.85701 0.02322 10      0.86272
## 4     0.86609 0.04009 9       0.86609 *
## 5     0.8605  0.01941 10      0.86609
## 6     0.85961 0.03981 11      0.86609
## 7     0.86795 0.03048 11      0.86795 *
## 8     0.86138 0.04024 10      0.86795
## 9     0.85896 0.0474  8       0.86795
## 10    0.86695 0.02844 12      0.86795
## 11    0.8633  0.03962 10      0.86795
## 12    0.86299 0.03528 11      0.86795
## 13    0.87491 0.04411 14      0.87491 *
## 14    0.88292 0.02198 12      0.88292 *
## 15    0.8816  0.02685 12      0.88292
## 16    0.87992 0.02055 12      0.88292
## 17    0.87133 0.03382 14      0.88292
## 18    0.88289 0.03541 13      0.88292
## 19    0.88868 0.03208 11      0.88868 *
## 20    0.88323 0.02402 9       0.88868
## 21    0.85849 0.03453 8       0.88868
## 22    0.87945 0.03793 10      0.88868
## 23    0.88755 0.02895 10      0.88868
## 24    0.8885  0.04092 10      0.88868
## 25    0.88434 0.03136 10      0.88868
## 26    0.88614 0.03342 10      0.88868
## 27    0.87897 0.03294 10      0.88868
## 28    0.88385 0.02315 10      0.88868
## 29    0.88283 0.03224 10      0.88868
## 30    0.88485 0.021   10      0.88868
## 31    0.88695 0.02621 10      0.88868
## 32    0.88595 0.02565 10      0.88868
## 33    0.88412 0.03572 10      0.88868
## 34    0.88482 0.03888 11      0.88868
## 35    0.8861  0.04683 11      0.88868
## 36    0.89087 0.02724 11      0.89087 *
## 37    0.88807 0.02869 11      0.89087
## 38    0.88831 0.04513 11      0.89087
## 39    0.88503 0.03356 11      0.89087
## 40    0.88332 0.02746 11      0.89087
## 41    0.89081 0.02688 11      0.89087
## 42    0.8839  0.02367 11      0.89087
## 43    0.88434 0.02563 11      0.89087
## 44    0.88813 0.0299  11      0.89087
## 45    0.89083 0.02874 11      0.89087
## 46    0.89094 0.04196 11      0.89094 *
## 47    0.89177 0.03083 11      0.89177 *
## 48    0.8912  0.03144 11      0.89177
## 49    0.89115 0.02317 11      0.89177
## 50    0.88551 0.02194 11      0.89177
## 51    0.88624 0.02697 11      0.89177
## 52    0.88715 0.04395 11      0.89177
## 53    0.89193 0.02504 11      0.89193 *
## 54    0.88365 0.02413 11      0.89193
## 55    0.88662 0.0264  11      0.89193
## 56    0.88724 0.02753 11      0.89193
## 57    0.87481 0.03371 11      0.89193
## 58    0.88056 0.03348 11      0.89193
## 59    0.88299 0.03959 12      0.89193
## 60    0.88644 0.02822 12      0.89193
## 61    0.87988 0.03305 11      0.89193
## 62    0.88789 0.02656 11      0.89193
## 63    0.89066 0.03677 11      0.89193
## 64    0.8841  0.04052 11      0.89193
## 65    0.88499 0.04131 11      0.89193
## 66    0.88205 0.04338 11      0.89193
## 67    0.8834  0.02699 11      0.89193
## 68    0.88533 0.02052 10      0.89193
## 69    0.88908 0.0302  10      0.89193
## 70    0.888   0.02186 10      0.89193
## 71    0.88535 0.02858 10      0.89193
## 72    0.88556 0.02369 10      0.89193
## 73    0.88224 0.0271  10      0.89193
## 
## Best iteration = 53
## Number of selected features = 11
## Best measure value = 0.89193
## Std. dev. of best measure = 0.02504
## Run time = 14.36 minutes.
## SPSA-FSR begins:
## Wrapper = rf
## Measure = auc
## Number of selected features = 0
## 
## iter  value   st.dev  num.ft  best.value
## 1     0.8649  0.03696 9       0.8649 *
## 2     0.8718  0.03475 12      0.8718 *
## 3     0.868   0.02671 11      0.8718
## 4     0.87638 0.03255 10      0.87638 *
## 5     0.86499 0.0383  9       0.87638
## 6     0.86413 0.03327 8       0.87638
## 7     0.88665 0.02856 12      0.88665 *
## 8     0.88625 0.01705 12      0.88665
## 9     0.8672  0.03854 10      0.88665
## 10    0.87854 0.03744 14      0.88665
## 11    0.88368 0.03347 12      0.88665
## 12    0.88164 0.0404  12      0.88665
## 13    0.88081 0.03335 12      0.88665
## 14    0.88263 0.02778 12      0.88665
## 15    0.88477 0.04211 12      0.88665
## 16    0.88378 0.0286  12      0.88665
## 17    0.88649 0.03481 12      0.88665
## 18    0.88033 0.03507 12      0.88665
## 19    0.88801 0.03302 12      0.88801 *
## 20    0.88543 0.02057 12      0.88801
## 21    0.8822  0.03562 12      0.88801
## 22    0.88829 0.03345 12      0.88829 *
## 23    0.88407 0.03459 12      0.88829
## 24    0.8837  0.02414 12      0.88829
## 25    0.88454 0.01784 12      0.88829
## 26    0.87858 0.04016 12      0.88829
## 27    0.8888  0.02776 12      0.8888 *
## 28    0.8845  0.0399  12      0.8888
## 29    0.885   0.02445 12      0.8888
## 30    0.88197 0.03706 12      0.8888
## 31    0.88549 0.02919 12      0.8888
## 32    0.88436 0.03498 12      0.8888
## 33    0.88245 0.02972 12      0.8888
## 34    0.88484 0.0286  12      0.8888
## 35    0.87648 0.0446  12      0.8888
## 36    0.88215 0.0259  12      0.8888
## 37    0.88468 0.02487 12      0.8888
## 38    0.88321 0.03639 12      0.8888
## 39    0.87934 0.03124 12      0.8888
## 40    0.88549 0.03395 12      0.8888
## 41    0.88052 0.02714 12      0.8888
## 42    0.89161 0.02642 11      0.89161 *
## 43    0.8916  0.03043 11      0.89161
## 44    0.88838 0.02369 11      0.89161
## 45    0.88744 0.03543 11      0.89161
## 46    0.88931 0.02391 11      0.89161
## 47    0.88411 0.03539 11      0.89161
## 48    0.88612 0.02604 11      0.89161
## 49    0.89115 0.02975 11      0.89161
## 50    0.88347 0.03438 11      0.89161
## 51    0.89053 0.02372 11      0.89161
## 52    0.89066 0.03223 11      0.89161
## 53    0.88373 0.04086 11      0.89161
## 54    0.89195 0.02719 11      0.89195 *
## 55    0.89038 0.0357  11      0.89195
## 56    0.88865 0.04041 11      0.89195
## 57    0.88921 0.03291 11      0.89195
## 58    0.88366 0.03933 11      0.89195
## 59    0.89088 0.03154 11      0.89195
## 60    0.88678 0.02984 11      0.89195
## 61    0.88801 0.02385 11      0.89195
## 62    0.89269 0.03181 11      0.89269 *
## 63    0.88914 0.0287  11      0.89269
## 64    0.88966 0.02655 11      0.89269
## 65    0.88857 0.03153 12      0.89269
## 66    0.8864  0.03088 12      0.89269
## 67    0.88233 0.03154 11      0.89269
## 68    0.88283 0.02783 11      0.89269
## 69    0.88734 0.03079 11      0.89269
## 70    0.88569 0.03236 11      0.89269
## 71    0.88855 0.01847 11      0.89269
## 72    0.88538 0.03167 11      0.89269
## 73    0.88669 0.03532 11      0.89269
## 74    0.88822 0.02176 11      0.89269
## 75    0.88294 0.04056 11      0.89269
## 76    0.88692 0.02634 11      0.89269
## 77    0.891   0.03291 11      0.89269
## 78    0.88765 0.02195 11      0.89269
## 79    0.88675 0.02782 11      0.89269
## 80    0.88596 0.02435 11      0.89269
## 81    0.88209 0.03122 11      0.89269
## 82    0.88914 0.03077 11      0.89269
## 
## Best iteration = 62
## Number of selected features = 11
## Best measure value = 0.89269
## Std. dev. of best measure = 0.03181
## Run time = 16 minutes.

Extract the spsa task (with the set of reduced features)

Plot the measure values across iterations

Figure 16. Scatter plots of mean accuracy rate for each iteration for the random forest classifier (left panel) with and (right panel) without hyperparameter tuning. Error bars of +/- 1 standard deviation are also shown.

Get feature importances for random forest

Get feature importances for random forest with tuned parameters

Table 7. Selected Features for Random Forest calssifier with and without spFSR feature selection
RF + spFSR
RF + spFSR + Tuned
Features Importance Features Importance
CP.non.anginal.pain 0.86828 Slope.upsloping 0.67339
Oldpeak 0.77995 CP.atypical.angina 0.61807
Exang 0.74931 Oldpeak 0.61475
Thal.normal 0.72170 CP.typical.angina 0.59232
CP.atypical.angina 0.71092 Thal.normal 0.58233
RestECG.ST.T.wave.abnormality 0.62218 CP.non.anginal.pain 0.57465
RestECG.normal 0.61559 FBS 0.56930
Gender.male 0.61351 Chol 0.56317
FBS 0.58535 RestECG.normal 0.55036
Chol 0.58038 Gender.male 0.54852

The top 10 features selected using the spFRS algorithm were similar for the random forest classifier with and without hyperparameter optimization and features selection (Table 7).

Importance Plotting

Figures 17. Features selected by the spFRS algorithm fused with the random forest classifier and ranked according to importance.

spFSR::plotImportance(spsaMod_rf_tuned)

Figures 18. Features selected by the spFRS algorithm fused with the random forest classifier with optimized hyperparameters and ranked according to importance.

Train models

Define the test data

Prediction

Evaluation

AUC plots for random forest

Figure 19. AUC plot for the random forest classifer. RF - Random forest classifier trained on all 17 features; RF_spsa - Random forest classifier trained on features selected using the spFRS algorithm; RF_spsa_tuned - Random forest classifier with tuned hyperparameters and trained on features selected using the spFRS algorithm.

Performance for random forest classifier with and without spSFRS feature selection

RF <- performance(pred_on_test_rf, measures = list(mmce, auc))
RF_spsa <- performance(pred_on_test_spsa_rf, measures = list(mmce, auc))
RF_spsa_tuned <- performance(pred_on_test_spsa_rf_tuned, measures = list(mmce, auc))

data_RF <- data.frame(RF, RF_spsa, RF_spsa_tuned)

kable(data_RF, caption = "Table 8. Performance for random forest classifier with and without tuned hyperparameters and spFRS feature selection") %>% kable_styling(full_width = F, font_size = 12)
Table 8. Performance for random forest classifier with and without tuned hyperparameters and spFRS feature selection
RF RF_spsa RF_spsa_tuned
mmce 0.1801802 0.1891892 0.1846847
auc 0.8885633 0.8763940 0.8641026

The random forest classifier without spFRS feature selection performed the best with a mean misclassification error of 0.171 and an AUC value of 0.893 (Table 8).

Confusion Matrix, Precision and Recall for random forest classifier

##     predicted
## true 0         1                             
##    0 91        14        tpr: 0.87 fnr: 0.13 
##    1 26        91        fpr: 0.22 tnr: 0.78 
##      ppv: 0.78 for: 0.13 lrp: 3.9  acc: 0.82 
##      fdr: 0.22 npv: 0.87 lrm: 0.17 dor: 22.75
## 
## 
## Abbreviations:
## tpr - True positive rate (Sensitivity, Recall)
## fpr - False positive rate (Fall-out)
## fnr - False negative rate (Miss rate)
## tnr - True negative rate (Specificity)
## ppv - Positive predictive value (Precision)
## for - False omission rate
## lrp - Positive likelihood ratio (LR+)
## fdr - False discovery rate
## npv - Negative predictive value
## acc - Accuracy
## lrm - Negative likelihood ratio (LR-)
## dor - Diagnostic odds ratio

Confusion Matrix, Precision and Recall for random forest classifier with features selected using the spFRS algorithm.

##     predicted
## true 0         1                            
##    0 90        15        tpr: 0.86 fnr: 0.14
##    1 27        90        fpr: 0.23 tnr: 0.77
##      ppv: 0.77 for: 0.14 lrp: 3.71 acc: 0.81
##      fdr: 0.23 npv: 0.86 lrm: 0.19 dor: 20  
## 
## 
## Abbreviations:
## tpr - True positive rate (Sensitivity, Recall)
## fpr - False positive rate (Fall-out)
## fnr - False negative rate (Miss rate)
## tnr - True negative rate (Specificity)
## ppv - Positive predictive value (Precision)
## for - False omission rate
## lrp - Positive likelihood ratio (LR+)
## fdr - False discovery rate
## npv - Negative predictive value
## acc - Accuracy
## lrm - Negative likelihood ratio (LR-)
## dor - Diagnostic odds ratio

Confusion Matrix, Precision and Recall for random forest classifier with optimized hyperparameters and features selected using the spFRS algorithm.

##     predicted
## true 0         1                             
##    0 89        16        tpr: 0.85 fnr: 0.15 
##    1 25        92        fpr: 0.21 tnr: 0.79 
##      ppv: 0.78 for: 0.15 lrp: 3.97 acc: 0.82 
##      fdr: 0.22 npv: 0.85 lrm: 0.19 dor: 20.47
## 
## 
## Abbreviations:
## tpr - True positive rate (Sensitivity, Recall)
## fpr - False positive rate (Fall-out)
## fnr - False negative rate (Miss rate)
## tnr - True negative rate (Specificity)
## ppv - Positive predictive value (Precision)
## for - False omission rate
## lrp - Positive likelihood ratio (LR+)
## fdr - False discovery rate
## npv - Negative predictive value
## acc - Accuracy
## lrm - Negative likelihood ratio (LR-)
## dor - Diagnostic odds ratio

Compare spFSR to other Feature Selection methods with 10 features

Random forest filter method for feature selection

Filter methods assign an importance to each feature. The feature is ranked according to importance value resulting in a feature subset. Create an object named mfv by calling generateFilterValuesData from mlr on classif.task and using the filter method randomForest.importance.

## Supervised task: dataHeart
## Type: regr
## Target: Goal
## Observations: 740
## Features:
##    numerics     factors     ordered functionals 
##          17           0           0           0 
## Missings: FALSE
## Has weights: FALSE
## Has blocking: FALSE
## Has coordinates: FALSE

Plot filtered features obtained random forest method

Figure 20. Features selectered sing a filter selection algorithm in mlr (https://mlr-org.github.io/mlr/articles/tutorial/devel/feature_selection.html).(Bischl et al. 2016)

We can also compare this with other filter methods to identify consistencies in the selected features (i.e. information.gain and chi.squared) using the interactive mode plotFilterValuesGGVIS(FV) (Fig. 19).

Plot filtered features obtained information gain and chi squared methods

Figure 21. Features selectered using a filter selection algorithm in mlr based on (Left panel) information gain and (right panel) chi squared (https://mlr-org.github.io/mlr/articles/tutorial/devel/feature_selection.html).(Bischl et al. 2016)

Using these 3 filters normal \(\beta\)-Thalassemia was consistently the most important feature while FBS (Fasting Blood Sugar) was the least important.

Fuse the random forest learner with the information gain filter

We now ‘fused’ the random forest classification learner with the information.gain filter to train the model.

Determine optimal number of features to keep

The optimal percentage of features to keep was determined by 5-fold cross-validation. We use ‘information gain’ as an importance measure and select the 10 features with highest importance. In each resampling iteration feature selection is carried out on the corresponding training data set before fitting the learner.

## [Tune] Started tuning learner classif.randomForest.filtered for parameter set:
##            Type len Def Constr Req Tunable Trafo
## fw.abs discrete   -   -     10   -    TRUE     -
## With control class: TuneControlGrid
## Imputation value: 1
## [Tune-x] 1: fw.abs=10
## [Tune-y] 1: mmce.test.mean=0.2202703; time: 0.0 min
## [Tune] Result: fw.abs=10 : mmce.test.mean=0.2202703

Performance (misclassification error)

The optimal percentage and corresponding misclassification error are:

## $fw.abs
## [1] 10
## mmce.test.mean 
##      0.2202703

Fuse learner with feature selection

We can now fuse it with fw percentage by “wrapper” the random forest learner with the chi-squared method before training the model:

View selected features

Now applied getFilteredFeatures on the trained model to view the selected features.

Wrapper Methods

Select optimal features to use

Used a random search with ten iterations on the random forest classifier and classif.task.

Performance (misclassification error)

## mmce.test.mean 
##       0.208507

View the important features

Table 9. Selected Features for Random Forest classifier with mlr Feature Selection (selectFeatures)
Filter Wrapper
Age Chol
Thalach FBS
Oldpeak CP.typical.angina
Gender.male Exang
CP.atypical.angina Slope.flat
Exang Slope.upsloping
Slope.flat Thal.normal
Slope.upsloping NA
Thal.normal NA
Thal.reversible.defect NA

The wrapper method selected fewer features and 4 of these were also selected by the filter method (Table 9).

Wrap feature selection method with learner

By comparing the misclassification error rates, a random search wrapper method out performed the chi squared (filtered) method. We then fused the wrapper method in a learnerusing makeFeatSelWrapper together with makeFeatSelControlRandom and makeResampleDesc objects.

Prediction

Evaluation

Obtain the confusion matrix by running calculateConfusionMatrix(pred_on_test) and get the ROC.

Confusion Matrix

##         predicted
## true      0  1 -err.-
##   0      84 21     21
##   1      25 92     25
##   -err.- 25 21     46

AUC plot

Figure 22. AUC plot for the random forest classifer. RF - Random forest classifier trained on all 17 features; RF_spsa_tuned - Random forest classifier with tuned hyperparameters and trained on features selected using the spFRS algorithm; RF_wrapper - Random forest classifier with tned parameters and trained on features selected using the selectFeatures mlr algorithm.

The AUC plot show that the random forest classifier with optimized hyperparameters and using the wrapper method for feature selection performed slightly better than the classifier with or without spFRS feature selection.

Performance for random forest with mlr feature selection

Misclassification Error and AUC value

Table 10. Classifier Performance
x
mmce 0.2072072
auc 0.8689052

The random forest classifier using the wrapper method for feature selection had a mean misclassification error of 0.216 and AUC value of 0.860 (Table 10).

Performance for random forest with mlr feature selection

##     predicted
## true 0         1                             
##    0 84        21        tpr: 0.8  fnr: 0.2  
##    1 25        92        fpr: 0.21 tnr: 0.79 
##      ppv: 0.77 for: 0.19 lrp: 3.74 acc: 0.79 
##      fdr: 0.23 npv: 0.81 lrm: 0.25 dor: 14.72
## 
## 
## Abbreviations:
## tpr - True positive rate (Sensitivity, Recall)
## fpr - False positive rate (Fall-out)
## fnr - False negative rate (Miss rate)
## tnr - True negative rate (Specificity)
## ppv - Positive predictive value (Precision)
## for - False omission rate
## lrp - Positive likelihood ratio (LR+)
## fdr - False discovery rate
## npv - Negative predictive value
## acc - Accuracy
## lrm - Negative likelihood ratio (LR-)
## dor - Diagnostic odds ratio
Table 11. Prediction classifiers performance with and without feature selection
kNN + spFSR kNN + spFSR + tuned RF + spFSR RF + spFSR + tuned RF + Filter RF + Wrapper
Thal.normal FBS CP.atypical.angina CP.atypical.angina Age Trestbps
FBS Oldpeak Thal.normal Thal.normal Thalach Chol
Age Exang Chol Oldpeak Oldpeak Gender.male
Thalach Chol Slope.flat Chol Gender.male CP.atypical.angina
RestECG.normal RestECG.normal Gender.male Exang CP.atypical.angina CP.typical.angina
Oldpeak CP.typical.angina CP.non.anginal.pain FBS Exang Thal.normal
Chol RestECG.ST.T.wave.abnormality Oldpeak CP.non.anginal.pain Slope.flat Thal.reversible.defect
CP.non.anginal.pain CP.non.anginal.pain CP.typical.angina Thalach Slope.upsloping NA
Trestbps Slope.upsloping FBS Slope.upsloping Thal.normal NA
Gender.male CP.atypical.angina Thal.reversible.defect Gender.male Thal.reversible.defect NA
Table 12. Prediction classifiers performance with and without feature selection
Classifier Optimal Parameters Feature Selection mmce AUC Precision Recall
Naïve Bayes Yes 0.180 0.891 0.76 0.84
Random Forest Yes 0.176 0.886 0.87 0.78
k-Nearest Neighbours Yes 0.162 0.893 0.87 0.81
NA NA NA NA
k-Nearest Neighbours Yes 0.180 0.885 0.84 0.79
k-Nearest Neighbours No spFSR 0.225 0.864 0.78 0.75
k-Nearest Neighbours Yes spFSR 0.243 0.840 0.80 0.72
NA NA NA NA
Random Forest Yes 0.171 0.893 0.88 0.79
Random Forest No spFSR 0.212 0.867 0.85 0.74
Random Forest Yes spFSR 0.207 0.869 0.85 0.75
NA NA NA NA
Random Forest Yes selFeat 0.216 0.860 0.72 0.80

All classifiers accurately predicted individual with heart disease reasonably well (AUC values > 0.84, precision > 0.70 and recall > 0.70) (Table 12). The kNN classifier trained on all 17 features in the dataset had the lowest mean misclassification error (0.162).

Discussion

The kNN and random forest classifiers performed reasonably well at predicting patients with heart disease (precision of up to 88% and recall of up to 84%). Initial results comparing the Naive Bayes, kNN and random forest classifiers indicated the the Naive Bayes classifier did not perform as well as the other classifiers. Training the kNN and random forest classifiers with features selected by various feature selection algorithms did not significantly improve the preformance of these classifiers.

Several of the top ten features selected by the various feature selection algorithms were related to chest pain (CP.atypical.angina, CP.non.anginal.pain and CP.typical.angina) and cholesterol levels (chol), known indicators of the potential for heart disease. ST depression induced by exercise relative to rest (Oldpeak) was also selected by these algorithms and is an indicator of heart disease. Interestingly, all of the features previously (Phase 1 of this project) identified as potential predictors of heart disease were selected by these algorithms. These featues were patient age (Age), cholestrol levels (Chol), ST depression induced by exercise relative to rest (Oldpeak), maxium heart rate (Thalach), slope of the peak exercise ST segment (Slope.upsloping) and exercise induced angina (Exang). Even so, it is possible that the number of features used in this study (or the number of observations) may not be large enough to observe any direct benefit from feature selection.

In future studies other supervised machine learning algorithms such as Support Vector Machines (SVM) or ensemble methods could be tested to determine whether they may perform better than those trialed in these studies.

Conclusion

The kNN and random forest classifiers performed reasonably well at identifying heart disease patients. However, some improvement is required before these predictors can be used in a clinical setting, as a significant number of diseased patients were not identified in these studies.

References

Aksakalli, Vural, and Milad Malekipirbazai. 2016. “Feature Selection via Binary Simultaneous Perturbation Stochastic Approximation.” Pattern Rceognition Letters.

Bischl, Bernd, Michel Lang, Lars Kotthoff, Julia Schiffner, Jakob Richter, Erich Studerus, Giuseppe Casalicchio, and Zachary M. Jones. 2016. “mlr: Machine Learning in R.” Journal of Machine Learning Research.

Breiman, L. 2001. “Random Forests.” Machine Learning.