Heart Diseases

knitr::include_graphics("HeartDiseases.png")

Heart Diseases, also known as Cardiovascular Diseases(CVD), are a generic term used for conditions that affect the heart’s blood vessels. Heart Diseases is the leading cause of death globally. Heart Diseases are typically a result of arteries building-up fatty deposits.

Symptoms of Heart Diseases

The typical heart disease symptoms comprise chest pain (angina), numbness, shortness of breath, coldness, or weakness around the arms and legs.
The arrhythmias symptoms include a type of flutter in the chest; tachycardia or a racing heartbeat; a slow or a weak heartbeat, or bradycardia; and fainting.
The cardiovascular problem may be marked by blue or pale gray skin.
thick heart muscle, termed cardiomyopathy, can result in bloating and breathlessness, amongst other symptoms.

Heart Diseases Risk Factors

High blood pressure, age, diabetes, poor diet or exercise habits, smoking, substance abuse, obesity, and other chronic kidney diseases also contribute to heart problems.

Dataset

The Heart Diseases dataset are from University of California machine learning repository. The dataset consist of 12 variables and 918 observations. The description of the variables are :

Age : age of the patient [years]
Sex : sex of the patient
- M : Male
- F : Female
ChestPainType : chest pain type
- TA : Typical Angina
- ATA : Atypical Angina
- NAP : Non-Anginal Pain,
- ASY : Asymptonic
RestingBP : resting blood pressure [mmHg]
Cholesterol : serum cholesterol [mm/dl]
FastingBS : fasting blood sugar
- 1 : if FastingBS > 120 mg/dl
- 0 : otherwise
RestingECG : esting electrocardiogram results
- Normal : Normal
- ST : having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05
MaxHR : maximum heart rate achived [Numeric value between 60 and 202]
ExerciseAngina : exercise-induced angina
- Y : Yes
- N : No
Oldpeak : oldpeak [Numeric value measured in depression]
ST_Slope : the slope of the peak exercise ST segment
- Up : upsloping
- Flat : flat
- Down : downsloping
HeartDisease : output class
- 1 : heart disease
- 0 : normal

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(caret) #confusionmatrix

## Loading required package: ggplot2

## Loading required package: lattice

library(class) #knn
library(e1071) #naiveBayes
library(ROCR)  #ROC
library(partykit) #Decision Tree

## Loading required package: grid

## Loading required package: libcoin

## Loading required package: mvtnorm

library(randomForest)

## randomForest 4.7-1

## Type rfNews() to see new features/changes/bug fixes.

## 
## Attaching package: 'randomForest'

## The following object is masked from 'package:ggplot2':
## 
##     margin

## The following object is masked from 'package:dplyr':
## 
##     combine

library(knitr)

heart <- read.csv("heart.csv")
str(heart)

## 'data.frame':    918 obs. of  12 variables:
##  $ Age           : int  40 49 37 48 54 39 45 54 37 48 ...
##  $ Sex           : chr  "M" "F" "M" "F" ...
##  $ ChestPainType : chr  "ATA" "NAP" "ATA" "ASY" ...
##  $ RestingBP     : int  140 160 130 138 150 120 130 110 140 120 ...
##  $ Cholesterol   : int  289 180 283 214 195 339 237 208 207 284 ...
##  $ FastingBS     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ RestingECG    : chr  "Normal" "Normal" "ST" "Normal" ...
##  $ MaxHR         : int  172 156 98 108 122 170 170 142 130 120 ...
##  $ ExerciseAngina: chr  "N" "N" "N" "Y" ...
##  $ Oldpeak       : num  0 1 0 1.5 0 0 0 0 1.5 0 ...
##  $ ST_Slope      : chr  "Up" "Flat" "Up" "Flat" ...
##  $ HeartDisease  : int  0 1 0 1 0 0 0 0 1 0 ...

Data Wrangling

colSums(is.na(heart))

##            Age            Sex  ChestPainType      RestingBP    Cholesterol 
##              0              0              0              0              0 
##      FastingBS     RestingECG          MaxHR ExerciseAngina        Oldpeak 
##              0              0              0              0              0 
##       ST_Slope   HeartDisease 
##              0              0

heart_clean <- heart %>% 
  mutate_if(is.character, as.factor) %>% 
  mutate_at(vars(FastingBS,HeartDisease), as.factor)

str(heart_clean)

## 'data.frame':    918 obs. of  12 variables:
##  $ Age           : int  40 49 37 48 54 39 45 54 37 48 ...
##  $ Sex           : Factor w/ 2 levels "F","M": 2 1 2 1 2 2 1 2 2 1 ...
##  $ ChestPainType : Factor w/ 4 levels "ASY","ATA","NAP",..: 2 3 2 1 3 3 2 2 1 2 ...
##  $ RestingBP     : int  140 160 130 138 150 120 130 110 140 120 ...
##  $ Cholesterol   : int  289 180 283 214 195 339 237 208 207 284 ...
##  $ FastingBS     : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ RestingECG    : Factor w/ 3 levels "LVH","Normal",..: 2 2 3 2 2 2 2 2 2 2 ...
##  $ MaxHR         : int  172 156 98 108 122 170 170 142 130 120 ...
##  $ ExerciseAngina: Factor w/ 2 levels "N","Y": 1 1 1 2 1 1 1 1 2 1 ...
##  $ Oldpeak       : num  0 1 0 1.5 0 0 0 0 1.5 0 ...
##  $ ST_Slope      : Factor w/ 3 levels "Down","Flat",..: 3 2 3 2 3 3 3 3 2 3 ...
##  $ HeartDisease  : Factor w/ 2 levels "0","1": 1 2 1 2 1 1 1 1 2 1 ...

Exploratory Data Analysis

summary(heart_clean)

##       Age        Sex     ChestPainType   RestingBP      Cholesterol   
##  Min.   :28.00   F:193   ASY:496       Min.   :  0.0   Min.   :  0.0  
##  1st Qu.:47.00   M:725   ATA:173       1st Qu.:120.0   1st Qu.:173.2  
##  Median :54.00           NAP:203       Median :130.0   Median :223.0  
##  Mean   :53.51           TA : 46       Mean   :132.4   Mean   :198.8  
##  3rd Qu.:60.00                         3rd Qu.:140.0   3rd Qu.:267.0  
##  Max.   :77.00                         Max.   :200.0   Max.   :603.0  
##  FastingBS  RestingECG      MaxHR       ExerciseAngina    Oldpeak       
##  0:704     LVH   :188   Min.   : 60.0   N:547          Min.   :-2.6000  
##  1:214     Normal:552   1st Qu.:120.0   Y:371          1st Qu.: 0.0000  
##            ST    :178   Median :138.0                  Median : 0.6000  
##                         Mean   :136.8                  Mean   : 0.8874  
##                         3rd Qu.:156.0                  3rd Qu.: 1.5000  
##                         Max.   :202.0                  Max.   : 6.2000  
##  ST_Slope   HeartDisease
##  Down: 63   0:410       
##  Flat:460   1:508       
##  Up  :395               
##                         
##                         
##

Cross Validation

Split the data into train and test dataset. 80% of the data will be use as the train dataset, and 20% of the data will be use as the test dataset.

RNGkind(sample.kind = "Rounding")

## Warning in RNGkind(sample.kind = "Rounding"): non-uniform 'Rounding' sampler
## used

set.seed(310)

ind_heart <- sample(x = nrow(heart_clean), size = nrow(heart_clean)*0.8)

heart_train <- heart_clean[ind_heart,]
heart_test <- heart_clean[-ind_heart,]

Check Class Imbalance

heart_train$HeartDisease %>% 
  table() %>% 
  prop.table()

## .
##         0         1 
## 0.4509537 0.5490463

The positive class in the train dataset is 55% and negative class is 45%. In spite of imbalace between the positive and negative class, but they are within acceptable range. Therefore, no need to upsample or downsample.

Models

Logistic Regression

Build basic Logistic Regression with all variables, the AIC number is 513.56

hmodel_logreg <- glm(formula = HeartDisease ~ . , data = heart_train, family = "binomial")
summary(hmodel_logreg)

## 
## Call:
## glm(formula = HeartDisease ~ ., family = "binomial", data = heart_train)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.5501  -0.3742   0.1803   0.4480   2.6262  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -1.605788   1.559330  -1.030 0.303107    
## Age               0.019757   0.015125   1.306 0.191457    
## SexM              1.478898   0.310109   4.769 1.85e-06 ***
## ChestPainTypeATA -1.946743   0.366836  -5.307 1.12e-07 ***
## ChestPainTypeNAP -1.607122   0.294460  -5.458 4.82e-08 ***
## ChestPainTypeTA  -1.296979   0.464342  -2.793 0.005220 ** 
## RestingBP         0.007600   0.006742   1.127 0.259637    
## Cholesterol      -0.003811   0.001231  -3.096 0.001962 ** 
## FastingBS1        1.093455   0.303533   3.602 0.000315 ***
## RestingECGNormal -0.113511   0.298322  -0.380 0.703576    
## RestingECGST     -0.104270   0.385371  -0.271 0.786722    
## MaxHR            -0.007241   0.005440  -1.331 0.183137    
## ExerciseAnginaY   1.002417   0.268274   3.737 0.000187 ***
## Oldpeak           0.256180   0.135863   1.886 0.059353 .  
## ST_SlopeFlat      1.484516   0.462843   3.207 0.001339 ** 
## ST_SlopeUp       -0.820405   0.496396  -1.653 0.098387 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1010.47  on 733  degrees of freedom
## Residual deviance:  481.56  on 718  degrees of freedom
## AIC: 513.56
## 
## Number of Fisher Scoring iterations: 5

Feature Selection

Using Feature Selection to tune the base Logistic Regression Model, with backward direction. The resulting AIC is slightly better, 508.7 (compared to 513.56). Therefore, we will be using model produced by Feature Selection.

hmodel_logreg_back <- step(object = hmodel_logreg, direction = "backward", trace = 0)
summary(hmodel_logreg_back)

## 
## Call:
## glm(formula = HeartDisease ~ Age + Sex + ChestPainType + Cholesterol + 
##     FastingBS + ExerciseAngina + Oldpeak + ST_Slope, family = "binomial", 
##     data = heart_train)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.5332  -0.3741   0.1850   0.4566   2.6337  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -2.275175   0.983254  -2.314 0.020672 *  
## Age               0.030517   0.013807   2.210 0.027087 *  
## SexM              1.479617   0.306649   4.825 1.40e-06 ***
## ChestPainTypeATA -1.958143   0.361297  -5.420 5.97e-08 ***
## ChestPainTypeNAP -1.675442   0.291650  -5.745 9.21e-09 ***
## ChestPainTypeTA  -1.317708   0.465893  -2.828 0.004679 ** 
## Cholesterol      -0.003805   0.001160  -3.279 0.001041 ** 
## FastingBS1        1.095335   0.300604   3.644 0.000269 ***
## ExerciseAnginaY   1.094505   0.260487   4.202 2.65e-05 ***
## Oldpeak           0.247598   0.133228   1.858 0.063104 .  
## ST_SlopeFlat      1.497483   0.462311   3.239 0.001199 ** 
## ST_SlopeUp       -0.854692   0.492659  -1.735 0.082766 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1010.5  on 733  degrees of freedom
## Residual deviance:  484.7  on 722  degrees of freedom
## AIC: 508.7
## 
## Number of Fisher Scoring iterations: 5

Predict

Predict the test dataset using the Logistic Regression Model

hpred_logreg_back <- predict(object = hmodel_logreg_back, newdata = heart_test, type = "response")

#return the class
hpred_logreg_back_label <-as.factor(ifelse(hpred_logreg_back > 0.5, "1", "0"))

Model Evaluation

Model Evaluation using confusion matrix. The accuracy of the model at 88%. Recall/Sensitivity at 88%. Precision/Pos Pred Value at 91%.

conf_logreg <- confusionMatrix(data = hpred_logreg_back_label, reference = heart_test$HeartDisease, positive = "1")
conf_logreg

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 70 13
##          1  9 92
##                                           
##                Accuracy : 0.8804          
##                  95% CI : (0.8246, 0.9235)
##     No Information Rate : 0.5707          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.7575          
##                                           
##  Mcnemar's Test P-Value : 0.5224          
##                                           
##             Sensitivity : 0.8762          
##             Specificity : 0.8861          
##          Pos Pred Value : 0.9109          
##          Neg Pred Value : 0.8434          
##              Prevalence : 0.5707          
##          Detection Rate : 0.5000          
##    Detection Prevalence : 0.5489          
##       Balanced Accuracy : 0.8811          
##                                           
##        'Positive' Class : 1               
##

K-Nearest Neighbor

Split the X(predictor) and Y(target) from the train and test dataset.

heart_train_x <- heart_train %>% 
  select_if(is.numeric)

heart_train_y <- heart_train %>% 
  select(HeartDisease)

heart_test_x <- heart_test %>% 
  select_if(is.numeric)

heart_test_y <- heart_test %>% 
  select(HeartDisease)

Scaling

Scale the train predictor, as they are not in the same scale.

heart_train_x_scale <- scale(heart_train_x)

heart_test_x_scale <- scale(heart_test_x,
                            center = attr(heart_train_x_scale,"scaled:center"),
                            scale = attr(heart_train_x_scale,"scaled:scale"))

define K value

heart_train_x %>% 
  nrow() %>% 
  sqrt()

## [1] 27.09243

K = 27

Predict

Predict the test dataset using the KNN Model

hpred_knn <- knn(train = heart_train_x_scale,
                 test = heart_test_x_scale,
                 cl = heart_train_y$HeartDisease,
                 k = 27)

Model Evaluation

Model Evaluation using confusion matrix. The accuracy of the model at 82%. Recall/Sensitivity at 79%. Precision/Pos Pred Value at 88%.

conf_knn <- confusionMatrix(data = hpred_knn, reference = heart_test_y$HeartDisease, positive = "1")
conf_knn

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 68 22
##          1 11 83
##                                           
##                Accuracy : 0.8207          
##                  95% CI : (0.7575, 0.8732)
##     No Information Rate : 0.5707          
##     P-Value [Acc > NIR] : 5.171e-13       
##                                           
##                   Kappa : 0.6402          
##                                           
##  Mcnemar's Test P-Value : 0.08172         
##                                           
##             Sensitivity : 0.7905          
##             Specificity : 0.8608          
##          Pos Pred Value : 0.8830          
##          Neg Pred Value : 0.7556          
##              Prevalence : 0.5707          
##          Detection Rate : 0.4511          
##    Detection Prevalence : 0.5109          
##       Balanced Accuracy : 0.8256          
##                                           
##        'Positive' Class : 1               
##

Naive Bayes Classifier

Model with Naive Bayes

hmodel_NB <- naiveBayes(formula = HeartDisease ~ . , data = heart_train, laplace = 1)

Predict

Predict the test dataset using the Naive Bayes Model

hpred_NB <- predict(object = hmodel_NB, newdata = heart_test, type = "class")

Model Evaluation

Confusion Matrix

Model Evaluation using confusion matrix. The accuracy of the model at 88%. Recall/Sensitivity at 87%. Precision/Pos Pred Value at 91%.

conf_NB <- confusionMatrix(data = hpred_NB, reference = heart_test$HeartDisease, positive = "1")
conf_NB

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 70 14
##          1  9 91
##                                           
##                Accuracy : 0.875           
##                  95% CI : (0.8184, 0.9191)
##     No Information Rate : 0.5707          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.7469          
##                                           
##  Mcnemar's Test P-Value : 0.4042          
##                                           
##             Sensitivity : 0.8667          
##             Specificity : 0.8861          
##          Pos Pred Value : 0.9100          
##          Neg Pred Value : 0.8333          
##              Prevalence : 0.5707          
##          Detection Rate : 0.4946          
##    Detection Prevalence : 0.5435          
##       Balanced Accuracy : 0.8764          
##                                           
##        'Positive' Class : 1               
##

ROC

Receiver-Operating Curve (ROC). Plot True Positive Rate (Sensitivity atau Recall) with False Positive Rate (1-Specificity). The resulting plot shows ┌── , which is a good curve.

hpred_NBprob <- predict(object = hmodel_NB, newdata = heart_test, type = "raw")

hpred_NB_roc <- prediction(predictions = hpred_NBprob[,2],
                           labels = as.numeric(heart_test$HeartDisease == "1"))

plot(performance(prediction.obj = hpred_NB_roc,
                 measure = "tpr",
                 x.measure = "fpr"))

AUC

Area Under ROC Curve (AUC) is 0.93.

hpred_NB_auc <- performance(prediction.obj = hpred_NB_roc, measure = "auc")

hpred_NB_auc@y.values

## [[1]]
## [1] 0.9320072

Decision Tree

Build model with Decistion Tree, and plot show ST_Slope is the Root Node.

hmodel_DT <- ctree(formula = HeartDisease ~ . , data = heart_train)

plot(hmodel_DT, type = "simple")

Predict

Predict the test dataset using Decision Tree model

hpred_DT <- predict(object = hmodel_DT, newdata = heart_test, type = "response")

Model Evaluation

Confustion Matrix

Model Evaluation using confusion matrix. The accuracy of the model at 88%. Recall/Sensitivity at 90%. Precision/Pos Pred Value at 88%.

conf_DT <- confusionMatrix(data = hpred_DT, reference = heart_test$HeartDisease, positive = "1")
conf_DT

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 67 10
##          1 12 95
##                                           
##                Accuracy : 0.8804          
##                  95% CI : (0.8246, 0.9235)
##     No Information Rate : 0.5707          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.7552          
##                                           
##  Mcnemar's Test P-Value : 0.8312          
##                                           
##             Sensitivity : 0.9048          
##             Specificity : 0.8481          
##          Pos Pred Value : 0.8879          
##          Neg Pred Value : 0.8701          
##              Prevalence : 0.5707          
##          Detection Rate : 0.5163          
##    Detection Prevalence : 0.5815          
##       Balanced Accuracy : 0.8764          
##                                           
##        'Positive' Class : 1               
##

ROC

Receiver-Operating Curve (ROC). Plot True Positive Rate (Sensitivity atau Recall) with False Positive Rate (1-Specificity). The resulting plot shows ┌── , which is a good curve.

hpred_DTprob <- predict(object = hmodel_DT, newdata = heart_test, type = "prob")

hpred_DT_roc <- prediction(predictions = hpred_DTprob[,2],
                           labels = as.numeric(heart_test$HeartDisease == "1"))

plot(performance(prediction.obj = hpred_DT_roc,
                 measure = "tpr",
                 x.measure = "fpr"))

AUC

Area Under ROC Curve (AUC) is 0.91.

hpred_DT_auc <- performance(prediction.obj = hpred_DT_roc, measure = "auc")

hpred_DT_auc@y.values

## [[1]]
## [1] 0.9136227

Random Forest

Model Fitting

Build Random Forest model with k-fold cross validation (k=25), and repeat 10 times. Save the model for future use.

set.seed(310)
control <- trainControl(method = "repeatedcv", number = 25, repeats = 10)

# hmodel_RF <- train(form = HeartDisease ~ . , data = heart_train,
#                    method = "rf", trControl = control)
# 
# saveRDS(hmodel_RF, "hmodel_RF.RDS")

The sample size of the model is 705. The final and optimum mtry is 2. And with OOB 13%, which mean the model is 87% accurate predicting the train data.

hmodel_RF <- readRDS("hmodel_RF.RDS")
hmodel_RF

## Random Forest 
## 
## 734 samples
##  11 predictor
##   2 classes: '0', '1' 
## 
## No pre-processing
## Resampling: Cross-Validated (25 fold, repeated 10 times) 
## Summary of sample sizes: 705, 705, 705, 705, 705, 705, ... 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa    
##    2    0.8674846  0.7303772
##    8    0.8534797  0.7028073
##   15    0.8368805  0.6696213
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 2.

hmodel_RF$finalModel

## 
## Call:
##  randomForest(x = x, y = y, mtry = min(param$mtry, ncol(x))) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 2
## 
##         OOB estimate of  error rate: 13.22%
## Confusion matrix:
##     0   1 class.error
## 0 274  57  0.17220544
## 1  40 363  0.09925558

Importance variable is ST_Slope, same with Decision Tree.

plot(varImp(hmodel_RF))

Predict

Predict the test dataset using Random Forest model

hpred_RF <- predict(object = hmodel_RF, newdata = heart_test)

Model Evaluation

Confusion Matrix

Model Evaluation using confusion matrix. The accuracy of the model at 91%. Recall/Sensitivity at 91%. Precision/Pos Pred Value at 93%.

conf_RF <- confusionMatrix(data = hpred_RF, reference = heart_test$HeartDisease, positive = "1")
conf_RF

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 72  9
##          1  7 96
##                                           
##                Accuracy : 0.913           
##                  95% CI : (0.8626, 0.9495)
##     No Information Rate : 0.5707          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.8231          
##                                           
##  Mcnemar's Test P-Value : 0.8026          
##                                           
##             Sensitivity : 0.9143          
##             Specificity : 0.9114          
##          Pos Pred Value : 0.9320          
##          Neg Pred Value : 0.8889          
##              Prevalence : 0.5707          
##          Detection Rate : 0.5217          
##    Detection Prevalence : 0.5598          
##       Balanced Accuracy : 0.9128          
##                                           
##        'Positive' Class : 1               
##

ROC

Receiver-Operating Curve (ROC). Plot True Positive Rate (Sensitivity atau Recall) with False Positive Rate (1-Specificity). The resulting plot shows ┌── , which is a good curve.

hpred_RFprob <- predict(object = hmodel_RF, newdata = heart_test, type = "prob")

hpred_RF_roc <- prediction(predictions = hpred_RFprob[,2],
                           labels = as.numeric(heart_test$HeartDisease == "1"))

plot(performance(prediction.obj = hpred_RF_roc,
                 measure = "tpr",
                 x.measure = "fpr"))

AUC

Area Under ROC Curve (AUC) is 0.94.

hpred_RF_auc <- performance(prediction.obj = hpred_RF_roc, measure = "auc")

hpred_RF_auc@y.values

## [[1]]
## [1] 0.9393611

Model Summary

Summarize all the confusion matrix of the model. Given the Heart Diseases dataset, the best model to predict Heart Diseases is Random Forest Model, with 91% Accuracy, 91% Recall/Sensitivity, and 93% Precision/Pos Pred.

conf_list <- list ("Logistic Regression" = conf_logreg,
                   "K-Nearest Neighbor" = conf_knn,
                   "Naive Bayes" = conf_NB,
                   "Decision Tree" = conf_DT,
                   "Random Forest" = conf_RF)

conf_results <- sapply(conf_list, function(x) x$byClass)

options(digits = 2)

results_table <- kable(conf_results)
results_table

	Logistic Regression	K-Nearest Neighbor	Naive Bayes	Decision Tree	Random Forest
Sensitivity	0.88	0.79	0.87	0.90	0.91
Specificity	0.89	0.86	0.89	0.85	0.91
Pos Pred Value	0.91	0.88	0.91	0.89	0.93
Neg Pred Value	0.84	0.76	0.83	0.87	0.89
Precision	0.91	0.88	0.91	0.89	0.93
Recall	0.88	0.79	0.87	0.90	0.91
F1	0.89	0.83	0.89	0.90	0.92
Prevalence	0.57	0.57	0.57	0.57	0.57
Detection Rate	0.50	0.45	0.49	0.52	0.52
Detection Prevalence	0.55	0.51	0.54	0.58	0.56
Balanced Accuracy	0.88	0.83	0.88	0.88	0.91

Heart Diseases

Sunarto Rusli

3/3/2022

Heart Diseases

Symptoms of Heart Diseases

Heart Diseases Risk Factors

Dataset

Data Wrangling

Exploratory Data Analysis

Cross Validation

Check Class Imbalance

Models

Logistic Regression

Feature Selection

Predict

Model Evaluation

K-Nearest Neighbor

Scaling

Predict

Model Evaluation

Naive Bayes Classifier

Predict

Model Evaluation

Confusion Matrix

ROC

AUC

Decision Tree

Predict

Model Evaluation

Confustion Matrix

ROC

AUC

Random Forest

Model Fitting

Predict

Model Evaluation

Confusion Matrix

ROC

AUC

Model Summary