Number of observations: Healthy and Heart Disease cases


Heart Disease is uniformly spread out across Age


No major difference in Rest ECG for Healthy and Heart Disease patients


More Heart Disease patients seem to have between 200 and 250 mg/dl


Heart Disease patients have higher maximum heart rate than healthy patients


More Heart Disease patients have ST depression of 0.1


Almost all of the patients who have Heart Disease have 0 major vessels as observed by Fluroscopy


More females have Heart Disease


More Heart Disease patients have chest pain type 1 or 2


No difference in fasting blood sugar


Patients with Rest ECG 1 have more Heart Diseases


Patients with no exercise induced angina have more Heart Disease


Peak excercise ST Slope 2 have more Heart Disease


Fixed defect thalasemia has more Heart Disease


## 
## Call:
## glm(formula = target ~ ., family = binomial, data = h)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.7777  -0.3544   0.1525   0.5302   2.6007  
## 
## Coefficients:
##                                     Estimate Std. Error z value Pr(>|z|)
## (Intercept)                        1.1005784  3.3647651   0.327  0.74360
## age                               -0.0005734  0.0235952  -0.024  0.98061
## sexMale                           -1.5149396  0.5212317  -2.906  0.00366
## cpChest Pain Type 1                0.9832302  0.5640531   1.743  0.08131
## cpChest Pain Type 2                1.9452318  0.4771939   4.076 4.57e-05
## cpChest Pain Type 3                2.0159122  0.6506319   3.098  0.00195
## trestbps                          -0.0170729  0.0107004  -1.596  0.11059
## chol                              -0.0043317  0.0038894  -1.114  0.26539
## fbsFasting Blood Sugar > 120       0.1764007  0.5661856   0.312  0.75538
## restecgRest ECG 1                  0.5702065  0.3745081   1.523  0.12787
## restecgRest ECG 2                 -0.2767289  2.2672126  -0.122  0.90285
## thalach                            0.0171314  0.0107357   1.596  0.11055
## exangExercise Induced Angina      -0.7630837  0.4260285  -1.791  0.07327
## oldpeak                           -0.4892926  0.2258040  -2.167  0.03024
## slopePeak Excercise ST Slope 1    -0.7196641  0.8630729  -0.834  0.40437
## slopePeak Excercise ST Slope 2     0.2015612  0.9382445   0.215  0.82990
## ca                                -0.8331781  0.2043120  -4.078 4.54e-05
## thalNormal Thalassemia             1.8146869  2.3786093   0.763  0.44551
## thalFixed Defect Thalassemia       1.8533188  2.2904818   0.809  0.41844
## thalReversible Defect Thalassemia  0.4732491  2.3013525   0.206  0.83707
##                                      
## (Intercept)                          
## age                                  
## sexMale                           ** 
## cpChest Pain Type 1               .  
## cpChest Pain Type 2               ***
## cpChest Pain Type 3               ** 
## trestbps                             
## chol                                 
## fbsFasting Blood Sugar > 120         
## restecgRest ECG 1                    
## restecgRest ECG 2                    
## thalach                              
## exangExercise Induced Angina      .  
## oldpeak                           *  
## slopePeak Excercise ST Slope 1       
## slopePeak Excercise ST Slope 2       
## ca                                ***
## thalNormal Thalassemia               
## thalFixed Defect Thalassemia         
## thalReversible Defect Thalassemia    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 417.64  on 302  degrees of freedom
## Residual deviance: 201.69  on 283  degrees of freedom
## AIC: 241.69
## 
## Number of Fisher Scoring iterations: 6

Taking only the significant variables and summarising

##      sex                      cp                             exang    
##  Female: 96   Chest Pain Type 0:143   No Exercise Induced Angina:204  
##  Male  :207   Chest Pain Type 1: 50   Exercise Induced Angina   : 99  
##               Chest Pain Type 2: 87                                   
##               Chest Pain Type 3: 23                                   
##                                                                       
##                                                                       
##     oldpeak           ca                   target   
##  Min.   :0.00   Min.   :0.0000   Healthy      :138  
##  1st Qu.:0.00   1st Qu.:0.0000   Heart Disease:165  
##  Median :0.80   Median :0.0000                      
##  Mean   :1.04   Mean   :0.7294                      
##  3rd Qu.:1.60   3rd Qu.:1.0000                      
##  Max.   :6.20   Max.   :4.0000

## 
## Call:
## glm(formula = target ~ ., family = binomial, data = d)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3277  -0.5202   0.2011   0.5714   2.5038  
## 
## Coefficients:
##                              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                    1.9614     0.4348   4.511 6.44e-06 ***
## sexMale                       -1.4117     0.3894  -3.625 0.000289 ***
## cpChest Pain Type 1            1.3498     0.4868   2.773 0.005560 ** 
## cpChest Pain Type 2            2.0905     0.4192   4.987 6.12e-07 ***
## cpChest Pain Type 3            2.0161     0.6086   3.313 0.000924 ***
## exangExercise Induced Angina  -1.2217     0.3721  -3.283 0.001028 ** 
## oldpeak                       -0.8060     0.1810  -4.454 8.42e-06 ***
## ca                            -0.7635     0.1662  -4.595 4.34e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 417.64  on 302  degrees of freedom
## Residual deviance: 238.32  on 295  degrees of freedom
## AIC: 254.32
## 
## Number of Fisher Scoring iterations: 5


As ST depression rises, chances of a heart disease falls


As number of vessels as observed by fluroscopy rises, probability of heart disease falls


## Generalized Linear Model 
## 
## 242 samples
##   5 predictor
##   2 classes: 'Healthy', 'Heart.Disease' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 10 times) 
## Summary of sample sizes: 218, 217, 218, 219, 218, 218, ... 
## Resampling results:
## 
##   ROC        Sens       Spec     
##   0.8697208  0.7675758  0.8408974

Variable importance. We see that ST Depression is the most important variable followed by Chest Pain Type and No. of Vessels

## glm variable importance
## 
##                                Overall
## oldpeak                         100.00
## `cpChest Pain Type 2`            91.83
## ca                               72.94
## sexMale                          63.47
## `cpChest Pain Type 3`            38.18
## `exangExercise Induced Angina`   29.97
## `cpChest Pain Type 1`             0.00

## Confusion Matrix and Statistics
## 
##                
## pred            Healthy Heart Disease
##   Healthy            21             3
##   Heart Disease       3            34
##                                          
##                Accuracy : 0.9016         
##                  95% CI : (0.7981, 0.963)
##     No Information Rate : 0.6066         
##     P-Value [Acc > NIR] : 2.801e-07      
##                                          
##                   Kappa : 0.7939         
##  Mcnemar's Test P-Value : 1              
##                                          
##             Sensitivity : 0.9189         
##             Specificity : 0.8750         
##          Pos Pred Value : 0.9189         
##          Neg Pred Value : 0.8750         
##              Prevalence : 0.6066         
##          Detection Rate : 0.5574         
##    Detection Prevalence : 0.6066         
##       Balanced Accuracy : 0.8970         
##                                          
##        'Positive' Class : Heart Disease  
## 

Plotting the Confusion Matrix for Logistic Regression


## Random Forest 
## 
## 242 samples
##   5 predictor
##   2 classes: 'Healthy', 'Heart.Disease' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 10 times) 
## Summary of sample sizes: 218, 218, 217, 217, 218, 219, ... 
## Resampling results across tuning parameters:
## 
##   mtry  ROC        Sens       Spec     
##   2     0.8807964  0.7402273  0.8061538
##   4     0.8705818  0.7596212  0.7842308
##   7     0.8636490  0.7422727  0.7826282
## 
## ROC was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 2.

Variable importance of random forest. We see similar importance as logistic regression.

## rf variable importance
## 
##                              Overall
## oldpeak                      100.000
## ca                            70.330
## exangExercise Induced Angina  43.950
## cpChest Pain Type 2           20.794
## sexMale                       18.268
## cpChest Pain Type 3            6.915
## cpChest Pain Type 1            0.000

## Confusion Matrix and Statistics
## 
##                pred
##                 Healthy Heart Disease
##   Healthy            21             3
##   Heart Disease       2            35
##                                          
##                Accuracy : 0.918          
##                  95% CI : (0.819, 0.9728)
##     No Information Rate : 0.623          
##     P-Value [Acc > NIR] : 1.627e-07      
##                                          
##                   Kappa : 0.827          
##  Mcnemar's Test P-Value : 1              
##                                          
##             Sensitivity : 0.9211         
##             Specificity : 0.9130         
##          Pos Pred Value : 0.9459         
##          Neg Pred Value : 0.8750         
##              Prevalence : 0.6230         
##          Detection Rate : 0.5738         
##    Detection Prevalence : 0.6066         
##       Balanced Accuracy : 0.9170         
##                                          
##        'Positive' Class : Heart Disease  
## 

Plotting the Confusion Matrix for Random Forest


## Neural Network 
## 
## 242 samples
##   5 predictor
##   2 classes: 'Healthy', 'Heart.Disease' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 10 times) 
## Summary of sample sizes: 219, 218, 217, 218, 218, 217, ... 
## Resampling results across tuning parameters:
## 
##   size  decay  ROC        Sens       Spec     
##   1     0e+00  0.8253329  0.7371970  0.7973718
##   1     1e-04  0.8367104  0.7455303  0.7835897
##   1     1e-01  0.8606520  0.7200758  0.8532692
##   3     0e+00  0.8395350  0.7092424  0.8291667
##   3     1e-04  0.8425534  0.7068182  0.8097436
##   3     1e-01  0.8775925  0.7393182  0.8379487
##   5     0e+00  0.8053914  0.7181061  0.7803205
##   5     1e-04  0.8358202  0.7387879  0.7939103
##   5     1e-01  0.8758001  0.7457576  0.8314744
## 
## ROC was used to select the optimal model using the largest value.
## The final values used for the model were size = 3 and decay = 0.1.

Similar variable importance is observed

## nnet variable importance
## 
##                              Overall
## exangExercise Induced Angina  100.00
## cpChest Pain Type 3            56.58
## ca                             47.47
## sexMale                        40.05
## cpChest Pain Type 2            12.83
## cpChest Pain Type 1            11.14
## oldpeak                         0.00

## Confusion Matrix and Statistics
## 
##                pred
##                 Healthy Heart Disease
##   Healthy            22             2
##   Heart Disease       3            34
##                                          
##                Accuracy : 0.918          
##                  95% CI : (0.819, 0.9728)
##     No Information Rate : 0.5902         
##     P-Value [Acc > NIR] : 1.172e-08      
##                                          
##                   Kappa : 0.8295         
##  Mcnemar's Test P-Value : 1              
##                                          
##             Sensitivity : 0.9444         
##             Specificity : 0.8800         
##          Pos Pred Value : 0.9189         
##          Neg Pred Value : 0.9167         
##              Prevalence : 0.5902         
##          Detection Rate : 0.5574         
##    Detection Prevalence : 0.6066         
##       Balanced Accuracy : 0.9122         
##                                          
##        'Positive' Class : Heart Disease  
## 

Plotting the Confusion Matrix for Neural Network


Running rpart to obtain Decision Tree for decision making

##    user  system elapsed 
##    1.81    0.01    1.94
## CART 
## 
## 303 samples
##   5 predictor
##   2 classes: 'Healthy', 'Heart.Disease' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 10 times) 
## Summary of sample sizes: 273, 274, 273, 272, 272, 272, ... 
## Resampling results:
## 
##   ROC        Sens       Spec     
##   0.8192162  0.7413736  0.8274265
## 
## Tuning parameter 'cp' was held constant at a value of 0.01

Similar variable importance

## rpart variable importance
## 
##                                Overall
## oldpeak                         100.00
## exangExercise Induced Angina     94.56
## cpChest Pain Type 2              69.69
## ca                               61.07
## sexMale                          55.04
## cpChest Pain Type 1              11.54
## `cpChest Pain Type 1`             0.00
## `exangExercise Induced Angina`    0.00
## `cpChest Pain Type 3`             0.00
## `cpChest Pain Type 2`             0.00