11/22/2019

Summary

There are several markers that are strong risk factors for heart diseases in patients. If studied properly, these markers can be used to predict the likelihood of having heart disease in near future patients.

The goal of this project was to assess the importance of markers in a dataset provided by the Unviersity of Cailfornia, Irivine, and to train a model that could predict heart disease given the presented parameters.

This dataset contians 13 different measurements from 303 patients, and can be found [here]data from https://www.kaggle.com/ronitf/heart-disease-uci#heart.csv.

Parameters avaialble in the dataset

  • Age, Sex
  • Type of chest pain experienced (0-3)
  • Resting systolic blood pressure
  • Serum cholrestrol level (mg/dl)
  • A binary variable indicating high fasting blood surgar levels (>120 mg/dl)
  • Resting ECG result (0-2)
  • Maximum heartrate achieved during exercise
  • A binary variable indicating whether angia was induced during exercise
  • ST depression induced by exercise relative to rest in the ECG graph
  • The slope of the ST segment in the ECG graph during peak exercise
  • Number of major vessels blocked as seen by fluoroscopy (0-3)
  • Whether the patient has Thalassamia and what type (3, 6 and 7)

Training the initial random forest model

First we use 80% of the data as training dataset and use a RandomForest ensemble learning method on all variables to assess the importance and of each variable on the presence of heart disease.

set.seed(123)
inTrain <- createDataPartition(heart_df$target, p=0.8)[[1]]
training <- heart_df[inTrain,]
testing <- heart_df[-inTrain,]
rf_full_model <- randomForest(formula=target~., data=training)

Assessing Feature Importance

The we visualize the importance of various features of the model:

Disussing Feature Importance and Prediction Model

It appears that the following parameters are the most important in predicting heart disease:

  • Having higher heart rate during exercise
  • Detecting more blocked major vessels
  • Having a particular type of chest pain
  • Observing more ST depression in the ECG graph after exercise

Additionally, our assessment shows that having high fasting blood sugar is not assocaited with heart disease, therefore we can remove that parameter from our final prediction model.

Prediction Model’s Performance

It appears that the model has convered.

Final Prediction Result on the Test Data

We apply the model on the test data:

##           Reference
## Prediction  0  1
##          0 20  3
##          1  7 30

The results show that our model is capable of using the variables to distinguish patients with heart problems very well.

The sensitivity and speciticty of the model are 91% and 91% respectively.