Logistic Regression of Heart Disease Data

Data_Products-Final_Project

Chris Harris

We make use of the UCI Heart Disease data https://archive.ics.uci.edu/ml/datasets/Heart+Disease.

The data contains a field num which represents angiographic disease status

– Value 0: < 50% diameter narrowing – Value 1: > 50% diameter narrowing

We assume values 0 mean a negative diagnosis for heart disease and values 1,2,3… represent a positive diagnosis. We wish to create a binary classifier to predict heart disease diagnosis.

The approach we take is performing logistic regression. The data set includes the following variables which we use as potential regressors.

  • Age
  • Sex
  • Chest Pain Type
  • Resting blood pressure
  • Cholesterol
  • Blood Sugar
  • Electrocardiographic Results
  • Maximum Heart Rate

We propose to create a Shiny App which allows for selecting a subset of these variables. For instance, suppose the variables Age, Resting blood pressure, and Maximum Heart Rate are selected. The model can be trained via the following

train.index = createDataPartition(heart.data$ha,p=0.6, list=FALSE)
train <- heart.data[train.index,]; test <- heart.data[-train.index,]
model <- train(as.factor(ha) ~ age + trestbps + thalach, data=train, method = 'glm', family = 'binomial')

Results

With this model we can make a bar plot to depict (clockwise starting from top left ) true negatives, false positives, true positives, and false negatives.

plot of chunk unnamed-chunk-3

The accuracy of this prediction is

paste("Accuracy = ", sum(predict(model,test) == test$ha)/dim(test)[1], sep= " ")
[1] "Accuracy =  0.760330578512397"