Logistic Regression of Heart Disease Data

Data_Products-Final_Project

Chris Harris

We make use of the UCI Heart Disease data https://archive.ics.uci.edu/ml/datasets/Heart+Disease.

The data contains a field num which represents angiographic disease status

– Value 0: < 50% diameter narrowing – Value 1: > 50% diameter narrowing

We assume values 0 mean a negative diagnosis for heart disease and values 1,2,3… represent a positive diagnosis. We wish to create a binary classifier to predict heart disease diagnosis.

The approach we take is performing logistic regression. The data set includes the following variables which we use as potential regressors.

Age
Sex
Chest Pain Type
Resting blood pressure
Cholesterol
Blood Sugar
Electrocardiographic Results
Maximum Heart Rate

We propose to create a Shiny App which allows for selecting a subset of these variables. For instance, suppose the variables Age, Resting blood pressure, and Maximum Heart Rate are selected. The model can be trained via the following

train.index = createDataPartition(heart.data$ha,p=0.6, list=FALSE)
train <- heart.data[train.index,]; test <- heart.data[-train.index,]
model <- train(as.factor(ha) ~ age + trestbps + thalach, data=train, method = 'glm', family = 'binomial')

Results

With this model we can make a bar plot to depict (clockwise starting from top left ) true negatives, false positives, true positives, and false negatives.

plot of chunk unnamed-chunk-3

The accuracy of this prediction is

paste("Accuracy = ", sum(predict(model,test) == test$ha)/dim(test)[1], sep= " ")

[1] "Accuracy =  0.760330578512397"