Data_Products-Final_Project
Chris Harris
We make use of the UCI Heart Disease data https://archive.ics.uci.edu/ml/datasets/Heart+Disease.
The data contains a field num which represents angiographic disease status
– Value 0: < 50% diameter narrowing – Value 1: > 50% diameter narrowing
We assume values 0 mean a negative diagnosis for heart disease and values 1,2,3… represent a positive diagnosis. We wish to create a binary classifier to predict heart disease diagnosis.
The approach we take is performing logistic regression. The data set includes the following variables which we use as potential regressors.
We propose to create a Shiny App which allows for selecting a subset of these variables. For instance, suppose the variables Age, Resting blood pressure, and Maximum Heart Rate are selected. The model can be trained via the following
train.index = createDataPartition(heart.data$ha,p=0.6, list=FALSE)
train <- heart.data[train.index,]; test <- heart.data[-train.index,]
model <- train(as.factor(ha) ~ age + trestbps + thalach, data=train, method = 'glm', family = 'binomial')
With this model we can make a bar plot to depict (clockwise starting from top left ) true negatives, false positives, true positives, and false negatives.
The accuracy of this prediction is
paste("Accuracy = ", sum(predict(model,test) == test$ha)/dim(test)[1], sep= " ")
[1] "Accuracy = 0.760330578512397"