Anoop Swarup
June 15, 2018
Project for Coursera “Developing Data Products” Course
Enter some data on the risk factors:
Then you are given an estimate of a Coronary Heart Disease (CHD) risk.
In subsequent slides we discuss the model for the web-based application.
The regression model we built is based on a subset of data from a study by Rousseauw et al, 1983, presented in South African Medical Journal.
We first created a generalized linear model (logistic regression) using all the variables in the 'SAheart' dataset.
fit <- glm(factor(chd) ~ ., data=SAheart, family = binomial)
Results from this model gave us the significant predictor variables to be used in our model for Shiny App. Those are: tobacco, ldl, famhist, typea, and age.
To build the final model, we partitioned the data into training (70%) and test (30%) datasets. The model was then built on the training dataset, and tested on the test dataset.
modFit <- train(chd ~ age + tobacco + typea + ldl + famhist,
method = "glm", family="binomial", data = trainSA)
training_prediction <- predict(modFit, trainSA)
testing_prediction <- predict(modFit, testSA)
confMat1 <- confusionMatrix(trainSA$chd, training_prediction)
# paste("Prediction accuracy - training:", round(confMat1$overall["Accuracy"], 2))
confMat2 <- confusionMatrix(testSA$chd, testing_prediction)
# paste("Prediction accuracy - test:", round(confMat2$overall["Accuracy"], 2))
https://alphasig.shinyapps.io/HeartPredict/