MyShinyAppPresentation

A. Rosa Castillo
06.11.2017

Motivation

  • I completed also the courses about Regression Models and Machine Learning.
  • Once it was named the dataset ''SAheart'' from the ''ElemStatLearn'' package, but just briefly. This dataset shows a retrospective sample of males in a heart-disease high-risk region of the Western Cape, South Africa.
  • I decided to put into work what we learned about modeling data and predicting outcomes.
  • With some interesting dataset as the SAheart.

Building the model

The dataset is divided in two groups: train and test. The train dataset is going to be slightly bigger than the other. The test group will be used to test the model or generate a confusion matrix.

    data(SAheart)
    set.seed(8484)
    train = sample(1:dim(SAheart)[1],size=dim(SAheart)[1]/1.3,replace=F)
    trainSA = SAheart[train,]
    testSA = SAheart[-train,]

We chose a generalized linear model with six predictors:

  • tobacco consumption,
  • alcohol consumption,
  • cholesterol levels,
  • age,
  • type A,
  • obesity
modelFit <- train(factor(chd) ~ age + alcohol + obesity + tobacco + typea + ldl, data = trainSA, method = "glm", family = "binomial")

With resampling we try to improve the accuracy of the model.

modelFit$resample
    Accuracy     Kappa   Resample
1  0.6811594 0.2177274 Resample01
2  0.7177419 0.3606364 Resample02
3  0.7109375 0.3268903 Resample03
4  0.7132353 0.3591107 Resample04
5  0.6694915 0.2584596 Resample05
6  0.6834532 0.2937644 Resample06
7  0.7251908 0.3908551 Resample07
8  0.6640000 0.2093373 Resample08
9  0.7054264 0.3046809 Resample09
10 0.7500000 0.4368985 Resample10
11 0.7123288 0.3204787 Resample11
12 0.7000000 0.2795883 Resample12
13 0.7142857 0.3238348 Resample13
14 0.6829268 0.3206345 Resample14
15 0.7500000 0.4470656 Resample15
16 0.8030303 0.5338223 Resample16
17 0.7021277 0.3063949 Resample17
18 0.7593985 0.4279570 Resample18
19 0.7600000 0.4149766 Resample19
20 0.6666667 0.2804757 Resample20
21 0.7727273 0.4655870 Resample21
22 0.7751938 0.5033851 Resample22
23 0.7153285 0.3983786 Resample23
24 0.6842105 0.2961190 Resample24
25 0.7886179 0.4889741 Resample25
confusionMatrix.train(modelFit)
Bootstrapped (25 reps) Confusion Matrix 

(entries are percentual average cell counts across resamples)

          Reference
Prediction    0    1
         0 54.6 18.0
         1 10.0 17.4

 Accuracy (average) : 0.72

How important is each predictor

We provide at the shiny app the different coefficients for each predictor. Thus we get an idea how important is each one.

modelFit$finalModel[1]
$coefficients
 (Intercept)          age      alcohol      obesity      tobacco 
-5.266713846  0.067523879  0.000668665 -0.075501628  0.068533940 
       typea          ldl 
 0.040885964  0.228222302 

''ldl'', obesity and tobacco are important factors.

Let's make a prediction

For instance for the default values of the app: tobacco = 15, obesity = 31, age = 40, alcohol = 50 and ldl =7, we get a 0 meaning FALSE. The rest of the needed values for the prediction are taken from the first test group element.

[1] 0
Levels: 0 1

Finally the accuracy is computed using the test group.

# predict new values with testing
predictedValues <- predict(modelFit, newdata = testSA)
xtab <- table(predictedValues, testSA$chd)
c <- confusionMatrix(xtab)
c$overall[1]
 Accuracy 
0.6448598