Introduction

Research Question: What factors contribute to an individual having cardiovascular disease?

The dataset I opted to use was a sample of 500 individuals with 31 variables. The dataset was basically collecting health information from the individuals. It included variables like: Gender, Age, CVD, Cholesterol, Hypertension, Systolic blood pressure, Smoking, Ethnicity and many more just to mention the important ones. I got this dataset from OpenIntro.org and link is down below:

https://www.openintro.org/data/index.php?data=prevend.samp

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(dplyr)
library(tibble)
library(cowplot)
## Warning: package 'cowplot' was built under R version 4.5.2
## 
## Attaching package: 'cowplot'
## 
## The following object is masked from 'package:lubridate':
## 
##     stamp
library(pROC)
## Warning: package 'pROC' was built under R version 4.5.2
## Type 'citation("pROC")' for a citation.
## 
## Attaching package: 'pROC'
## 
## The following objects are masked from 'package:stats':
## 
##     cov, smooth, var
main_analysis_data <- read.csv("prevend.samp.csv")

head(main_analysis_data )
##   Casenr Age Gender Ethnicity Education RFFT VAT CVD DM Smoking Hypertension
## 1   2266  55      1         3         2   62  -1   0  1       0            0
## 2   3235  65      1         0         1   79  11   0  0       0            1
## 3   1068  46      0         2         3   89   6   0  0       0            0
## 4   3422  68      1         0         2   70   5   0  0       0            0
## 5   3570  70      0         0         2   35  10   0  0       0            0
## 6   1932  53      0         0         0   14   7   0  0       0            0
##        BMI   SBP  DBP  MAP     eGFR Albuminuria.1 Albuminuria.2 Chol  HDL
## 1 39.66942 122.0 63.5 86.0 83.29573             0             0 3.86 1.54
## 2 28.98114 107.5 66.5 82.5 76.49061             0             0 5.64 1.53
## 3 23.23346 120.5 75.0 92.5 76.44909             1             2 6.83 1.04
## 4 22.34352 114.5 67.0 85.0 61.23697             0             1 7.11 1.85
## 5 32.41004 114.0 72.5 89.5 88.14615             0             0 5.04 1.40
## 6 29.15877 126.5 75.0 95.0 91.90595             0             0 3.05 0.79
##   Statin Solubility Days     Years  DDD FRS         PS PSquint GRS Match_1
## 1      0          2   -1 -1.000000    0   8 0.37427518       5   1     816
## 2      1          1 1672  4.580822 1373  11 0.25594606       4   1     727
## 3      0          2   -1 -1.000000    0  -1 0.12850400       3   0      -1
## 4      0          2   -1 -1.000000    0   9 0.09417574       2   0     838
## 5      0          2   -1 -1.000000    0  12 0.19343208       4   1      -1
## 6      0          2   -1 -1.000000    0   8 0.13091850       3   1      15
##   Match_2
## 1     113
## 2     242
## 3      -1
## 4      -1
## 5     276
## 6     121
str(main_analysis_data )
## 'data.frame':    500 obs. of  31 variables:
##  $ Casenr       : int  2266 3235 1068 3422 3570 1932 3134 3573 1103 868 ...
##  $ Age          : int  55 65 46 68 70 53 64 70 46 44 ...
##  $ Gender       : int  1 1 0 1 0 0 0 0 1 0 ...
##  $ Ethnicity    : int  3 0 2 0 0 0 2 0 0 0 ...
##  $ Education    : int  2 1 3 2 2 0 1 3 2 3 ...
##  $ RFFT         : int  62 79 89 70 35 14 31 47 88 91 ...
##  $ VAT          : int  -1 11 6 5 10 7 8 5 11 11 ...
##  $ CVD          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ DM           : int  1 0 0 0 0 0 0 0 0 0 ...
##  $ Smoking      : int  0 0 0 0 0 0 0 0 1 0 ...
##  $ Hypertension : int  0 1 0 0 0 0 0 1 1 0 ...
##  $ BMI          : num  39.7 29 23.2 22.3 32.4 ...
##  $ SBP          : num  122 108 120 114 114 ...
##  $ DBP          : num  63.5 66.5 75 67 72.5 75 77 76.5 99 70 ...
##  $ MAP          : num  86 82.5 92.5 85 89.5 ...
##  $ eGFR         : num  83.3 76.5 76.4 61.2 88.1 ...
##  $ Albuminuria.1: int  0 0 1 0 0 0 0 0 0 0 ...
##  $ Albuminuria.2: int  0 0 2 1 0 0 1 0 1 0 ...
##  $ Chol         : num  3.86 5.64 6.83 7.11 5.04 3.05 4.9 5.5 3.92 5.75 ...
##  $ HDL          : num  1.54 1.53 1.04 1.85 1.4 0.79 1.23 1.57 1.39 1.18 ...
##  $ Statin       : int  0 1 0 0 0 0 0 0 0 0 ...
##  $ Solubility   : int  2 1 2 2 2 2 2 2 2 2 ...
##  $ Days         : int  -1 1672 -1 -1 -1 -1 -1 -1 -1 -1 ...
##  $ Years        : num  -1 4.58 -1 -1 -1 ...
##  $ DDD          : num  0 1373 0 0 0 ...
##  $ FRS          : int  8 11 -1 9 12 8 12 15 11 8 ...
##  $ PS           : num  0.3743 0.2559 0.1285 0.0942 0.1934 ...
##  $ PSquint      : int  5 4 3 2 4 3 3 4 3 2 ...
##  $ GRS          : int  1 1 0 0 1 1 0 1 0 0 ...
##  $ Match_1      : int  816 727 -1 838 -1 15 200 -1 -1 -1 ...
##  $ Match_2      : int  113 242 -1 -1 276 121 -1 -1 -1 -1 ...
colSums(is.na(main_analysis_data )) 
##        Casenr           Age        Gender     Ethnicity     Education 
##             0             0             0             0             0 
##          RFFT           VAT           CVD            DM       Smoking 
##             0             0             0             0             0 
##  Hypertension           BMI           SBP           DBP           MAP 
##             0             0             0             0             0 
##          eGFR Albuminuria.1 Albuminuria.2          Chol           HDL 
##             0             0             0             0             0 
##        Statin    Solubility          Days         Years           DDD 
##             0             0             0             0             0 
##           FRS            PS       PSquint           GRS       Match_1 
##             0             0             0             0             0 
##       Match_2 
##             0
tail(main_analysis_data )
##     Casenr Age Gender Ethnicity Education RFFT VAT CVD DM Smoking Hypertension
## 495   1161  46      1         0         1   65  11   0  0       0            0
## 496    475  40      1         0         2  109  11   0  0       0            0
## 497   2784  60      0         0         3   89  12   0  0       0            1
## 498   2121  54      1         0         2   79   7   0  0       0            0
## 499   1983  53      1         0         3   85  11   0  0       0            1
## 500    468  40      1         0         3  125  12   0  0       0            0
##          BMI   SBP  DBP  MAP     eGFR Albuminuria.1 Albuminuria.2 Chol  HDL
## 495 27.60945 114.0 71.5 87.0 74.52218             0             0 4.03 1.30
## 496 22.72709 106.0 57.5 78.5 90.43841             0             0 4.70 1.97
## 497 27.77588 126.0 76.5 95.0 78.10049             0             0 6.06 1.58
## 498 23.05175 119.5 77.0 94.0 79.42434             0             1 8.19 1.66
## 499 21.92527 119.0 77.5 92.0 79.72628             0             1 5.71 1.61
## 500 18.10774  99.0 66.0 76.5 87.32902             0             1 4.52 1.47
##     Statin Solubility Days Years DDD FRS         PS PSquint GRS Match_1 Match_2
## 495      0          2   -1    -1   0   1 0.08533976       2   0      -1      -1
## 496      0          2   -1    -1   0   0 0.05299198       1   0      -1      -1
## 497      0          2   -1    -1   0  13 0.24122647       4   0      -1      -1
## 498      0          2   -1    -1   0   7 0.07188406       1   0      -1      -1
## 499      0          2   -1    -1   0  10 0.12245760       3   1      -1      -1
## 500      0          2   -1    -1   0   1 0.03873501       1   0      -1      -1
#Variables being used are: Smoking, SBP, Gender, Chol, CVD, Age

Data Analysis

For my data analysis, I will be using a logistic regression model to determine the factors that are significant in determining if an individual has Cardiovascular disease (CVD). I will then use a graph do visualize my data and lastly visualize the area under the curve.

CVD_data <- main_analysis_data |>
  select(Gender, Smoking, CVD, Chol, SBP, Age)
    

CVD_data
##     Gender Smoking CVD Chol   SBP Age
## 1        1       0   0 3.86 122.0  55
## 2        1       0   0 5.64 107.5  65
## 3        0       0   0 6.83 120.5  46
## 4        1       0   0 7.11 114.5  68
## 5        0       0   0 5.04 114.0  70
## 6        0       0   0 3.05 126.5  53
## 7        0       0   0 4.90 119.0  64
## 8        0       0   0 5.50 141.0  70
## 9        1       1   0 3.92 159.5  46
## 10       0       0   0 5.75 129.5  44
## 11       0       1   0 6.71 122.0  46
## 12       0       0   0 5.42 141.5  75
## 13       1       1   0 4.60 105.5  41
## 14       1       0   0 6.28 116.5  41
## 15       0       1   0 6.05 148.5  66
## 16       1       0   0 3.77 127.5  38
## 17       1       0   0 8.17 122.5  64
## 18       0       0   0 4.71 129.0  51
## 19       1       0   0 5.62 132.5  47
## 20       0       0   1 4.55 126.5  61
## 21       0       0   1 3.66 125.0  48
## 22       0       0   0 6.63 153.5  54
## 23       0       1   0 3.37 168.0  75
## 24       0       1   0 4.99 144.0  58
## 25       0       0   0 5.83 133.0  51
## 26       1       0   0 5.42 171.0  71
## 27       1       0   0 4.43 108.5  58
## 28       1       0   0 6.94 146.5  65
## 29       0       0   0 7.09 153.5  64
## 30       0       0   0 4.98 118.0  70
## 31       1       0   0 5.62 108.0  59
## 32       1       0   0 4.14 152.5  63
## 33       0       0   0 5.08 147.5  51
## 34       1       0   0 5.47  88.5  39
## 35       0       0   0 4.62 120.5  36
## 36       0       0   0 5.93 158.0  78
## 37       1       1   0 5.22 131.5  61
## 38       1       0   0 4.03 162.0  51
## 39       1       1   0 4.31 121.0  48
## 40       0       0   0 3.61 167.5  72
## 41       1       0   0 3.65 127.0  62
## 42       0       0   0 4.82 137.0  74
## 43       1       1   0 6.00  99.5  54
## 44       0       1   0 7.43 144.0  50
## 45       0       0   0 5.84 101.0  43
## 46       0       0   0 4.99 111.0  51
## 47       0       0   0 5.69 175.0  68
## 48       1       0   0 6.36 135.5  63
## 49       1       1   0 4.92 125.5  43
## 50       0       0   0 4.39 145.5  56
## 51       0       0   0 5.43 139.0  48
## 52       0       0   1 5.46 114.5  58
## 53       1       0   0 6.42 146.0  54
## 54       0       0   0 6.35 106.0  64
## 55       1       1   0 5.37 134.5  55
## 56       1       0   0 6.40 146.0  45
## 57       0       0   0 6.15 130.5  55
## 58       0       0   0 4.31 132.0  45
## 59       0       1   0 4.77 126.0  71
## 60       1       0   0 4.48  99.5  46
## 61       0       0   0 6.55 154.5  78
## 62       0       1   0 6.73 150.5  56
## 63       0       0   0 4.59 125.5  70
## 64       0       0   0 4.94 122.5  70
## 65       1       0   0 4.87 104.5  64
## 66       1       0   0 5.98 113.5  54
## 67       0       0   0 6.72 119.0  55
## 68       1       0   0 6.15 102.5  38
## 69       1       0   0 5.05 119.5  49
## 70       0       1   0 4.07 131.0  58
## 71       1       0   0 6.46 123.5  71
## 72       0       0   0 6.01 137.0  51
## 73       1       0   0 4.80 112.0  38
## 74       0       0   0 7.76 116.5  51
## 75       0       1   0 5.35 119.0  72
## 76       1       1   0 4.94 133.0  48
## 77       1       0   0 5.58 122.0  64
## 78       1       0   0 5.56 123.0  48
## 79       0       0   1 6.86 145.5  38
## 80       0       0   0 5.95 122.5  54
## 81       0       0   0 7.55 130.0  44
## 82       1       0   0 5.43  95.5  58
## 83       0       0   0 5.53 169.0  58
## 84       1       0   0 5.01 110.5  45
## 85       0       0   0 4.86 120.0  45
## 86       1       0   0 5.98 115.0  61
## 87       0       1   1 4.60 152.5  49
## 88       0       1   0 4.66 109.5  37
## 89       0       0   0 4.94 126.0  43
## 90       0       0   0 7.17 126.0  64
## 91       0       0   0 4.97 133.5  72
## 92       0       0   0 3.34 119.0  43
## 93       0       1   1 3.39 142.5  72
## 94       1       0   0 4.88 149.5  72
## 95       1       0   0 5.97 102.5  53
## 96       1       1   0 5.09 103.5  46
## 97       1       1   0 6.66 104.0  49
## 98       1       0   0 5.74 100.0  39
## 99       1       0   1 3.45 100.0  44
## 100      1       0   0 5.16 160.0  75
## 101      0       0   0 6.05 129.5  59
## 102      1       0   0 5.90 106.5  57
## 103      0       0   0 6.46 142.0  48
## 104      1       1   0 4.13 119.0  43
## 105      0       0   0 5.87 118.0  62
## 106      1       0   0 5.68 111.5  54
## 107      0       0   0 5.37 117.5  41
## 108      1       0   0 5.31  95.0  43
## 109      1       0   0 4.98 103.5  42
## 110      1       0   0 4.57 129.0  45
## 111      1       0   0 3.67 131.5  47
## 112      0       0   0 8.66 128.5  57
## 113      1      -1   0 6.13  98.5  66
## 114      0       0   0 7.27 124.0  65
## 115      1       0   0 5.12 137.0  50
## 116      1       1   0 5.84 117.0  52
## 117      0       1   0 7.11 109.0  57
## 118      1       0   0 5.42 106.5  41
## 119      1       1   0 6.57 104.5  47
## 120      0       0   1 4.39 125.5  58
## 121      0       0   0 4.22 114.0  52
## 122      0       1   0 5.55 154.0  78
## 123      0       0   0 4.44 111.0  37
## 124      0       0   0 4.72 147.5  75
## 125      0       1   0 4.95 109.0  41
## 126      1       0   0 5.80 138.5  64
## 127      0       0   0 4.11 117.5  36
## 128      1       0   0 6.28 137.0  68
## 129      0       0   0 6.40 155.5  52
## 130      1       1   0 8.14 119.5  62
## 131      0       0   0 6.41 158.0  75
## 132      0       1   0 4.81 134.0  52
## 133      0       0   0 5.43 134.0  57
## 134      0       0   0 6.02 143.5  80
## 135      0       0   0 5.69 160.0  61
## 136      1       1   0 3.83 141.5  41
## 137      0       0   0 6.69 137.5  44
## 138      1       0   0 5.56 115.0  43
## 139      1       0   0 5.94 143.5  55
## 140      0       0   0 6.69 134.0  56
## 141      0       0   0 5.67 133.0  42
## 142      0       1   0 6.35 134.0  43
## 143      0       1   0 5.43 117.0  43
## 144      1       0   0 6.83 134.5  64
## 145      1       0   0 4.86 108.5  74
## 146      1       0   0 4.25 115.5  56
## 147      0       0   0 3.66 112.5  43
## 148      1       1   0 5.16 101.0  36
## 149      1       0   0 3.88 178.0  72
## 150      0       1   0 4.99 109.5  52
## 151      1       0   0 5.34 124.5  45
## 152      1       0   0 6.73 149.0  61
## 153      0       1   1 5.88 164.0  48
## 154      1       0   0 6.76 123.0  78
## 155      0       0   0 5.02 122.0  37
## 156      1       0   0 5.26 122.5  64
## 157      1       0   0 5.22 103.0  50
## 158      0       0   0 4.02 144.0  77
## 159      0       0   0 3.97 125.0  46
## 160      1       0   0 5.13 116.5  55
## 161      1       0   0 5.41 123.0  42
## 162      1       0   0 3.59 150.5  68
## 163      1       0   0 7.79 107.5  36
## 164      0       1   0 5.26 135.0  59
## 165      1       0   0 4.49 104.5  52
## 166      1       1   0 5.01 129.5  54
## 167      1       0   0 4.50 112.0  49
## 168      1       0   1 4.52 115.0  64
## 169      0       0   0 6.37 114.0  46
## 170      0       0   0 5.92 122.0  56
## 171      1      -1   0 6.32 108.5  58
## 172      1       0   0 5.06 141.0  71
## 173      0       0   0 6.28 143.0  58
## 174      1       1   0 5.22 116.5  54
## 175      1       0   0 5.61 110.0  38
## 176      1       0   0 5.16 115.5  39
## 177      0       1   0 5.00 104.5  69
## 178      0       1   0 6.90 127.5  62
## 179      1       0   0 5.56 122.0  55
## 180      1       0   0 4.39 128.5  47
## 181      1       0   0 4.39 109.0  41
## 182      1       0   0 5.14 130.5  68
## 183      0       0   0 5.16 109.5  69
## 184      1       0   0 5.43 137.0  53
## 185      0       0   0 5.48 104.0  69
## 186      0       0   1 4.66 113.5  53
## 187      0       0   0 6.81 129.0  47
## 188      0       1   0 4.32 116.5  40
## 189      1       1   0 5.71 100.5  52
## 190      0       0   1 5.55 143.5  66
## 191      1       0   0 5.35 104.0  57
## 192      1       0   0 6.74  98.0  45
## 193      0       0   0 5.77 129.0  39
## 194      0       0   0 5.37 126.0  37
## 195      1       1   0 5.00 114.0  55
## 196      0       0   0 5.33 171.5  61
## 197      0       0   0 4.31 170.0  79
## 198      0       0   0 3.88 123.0  36
## 199      1       1   0 4.29 157.5  65
## 200      0       0   0 4.77 127.5  76
## 201      1       1   0 6.34 110.0  54
## 202      0       0   0 5.44 129.0  38
## 203      1      -1   0 6.78 143.5  76
## 204      1       0   0 5.92 138.5  49
## 205      1       0   0 6.57 109.5  65
## 206      1       0   0 5.44 143.0  68
## 207      1       0   0 4.61 135.0  54
## 208      0       0   0 3.48 137.0  72
## 209      0       0   1 3.90 133.5  72
## 210      1       1   0 6.75 130.5  50
## 211      1       1   0 5.36 126.0  42
## 212      0       0   1 4.09 132.5  56
## 213      0       0   0 6.45 120.5  55
## 214      0       0   0 4.18 156.5  80
## 215      0       1   1 3.92  91.5  57
## 216      0       0   0 6.21 136.0  56
## 217      0       0   0 7.72 134.0  61
## 218      0       1   0 4.26 110.5  77
## 219      0       0   0 6.74 147.0  68
## 220      0       0   0 4.09 154.0  57
## 221      0       0   1 4.79 103.5  67
## 222      1       1   0 5.53 109.5  54
## 223      1       0   0 3.87 117.0  76
## 224      0       0   0 6.61 188.5  70
## 225      0       0   1 2.23 121.5  59
## 226      1       1   1 5.10 117.5  60
## 227      0       0   0 4.86 106.0  46
## 228      1       1   1 4.78 114.5  49
## 229      1       0   0 4.80 138.5  68
## 230      0       0   0 8.56 115.5  52
## 231      1       1   0 7.42  97.5  54
## 232      1       0   0 4.57 106.0  50
## 233      1       0   0 6.72 129.5  72
## 234      1       0   0 5.09 112.0  59
## 235      1       1   0 6.07 110.5  49
## 236      1       0   0 4.87 121.5  55
## 237      1       0   0 5.11 112.0  38
## 238      1       1   0 3.71 134.5  51
## 239      1       0   0 5.02 110.5  57
## 240      1       0   0 5.97 121.0  42
## 241      0       0   0 5.53 121.5  49
## 242      1       0   0 4.61 125.0  48
## 243      0      -1   0 4.64 149.0  68
## 244      0       0   0 4.71 147.5  70
## 245      1       0   0 4.75 116.0  38
## 246      0       0   0 3.22 131.5  71
## 247      1       1   0 5.75 114.0  56
## 248      0       0   0 4.30 106.5  45
## 249      0       0   0 5.82 124.0  44
## 250      1       0   0 6.63 116.5  57
## 251      0       1   0 4.70 167.0  69
## 252      0       0   0 4.74 102.0  36
## 253      1       0   0 7.63 105.5  47
## 254      1       0   0 4.21 131.5  72
## 255      0       1   1 4.54 135.5  72
## 256      0       0   0 5.01 102.5  62
## 257      1       1   0 3.93 118.5  47
## 258      1       0   0 4.48 107.0  53
## 259      1       0   0 5.72 156.0  54
## 260      0       0   1 4.16 154.0  49
## 261      1       1   0 8.77 132.5  67
## 262      1       0   0 5.10 111.5  51
## 263      0      -1   0 4.37 138.5  40
## 264      0       0   0 3.60 120.5  65
## 265      1       0   0 4.86 112.5  44
## 266      0       0   0 5.04 134.0  61
## 267      0       1   0 5.54 110.0  38
## 268      0       0   0 4.16 106.0  54
## 269      0       0   0 3.70 125.0  70
## 270      0       0   1 4.18 142.5  78
## 271      1       0   0 5.14 141.0  48
## 272      1       1   0 6.26 135.5  66
## 273      0       0   0 4.58 117.0  38
## 274      0       0   0 6.65 131.0  50
## 275      1       0   0 4.80 104.5  39
## 276      1       1   0 5.56  99.5  46
## 277      0       0   0 5.21 123.0  37
## 278      0       1   0 2.53 122.5  57
## 279      0       0   0 6.16 116.0  71
## 280      1       0   0 5.84 156.5  51
## 281      1       0   0 5.59 113.5  66
## 282      1       0   0 4.91 126.0  49
## 283      0       0   1 3.73 117.0  58
## 284      1       0   0 4.64 117.5  38
## 285      1       0   0 6.09 107.5  37
## 286      0       1   0 5.96 125.5  61
## 287      0       0   1 6.26 165.5  77
## 288      1       1   0 5.79 109.5  54
## 289      0       0   0 8.58 108.0  39
## 290      1       1   0 5.33 108.0  50
## 291      0       0   0 9.22 128.5  46
## 292      1       0   0 5.23 122.5  51
## 293      1       0   0 6.56 115.0  50
## 294      0       0   1 4.57 121.0  59
## 295      1       0   0 4.05 122.5  45
## 296      0       0   0 7.24 114.5  66
## 297      0       0   0 6.11 141.0  36
## 298      0       0   0 7.12 178.5  63
## 299      1       0   0 6.00 152.0  51
## 300      0       1   0 5.27 134.5  62
## 301      0       0   0 5.35 144.0  61
## 302      1       0   0 4.82 111.5  51
## 303      0       0   0 5.63 130.5  59
## 304      1       0   0 4.18 135.0  65
## 305      1       0   0 6.48 150.0  75
## 306      0       0   0 5.37 141.0  80
## 307      0       0   0 2.70 152.5  77
## 308      0       1   0 3.94 107.0  49
## 309      0       0   0 5.75 136.0  54
## 310      0       0   0 4.58 139.0  52
## 311      1       0   0 6.52 121.5  36
## 312      0       0   0 4.65 125.5  76
## 313      1       0   0 5.74 110.5  36
## 314      1       0   0 5.82 121.5  72
## 315      0       0   0 7.68 140.0  46
## 316      1       0   0 4.97 114.0  57
## 317      0       1   0 4.36 155.5  53
## 318      0       1   0 4.66 131.0  50
## 319      1       0   0 4.37 120.0  55
## 320      0       0   0 6.07 137.5  60
## 321      0       0   1 4.36 128.5  56
## 322      0       0   1 4.63 109.0  42
## 323      1       0   0 5.25 154.0  71
## 324      1       0   0 5.25  98.0  60
## 325      0       1   0 6.69 131.0  48
## 326      1       0   0 7.78 110.5  57
## 327      0       1   0 4.48 117.5  55
## 328      1       1   0 4.95 106.0  57
## 329      0       0   0 4.77 116.5  38
## 330      0       0   0 5.15 159.0  72
## 331      0       1   0 5.18 141.0  74
## 332      0       1   0 5.57 120.0  38
## 333      0       0   0 5.87 141.5  47
## 334      0       0   1 4.60 159.0  81
## 335      0       0   0 7.02 125.5  37
## 336      0       0   0 5.19 115.5  53
## 337      0       0   0 5.68 145.0  61
## 338      1       0   0 3.84 114.5  58
## 339      0       1   0 4.06 123.5  41
## 340      1       0   0 3.35 126.5  49
## 341      0       0   1 2.53 131.5  76
## 342      0       0   0 5.91 129.0  63
## 343      1       0   0 6.03 106.0  43
## 344      0       0   0 8.58 114.5  67
## 345      1       1   0 4.84 108.5  42
## 346      0       1   0 6.52 140.0  54
## 347      0       0   1 4.24 145.5  63
## 348      1       0   0 5.10 100.5  48
## 349      0       0   0 6.35 137.5  73
## 350      0       0   0 4.42 144.0  48
## 351      1      -1   0 4.42  93.0  46
## 352      0       0   0 5.36 109.5  55
## 353      1       1   0 4.82 109.0  52
## 354      0       0   0 5.29 127.0  37
## 355      1       0   0 4.80 126.0  36
## 356      1       0   0 5.74 109.0  50
## 357      0       0   0 4.92 132.0  65
## 358      1       1   0 5.35 104.5  50
## 359      1       0   0 7.15  95.5  62
## 360      1       1   0 4.81 113.0  42
## 361      1       0   0 6.17 104.0  49
## 362      1       0   0 5.06  99.5  49
## 363      0       0   0 5.20 128.5  57
## 364      1       0   0 5.36 137.0  64
## 365      0       0   0 5.75 127.0  41
## 366      0       0   0 5.75 125.0  47
## 367      0       0   0 7.12 112.0  42
## 368      0       0   0 4.30 142.5  41
## 369      0       0   0 6.22 120.0  60
## 370      0       0   0 4.57 111.5  46
## 371      1       0   0 4.79 126.0  72
## 372      1       0   0 4.99 101.0  46
## 373      1       1   0 3.93 109.0  39
## 374      1       0   0 5.35 151.0  68
## 375      1       0   0 5.47 130.0  42
## 376      0       0   0 4.15 137.0  67
## 377      0       1   0 6.85 120.0  48
## 378      1       0   0 5.91 109.0  40
## 379      0       0   0 4.96 127.5  54
## 380      0       0   0 6.31 127.5  48
## 381      0       0   0 4.88 114.0  36
## 382      0       0   0 5.29 127.0  61
## 383      0       0   0 5.14 132.0  37
## 384      0       1   0 4.96 120.5  51
## 385      0       1   1 5.31 158.0  69
## 386      0       0   0 5.60 122.0  48
## 387      1       0   0 7.68 163.5  73
## 388      0       0   0 6.41 121.0  56
## 389      1       0   0 5.65 112.0  51
## 390      0       1   0 5.32 128.0  44
## 391      0       0   0 6.22 110.0  49
## 392      1       0   0 7.04 130.5  56
## 393      1       0   1 4.26 110.0  54
## 394      1       0   0 6.78 100.0  49
## 395      1       0   0 5.58 109.5  42
## 396      0       1   0 5.79 117.5  41
## 397      0       0   0 6.79 125.0  51
## 398      0       1   0 5.00 107.5  45
## 399      0       1   0 4.69 117.0  61
## 400      0       0   1 4.27 132.0  73
## 401      1       0   0 5.72 129.5  59
## 402      0       0   0 5.54 155.5  68
## 403      0       0   0 5.08 146.5  73
## 404      1       0   0 5.99 134.5  68
## 405      1       0   0 4.42 104.5  44
## 406      0       1   1 3.14 116.5  72
## 407      0       1   0 6.82 119.0  57
## 408      1       0   0 4.65 137.0  48
## 409      1       0   0 3.81  77.5  55
## 410      1       1   0 6.93  96.0  51
## 411      1       0   0 6.79 125.0  47
## 412      1       0   0 5.60 147.5  52
## 413      1       0   0 5.68 104.5  46
## 414      0       0   0 4.30 127.5  68
## 415      1       0   0 3.94 103.5  58
## 416      1       0   0 5.26 124.5  69
## 417      0       1   0 6.26 111.5  55
## 418      1       1   0 5.70 107.5  71
## 419      1       1   0 4.99 106.5  48
## 420      0       0   0 6.83 124.5  47
## 421      1       0   0 5.26 111.0  40
## 422      0       0   0 5.08 107.5  57
## 423      1       0   0 5.04 112.5  44
## 424      1       0   1 5.03 109.5  62
## 425      1       0   0 3.05 115.5  52
## 426      0       1   0 5.34 130.5  55
## 427      0       0   0 5.74 126.5  48
## 428      1       0   0 6.00 138.5  60
## 429      0       0   0 5.59 117.0  70
## 430      1       0   0 5.57 105.0  41
## 431      0       1   0 5.31 114.0  48
## 432      0       0   0 3.99 143.5  70
## 433      1       1   0 6.80 104.5  61
## 434      0       0   0 6.34 120.5  48
## 435      1       1   0 5.18 144.0  60
## 436      1       0   0 7.47 108.5  70
## 437      1       0   0 4.16 112.0  41
## 438      0       1   0 6.12 166.0  64
## 439      1       0   0 6.19 113.0  37
## 440      0       0   0 5.35 120.5  58
## 441      0       0   0 5.12 141.5  75
## 442      0       0   0 4.90 122.5  50
## 443      0       0   0 3.63 129.5  37
## 444      1       0   0 6.76 116.5  65
## 445      0       0   0 6.31 122.0  51
## 446      0       0   0 5.53 130.0  43
## 447      1       0   0 4.71 172.5  67
## 448      1       1   0 5.74 144.5  52
## 449      0       0   0 7.07 106.0  52
## 450      0       1   0 5.36 120.0  37
## 451      1       0   0 4.29 118.0  38
## 452      0       1   0 5.08 123.5  44
## 453      0       0   0 7.40 154.0  56
## 454      0       0   1 5.44 160.5  80
## 455      1       0   0 6.43 160.5  73
## 456      1       0   0 5.47 155.0  78
## 457      0       1   0 4.91 126.5  40
## 458      1       0   0 7.67 106.5  47
## 459      1       0   0 6.94 156.0  63
## 460      0       1   0 6.07 117.5  42
## 461      0       1   0 4.34 176.0  71
## 462      1       0   0 6.18 108.0  43
## 463      0       0   0 5.75 133.0  81
## 464      1       0   1 4.41 106.0  40
## 465      0       0   0 6.21 121.0  63
## 466      0       1   0 5.17 110.0  50
## 467      1       1   1 5.27 134.5  61
## 468      0       0   0 5.71 112.0  52
## 469      0       0   0 6.41 124.0  59
## 470      1       0   0 4.63 122.0  54
## 471      1       1   0 5.73 114.5  57
## 472      0       0   1 4.85 150.5  75
## 473      0       0   0 6.04 152.0  75
## 474      0       0   0 4.10 147.5  42
## 475      1       0   0 7.02 109.5  54
## 476      0       1   0 5.79 123.0  54
## 477      0       1   0 3.88 133.0  61
## 478      0       1   0 5.53 129.0  38
## 479      0       0   0 6.64 137.5  66
## 480      1       0   0 4.80 110.0  42
## 481      0       0   0 6.33 151.0  50
## 482      1       0   0 4.16 100.5  37
## 483      0       0   0 4.96 124.0  37
## 484      1       0   0 4.48 163.5  54
## 485      1       0   0 4.12 116.5  47
## 486      0       0   0 4.12 144.0  42
## 487      0       0   1 3.97 141.5  69
## 488      1       1   0 6.34 111.0  56
## 489      0       0   0 5.48 112.5  51
## 490      1       0   0 2.51 106.0  37
## 491      1       0   0 6.65 128.5  49
## 492      0       0   0 5.52 109.0  46
## 493      0       1   0 6.22 128.0  47
## 494      0       0   0 5.35 140.0  38
## 495      1       0   0 4.03 114.0  46
## 496      1       0   0 4.70 106.0  40
## 497      0       0   0 6.06 126.0  60
## 498      1       0   0 8.19 119.5  54
## 499      1       0   0 5.71 119.0  53
## 500      1       0   0 4.52  99.0  40
CVD_data[CVD_data$Gender == 0,]$Gender <- "M"
CVD_data[CVD_data$Gender == 1,]$Gender <- "F"
CVD_data[CVD_data$Smoking == 0,]$Smoking <- "Non-smoker"
CVD_data[CVD_data$Smoking == 1,]$Smoking <- " Frequent Smoker"
CVD_data[CVD_data$Smoking == -1,]$Smoking <- "Occasional Smoker"


str(CVD_data)
## 'data.frame':    500 obs. of  6 variables:
##  $ Gender : chr  "F" "F" "M" "F" ...
##  $ Smoking: chr  "Non-smoker" "Non-smoker" "Non-smoker" "Non-smoker" ...
##  $ CVD    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Chol   : num  3.86 5.64 6.83 7.11 5.04 3.05 4.9 5.5 3.92 5.75 ...
##  $ SBP    : num  122 108 120 114 114 ...
##  $ Age    : int  55 65 46 68 70 53 64 70 46 44 ...
head(CVD_data)
##   Gender    Smoking CVD Chol   SBP Age
## 1      F Non-smoker   0 3.86 122.0  55
## 2      F Non-smoker   0 5.64 107.5  65
## 3      M Non-smoker   0 6.83 120.5  46
## 4      F Non-smoker   0 7.11 114.5  68
## 5      M Non-smoker   0 5.04 114.0  70
## 6      M Non-smoker   0 3.05 126.5  53
CVD_data$Gender <- as.factor(CVD_data$Gender)
CVD_data$Smoking <- as.factor(CVD_data$Smoking)

str(CVD_data)
## 'data.frame':    500 obs. of  6 variables:
##  $ Gender : Factor w/ 2 levels "F","M": 1 1 2 1 2 2 2 2 1 2 ...
##  $ Smoking: Factor w/ 3 levels " Frequent Smoker",..: 2 2 2 2 2 2 2 2 1 2 ...
##  $ CVD    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Chol   : num  3.86 5.64 6.83 7.11 5.04 3.05 4.9 5.5 3.92 5.75 ...
##  $ SBP    : num  122 108 120 114 114 ...
##  $ Age    : int  55 65 46 68 70 53 64 70 46 44 ...
unique(CVD_data$Smoking)
## [1] Non-smoker         Frequent Smoker  Occasional Smoker
## Levels:  Frequent Smoker Non-smoker Occasional Smoker
CVD_data[CVD_data$CVD == 0,]$CVD <- "CVD not present"
CVD_data[CVD_data$CVD == 1,]$CVD <- "CVD present"

CVD_data$CVD <- as.factor(CVD_data$CVD)


unique(CVD_data$CVD)
## [1] CVD not present CVD present    
## Levels: CVD not present CVD present
xtabs(~ CVD + Smoking, data = CVD_data)
##                  Smoking
## CVD                Frequent Smoker Non-smoker Occasional Smoker
##   CVD not present              106        348                 6
##   CVD present                   10         30                 0
xtabs(~ CVD + Gender, data = CVD_data)
##                  Gender
## CVD                 F   M
##   CVD not present 227 233
##   CVD present       8  32

Regression Analysis

logistic_result <- glm(CVD ~ Gender, data= CVD_data, family="binomial")

summary(logistic_result)
## 
## Call:
## glm(formula = CVD ~ Gender, family = "binomial", data = CVD_data)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -3.3455     0.3597  -9.300  < 2e-16 ***
## GenderM       1.3602     0.4061   3.349 0.000811 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 278.77  on 499  degrees of freedom
## Residual deviance: 265.07  on 498  degrees of freedom
## AIC: 269.07
## 
## Number of Fisher Scoring iterations: 6

Gender is highly significant in determining the risk for cardiovascular disease because the p-value is less than 0.001

Based on the results above, with an intercept of approximately (-3.35), this indicates the log odds that a female will have cardiovascular disease. As for the males with an intercept of approximately (1.36), it shows that males increase the log odds ratio of cardiovascular disease by 1.36 as compared to females.

If we go ahead and find out the probability of females having cardiovascular disease:

P = 1 / (1 + e^3.35) = 0.034

Meaning that the odds ratio for Males can be calculated by doing:

Odds ratio = e^1.36 = 3.90

This shows that males chances of having cardiovascular disease is 3.9 times higher than females.

Instead of looking at just one factor, I then decided to do a logistic regression model showing how other factor like Smoking, SBP, Cholesterol and Age on top of Gender affect an individual’s risk of having cardiovascular disease which is shown below.

logistic_result2 <- glm(CVD ~ Gender + Smoking + SBP + Chol + Age, data= CVD_data, family="binomial")

summary(logistic_result2)
## 
## Call:
## glm(formula = CVD ~ Gender + Smoking + SBP + Chol + Age, family = "binomial", 
##     data = CVD_data)
## 
## Coefficients:
##                            Estimate Std. Error z value Pr(>|z|)    
## (Intercept)               -0.273142   1.617381  -0.169  0.86589    
## GenderM                    1.152326   0.433000   2.661  0.00778 ** 
## SmokingNon-smoker         -0.072692   0.417005  -0.174  0.86162    
## SmokingOccasional Smoker -13.901997 909.988172  -0.015  0.98781    
## SBP                       -0.004808   0.011142  -0.431  0.66612    
## Chol                      -0.912941   0.190581  -4.790 1.67e-06 ***
## Age                        0.040105   0.016336   2.455  0.01409 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 278.77  on 499  degrees of freedom
## Residual deviance: 226.24  on 493  degrees of freedom
## AIC: 240.24
## 
## Number of Fisher Scoring iterations: 15

GenderM (1.15) and a p-value of 0.00778 meaning males increase the log odds of having cardiovascular disease hence proving that gender is important in this model.

SmokingNon-Smoker approximately (-0.073) and a p-value of approximately 0.862 being greater than 0.001, shows that being a non-smoker is not necessarily statistically significant in this model.

SmokingOccasional Smoker (-13.9) and a p-value of approximately 0.98 shows that being an occasional smoker is not statistically significant in determining the chances for having cardiovascular disease.

SBP (-0.005) and a p-value of approximately 0.67 being greater than 0.001 shows that the systolic blood pressure is not significant in determining the chances of an individual having cardiovascular disease.

Chol (-0.91) and a p-value of (1.67e-06) shows that cholesterol levels hold a very high significance in determining the chances of having cardiovascular disease in this model.

Age (0.04) and a p-value of (0.014) shows that there is a slight significance in age playing a role in determining the chances of having cardiovascular disease.

combined_results_r_squared <- 1 - (logistic_result2$deviance/logistic_result2$null.deviance)

combined_results_r_squared
## [1] 0.1884289

This value shows us that the model is explaining a 18.84% of the variation in having cardiovascular disease.

1 - pchisq((logistic_result2$null.deviance - logistic_result2$deviance), df=(length(logistic_result2$coefficients)-1))
## [1] 1.460094e-09

This shows that the model is not significant in determining the chances of having cardiovascular disease.

Knowing that, I will now test the logistic regression to see what it predicts by giving a probability and ranking the individuals from lowest to highest.

predicted.data <- data.frame(
  probability.of.CVD=logistic_result2$fitted.values,
  CVD=CVD_data$CVD)
 

predicted.data <- predicted.data[
  order(predicted.data$probability.of.CVD, decreasing=FALSE),]
predicted.data$rank <- 1:nrow(predicted.data)

predicted.data
##     probability.of.CVD             CVD rank
## 171       1.323406e-08 CVD not present    1
## 203       1.512670e-08 CVD not present    2
## 113       2.276382e-08 CVD not present    3
## 351       4.993091e-08 CVD not present    4
## 263       1.045041e-07 CVD not present    5
## 243       2.386973e-07 CVD not present    6
## 163       1.455867e-03 CVD not present    7
## 291       1.685866e-03 CVD not present    8
## 498       1.962301e-03 CVD not present    9
## 261       1.966324e-03 CVD not present   10
## 289       2.518218e-03 CVD not present   11
## 458       2.534607e-03 CVD not present   12
## 253       2.641263e-03 CVD not present   13
## 17        2.938874e-03 CVD not present   14
## 130       3.041093e-03 CVD not present   15
## 326       3.355540e-03 CVD not present   16
## 230       4.159822e-03 CVD not present   17
## 311       4.326886e-03 CVD not present   18
## 112       4.357958e-03 CVD not present   19
## 231       4.724561e-03 CVD not present   20
## 411       5.164675e-03 CVD not present   21
## 387       5.401740e-03 CVD not present   22
## 192       5.677857e-03 CVD not present   23
## 392       5.740418e-03 CVD not present   24
## 475       5.967429e-03 CVD not present   25
## 56        6.145550e-03 CVD not present   26
## 491       6.246051e-03 CVD not present   27
## 210       6.320270e-03 CVD not present   28
## 439       6.328592e-03 CVD not present   29
## 394       6.360927e-03 CVD not present   30
## 315       6.476039e-03 CVD not present   31
## 410       6.587267e-03 CVD not present   32
## 14        6.726842e-03 CVD not present   33
## 81        7.057380e-03 CVD not present   34
## 285       7.113714e-03 CVD not present   35
## 68        7.180186e-03 CVD not present   36
## 459       7.354692e-03 CVD not present   37
## 344       7.465367e-03 CVD not present   38
## 119       7.475385e-03 CVD not present   39
## 97        7.478722e-03 CVD not present   40
## 293       7.522060e-03 CVD not present   41
## 436       7.541336e-03 CVD not present   42
## 359       7.798575e-03 CVD not present   43
## 74        8.222091e-03 CVD not present   44
## 462       8.305238e-03 CVD not present   45
## 28        8.333112e-03 CVD not present   46
## 152       8.493932e-03 CVD not present   47
## 53        8.635725e-03 CVD not present   48
## 335       8.820286e-03 CVD not present   49
## 240       9.072278e-03 CVD not present   50
## 313       9.252271e-03 CVD not present   51
## 250       9.260020e-03 CVD not present   52
## 378       9.366934e-03 CVD not present   53
## 144       9.367121e-03 CVD not present   54
## 4         9.375974e-03 CVD not present   55
## 343       9.603553e-03 CVD not present   56
## 79        9.643106e-03     CVD present   57
## 44        1.004055e-02 CVD not present   58
## 367       1.048049e-02 CVD not present   59
## 433       1.059062e-02 CVD not present   60
## 361       1.084082e-02 CVD not present   61
## 299       1.089063e-02 CVD not present   62
## 98        1.095663e-02 CVD not present   63
## 175       1.129211e-02 CVD not present   64
## 444       1.131120e-02 CVD not present   65
## 204       1.153039e-02 CVD not present   66
## 453       1.161491e-02 CVD not present   67
## 217       1.166716e-02 CVD not present   68
## 201       1.183994e-02 CVD not present   69
## 280       1.231597e-02 CVD not present   70
## 235       1.236062e-02 CVD not present   71
## 488       1.275540e-02 CVD not present   72
## 430       1.350069e-02 CVD not present   73
## 395       1.362566e-02 CVD not present   74
## 375       1.365072e-02 CVD not present   75
## 48        1.369454e-02 CVD not present   76
## 205       1.387803e-02 CVD not present   77
## 139       1.402433e-02 CVD not present   78
## 138       1.406141e-02 CVD not present   79
## 19        1.436272e-02 CVD not present   80
## 233       1.454532e-02 CVD not present   81
## 34        1.475935e-02 CVD not present   82
## 137       1.481005e-02 CVD not present   83
## 161       1.489406e-02 CVD not present   84
## 413       1.493625e-02 CVD not present   85
## 66        1.499062e-02 CVD not present   86
## 3         1.531639e-02 CVD not present   87
## 95        1.531761e-02 CVD not present   88
## 118       1.534212e-02 CVD not present   89
## 259       1.548627e-02 CVD not present   90
## 187       1.558254e-02 CVD not present   91
## 420       1.563442e-02 CVD not present   92
## 448       1.594273e-02 CVD not present   93
## 356       1.622401e-02 CVD not present   94
## 78        1.649449e-02 CVD not present   95
## 211       1.649815e-02 CVD not present   96
## 428       1.657658e-02 CVD not present   97
## 412       1.659394e-02 CVD not present   98
## 116       1.659759e-02 CVD not present   99
## 421       1.666954e-02 CVD not present  100
## 449       1.675506e-02 CVD not present  101
## 43        1.689856e-02 CVD not present  102
## 455       1.695624e-02 CVD not present  103
## 176       1.716096e-02 CVD not present  104
## 377       1.752667e-02 CVD not present  105
## 148       1.753753e-02 CVD not present  106
## 237       1.754235e-02 CVD not present  107
## 298       1.754623e-02 CVD not present  108
## 151       1.772706e-02 CVD not present  109
## 128       1.779960e-02 CVD not present  110
## 297       1.788596e-02 CVD not present  111
## 499       1.789318e-02 CVD not present  112
## 389       1.803822e-02 CVD not present  113
## 272       1.811618e-02 CVD not present  114
## 71        1.816855e-02 CVD not present  115
## 11        1.819289e-02 CVD not present  116
## 276       1.829556e-02 CVD not present  117
## 154       1.833444e-02 CVD not present  118
## 305       1.843364e-02 CVD not present  119
## 102       1.873920e-02 CVD not present  120
## 397       1.892829e-02 CVD not present  121
## 325       1.920553e-02 CVD not present  122
## 108       1.934534e-02 CVD not present  123
## 288       1.945805e-02 CVD not present  124
## 86        1.961402e-02 CVD not present  125
## 106       1.980672e-02 CVD not present  126
## 355       2.003674e-02 CVD not present  127
## 274       2.005283e-02 CVD not present  128
## 189       2.015880e-02 CVD not present  129
## 117       2.083877e-02 CVD not present  130
## 103       2.086153e-02 CVD not present  131
## 142       2.108848e-02 CVD not present  132
## 29        2.109238e-02 CVD not present  133
## 184       2.111981e-02 CVD not present  134
## 247       2.135775e-02 CVD not present  135
## 401       2.136801e-02 CVD not present  136
## 114       2.145899e-02 CVD not present  137
## 22        2.148619e-02 CVD not present  138
## 179       2.182671e-02 CVD not present  139
## 271       2.207036e-02 CVD not present  140
## 90        2.234956e-02 CVD not present  141
## 471       2.255927e-02 CVD not present  142
## 62        2.314424e-02 CVD not present  143
## 73        2.314798e-02 CVD not present  144
## 126       2.320203e-02 CVD not present  145
## 404       2.334281e-02 CVD not present  146
## 423       2.358433e-02 CVD not present  147
## 245       2.375288e-02 CVD not present  148
## 169       2.384160e-02 CVD not present  149
## 296       2.396857e-02 CVD not present  150
## 109       2.399857e-02 CVD not present  151
## 140       2.414025e-02 CVD not present  152
## 129       2.416297e-02 CVD not present  153
## 67        2.424941e-02 CVD not present  154
## 481       2.428898e-02 CVD not present  155
## 222       2.454299e-02 CVD not present  156
## 115       2.475886e-02 CVD not present  157
## 275       2.493422e-02 CVD not present  158
## 292       2.498672e-02 CVD not present  159
## 358       2.522104e-02 CVD not present  160
## 290       2.525625e-02 CVD not present  161
## 84        2.542677e-02 CVD not present  162
## 49        2.548951e-02 CVD not present  163
## 380       2.552821e-02 CVD not present  164
## 434       2.568456e-02 CVD not present  165
## 407       2.575067e-02 CVD not present  166
## 31        2.584134e-02 CVD not present  167
## 284       2.601304e-02 CVD not present  168
## 55        2.617315e-02 CVD not present  169
## 157       2.656258e-02 CVD not present  170
## 346       2.710379e-02 CVD not present  171
## 96        2.731085e-02 CVD not present  172
## 480       2.732120e-02 CVD not present  173
## 69        2.750321e-02 CVD not present  174
## 348       2.765329e-02 CVD not present  175
## 265       2.767995e-02 CVD not present  176
## 178       2.801549e-02 CVD not present  177
## 460       2.811683e-02 CVD not present  178
## 372       2.813620e-02 CVD not present  179
## 496       2.813846e-02 CVD not present  180
## 493       2.847711e-02 CVD not present  181
## 345       2.849811e-02 CVD not present  182
## 360       2.865783e-02 CVD not present  183
## 193       2.882448e-02 CVD not present  184
## 76        2.938460e-02 CVD not present  185
## 445       2.944474e-02 CVD not present  186
## 262       2.952529e-02 CVD not present  187
## 362       2.992912e-02 CVD not present  188
## 282       3.020717e-02 CVD not present  189
## 77        3.047605e-02 CVD not present  190
## 213       3.060431e-02 CVD not present  191
## 191       3.094579e-02 CVD not present  192
## 82        3.118468e-02 CVD not present  193
## 391       3.119804e-02 CVD not present  194
## 174       3.127618e-02 CVD not present  195
## 419       3.180840e-02 CVD not present  196
## 365       3.200926e-02 CVD not present  197
## 2         3.214510e-02 CVD not present  198
## 160       3.281371e-02 CVD not present  199
## 388       3.288461e-02 CVD not present  200
## 314       3.370821e-02 CVD not present  201
## 153       3.381861e-02     CVD present  202
## 281       3.396345e-02 CVD not present  203
## 333       3.396623e-02 CVD not present  204
## 500       3.408916e-02 CVD not present  205
## 249       3.427827e-02 CVD not present  206
## 13        3.437051e-02 CVD not present  207
## 110       3.444140e-02 CVD not present  208
## 364       3.451841e-02 CVD not present  209
## 396       3.464478e-02 CVD not present  210
## 224       3.465793e-02 CVD not present  211
## 219       3.467907e-02 CVD not present  212
## 141       3.472666e-02 CVD not present  213
## 408       3.473585e-02 CVD not present  214
## 451       3.537703e-02 CVD not present  215
## 166       3.543741e-02 CVD not present  216
## 10        3.554029e-02 CVD not present  217
## 72        3.579182e-02 CVD not present  218
## 173       3.597849e-02 CVD not present  219
## 435       3.597861e-02 CVD not present  220
## 45        3.604945e-02 CVD not present  221
## 467       3.610420e-02     CVD present  222
## 464       3.635738e-02     CVD present  223
## 469       3.642409e-02 CVD not present  224
## 206       3.652338e-02 CVD not present  225
## 216       3.658513e-02 CVD not present  226
## 479       3.663013e-02 CVD not present  227
## 26        3.666283e-02 CVD not present  228
## 478       3.677071e-02 CVD not present  229
## 332       3.701063e-02 CVD not present  230
## 202       3.710771e-02 CVD not present  231
## 302       3.779995e-02 CVD not present  232
## 181       3.792862e-02 CVD not present  233
## 242       3.803694e-02 CVD not present  234
## 57        3.806221e-02 CVD not present  235
## 374       3.809277e-02 CVD not present  236
## 494       3.816816e-02 CVD not present  237
## 37        3.825395e-02 CVD not present  238
## 228       3.833366e-02     CVD present  239
## 194       3.849792e-02 CVD not present  240
## 324       3.902946e-02 CVD not present  241
## 267       3.979633e-02 CVD not present  242
## 195       3.991954e-02 CVD not present  243
## 239       4.015350e-02 CVD not present  244
## 236       4.030194e-02 CVD not present  245
## 156       4.030551e-02 CVD not present  246
## 417       4.046112e-02 CVD not present  247
## 234       4.050542e-02 CVD not present  248
## 366       4.074006e-02 CVD not present  249
## 354       4.110447e-02 CVD not present  250
## 418       4.123828e-02 CVD not present  251
## 316       4.127909e-02 CVD not present  252
## 482       4.136955e-02 CVD not present  253
## 446       4.138531e-02 CVD not present  254
## 405       4.233759e-02 CVD not present  255
## 427       4.241439e-02 CVD not present  256
## 353       4.261096e-02 CVD not present  257
## 25        4.269553e-02 CVD not present  258
## 450       4.281023e-02 CVD not present  259
## 226       4.361554e-02     CVD present  260
## 180       4.366741e-02 CVD not present  261
## 60        4.438952e-02 CVD not present  262
## 277       4.489913e-02 CVD not present  263
## 484       4.495033e-02 CVD not present  264
## 80        4.527007e-02 CVD not present  265
## 369       4.553691e-02 CVD not present  266
## 207       4.574428e-02 CVD not present  267
## 437       4.574582e-02 CVD not present  268
## 383       4.579240e-02 CVD not present  269
## 323       4.601383e-02 CVD not present  270
## 167       4.620381e-02 CVD not present  271
## 232       4.642670e-02 CVD not present  272
## 107       4.668131e-02 CVD not present  273
## 328       4.671578e-02 CVD not present  274
## 470       4.771483e-02 CVD not present  275
## 320       4.788784e-02 CVD not present  276
## 416       4.837369e-02 CVD not present  277
## 424       4.843620e-02     CVD present  278
## 101       4.865116e-02 CVD not present  279
## 386       4.891681e-02 CVD not present  280
## 456       4.942431e-02 CVD not present  281
## 438       5.024092e-02 CVD not present  282
## 170       5.026949e-02 CVD not present  283
## 182       5.027734e-02 CVD not present  284
## 54        5.051844e-02 CVD not present  285
## 309       5.063743e-02 CVD not present  286
## 497       5.091204e-02 CVD not present  287
## 465       5.127146e-02 CVD not present  288
## 143       5.136458e-02 CVD not present  289
## 492       5.155601e-02 CVD not present  290
## 51        5.245030e-02 CVD not present  291
## 104       5.259060e-02 CVD not present  292
## 155       5.319484e-02 CVD not present  293
## 16        5.332153e-02 CVD not present  294
## 39        5.393086e-02 CVD not present  295
## 165       5.407025e-02 CVD not present  296
## 241       5.411214e-02 CVD not present  297
## 468       5.419719e-02 CVD not present  298
## 483       5.551581e-02 CVD not present  299
## 476       5.559698e-02 CVD not present  300
## 390       5.580691e-02 CVD not present  301
## 295       5.585782e-02 CVD not present  302
## 258       5.600557e-02 CVD not present  303
## 373       5.620260e-02 CVD not present  304
## 100       5.636089e-02 CVD not present  305
## 136       5.701705e-02 CVD not present  306
## 131       5.747550e-02 CVD not present  307
## 172       5.755445e-02 CVD not present  308
## 447       5.797148e-02 CVD not present  309
## 61        5.798308e-02 CVD not present  310
## 485       5.828731e-02 CVD not present  311
## 9         5.875224e-02 CVD not present  312
## 38        5.960919e-02 CVD not present  313
## 381       5.991953e-02 CVD not present  314
## 83        6.131303e-02 CVD not present  315
## 495       6.132759e-02 CVD not present  316
## 65        6.136776e-02 CVD not present  317
## 349       6.156749e-02 CVD not present  318
## 286       6.186687e-02 CVD not present  319
## 15        6.232005e-02 CVD not present  320
## 135       6.232889e-02 CVD not present  321
## 319       6.258874e-02 CVD not present  322
## 489       6.345078e-02 CVD not present  323
## 342       6.400981e-02 CVD not present  324
## 229       6.496847e-02 CVD not present  325
## 105       6.702766e-02 CVD not present  326
## 94        6.709596e-02 CVD not present  327
## 337       6.724953e-02 CVD not present  328
## 287       6.810877e-02     CVD present  329
## 457       6.866776e-02 CVD not present  330
## 393       6.925821e-02     CVD present  331
## 303       6.948821e-02 CVD not present  332
## 431       6.968272e-02 CVD not present  333
## 452       6.993476e-02 CVD not present  334
## 27        7.006393e-02 CVD not present  335
## 89        7.012984e-02 CVD not present  336
## 329       7.015254e-02 CVD not present  337
## 252       7.126240e-02 CVD not present  338
## 35        7.263846e-02 CVD not present  339
## 257       7.271044e-02 CVD not present  340
## 146       7.342412e-02 CVD not present  341
## 279       7.395537e-02 CVD not present  342
## 125       7.449469e-02 CVD not present  343
## 133       7.522775e-02 CVD not present  344
## 47        7.569326e-02 CVD not present  345
## 33        7.620700e-02 CVD not present  346
## 168       7.881977e-02     CVD present  347
## 111       7.990713e-02 CVD not present  348
## 196       8.034969e-02 CVD not present  349
## 371       8.039220e-02 CVD not present  350
## 473       8.087123e-02 CVD not present  351
## 88        8.183412e-02 CVD not present  352
## 426       8.184388e-02 CVD not present  353
## 273       8.216459e-02 CVD not present  354
## 352       8.259951e-02 CVD not present  355
## 85        8.297434e-02 CVD not present  356
## 52        8.297785e-02     CVD present  357
## 398       8.335387e-02 CVD not present  358
## 466       8.593426e-02 CVD not present  359
## 336       8.616408e-02 CVD not present  360
## 199       8.785986e-02 CVD not present  361
## 145       8.813768e-02 CVD not present  362
## 32        8.850941e-02 CVD not present  363
## 440       8.858940e-02 CVD not present  364
## 301       8.918319e-02 CVD not present  365
## 190       9.084941e-02     CVD present  366
## 123       9.139330e-02 CVD not present  367
## 227       9.152216e-02 CVD not present  368
## 363       9.341712e-02 CVD not present  369
## 402       9.349731e-02 CVD not present  370
## 238       9.436740e-02 CVD not present  371
## 322       9.448547e-02     CVD present  372
## 442       9.529642e-02 CVD not present  373
## 1         9.530183e-02 CVD not present  374
## 36        9.633076e-02 CVD not present  375
## 46        9.644320e-02 CVD not present  376
## 99        9.871079e-02     CVD present  377
## 164       9.923685e-02 CVD not present  378
## 304       9.937907e-02 CVD not present  379
## 382       1.009113e-01 CVD not present  380
## 384       1.012894e-01 CVD not present  381
## 134       1.023927e-01 CVD not present  382
## 379       1.025681e-01 CVD not present  383
## 368       1.034047e-01 CVD not present  384
## 150       1.074194e-01 CVD not present  385
## 415       1.077120e-01 CVD not present  386
## 300       1.098628e-01 CVD not present  387
## 87        1.102313e-01     CVD present  388
## 8         1.105417e-01 CVD not present  389
## 338       1.114595e-01 CVD not present  390
## 132       1.119637e-01 CVD not present  391
## 18        1.122161e-01 CVD not present  392
## 127       1.123633e-01 CVD not present  393
## 422       1.128364e-01 CVD not present  394
## 370       1.133591e-01 CVD not present  395
## 429       1.138507e-01 CVD not present  396
## 340       1.143337e-01 CVD not present  397
## 24        1.148008e-01 CVD not present  398
## 188       1.170546e-01 CVD not present  399
## 318       1.192361e-01 CVD not present  400
## 350       1.196255e-01 CVD not present  401
## 266       1.199864e-01 CVD not present  402
## 409       1.201556e-01 CVD not present  403
## 486       1.231718e-01 CVD not present  404
## 474       1.233265e-01 CVD not present  405
## 385       1.233689e-01     CVD present  406
## 58        1.236641e-01 CVD not present  407
## 310       1.237278e-01 CVD not present  408
## 254       1.263121e-01 CVD not present  409
## 185       1.268417e-01 CVD not present  410
## 198       1.320112e-01 CVD not present  411
## 186       1.337822e-01     CVD present  412
## 149       1.351396e-01 CVD not present  413
## 463       1.377820e-01 CVD not present  414
## 248       1.386615e-01 CVD not present  415
## 12        1.401478e-01 CVD not present  416
## 41        1.415911e-01 CVD not present  417
## 122       1.418711e-01 CVD not present  418
## 339       1.446978e-01 CVD not present  419
## 256       1.450928e-01 CVD not present  420
## 330       1.452912e-01 CVD not present  421
## 260       1.459669e-01     CVD present  422
## 317       1.514495e-01 CVD not present  423
## 454       1.514667e-01     CVD present  424
## 357       1.527789e-01 CVD not present  425
## 75        1.558278e-01 CVD not present  426
## 7         1.581158e-01 CVD not present  427
## 183       1.592955e-01 CVD not present  428
## 490       1.593529e-01 CVD not present  429
## 50        1.604392e-01 CVD not present  430
## 443       1.616229e-01 CVD not present  431
## 162       1.652444e-01 CVD not present  432
## 403       1.668787e-01 CVD not present  433
## 425       1.679628e-01 CVD not present  434
## 294       1.706276e-01     CVD present  435
## 159       1.716440e-01 CVD not present  436
## 327       1.722274e-01 CVD not present  437
## 306       1.728609e-01 CVD not present  438
## 331       1.736515e-01 CVD not present  439
## 321       1.756876e-01     CVD present  440
## 441       1.765091e-01 CVD not present  441
## 5         1.771968e-01 CVD not present  442
## 399       1.796670e-01 CVD not present  443
## 20        1.810634e-01     CVD present  444
## 121       1.811259e-01 CVD not present  445
## 30        1.824388e-01 CVD not present  446
## 91        1.846595e-01 CVD not present  447
## 64        1.846691e-01 CVD not present  448
## 120       1.856395e-01     CVD present  449
## 251       1.904199e-01 CVD not present  450
## 177       1.945554e-01 CVD not present  451
## 244       1.985714e-01 CVD not present  452
## 223       1.988602e-01 CVD not present  453
## 220       2.007133e-01 CVD not present  454
## 221       2.014825e-01     CVD present  455
## 147       2.056699e-01 CVD not present  456
## 472       2.080110e-01     CVD present  457
## 268       2.082859e-01 CVD not present  458
## 212       2.110547e-01     CVD present  459
## 42        2.167397e-01 CVD not present  460
## 308       2.197568e-01 CVD not present  461
## 347       2.249088e-01     CVD present  462
## 59        2.255067e-01 CVD not present  463
## 21        2.295599e-01     CVD present  464
## 124       2.307909e-01 CVD not present  465
## 63        2.350661e-01 CVD not present  466
## 70        2.422824e-01 CVD not present  467
## 200       2.472681e-01 CVD not present  468
## 92        2.515550e-01 CVD not present  469
## 461       2.531873e-01 CVD not present  470
## 255       2.631905e-01     CVD present  471
## 414       2.679604e-01 CVD not present  472
## 312       2.701094e-01 CVD not present  473
## 376       2.781145e-01 CVD not present  474
## 334       2.872135e-01     CVD present  475
## 477       2.981657e-01 CVD not present  476
## 215       2.987071e-01     CVD present  477
## 283       3.025489e-01     CVD present  478
## 400       3.103002e-01     CVD present  479
## 197       3.149022e-01 CVD not present  480
## 487       3.249909e-01     CVD present  481
## 432       3.276797e-01 CVD not present  482
## 270       3.620430e-01     CVD present  483
## 214       3.650280e-01 CVD not present  484
## 209       3.756050e-01     CVD present  485
## 158       3.851480e-01 CVD not present  486
## 6         3.868460e-01 CVD not present  487
## 218       3.886224e-01 CVD not present  488
## 264       3.887433e-01 CVD not present  489
## 40        3.996418e-01 CVD not present  490
## 269       4.097495e-01 CVD not present  491
## 208       4.646498e-01 CVD not present  492
## 93        4.966941e-01     CVD present  493
## 23        5.006878e-01 CVD not present  494
## 246       5.205048e-01 CVD not present  495
## 278       5.662294e-01 CVD not present  496
## 406       5.841941e-01     CVD present  497
## 225       6.347600e-01     CVD present  498
## 307       6.674028e-01 CVD not present  499
## 341       7.135157e-01     CVD present  500
ggplot(data=predicted.data, aes(x=rank, y=probability.of.CVD)) +
  geom_point(aes(color=CVD), alpha=1, shape=4, stroke=2) +
  xlab("Index") +
  ylab("Predicted probability of getting Cardiovascular disease")

This sigmoid curve shows how well the model is at predicting the probability of cardiovascular disease because as shown, the individuals with no CVD present are on the left side of the curve while the individuals with CVD are on the right side of the curve. This also shows that the model ranked them appropriately with obvious overlap in some values which is common due to errors.

Model Assumption and Diagnostics

To find out how well the model is good at classifying the individuals according to CVD present and CVD not present, I did a confusion matrix:

CVD_data$CVD_num <- ifelse(CVD_data$CVD == "CVD present", 1, 0)


predicted.probs <- logistic_result2$fitted.values

predicted.classes <- ifelse(predicted.probs > 0.5, 1, 0)

confusion_matrix_data <- table(
  Predicted = factor(predicted.classes, levels = c(0, 1)),
  Actual = factor(CVD_data$CVD_num, levels = c(0, 1))
)

confusion_matrix_data
##          Actual
## Predicted   0   1
##         0 456  37
##         1   4   3

456 individuals actually did not have CVD and the model correctly predicted that they did not have CVD = True Negative

37 individuals actually did not have CVD but the model predicted that they did have CVD = False Positive

4 individuals actually did not have CVD but the model predicted that they have CVD = False Negative

3 individuals actually did have CVD and the model correctly predicted that they have CVD = True Positive

To calculate the performance metrics of the model we have to identify the true positives and negatives and the false positives and negatives:

TP <- 3
TN <- 456
FP <- 37
FN <- 4


accuracy <- (TP + TN) / (TP + TN + FP + FN)
sensitivity <- TP / (TP + FN)   
specificity <- TN / (TN + FP)   
precision <- TP / (TP + FP)     
f1_score <- 2 * (precision * sensitivity) / (precision + sensitivity)

cat("Accuracy:    ", round(accuracy, 4), "\n")
## Accuracy:     0.918
cat("Sensitivity: ", round(sensitivity, 4), "\n")
## Sensitivity:  0.4286
cat("Specificity: ", round(specificity, 4), "\n")
## Specificity:  0.9249
cat("Precision:   ", round(precision, 4), "\n")
## Precision:    0.075
cat("F1 Score:    ", round(f1_score, 4), "\n")
## F1 Score:     0.1277

The performance metrics above is showing us that:

Accuracy: The model has a 91.8% accuracy in making the right predictions.

Sensitivity: Of the actual positive values, the model was able to catch 42.86% of the true positives.

Specificity: The model was able to detect approximately 92.5% of the true negative values.

Precision: The model was able to detect only 7.5% of the values that were predicted as true positives from the actual positives.

F1 Score: Approximately 12.7% shows that the model did not do a good job at identifying true positives and instead produced many false positives and negatives.

roc_results <- roc(response = CVD_data$CVD,
               predictor = logistic_result2$fitted.values,
               levels = c("CVD not present", "CVD present"),
               direction = "<")  

auc_values_results <- auc(roc_results); auc_values_results
## Area under the curve: 0.8192
plot.roc(roc_results, print.auc = TRUE, legacy.axes = TRUE,
         xlab = "False Positive Rate (1 - Specificity)",
         ylab = "True Positive Rate (Sensitivity)")

The AUC = 0.819 means that the model has a 81.9% meaning that it is fairly good at differentiating between an individual who has CVD and one who does not.

Conclusion and Future Directions

In conclusion, I would say that the model does a good job at predicting individuals with CVD and those without, however, based on the factors I chose, it appeared that those factors were not significant enough to determine if the individual has CVD or not. This is why the model did not do a good job at showing precision and recall which is the F1 score. As we could see, initially, R-squared of the model was approximately = 0.188 when I was comparing CVD to gender in the beginning. However, once I added more predictors, the adjusted R-squared came to (1.460094c-09) which shows that they did not improve the model. The AIC dropped from 269.07 to 240.24 which shows that aside from gender, there were other predictors that were significant and slightly improved the model.

In the future, it would be better to choose variables that may have a higher significance to create a good model.