Research Question: What factors contribute to an individual having cardiovascular disease?
The dataset I opted to use was a sample of 500 individuals with 31 variables. The dataset was basically collecting health information from the individuals. It included variables like: Gender, Age, CVD, Cholesterol, Hypertension, Systolic blood pressure, Smoking, Ethnicity and many more just to mention the important ones. I got this dataset from OpenIntro.org and link is down below:
https://www.openintro.org/data/index.php?data=prevend.samp
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.1 ✔ stringr 1.5.2
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(dplyr)
library(tibble)
library(cowplot)
## Warning: package 'cowplot' was built under R version 4.5.2
##
## Attaching package: 'cowplot'
##
## The following object is masked from 'package:lubridate':
##
## stamp
library(pROC)
## Warning: package 'pROC' was built under R version 4.5.2
## Type 'citation("pROC")' for a citation.
##
## Attaching package: 'pROC'
##
## The following objects are masked from 'package:stats':
##
## cov, smooth, var
main_analysis_data <- read.csv("prevend.samp.csv")
head(main_analysis_data )
## Casenr Age Gender Ethnicity Education RFFT VAT CVD DM Smoking Hypertension
## 1 2266 55 1 3 2 62 -1 0 1 0 0
## 2 3235 65 1 0 1 79 11 0 0 0 1
## 3 1068 46 0 2 3 89 6 0 0 0 0
## 4 3422 68 1 0 2 70 5 0 0 0 0
## 5 3570 70 0 0 2 35 10 0 0 0 0
## 6 1932 53 0 0 0 14 7 0 0 0 0
## BMI SBP DBP MAP eGFR Albuminuria.1 Albuminuria.2 Chol HDL
## 1 39.66942 122.0 63.5 86.0 83.29573 0 0 3.86 1.54
## 2 28.98114 107.5 66.5 82.5 76.49061 0 0 5.64 1.53
## 3 23.23346 120.5 75.0 92.5 76.44909 1 2 6.83 1.04
## 4 22.34352 114.5 67.0 85.0 61.23697 0 1 7.11 1.85
## 5 32.41004 114.0 72.5 89.5 88.14615 0 0 5.04 1.40
## 6 29.15877 126.5 75.0 95.0 91.90595 0 0 3.05 0.79
## Statin Solubility Days Years DDD FRS PS PSquint GRS Match_1
## 1 0 2 -1 -1.000000 0 8 0.37427518 5 1 816
## 2 1 1 1672 4.580822 1373 11 0.25594606 4 1 727
## 3 0 2 -1 -1.000000 0 -1 0.12850400 3 0 -1
## 4 0 2 -1 -1.000000 0 9 0.09417574 2 0 838
## 5 0 2 -1 -1.000000 0 12 0.19343208 4 1 -1
## 6 0 2 -1 -1.000000 0 8 0.13091850 3 1 15
## Match_2
## 1 113
## 2 242
## 3 -1
## 4 -1
## 5 276
## 6 121
str(main_analysis_data )
## 'data.frame': 500 obs. of 31 variables:
## $ Casenr : int 2266 3235 1068 3422 3570 1932 3134 3573 1103 868 ...
## $ Age : int 55 65 46 68 70 53 64 70 46 44 ...
## $ Gender : int 1 1 0 1 0 0 0 0 1 0 ...
## $ Ethnicity : int 3 0 2 0 0 0 2 0 0 0 ...
## $ Education : int 2 1 3 2 2 0 1 3 2 3 ...
## $ RFFT : int 62 79 89 70 35 14 31 47 88 91 ...
## $ VAT : int -1 11 6 5 10 7 8 5 11 11 ...
## $ CVD : int 0 0 0 0 0 0 0 0 0 0 ...
## $ DM : int 1 0 0 0 0 0 0 0 0 0 ...
## $ Smoking : int 0 0 0 0 0 0 0 0 1 0 ...
## $ Hypertension : int 0 1 0 0 0 0 0 1 1 0 ...
## $ BMI : num 39.7 29 23.2 22.3 32.4 ...
## $ SBP : num 122 108 120 114 114 ...
## $ DBP : num 63.5 66.5 75 67 72.5 75 77 76.5 99 70 ...
## $ MAP : num 86 82.5 92.5 85 89.5 ...
## $ eGFR : num 83.3 76.5 76.4 61.2 88.1 ...
## $ Albuminuria.1: int 0 0 1 0 0 0 0 0 0 0 ...
## $ Albuminuria.2: int 0 0 2 1 0 0 1 0 1 0 ...
## $ Chol : num 3.86 5.64 6.83 7.11 5.04 3.05 4.9 5.5 3.92 5.75 ...
## $ HDL : num 1.54 1.53 1.04 1.85 1.4 0.79 1.23 1.57 1.39 1.18 ...
## $ Statin : int 0 1 0 0 0 0 0 0 0 0 ...
## $ Solubility : int 2 1 2 2 2 2 2 2 2 2 ...
## $ Days : int -1 1672 -1 -1 -1 -1 -1 -1 -1 -1 ...
## $ Years : num -1 4.58 -1 -1 -1 ...
## $ DDD : num 0 1373 0 0 0 ...
## $ FRS : int 8 11 -1 9 12 8 12 15 11 8 ...
## $ PS : num 0.3743 0.2559 0.1285 0.0942 0.1934 ...
## $ PSquint : int 5 4 3 2 4 3 3 4 3 2 ...
## $ GRS : int 1 1 0 0 1 1 0 1 0 0 ...
## $ Match_1 : int 816 727 -1 838 -1 15 200 -1 -1 -1 ...
## $ Match_2 : int 113 242 -1 -1 276 121 -1 -1 -1 -1 ...
colSums(is.na(main_analysis_data ))
## Casenr Age Gender Ethnicity Education
## 0 0 0 0 0
## RFFT VAT CVD DM Smoking
## 0 0 0 0 0
## Hypertension BMI SBP DBP MAP
## 0 0 0 0 0
## eGFR Albuminuria.1 Albuminuria.2 Chol HDL
## 0 0 0 0 0
## Statin Solubility Days Years DDD
## 0 0 0 0 0
## FRS PS PSquint GRS Match_1
## 0 0 0 0 0
## Match_2
## 0
tail(main_analysis_data )
## Casenr Age Gender Ethnicity Education RFFT VAT CVD DM Smoking Hypertension
## 495 1161 46 1 0 1 65 11 0 0 0 0
## 496 475 40 1 0 2 109 11 0 0 0 0
## 497 2784 60 0 0 3 89 12 0 0 0 1
## 498 2121 54 1 0 2 79 7 0 0 0 0
## 499 1983 53 1 0 3 85 11 0 0 0 1
## 500 468 40 1 0 3 125 12 0 0 0 0
## BMI SBP DBP MAP eGFR Albuminuria.1 Albuminuria.2 Chol HDL
## 495 27.60945 114.0 71.5 87.0 74.52218 0 0 4.03 1.30
## 496 22.72709 106.0 57.5 78.5 90.43841 0 0 4.70 1.97
## 497 27.77588 126.0 76.5 95.0 78.10049 0 0 6.06 1.58
## 498 23.05175 119.5 77.0 94.0 79.42434 0 1 8.19 1.66
## 499 21.92527 119.0 77.5 92.0 79.72628 0 1 5.71 1.61
## 500 18.10774 99.0 66.0 76.5 87.32902 0 1 4.52 1.47
## Statin Solubility Days Years DDD FRS PS PSquint GRS Match_1 Match_2
## 495 0 2 -1 -1 0 1 0.08533976 2 0 -1 -1
## 496 0 2 -1 -1 0 0 0.05299198 1 0 -1 -1
## 497 0 2 -1 -1 0 13 0.24122647 4 0 -1 -1
## 498 0 2 -1 -1 0 7 0.07188406 1 0 -1 -1
## 499 0 2 -1 -1 0 10 0.12245760 3 1 -1 -1
## 500 0 2 -1 -1 0 1 0.03873501 1 0 -1 -1
#Variables being used are: Smoking, SBP, Gender, Chol, CVD, Age
For my data analysis, I will be using a logistic regression model to determine the factors that are significant in determining if an individual has Cardiovascular disease (CVD). I will then use a graph do visualize my data and lastly visualize the area under the curve.
CVD_data <- main_analysis_data |>
select(Gender, Smoking, CVD, Chol, SBP, Age)
CVD_data
## Gender Smoking CVD Chol SBP Age
## 1 1 0 0 3.86 122.0 55
## 2 1 0 0 5.64 107.5 65
## 3 0 0 0 6.83 120.5 46
## 4 1 0 0 7.11 114.5 68
## 5 0 0 0 5.04 114.0 70
## 6 0 0 0 3.05 126.5 53
## 7 0 0 0 4.90 119.0 64
## 8 0 0 0 5.50 141.0 70
## 9 1 1 0 3.92 159.5 46
## 10 0 0 0 5.75 129.5 44
## 11 0 1 0 6.71 122.0 46
## 12 0 0 0 5.42 141.5 75
## 13 1 1 0 4.60 105.5 41
## 14 1 0 0 6.28 116.5 41
## 15 0 1 0 6.05 148.5 66
## 16 1 0 0 3.77 127.5 38
## 17 1 0 0 8.17 122.5 64
## 18 0 0 0 4.71 129.0 51
## 19 1 0 0 5.62 132.5 47
## 20 0 0 1 4.55 126.5 61
## 21 0 0 1 3.66 125.0 48
## 22 0 0 0 6.63 153.5 54
## 23 0 1 0 3.37 168.0 75
## 24 0 1 0 4.99 144.0 58
## 25 0 0 0 5.83 133.0 51
## 26 1 0 0 5.42 171.0 71
## 27 1 0 0 4.43 108.5 58
## 28 1 0 0 6.94 146.5 65
## 29 0 0 0 7.09 153.5 64
## 30 0 0 0 4.98 118.0 70
## 31 1 0 0 5.62 108.0 59
## 32 1 0 0 4.14 152.5 63
## 33 0 0 0 5.08 147.5 51
## 34 1 0 0 5.47 88.5 39
## 35 0 0 0 4.62 120.5 36
## 36 0 0 0 5.93 158.0 78
## 37 1 1 0 5.22 131.5 61
## 38 1 0 0 4.03 162.0 51
## 39 1 1 0 4.31 121.0 48
## 40 0 0 0 3.61 167.5 72
## 41 1 0 0 3.65 127.0 62
## 42 0 0 0 4.82 137.0 74
## 43 1 1 0 6.00 99.5 54
## 44 0 1 0 7.43 144.0 50
## 45 0 0 0 5.84 101.0 43
## 46 0 0 0 4.99 111.0 51
## 47 0 0 0 5.69 175.0 68
## 48 1 0 0 6.36 135.5 63
## 49 1 1 0 4.92 125.5 43
## 50 0 0 0 4.39 145.5 56
## 51 0 0 0 5.43 139.0 48
## 52 0 0 1 5.46 114.5 58
## 53 1 0 0 6.42 146.0 54
## 54 0 0 0 6.35 106.0 64
## 55 1 1 0 5.37 134.5 55
## 56 1 0 0 6.40 146.0 45
## 57 0 0 0 6.15 130.5 55
## 58 0 0 0 4.31 132.0 45
## 59 0 1 0 4.77 126.0 71
## 60 1 0 0 4.48 99.5 46
## 61 0 0 0 6.55 154.5 78
## 62 0 1 0 6.73 150.5 56
## 63 0 0 0 4.59 125.5 70
## 64 0 0 0 4.94 122.5 70
## 65 1 0 0 4.87 104.5 64
## 66 1 0 0 5.98 113.5 54
## 67 0 0 0 6.72 119.0 55
## 68 1 0 0 6.15 102.5 38
## 69 1 0 0 5.05 119.5 49
## 70 0 1 0 4.07 131.0 58
## 71 1 0 0 6.46 123.5 71
## 72 0 0 0 6.01 137.0 51
## 73 1 0 0 4.80 112.0 38
## 74 0 0 0 7.76 116.5 51
## 75 0 1 0 5.35 119.0 72
## 76 1 1 0 4.94 133.0 48
## 77 1 0 0 5.58 122.0 64
## 78 1 0 0 5.56 123.0 48
## 79 0 0 1 6.86 145.5 38
## 80 0 0 0 5.95 122.5 54
## 81 0 0 0 7.55 130.0 44
## 82 1 0 0 5.43 95.5 58
## 83 0 0 0 5.53 169.0 58
## 84 1 0 0 5.01 110.5 45
## 85 0 0 0 4.86 120.0 45
## 86 1 0 0 5.98 115.0 61
## 87 0 1 1 4.60 152.5 49
## 88 0 1 0 4.66 109.5 37
## 89 0 0 0 4.94 126.0 43
## 90 0 0 0 7.17 126.0 64
## 91 0 0 0 4.97 133.5 72
## 92 0 0 0 3.34 119.0 43
## 93 0 1 1 3.39 142.5 72
## 94 1 0 0 4.88 149.5 72
## 95 1 0 0 5.97 102.5 53
## 96 1 1 0 5.09 103.5 46
## 97 1 1 0 6.66 104.0 49
## 98 1 0 0 5.74 100.0 39
## 99 1 0 1 3.45 100.0 44
## 100 1 0 0 5.16 160.0 75
## 101 0 0 0 6.05 129.5 59
## 102 1 0 0 5.90 106.5 57
## 103 0 0 0 6.46 142.0 48
## 104 1 1 0 4.13 119.0 43
## 105 0 0 0 5.87 118.0 62
## 106 1 0 0 5.68 111.5 54
## 107 0 0 0 5.37 117.5 41
## 108 1 0 0 5.31 95.0 43
## 109 1 0 0 4.98 103.5 42
## 110 1 0 0 4.57 129.0 45
## 111 1 0 0 3.67 131.5 47
## 112 0 0 0 8.66 128.5 57
## 113 1 -1 0 6.13 98.5 66
## 114 0 0 0 7.27 124.0 65
## 115 1 0 0 5.12 137.0 50
## 116 1 1 0 5.84 117.0 52
## 117 0 1 0 7.11 109.0 57
## 118 1 0 0 5.42 106.5 41
## 119 1 1 0 6.57 104.5 47
## 120 0 0 1 4.39 125.5 58
## 121 0 0 0 4.22 114.0 52
## 122 0 1 0 5.55 154.0 78
## 123 0 0 0 4.44 111.0 37
## 124 0 0 0 4.72 147.5 75
## 125 0 1 0 4.95 109.0 41
## 126 1 0 0 5.80 138.5 64
## 127 0 0 0 4.11 117.5 36
## 128 1 0 0 6.28 137.0 68
## 129 0 0 0 6.40 155.5 52
## 130 1 1 0 8.14 119.5 62
## 131 0 0 0 6.41 158.0 75
## 132 0 1 0 4.81 134.0 52
## 133 0 0 0 5.43 134.0 57
## 134 0 0 0 6.02 143.5 80
## 135 0 0 0 5.69 160.0 61
## 136 1 1 0 3.83 141.5 41
## 137 0 0 0 6.69 137.5 44
## 138 1 0 0 5.56 115.0 43
## 139 1 0 0 5.94 143.5 55
## 140 0 0 0 6.69 134.0 56
## 141 0 0 0 5.67 133.0 42
## 142 0 1 0 6.35 134.0 43
## 143 0 1 0 5.43 117.0 43
## 144 1 0 0 6.83 134.5 64
## 145 1 0 0 4.86 108.5 74
## 146 1 0 0 4.25 115.5 56
## 147 0 0 0 3.66 112.5 43
## 148 1 1 0 5.16 101.0 36
## 149 1 0 0 3.88 178.0 72
## 150 0 1 0 4.99 109.5 52
## 151 1 0 0 5.34 124.5 45
## 152 1 0 0 6.73 149.0 61
## 153 0 1 1 5.88 164.0 48
## 154 1 0 0 6.76 123.0 78
## 155 0 0 0 5.02 122.0 37
## 156 1 0 0 5.26 122.5 64
## 157 1 0 0 5.22 103.0 50
## 158 0 0 0 4.02 144.0 77
## 159 0 0 0 3.97 125.0 46
## 160 1 0 0 5.13 116.5 55
## 161 1 0 0 5.41 123.0 42
## 162 1 0 0 3.59 150.5 68
## 163 1 0 0 7.79 107.5 36
## 164 0 1 0 5.26 135.0 59
## 165 1 0 0 4.49 104.5 52
## 166 1 1 0 5.01 129.5 54
## 167 1 0 0 4.50 112.0 49
## 168 1 0 1 4.52 115.0 64
## 169 0 0 0 6.37 114.0 46
## 170 0 0 0 5.92 122.0 56
## 171 1 -1 0 6.32 108.5 58
## 172 1 0 0 5.06 141.0 71
## 173 0 0 0 6.28 143.0 58
## 174 1 1 0 5.22 116.5 54
## 175 1 0 0 5.61 110.0 38
## 176 1 0 0 5.16 115.5 39
## 177 0 1 0 5.00 104.5 69
## 178 0 1 0 6.90 127.5 62
## 179 1 0 0 5.56 122.0 55
## 180 1 0 0 4.39 128.5 47
## 181 1 0 0 4.39 109.0 41
## 182 1 0 0 5.14 130.5 68
## 183 0 0 0 5.16 109.5 69
## 184 1 0 0 5.43 137.0 53
## 185 0 0 0 5.48 104.0 69
## 186 0 0 1 4.66 113.5 53
## 187 0 0 0 6.81 129.0 47
## 188 0 1 0 4.32 116.5 40
## 189 1 1 0 5.71 100.5 52
## 190 0 0 1 5.55 143.5 66
## 191 1 0 0 5.35 104.0 57
## 192 1 0 0 6.74 98.0 45
## 193 0 0 0 5.77 129.0 39
## 194 0 0 0 5.37 126.0 37
## 195 1 1 0 5.00 114.0 55
## 196 0 0 0 5.33 171.5 61
## 197 0 0 0 4.31 170.0 79
## 198 0 0 0 3.88 123.0 36
## 199 1 1 0 4.29 157.5 65
## 200 0 0 0 4.77 127.5 76
## 201 1 1 0 6.34 110.0 54
## 202 0 0 0 5.44 129.0 38
## 203 1 -1 0 6.78 143.5 76
## 204 1 0 0 5.92 138.5 49
## 205 1 0 0 6.57 109.5 65
## 206 1 0 0 5.44 143.0 68
## 207 1 0 0 4.61 135.0 54
## 208 0 0 0 3.48 137.0 72
## 209 0 0 1 3.90 133.5 72
## 210 1 1 0 6.75 130.5 50
## 211 1 1 0 5.36 126.0 42
## 212 0 0 1 4.09 132.5 56
## 213 0 0 0 6.45 120.5 55
## 214 0 0 0 4.18 156.5 80
## 215 0 1 1 3.92 91.5 57
## 216 0 0 0 6.21 136.0 56
## 217 0 0 0 7.72 134.0 61
## 218 0 1 0 4.26 110.5 77
## 219 0 0 0 6.74 147.0 68
## 220 0 0 0 4.09 154.0 57
## 221 0 0 1 4.79 103.5 67
## 222 1 1 0 5.53 109.5 54
## 223 1 0 0 3.87 117.0 76
## 224 0 0 0 6.61 188.5 70
## 225 0 0 1 2.23 121.5 59
## 226 1 1 1 5.10 117.5 60
## 227 0 0 0 4.86 106.0 46
## 228 1 1 1 4.78 114.5 49
## 229 1 0 0 4.80 138.5 68
## 230 0 0 0 8.56 115.5 52
## 231 1 1 0 7.42 97.5 54
## 232 1 0 0 4.57 106.0 50
## 233 1 0 0 6.72 129.5 72
## 234 1 0 0 5.09 112.0 59
## 235 1 1 0 6.07 110.5 49
## 236 1 0 0 4.87 121.5 55
## 237 1 0 0 5.11 112.0 38
## 238 1 1 0 3.71 134.5 51
## 239 1 0 0 5.02 110.5 57
## 240 1 0 0 5.97 121.0 42
## 241 0 0 0 5.53 121.5 49
## 242 1 0 0 4.61 125.0 48
## 243 0 -1 0 4.64 149.0 68
## 244 0 0 0 4.71 147.5 70
## 245 1 0 0 4.75 116.0 38
## 246 0 0 0 3.22 131.5 71
## 247 1 1 0 5.75 114.0 56
## 248 0 0 0 4.30 106.5 45
## 249 0 0 0 5.82 124.0 44
## 250 1 0 0 6.63 116.5 57
## 251 0 1 0 4.70 167.0 69
## 252 0 0 0 4.74 102.0 36
## 253 1 0 0 7.63 105.5 47
## 254 1 0 0 4.21 131.5 72
## 255 0 1 1 4.54 135.5 72
## 256 0 0 0 5.01 102.5 62
## 257 1 1 0 3.93 118.5 47
## 258 1 0 0 4.48 107.0 53
## 259 1 0 0 5.72 156.0 54
## 260 0 0 1 4.16 154.0 49
## 261 1 1 0 8.77 132.5 67
## 262 1 0 0 5.10 111.5 51
## 263 0 -1 0 4.37 138.5 40
## 264 0 0 0 3.60 120.5 65
## 265 1 0 0 4.86 112.5 44
## 266 0 0 0 5.04 134.0 61
## 267 0 1 0 5.54 110.0 38
## 268 0 0 0 4.16 106.0 54
## 269 0 0 0 3.70 125.0 70
## 270 0 0 1 4.18 142.5 78
## 271 1 0 0 5.14 141.0 48
## 272 1 1 0 6.26 135.5 66
## 273 0 0 0 4.58 117.0 38
## 274 0 0 0 6.65 131.0 50
## 275 1 0 0 4.80 104.5 39
## 276 1 1 0 5.56 99.5 46
## 277 0 0 0 5.21 123.0 37
## 278 0 1 0 2.53 122.5 57
## 279 0 0 0 6.16 116.0 71
## 280 1 0 0 5.84 156.5 51
## 281 1 0 0 5.59 113.5 66
## 282 1 0 0 4.91 126.0 49
## 283 0 0 1 3.73 117.0 58
## 284 1 0 0 4.64 117.5 38
## 285 1 0 0 6.09 107.5 37
## 286 0 1 0 5.96 125.5 61
## 287 0 0 1 6.26 165.5 77
## 288 1 1 0 5.79 109.5 54
## 289 0 0 0 8.58 108.0 39
## 290 1 1 0 5.33 108.0 50
## 291 0 0 0 9.22 128.5 46
## 292 1 0 0 5.23 122.5 51
## 293 1 0 0 6.56 115.0 50
## 294 0 0 1 4.57 121.0 59
## 295 1 0 0 4.05 122.5 45
## 296 0 0 0 7.24 114.5 66
## 297 0 0 0 6.11 141.0 36
## 298 0 0 0 7.12 178.5 63
## 299 1 0 0 6.00 152.0 51
## 300 0 1 0 5.27 134.5 62
## 301 0 0 0 5.35 144.0 61
## 302 1 0 0 4.82 111.5 51
## 303 0 0 0 5.63 130.5 59
## 304 1 0 0 4.18 135.0 65
## 305 1 0 0 6.48 150.0 75
## 306 0 0 0 5.37 141.0 80
## 307 0 0 0 2.70 152.5 77
## 308 0 1 0 3.94 107.0 49
## 309 0 0 0 5.75 136.0 54
## 310 0 0 0 4.58 139.0 52
## 311 1 0 0 6.52 121.5 36
## 312 0 0 0 4.65 125.5 76
## 313 1 0 0 5.74 110.5 36
## 314 1 0 0 5.82 121.5 72
## 315 0 0 0 7.68 140.0 46
## 316 1 0 0 4.97 114.0 57
## 317 0 1 0 4.36 155.5 53
## 318 0 1 0 4.66 131.0 50
## 319 1 0 0 4.37 120.0 55
## 320 0 0 0 6.07 137.5 60
## 321 0 0 1 4.36 128.5 56
## 322 0 0 1 4.63 109.0 42
## 323 1 0 0 5.25 154.0 71
## 324 1 0 0 5.25 98.0 60
## 325 0 1 0 6.69 131.0 48
## 326 1 0 0 7.78 110.5 57
## 327 0 1 0 4.48 117.5 55
## 328 1 1 0 4.95 106.0 57
## 329 0 0 0 4.77 116.5 38
## 330 0 0 0 5.15 159.0 72
## 331 0 1 0 5.18 141.0 74
## 332 0 1 0 5.57 120.0 38
## 333 0 0 0 5.87 141.5 47
## 334 0 0 1 4.60 159.0 81
## 335 0 0 0 7.02 125.5 37
## 336 0 0 0 5.19 115.5 53
## 337 0 0 0 5.68 145.0 61
## 338 1 0 0 3.84 114.5 58
## 339 0 1 0 4.06 123.5 41
## 340 1 0 0 3.35 126.5 49
## 341 0 0 1 2.53 131.5 76
## 342 0 0 0 5.91 129.0 63
## 343 1 0 0 6.03 106.0 43
## 344 0 0 0 8.58 114.5 67
## 345 1 1 0 4.84 108.5 42
## 346 0 1 0 6.52 140.0 54
## 347 0 0 1 4.24 145.5 63
## 348 1 0 0 5.10 100.5 48
## 349 0 0 0 6.35 137.5 73
## 350 0 0 0 4.42 144.0 48
## 351 1 -1 0 4.42 93.0 46
## 352 0 0 0 5.36 109.5 55
## 353 1 1 0 4.82 109.0 52
## 354 0 0 0 5.29 127.0 37
## 355 1 0 0 4.80 126.0 36
## 356 1 0 0 5.74 109.0 50
## 357 0 0 0 4.92 132.0 65
## 358 1 1 0 5.35 104.5 50
## 359 1 0 0 7.15 95.5 62
## 360 1 1 0 4.81 113.0 42
## 361 1 0 0 6.17 104.0 49
## 362 1 0 0 5.06 99.5 49
## 363 0 0 0 5.20 128.5 57
## 364 1 0 0 5.36 137.0 64
## 365 0 0 0 5.75 127.0 41
## 366 0 0 0 5.75 125.0 47
## 367 0 0 0 7.12 112.0 42
## 368 0 0 0 4.30 142.5 41
## 369 0 0 0 6.22 120.0 60
## 370 0 0 0 4.57 111.5 46
## 371 1 0 0 4.79 126.0 72
## 372 1 0 0 4.99 101.0 46
## 373 1 1 0 3.93 109.0 39
## 374 1 0 0 5.35 151.0 68
## 375 1 0 0 5.47 130.0 42
## 376 0 0 0 4.15 137.0 67
## 377 0 1 0 6.85 120.0 48
## 378 1 0 0 5.91 109.0 40
## 379 0 0 0 4.96 127.5 54
## 380 0 0 0 6.31 127.5 48
## 381 0 0 0 4.88 114.0 36
## 382 0 0 0 5.29 127.0 61
## 383 0 0 0 5.14 132.0 37
## 384 0 1 0 4.96 120.5 51
## 385 0 1 1 5.31 158.0 69
## 386 0 0 0 5.60 122.0 48
## 387 1 0 0 7.68 163.5 73
## 388 0 0 0 6.41 121.0 56
## 389 1 0 0 5.65 112.0 51
## 390 0 1 0 5.32 128.0 44
## 391 0 0 0 6.22 110.0 49
## 392 1 0 0 7.04 130.5 56
## 393 1 0 1 4.26 110.0 54
## 394 1 0 0 6.78 100.0 49
## 395 1 0 0 5.58 109.5 42
## 396 0 1 0 5.79 117.5 41
## 397 0 0 0 6.79 125.0 51
## 398 0 1 0 5.00 107.5 45
## 399 0 1 0 4.69 117.0 61
## 400 0 0 1 4.27 132.0 73
## 401 1 0 0 5.72 129.5 59
## 402 0 0 0 5.54 155.5 68
## 403 0 0 0 5.08 146.5 73
## 404 1 0 0 5.99 134.5 68
## 405 1 0 0 4.42 104.5 44
## 406 0 1 1 3.14 116.5 72
## 407 0 1 0 6.82 119.0 57
## 408 1 0 0 4.65 137.0 48
## 409 1 0 0 3.81 77.5 55
## 410 1 1 0 6.93 96.0 51
## 411 1 0 0 6.79 125.0 47
## 412 1 0 0 5.60 147.5 52
## 413 1 0 0 5.68 104.5 46
## 414 0 0 0 4.30 127.5 68
## 415 1 0 0 3.94 103.5 58
## 416 1 0 0 5.26 124.5 69
## 417 0 1 0 6.26 111.5 55
## 418 1 1 0 5.70 107.5 71
## 419 1 1 0 4.99 106.5 48
## 420 0 0 0 6.83 124.5 47
## 421 1 0 0 5.26 111.0 40
## 422 0 0 0 5.08 107.5 57
## 423 1 0 0 5.04 112.5 44
## 424 1 0 1 5.03 109.5 62
## 425 1 0 0 3.05 115.5 52
## 426 0 1 0 5.34 130.5 55
## 427 0 0 0 5.74 126.5 48
## 428 1 0 0 6.00 138.5 60
## 429 0 0 0 5.59 117.0 70
## 430 1 0 0 5.57 105.0 41
## 431 0 1 0 5.31 114.0 48
## 432 0 0 0 3.99 143.5 70
## 433 1 1 0 6.80 104.5 61
## 434 0 0 0 6.34 120.5 48
## 435 1 1 0 5.18 144.0 60
## 436 1 0 0 7.47 108.5 70
## 437 1 0 0 4.16 112.0 41
## 438 0 1 0 6.12 166.0 64
## 439 1 0 0 6.19 113.0 37
## 440 0 0 0 5.35 120.5 58
## 441 0 0 0 5.12 141.5 75
## 442 0 0 0 4.90 122.5 50
## 443 0 0 0 3.63 129.5 37
## 444 1 0 0 6.76 116.5 65
## 445 0 0 0 6.31 122.0 51
## 446 0 0 0 5.53 130.0 43
## 447 1 0 0 4.71 172.5 67
## 448 1 1 0 5.74 144.5 52
## 449 0 0 0 7.07 106.0 52
## 450 0 1 0 5.36 120.0 37
## 451 1 0 0 4.29 118.0 38
## 452 0 1 0 5.08 123.5 44
## 453 0 0 0 7.40 154.0 56
## 454 0 0 1 5.44 160.5 80
## 455 1 0 0 6.43 160.5 73
## 456 1 0 0 5.47 155.0 78
## 457 0 1 0 4.91 126.5 40
## 458 1 0 0 7.67 106.5 47
## 459 1 0 0 6.94 156.0 63
## 460 0 1 0 6.07 117.5 42
## 461 0 1 0 4.34 176.0 71
## 462 1 0 0 6.18 108.0 43
## 463 0 0 0 5.75 133.0 81
## 464 1 0 1 4.41 106.0 40
## 465 0 0 0 6.21 121.0 63
## 466 0 1 0 5.17 110.0 50
## 467 1 1 1 5.27 134.5 61
## 468 0 0 0 5.71 112.0 52
## 469 0 0 0 6.41 124.0 59
## 470 1 0 0 4.63 122.0 54
## 471 1 1 0 5.73 114.5 57
## 472 0 0 1 4.85 150.5 75
## 473 0 0 0 6.04 152.0 75
## 474 0 0 0 4.10 147.5 42
## 475 1 0 0 7.02 109.5 54
## 476 0 1 0 5.79 123.0 54
## 477 0 1 0 3.88 133.0 61
## 478 0 1 0 5.53 129.0 38
## 479 0 0 0 6.64 137.5 66
## 480 1 0 0 4.80 110.0 42
## 481 0 0 0 6.33 151.0 50
## 482 1 0 0 4.16 100.5 37
## 483 0 0 0 4.96 124.0 37
## 484 1 0 0 4.48 163.5 54
## 485 1 0 0 4.12 116.5 47
## 486 0 0 0 4.12 144.0 42
## 487 0 0 1 3.97 141.5 69
## 488 1 1 0 6.34 111.0 56
## 489 0 0 0 5.48 112.5 51
## 490 1 0 0 2.51 106.0 37
## 491 1 0 0 6.65 128.5 49
## 492 0 0 0 5.52 109.0 46
## 493 0 1 0 6.22 128.0 47
## 494 0 0 0 5.35 140.0 38
## 495 1 0 0 4.03 114.0 46
## 496 1 0 0 4.70 106.0 40
## 497 0 0 0 6.06 126.0 60
## 498 1 0 0 8.19 119.5 54
## 499 1 0 0 5.71 119.0 53
## 500 1 0 0 4.52 99.0 40
CVD_data[CVD_data$Gender == 0,]$Gender <- "M"
CVD_data[CVD_data$Gender == 1,]$Gender <- "F"
CVD_data[CVD_data$Smoking == 0,]$Smoking <- "Non-smoker"
CVD_data[CVD_data$Smoking == 1,]$Smoking <- " Frequent Smoker"
CVD_data[CVD_data$Smoking == -1,]$Smoking <- "Occasional Smoker"
str(CVD_data)
## 'data.frame': 500 obs. of 6 variables:
## $ Gender : chr "F" "F" "M" "F" ...
## $ Smoking: chr "Non-smoker" "Non-smoker" "Non-smoker" "Non-smoker" ...
## $ CVD : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Chol : num 3.86 5.64 6.83 7.11 5.04 3.05 4.9 5.5 3.92 5.75 ...
## $ SBP : num 122 108 120 114 114 ...
## $ Age : int 55 65 46 68 70 53 64 70 46 44 ...
head(CVD_data)
## Gender Smoking CVD Chol SBP Age
## 1 F Non-smoker 0 3.86 122.0 55
## 2 F Non-smoker 0 5.64 107.5 65
## 3 M Non-smoker 0 6.83 120.5 46
## 4 F Non-smoker 0 7.11 114.5 68
## 5 M Non-smoker 0 5.04 114.0 70
## 6 M Non-smoker 0 3.05 126.5 53
CVD_data$Gender <- as.factor(CVD_data$Gender)
CVD_data$Smoking <- as.factor(CVD_data$Smoking)
str(CVD_data)
## 'data.frame': 500 obs. of 6 variables:
## $ Gender : Factor w/ 2 levels "F","M": 1 1 2 1 2 2 2 2 1 2 ...
## $ Smoking: Factor w/ 3 levels " Frequent Smoker",..: 2 2 2 2 2 2 2 2 1 2 ...
## $ CVD : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Chol : num 3.86 5.64 6.83 7.11 5.04 3.05 4.9 5.5 3.92 5.75 ...
## $ SBP : num 122 108 120 114 114 ...
## $ Age : int 55 65 46 68 70 53 64 70 46 44 ...
unique(CVD_data$Smoking)
## [1] Non-smoker Frequent Smoker Occasional Smoker
## Levels: Frequent Smoker Non-smoker Occasional Smoker
CVD_data[CVD_data$CVD == 0,]$CVD <- "CVD not present"
CVD_data[CVD_data$CVD == 1,]$CVD <- "CVD present"
CVD_data$CVD <- as.factor(CVD_data$CVD)
unique(CVD_data$CVD)
## [1] CVD not present CVD present
## Levels: CVD not present CVD present
xtabs(~ CVD + Smoking, data = CVD_data)
## Smoking
## CVD Frequent Smoker Non-smoker Occasional Smoker
## CVD not present 106 348 6
## CVD present 10 30 0
xtabs(~ CVD + Gender, data = CVD_data)
## Gender
## CVD F M
## CVD not present 227 233
## CVD present 8 32
logistic_result <- glm(CVD ~ Gender, data= CVD_data, family="binomial")
summary(logistic_result)
##
## Call:
## glm(formula = CVD ~ Gender, family = "binomial", data = CVD_data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.3455 0.3597 -9.300 < 2e-16 ***
## GenderM 1.3602 0.4061 3.349 0.000811 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 278.77 on 499 degrees of freedom
## Residual deviance: 265.07 on 498 degrees of freedom
## AIC: 269.07
##
## Number of Fisher Scoring iterations: 6
Gender is highly significant in determining the risk for cardiovascular disease because the p-value is less than 0.001
Based on the results above, with an intercept of approximately (-3.35), this indicates the log odds that a female will have cardiovascular disease. As for the males with an intercept of approximately (1.36), it shows that males increase the log odds ratio of cardiovascular disease by 1.36 as compared to females.
If we go ahead and find out the probability of females having cardiovascular disease:
P = 1 / (1 + e^3.35) = 0.034
Meaning that the odds ratio for Males can be calculated by doing:
Odds ratio = e^1.36 = 3.90
This shows that males chances of having cardiovascular disease is 3.9 times higher than females.
Instead of looking at just one factor, I then decided to do a logistic regression model showing how other factor like Smoking, SBP, Cholesterol and Age on top of Gender affect an individual’s risk of having cardiovascular disease which is shown below.
logistic_result2 <- glm(CVD ~ Gender + Smoking + SBP + Chol + Age, data= CVD_data, family="binomial")
summary(logistic_result2)
##
## Call:
## glm(formula = CVD ~ Gender + Smoking + SBP + Chol + Age, family = "binomial",
## data = CVD_data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.273142 1.617381 -0.169 0.86589
## GenderM 1.152326 0.433000 2.661 0.00778 **
## SmokingNon-smoker -0.072692 0.417005 -0.174 0.86162
## SmokingOccasional Smoker -13.901997 909.988172 -0.015 0.98781
## SBP -0.004808 0.011142 -0.431 0.66612
## Chol -0.912941 0.190581 -4.790 1.67e-06 ***
## Age 0.040105 0.016336 2.455 0.01409 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 278.77 on 499 degrees of freedom
## Residual deviance: 226.24 on 493 degrees of freedom
## AIC: 240.24
##
## Number of Fisher Scoring iterations: 15
GenderM (1.15) and a p-value of 0.00778 meaning males increase the log odds of having cardiovascular disease hence proving that gender is important in this model.
SmokingNon-Smoker approximately (-0.073) and a p-value of approximately 0.862 being greater than 0.001, shows that being a non-smoker is not necessarily statistically significant in this model.
SmokingOccasional Smoker (-13.9) and a p-value of approximately 0.98 shows that being an occasional smoker is not statistically significant in determining the chances for having cardiovascular disease.
SBP (-0.005) and a p-value of approximately 0.67 being greater than 0.001 shows that the systolic blood pressure is not significant in determining the chances of an individual having cardiovascular disease.
Chol (-0.91) and a p-value of (1.67e-06) shows that cholesterol levels hold a very high significance in determining the chances of having cardiovascular disease in this model.
Age (0.04) and a p-value of (0.014) shows that there is a slight significance in age playing a role in determining the chances of having cardiovascular disease.
combined_results_r_squared <- 1 - (logistic_result2$deviance/logistic_result2$null.deviance)
combined_results_r_squared
## [1] 0.1884289
This value shows us that the model is explaining a 18.84% of the variation in having cardiovascular disease.
1 - pchisq((logistic_result2$null.deviance - logistic_result2$deviance), df=(length(logistic_result2$coefficients)-1))
## [1] 1.460094e-09
This shows that the model is not significant in determining the chances of having cardiovascular disease.
Knowing that, I will now test the logistic regression to see what it predicts by giving a probability and ranking the individuals from lowest to highest.
predicted.data <- data.frame(
probability.of.CVD=logistic_result2$fitted.values,
CVD=CVD_data$CVD)
predicted.data <- predicted.data[
order(predicted.data$probability.of.CVD, decreasing=FALSE),]
predicted.data$rank <- 1:nrow(predicted.data)
predicted.data
## probability.of.CVD CVD rank
## 171 1.323406e-08 CVD not present 1
## 203 1.512670e-08 CVD not present 2
## 113 2.276382e-08 CVD not present 3
## 351 4.993091e-08 CVD not present 4
## 263 1.045041e-07 CVD not present 5
## 243 2.386973e-07 CVD not present 6
## 163 1.455867e-03 CVD not present 7
## 291 1.685866e-03 CVD not present 8
## 498 1.962301e-03 CVD not present 9
## 261 1.966324e-03 CVD not present 10
## 289 2.518218e-03 CVD not present 11
## 458 2.534607e-03 CVD not present 12
## 253 2.641263e-03 CVD not present 13
## 17 2.938874e-03 CVD not present 14
## 130 3.041093e-03 CVD not present 15
## 326 3.355540e-03 CVD not present 16
## 230 4.159822e-03 CVD not present 17
## 311 4.326886e-03 CVD not present 18
## 112 4.357958e-03 CVD not present 19
## 231 4.724561e-03 CVD not present 20
## 411 5.164675e-03 CVD not present 21
## 387 5.401740e-03 CVD not present 22
## 192 5.677857e-03 CVD not present 23
## 392 5.740418e-03 CVD not present 24
## 475 5.967429e-03 CVD not present 25
## 56 6.145550e-03 CVD not present 26
## 491 6.246051e-03 CVD not present 27
## 210 6.320270e-03 CVD not present 28
## 439 6.328592e-03 CVD not present 29
## 394 6.360927e-03 CVD not present 30
## 315 6.476039e-03 CVD not present 31
## 410 6.587267e-03 CVD not present 32
## 14 6.726842e-03 CVD not present 33
## 81 7.057380e-03 CVD not present 34
## 285 7.113714e-03 CVD not present 35
## 68 7.180186e-03 CVD not present 36
## 459 7.354692e-03 CVD not present 37
## 344 7.465367e-03 CVD not present 38
## 119 7.475385e-03 CVD not present 39
## 97 7.478722e-03 CVD not present 40
## 293 7.522060e-03 CVD not present 41
## 436 7.541336e-03 CVD not present 42
## 359 7.798575e-03 CVD not present 43
## 74 8.222091e-03 CVD not present 44
## 462 8.305238e-03 CVD not present 45
## 28 8.333112e-03 CVD not present 46
## 152 8.493932e-03 CVD not present 47
## 53 8.635725e-03 CVD not present 48
## 335 8.820286e-03 CVD not present 49
## 240 9.072278e-03 CVD not present 50
## 313 9.252271e-03 CVD not present 51
## 250 9.260020e-03 CVD not present 52
## 378 9.366934e-03 CVD not present 53
## 144 9.367121e-03 CVD not present 54
## 4 9.375974e-03 CVD not present 55
## 343 9.603553e-03 CVD not present 56
## 79 9.643106e-03 CVD present 57
## 44 1.004055e-02 CVD not present 58
## 367 1.048049e-02 CVD not present 59
## 433 1.059062e-02 CVD not present 60
## 361 1.084082e-02 CVD not present 61
## 299 1.089063e-02 CVD not present 62
## 98 1.095663e-02 CVD not present 63
## 175 1.129211e-02 CVD not present 64
## 444 1.131120e-02 CVD not present 65
## 204 1.153039e-02 CVD not present 66
## 453 1.161491e-02 CVD not present 67
## 217 1.166716e-02 CVD not present 68
## 201 1.183994e-02 CVD not present 69
## 280 1.231597e-02 CVD not present 70
## 235 1.236062e-02 CVD not present 71
## 488 1.275540e-02 CVD not present 72
## 430 1.350069e-02 CVD not present 73
## 395 1.362566e-02 CVD not present 74
## 375 1.365072e-02 CVD not present 75
## 48 1.369454e-02 CVD not present 76
## 205 1.387803e-02 CVD not present 77
## 139 1.402433e-02 CVD not present 78
## 138 1.406141e-02 CVD not present 79
## 19 1.436272e-02 CVD not present 80
## 233 1.454532e-02 CVD not present 81
## 34 1.475935e-02 CVD not present 82
## 137 1.481005e-02 CVD not present 83
## 161 1.489406e-02 CVD not present 84
## 413 1.493625e-02 CVD not present 85
## 66 1.499062e-02 CVD not present 86
## 3 1.531639e-02 CVD not present 87
## 95 1.531761e-02 CVD not present 88
## 118 1.534212e-02 CVD not present 89
## 259 1.548627e-02 CVD not present 90
## 187 1.558254e-02 CVD not present 91
## 420 1.563442e-02 CVD not present 92
## 448 1.594273e-02 CVD not present 93
## 356 1.622401e-02 CVD not present 94
## 78 1.649449e-02 CVD not present 95
## 211 1.649815e-02 CVD not present 96
## 428 1.657658e-02 CVD not present 97
## 412 1.659394e-02 CVD not present 98
## 116 1.659759e-02 CVD not present 99
## 421 1.666954e-02 CVD not present 100
## 449 1.675506e-02 CVD not present 101
## 43 1.689856e-02 CVD not present 102
## 455 1.695624e-02 CVD not present 103
## 176 1.716096e-02 CVD not present 104
## 377 1.752667e-02 CVD not present 105
## 148 1.753753e-02 CVD not present 106
## 237 1.754235e-02 CVD not present 107
## 298 1.754623e-02 CVD not present 108
## 151 1.772706e-02 CVD not present 109
## 128 1.779960e-02 CVD not present 110
## 297 1.788596e-02 CVD not present 111
## 499 1.789318e-02 CVD not present 112
## 389 1.803822e-02 CVD not present 113
## 272 1.811618e-02 CVD not present 114
## 71 1.816855e-02 CVD not present 115
## 11 1.819289e-02 CVD not present 116
## 276 1.829556e-02 CVD not present 117
## 154 1.833444e-02 CVD not present 118
## 305 1.843364e-02 CVD not present 119
## 102 1.873920e-02 CVD not present 120
## 397 1.892829e-02 CVD not present 121
## 325 1.920553e-02 CVD not present 122
## 108 1.934534e-02 CVD not present 123
## 288 1.945805e-02 CVD not present 124
## 86 1.961402e-02 CVD not present 125
## 106 1.980672e-02 CVD not present 126
## 355 2.003674e-02 CVD not present 127
## 274 2.005283e-02 CVD not present 128
## 189 2.015880e-02 CVD not present 129
## 117 2.083877e-02 CVD not present 130
## 103 2.086153e-02 CVD not present 131
## 142 2.108848e-02 CVD not present 132
## 29 2.109238e-02 CVD not present 133
## 184 2.111981e-02 CVD not present 134
## 247 2.135775e-02 CVD not present 135
## 401 2.136801e-02 CVD not present 136
## 114 2.145899e-02 CVD not present 137
## 22 2.148619e-02 CVD not present 138
## 179 2.182671e-02 CVD not present 139
## 271 2.207036e-02 CVD not present 140
## 90 2.234956e-02 CVD not present 141
## 471 2.255927e-02 CVD not present 142
## 62 2.314424e-02 CVD not present 143
## 73 2.314798e-02 CVD not present 144
## 126 2.320203e-02 CVD not present 145
## 404 2.334281e-02 CVD not present 146
## 423 2.358433e-02 CVD not present 147
## 245 2.375288e-02 CVD not present 148
## 169 2.384160e-02 CVD not present 149
## 296 2.396857e-02 CVD not present 150
## 109 2.399857e-02 CVD not present 151
## 140 2.414025e-02 CVD not present 152
## 129 2.416297e-02 CVD not present 153
## 67 2.424941e-02 CVD not present 154
## 481 2.428898e-02 CVD not present 155
## 222 2.454299e-02 CVD not present 156
## 115 2.475886e-02 CVD not present 157
## 275 2.493422e-02 CVD not present 158
## 292 2.498672e-02 CVD not present 159
## 358 2.522104e-02 CVD not present 160
## 290 2.525625e-02 CVD not present 161
## 84 2.542677e-02 CVD not present 162
## 49 2.548951e-02 CVD not present 163
## 380 2.552821e-02 CVD not present 164
## 434 2.568456e-02 CVD not present 165
## 407 2.575067e-02 CVD not present 166
## 31 2.584134e-02 CVD not present 167
## 284 2.601304e-02 CVD not present 168
## 55 2.617315e-02 CVD not present 169
## 157 2.656258e-02 CVD not present 170
## 346 2.710379e-02 CVD not present 171
## 96 2.731085e-02 CVD not present 172
## 480 2.732120e-02 CVD not present 173
## 69 2.750321e-02 CVD not present 174
## 348 2.765329e-02 CVD not present 175
## 265 2.767995e-02 CVD not present 176
## 178 2.801549e-02 CVD not present 177
## 460 2.811683e-02 CVD not present 178
## 372 2.813620e-02 CVD not present 179
## 496 2.813846e-02 CVD not present 180
## 493 2.847711e-02 CVD not present 181
## 345 2.849811e-02 CVD not present 182
## 360 2.865783e-02 CVD not present 183
## 193 2.882448e-02 CVD not present 184
## 76 2.938460e-02 CVD not present 185
## 445 2.944474e-02 CVD not present 186
## 262 2.952529e-02 CVD not present 187
## 362 2.992912e-02 CVD not present 188
## 282 3.020717e-02 CVD not present 189
## 77 3.047605e-02 CVD not present 190
## 213 3.060431e-02 CVD not present 191
## 191 3.094579e-02 CVD not present 192
## 82 3.118468e-02 CVD not present 193
## 391 3.119804e-02 CVD not present 194
## 174 3.127618e-02 CVD not present 195
## 419 3.180840e-02 CVD not present 196
## 365 3.200926e-02 CVD not present 197
## 2 3.214510e-02 CVD not present 198
## 160 3.281371e-02 CVD not present 199
## 388 3.288461e-02 CVD not present 200
## 314 3.370821e-02 CVD not present 201
## 153 3.381861e-02 CVD present 202
## 281 3.396345e-02 CVD not present 203
## 333 3.396623e-02 CVD not present 204
## 500 3.408916e-02 CVD not present 205
## 249 3.427827e-02 CVD not present 206
## 13 3.437051e-02 CVD not present 207
## 110 3.444140e-02 CVD not present 208
## 364 3.451841e-02 CVD not present 209
## 396 3.464478e-02 CVD not present 210
## 224 3.465793e-02 CVD not present 211
## 219 3.467907e-02 CVD not present 212
## 141 3.472666e-02 CVD not present 213
## 408 3.473585e-02 CVD not present 214
## 451 3.537703e-02 CVD not present 215
## 166 3.543741e-02 CVD not present 216
## 10 3.554029e-02 CVD not present 217
## 72 3.579182e-02 CVD not present 218
## 173 3.597849e-02 CVD not present 219
## 435 3.597861e-02 CVD not present 220
## 45 3.604945e-02 CVD not present 221
## 467 3.610420e-02 CVD present 222
## 464 3.635738e-02 CVD present 223
## 469 3.642409e-02 CVD not present 224
## 206 3.652338e-02 CVD not present 225
## 216 3.658513e-02 CVD not present 226
## 479 3.663013e-02 CVD not present 227
## 26 3.666283e-02 CVD not present 228
## 478 3.677071e-02 CVD not present 229
## 332 3.701063e-02 CVD not present 230
## 202 3.710771e-02 CVD not present 231
## 302 3.779995e-02 CVD not present 232
## 181 3.792862e-02 CVD not present 233
## 242 3.803694e-02 CVD not present 234
## 57 3.806221e-02 CVD not present 235
## 374 3.809277e-02 CVD not present 236
## 494 3.816816e-02 CVD not present 237
## 37 3.825395e-02 CVD not present 238
## 228 3.833366e-02 CVD present 239
## 194 3.849792e-02 CVD not present 240
## 324 3.902946e-02 CVD not present 241
## 267 3.979633e-02 CVD not present 242
## 195 3.991954e-02 CVD not present 243
## 239 4.015350e-02 CVD not present 244
## 236 4.030194e-02 CVD not present 245
## 156 4.030551e-02 CVD not present 246
## 417 4.046112e-02 CVD not present 247
## 234 4.050542e-02 CVD not present 248
## 366 4.074006e-02 CVD not present 249
## 354 4.110447e-02 CVD not present 250
## 418 4.123828e-02 CVD not present 251
## 316 4.127909e-02 CVD not present 252
## 482 4.136955e-02 CVD not present 253
## 446 4.138531e-02 CVD not present 254
## 405 4.233759e-02 CVD not present 255
## 427 4.241439e-02 CVD not present 256
## 353 4.261096e-02 CVD not present 257
## 25 4.269553e-02 CVD not present 258
## 450 4.281023e-02 CVD not present 259
## 226 4.361554e-02 CVD present 260
## 180 4.366741e-02 CVD not present 261
## 60 4.438952e-02 CVD not present 262
## 277 4.489913e-02 CVD not present 263
## 484 4.495033e-02 CVD not present 264
## 80 4.527007e-02 CVD not present 265
## 369 4.553691e-02 CVD not present 266
## 207 4.574428e-02 CVD not present 267
## 437 4.574582e-02 CVD not present 268
## 383 4.579240e-02 CVD not present 269
## 323 4.601383e-02 CVD not present 270
## 167 4.620381e-02 CVD not present 271
## 232 4.642670e-02 CVD not present 272
## 107 4.668131e-02 CVD not present 273
## 328 4.671578e-02 CVD not present 274
## 470 4.771483e-02 CVD not present 275
## 320 4.788784e-02 CVD not present 276
## 416 4.837369e-02 CVD not present 277
## 424 4.843620e-02 CVD present 278
## 101 4.865116e-02 CVD not present 279
## 386 4.891681e-02 CVD not present 280
## 456 4.942431e-02 CVD not present 281
## 438 5.024092e-02 CVD not present 282
## 170 5.026949e-02 CVD not present 283
## 182 5.027734e-02 CVD not present 284
## 54 5.051844e-02 CVD not present 285
## 309 5.063743e-02 CVD not present 286
## 497 5.091204e-02 CVD not present 287
## 465 5.127146e-02 CVD not present 288
## 143 5.136458e-02 CVD not present 289
## 492 5.155601e-02 CVD not present 290
## 51 5.245030e-02 CVD not present 291
## 104 5.259060e-02 CVD not present 292
## 155 5.319484e-02 CVD not present 293
## 16 5.332153e-02 CVD not present 294
## 39 5.393086e-02 CVD not present 295
## 165 5.407025e-02 CVD not present 296
## 241 5.411214e-02 CVD not present 297
## 468 5.419719e-02 CVD not present 298
## 483 5.551581e-02 CVD not present 299
## 476 5.559698e-02 CVD not present 300
## 390 5.580691e-02 CVD not present 301
## 295 5.585782e-02 CVD not present 302
## 258 5.600557e-02 CVD not present 303
## 373 5.620260e-02 CVD not present 304
## 100 5.636089e-02 CVD not present 305
## 136 5.701705e-02 CVD not present 306
## 131 5.747550e-02 CVD not present 307
## 172 5.755445e-02 CVD not present 308
## 447 5.797148e-02 CVD not present 309
## 61 5.798308e-02 CVD not present 310
## 485 5.828731e-02 CVD not present 311
## 9 5.875224e-02 CVD not present 312
## 38 5.960919e-02 CVD not present 313
## 381 5.991953e-02 CVD not present 314
## 83 6.131303e-02 CVD not present 315
## 495 6.132759e-02 CVD not present 316
## 65 6.136776e-02 CVD not present 317
## 349 6.156749e-02 CVD not present 318
## 286 6.186687e-02 CVD not present 319
## 15 6.232005e-02 CVD not present 320
## 135 6.232889e-02 CVD not present 321
## 319 6.258874e-02 CVD not present 322
## 489 6.345078e-02 CVD not present 323
## 342 6.400981e-02 CVD not present 324
## 229 6.496847e-02 CVD not present 325
## 105 6.702766e-02 CVD not present 326
## 94 6.709596e-02 CVD not present 327
## 337 6.724953e-02 CVD not present 328
## 287 6.810877e-02 CVD present 329
## 457 6.866776e-02 CVD not present 330
## 393 6.925821e-02 CVD present 331
## 303 6.948821e-02 CVD not present 332
## 431 6.968272e-02 CVD not present 333
## 452 6.993476e-02 CVD not present 334
## 27 7.006393e-02 CVD not present 335
## 89 7.012984e-02 CVD not present 336
## 329 7.015254e-02 CVD not present 337
## 252 7.126240e-02 CVD not present 338
## 35 7.263846e-02 CVD not present 339
## 257 7.271044e-02 CVD not present 340
## 146 7.342412e-02 CVD not present 341
## 279 7.395537e-02 CVD not present 342
## 125 7.449469e-02 CVD not present 343
## 133 7.522775e-02 CVD not present 344
## 47 7.569326e-02 CVD not present 345
## 33 7.620700e-02 CVD not present 346
## 168 7.881977e-02 CVD present 347
## 111 7.990713e-02 CVD not present 348
## 196 8.034969e-02 CVD not present 349
## 371 8.039220e-02 CVD not present 350
## 473 8.087123e-02 CVD not present 351
## 88 8.183412e-02 CVD not present 352
## 426 8.184388e-02 CVD not present 353
## 273 8.216459e-02 CVD not present 354
## 352 8.259951e-02 CVD not present 355
## 85 8.297434e-02 CVD not present 356
## 52 8.297785e-02 CVD present 357
## 398 8.335387e-02 CVD not present 358
## 466 8.593426e-02 CVD not present 359
## 336 8.616408e-02 CVD not present 360
## 199 8.785986e-02 CVD not present 361
## 145 8.813768e-02 CVD not present 362
## 32 8.850941e-02 CVD not present 363
## 440 8.858940e-02 CVD not present 364
## 301 8.918319e-02 CVD not present 365
## 190 9.084941e-02 CVD present 366
## 123 9.139330e-02 CVD not present 367
## 227 9.152216e-02 CVD not present 368
## 363 9.341712e-02 CVD not present 369
## 402 9.349731e-02 CVD not present 370
## 238 9.436740e-02 CVD not present 371
## 322 9.448547e-02 CVD present 372
## 442 9.529642e-02 CVD not present 373
## 1 9.530183e-02 CVD not present 374
## 36 9.633076e-02 CVD not present 375
## 46 9.644320e-02 CVD not present 376
## 99 9.871079e-02 CVD present 377
## 164 9.923685e-02 CVD not present 378
## 304 9.937907e-02 CVD not present 379
## 382 1.009113e-01 CVD not present 380
## 384 1.012894e-01 CVD not present 381
## 134 1.023927e-01 CVD not present 382
## 379 1.025681e-01 CVD not present 383
## 368 1.034047e-01 CVD not present 384
## 150 1.074194e-01 CVD not present 385
## 415 1.077120e-01 CVD not present 386
## 300 1.098628e-01 CVD not present 387
## 87 1.102313e-01 CVD present 388
## 8 1.105417e-01 CVD not present 389
## 338 1.114595e-01 CVD not present 390
## 132 1.119637e-01 CVD not present 391
## 18 1.122161e-01 CVD not present 392
## 127 1.123633e-01 CVD not present 393
## 422 1.128364e-01 CVD not present 394
## 370 1.133591e-01 CVD not present 395
## 429 1.138507e-01 CVD not present 396
## 340 1.143337e-01 CVD not present 397
## 24 1.148008e-01 CVD not present 398
## 188 1.170546e-01 CVD not present 399
## 318 1.192361e-01 CVD not present 400
## 350 1.196255e-01 CVD not present 401
## 266 1.199864e-01 CVD not present 402
## 409 1.201556e-01 CVD not present 403
## 486 1.231718e-01 CVD not present 404
## 474 1.233265e-01 CVD not present 405
## 385 1.233689e-01 CVD present 406
## 58 1.236641e-01 CVD not present 407
## 310 1.237278e-01 CVD not present 408
## 254 1.263121e-01 CVD not present 409
## 185 1.268417e-01 CVD not present 410
## 198 1.320112e-01 CVD not present 411
## 186 1.337822e-01 CVD present 412
## 149 1.351396e-01 CVD not present 413
## 463 1.377820e-01 CVD not present 414
## 248 1.386615e-01 CVD not present 415
## 12 1.401478e-01 CVD not present 416
## 41 1.415911e-01 CVD not present 417
## 122 1.418711e-01 CVD not present 418
## 339 1.446978e-01 CVD not present 419
## 256 1.450928e-01 CVD not present 420
## 330 1.452912e-01 CVD not present 421
## 260 1.459669e-01 CVD present 422
## 317 1.514495e-01 CVD not present 423
## 454 1.514667e-01 CVD present 424
## 357 1.527789e-01 CVD not present 425
## 75 1.558278e-01 CVD not present 426
## 7 1.581158e-01 CVD not present 427
## 183 1.592955e-01 CVD not present 428
## 490 1.593529e-01 CVD not present 429
## 50 1.604392e-01 CVD not present 430
## 443 1.616229e-01 CVD not present 431
## 162 1.652444e-01 CVD not present 432
## 403 1.668787e-01 CVD not present 433
## 425 1.679628e-01 CVD not present 434
## 294 1.706276e-01 CVD present 435
## 159 1.716440e-01 CVD not present 436
## 327 1.722274e-01 CVD not present 437
## 306 1.728609e-01 CVD not present 438
## 331 1.736515e-01 CVD not present 439
## 321 1.756876e-01 CVD present 440
## 441 1.765091e-01 CVD not present 441
## 5 1.771968e-01 CVD not present 442
## 399 1.796670e-01 CVD not present 443
## 20 1.810634e-01 CVD present 444
## 121 1.811259e-01 CVD not present 445
## 30 1.824388e-01 CVD not present 446
## 91 1.846595e-01 CVD not present 447
## 64 1.846691e-01 CVD not present 448
## 120 1.856395e-01 CVD present 449
## 251 1.904199e-01 CVD not present 450
## 177 1.945554e-01 CVD not present 451
## 244 1.985714e-01 CVD not present 452
## 223 1.988602e-01 CVD not present 453
## 220 2.007133e-01 CVD not present 454
## 221 2.014825e-01 CVD present 455
## 147 2.056699e-01 CVD not present 456
## 472 2.080110e-01 CVD present 457
## 268 2.082859e-01 CVD not present 458
## 212 2.110547e-01 CVD present 459
## 42 2.167397e-01 CVD not present 460
## 308 2.197568e-01 CVD not present 461
## 347 2.249088e-01 CVD present 462
## 59 2.255067e-01 CVD not present 463
## 21 2.295599e-01 CVD present 464
## 124 2.307909e-01 CVD not present 465
## 63 2.350661e-01 CVD not present 466
## 70 2.422824e-01 CVD not present 467
## 200 2.472681e-01 CVD not present 468
## 92 2.515550e-01 CVD not present 469
## 461 2.531873e-01 CVD not present 470
## 255 2.631905e-01 CVD present 471
## 414 2.679604e-01 CVD not present 472
## 312 2.701094e-01 CVD not present 473
## 376 2.781145e-01 CVD not present 474
## 334 2.872135e-01 CVD present 475
## 477 2.981657e-01 CVD not present 476
## 215 2.987071e-01 CVD present 477
## 283 3.025489e-01 CVD present 478
## 400 3.103002e-01 CVD present 479
## 197 3.149022e-01 CVD not present 480
## 487 3.249909e-01 CVD present 481
## 432 3.276797e-01 CVD not present 482
## 270 3.620430e-01 CVD present 483
## 214 3.650280e-01 CVD not present 484
## 209 3.756050e-01 CVD present 485
## 158 3.851480e-01 CVD not present 486
## 6 3.868460e-01 CVD not present 487
## 218 3.886224e-01 CVD not present 488
## 264 3.887433e-01 CVD not present 489
## 40 3.996418e-01 CVD not present 490
## 269 4.097495e-01 CVD not present 491
## 208 4.646498e-01 CVD not present 492
## 93 4.966941e-01 CVD present 493
## 23 5.006878e-01 CVD not present 494
## 246 5.205048e-01 CVD not present 495
## 278 5.662294e-01 CVD not present 496
## 406 5.841941e-01 CVD present 497
## 225 6.347600e-01 CVD present 498
## 307 6.674028e-01 CVD not present 499
## 341 7.135157e-01 CVD present 500
ggplot(data=predicted.data, aes(x=rank, y=probability.of.CVD)) +
geom_point(aes(color=CVD), alpha=1, shape=4, stroke=2) +
xlab("Index") +
ylab("Predicted probability of getting Cardiovascular disease")
This sigmoid curve shows how well the model is at predicting the probability of cardiovascular disease because as shown, the individuals with no CVD present are on the left side of the curve while the individuals with CVD are on the right side of the curve. This also shows that the model ranked them appropriately with obvious overlap in some values which is common due to errors.
To find out how well the model is good at classifying the individuals according to CVD present and CVD not present, I did a confusion matrix:
CVD_data$CVD_num <- ifelse(CVD_data$CVD == "CVD present", 1, 0)
predicted.probs <- logistic_result2$fitted.values
predicted.classes <- ifelse(predicted.probs > 0.5, 1, 0)
confusion_matrix_data <- table(
Predicted = factor(predicted.classes, levels = c(0, 1)),
Actual = factor(CVD_data$CVD_num, levels = c(0, 1))
)
confusion_matrix_data
## Actual
## Predicted 0 1
## 0 456 37
## 1 4 3
456 individuals actually did not have CVD and the model correctly predicted that they did not have CVD = True Negative
37 individuals actually did not have CVD but the model predicted that they did have CVD = False Positive
4 individuals actually did not have CVD but the model predicted that they have CVD = False Negative
3 individuals actually did have CVD and the model correctly predicted that they have CVD = True Positive
To calculate the performance metrics of the model we have to identify the true positives and negatives and the false positives and negatives:
TP <- 3
TN <- 456
FP <- 37
FN <- 4
accuracy <- (TP + TN) / (TP + TN + FP + FN)
sensitivity <- TP / (TP + FN)
specificity <- TN / (TN + FP)
precision <- TP / (TP + FP)
f1_score <- 2 * (precision * sensitivity) / (precision + sensitivity)
cat("Accuracy: ", round(accuracy, 4), "\n")
## Accuracy: 0.918
cat("Sensitivity: ", round(sensitivity, 4), "\n")
## Sensitivity: 0.4286
cat("Specificity: ", round(specificity, 4), "\n")
## Specificity: 0.9249
cat("Precision: ", round(precision, 4), "\n")
## Precision: 0.075
cat("F1 Score: ", round(f1_score, 4), "\n")
## F1 Score: 0.1277
The performance metrics above is showing us that:
Accuracy: The model has a 91.8% accuracy in making the right predictions.
Sensitivity: Of the actual positive values, the model was able to catch 42.86% of the true positives.
Specificity: The model was able to detect approximately 92.5% of the true negative values.
Precision: The model was able to detect only 7.5% of the values that were predicted as true positives from the actual positives.
F1 Score: Approximately 12.7% shows that the model did not do a good job at identifying true positives and instead produced many false positives and negatives.
roc_results <- roc(response = CVD_data$CVD,
predictor = logistic_result2$fitted.values,
levels = c("CVD not present", "CVD present"),
direction = "<")
auc_values_results <- auc(roc_results); auc_values_results
## Area under the curve: 0.8192
plot.roc(roc_results, print.auc = TRUE, legacy.axes = TRUE,
xlab = "False Positive Rate (1 - Specificity)",
ylab = "True Positive Rate (Sensitivity)")
The AUC = 0.819 means that the model has a 81.9% meaning that it is fairly good at differentiating between an individual who has CVD and one who does not.
In conclusion, I would say that the model does a good job at predicting individuals with CVD and those without, however, based on the factors I chose, it appeared that those factors were not significant enough to determine if the individual has CVD or not. This is why the model did not do a good job at showing precision and recall which is the F1 score. As we could see, initially, R-squared of the model was approximately = 0.188 when I was comparing CVD to gender in the beginning. However, once I added more predictors, the adjusted R-squared came to (1.460094c-09) which shows that they did not improve the model. The AIC dropped from 269.07 to 240.24 which shows that aside from gender, there were other predictors that were significant and slightly improved the model.
In the future, it would be better to choose variables that may have a higher significance to create a good model.