The dataset is originally from the National Institute of Diabetes and Kidney Diseases. All the patients are females of Pima Indian heritage (they are Native Americans). It has a dependent variable, which is the outcome, and this outcome depends on several variables, including pregnancies, blood pressure, age, skin thickness, insulin, BMI, and the diabetes pedigree function. The dataset includes several independent variables that are believed to influence the outcome (whether a person has diabetes).
A brief explanation of each variable is as follows:
Pregnancies: The number of times a patient has been pregnant. Higher pregnancy counts may increase the risk of diabetes due to hormonal changes and metabolic stress. The `plancenta produces hormone which makes it difficult for the insulin to regulate the glucose in the body.
Glucose: Higher glucose levels are directly associated with a greater likelihood of having diabetes.
Blood Pressure: Higher blood pressure may increase the chance of having diabetes because they share similar health risks.
Skin Thickness: More body fat can raise the risk of diabetes.
Insulin: Shows how well the body handles blood sugar. Unusual insulin levels may mean the body is not using sugar properly, which can lead to diabetes.
BMI (Body Mass Index): Estimates body fat based on height and weight. A higher BMI often means a higher risk of diabetes.
Diabetes Pedigree Function: Shows the family history of diabetes. A higher value means a stronger inherited risk.
Age: Older people are more likely to develop diabetes.
To gain a clear understanding of the factors associated with diabetes among Pima Indian women by cleaning, analyzing, and visualizing the dataset to reveal patterns and insights.
The objectives of this project is to:
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.4.3
## Warning: package 'tidyr' was built under R version 4.4.3
## Warning: package 'purrr' was built under R version 4.4.3
## Warning: package 'forcats' was built under R version 4.4.3
## Warning: package 'lubridate' was built under R version 4.4.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.1 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ purrr::%||%() masks base::%||%()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(corrplot)
## Warning: package 'corrplot' was built under R version 4.4.3
## corrplot 0.95 loaded
diabetes <- read.csv("C:\\Users\\user\\Desktop\\Saheed\\Diabetes\\diabetes.csv")
View(diabetes)
head(diabetes) # to check the first few head of my data
## Pregnancies Glucose BloodPressure SkinThickness Insulin BMI
## 1 6 148 72 35 0 33.6
## 2 1 85 66 29 0 26.6
## 3 8 183 64 0 0 23.3
## 4 1 89 66 23 94 28.1
## 5 0 137 40 35 168 43.1
## 6 5 116 74 0 0 25.6
## DiabetesPedigreeFunction Age Outcome
## 1 0.627 50 1
## 2 0.351 31 0
## 3 0.672 32 1
## 4 0.167 21 0
## 5 2.288 33 1
## 6 0.201 30 0
Checking for missing values
sapply(diabetes, function(x) sum(is.na(diabetes)))
## Pregnancies Glucose BloodPressure
## 0 0 0
## SkinThickness Insulin BMI
## 0 0 0
## DiabetesPedigreeFunction Age Outcome
## 0 0 0
dim(diabetes) # to know how many row and observation
## [1] 768 9
summary(diabetes) #to describe my data
## Pregnancies Glucose BloodPressure SkinThickness
## Min. : 0.000 Min. : 0.0 Min. : 0.00 Min. : 0.00
## 1st Qu.: 1.000 1st Qu.: 99.0 1st Qu.: 62.00 1st Qu.: 0.00
## Median : 3.000 Median :117.0 Median : 72.00 Median :23.00
## Mean : 3.845 Mean :120.9 Mean : 69.11 Mean :20.54
## 3rd Qu.: 6.000 3rd Qu.:140.2 3rd Qu.: 80.00 3rd Qu.:32.00
## Max. :17.000 Max. :199.0 Max. :122.00 Max. :99.00
## Insulin BMI DiabetesPedigreeFunction Age
## Min. : 0.0 Min. : 0.00 Min. :0.0780 Min. :21.00
## 1st Qu.: 0.0 1st Qu.:27.30 1st Qu.:0.2437 1st Qu.:24.00
## Median : 30.5 Median :32.00 Median :0.3725 Median :29.00
## Mean : 79.8 Mean :31.99 Mean :0.4719 Mean :33.24
## 3rd Qu.:127.2 3rd Qu.:36.60 3rd Qu.:0.6262 3rd Qu.:41.00
## Max. :846.0 Max. :67.10 Max. :2.4200 Max. :81.00
## Outcome
## Min. :0.000
## 1st Qu.:0.000
## Median :0.000
## Mean :0.349
## 3rd Qu.:1.000
## Max. :1.000
str(diabetes) #Used to check the structure of the dataset
## 'data.frame': 768 obs. of 9 variables:
## $ Pregnancies : int 6 1 8 1 0 5 3 10 2 8 ...
## $ Glucose : int 148 85 183 89 137 116 78 115 197 125 ...
## $ BloodPressure : int 72 66 64 66 40 74 50 0 70 96 ...
## $ SkinThickness : int 35 29 0 23 35 0 32 0 45 0 ...
## $ Insulin : int 0 0 0 94 168 0 88 0 543 0 ...
## $ BMI : num 33.6 26.6 23.3 28.1 43.1 25.6 31 35.3 30.5 0 ...
## $ DiabetesPedigreeFunction: num 0.627 0.351 0.672 0.167 2.288 ...
## $ Age : int 50 31 32 21 33 30 26 29 53 54 ...
## $ Outcome : int 1 0 1 0 1 0 1 0 1 1 ...
#Converting integers to numeric
diabetes$Pregnancies <- as.numeric(diabetes$Pregnancies)
diabetes$Glucose <- as.numeric(diabetes$Glucose)
diabetes$BloodPressure <- as.numeric(diabetes$BloodPressure)
diabetes$SkinThickness <- as.numeric(diabetes$SkinThickness)
diabetes$Insulin <- as.numeric(diabetes$Insulin)
diabetes$Age <- as.numeric(diabetes$Age)
diabetes$Outcome <- as.numeric(diabetes$Outcome)
diabetes$BMI <- as.numeric(diabetes$BMI)
str(diabetes)
## 'data.frame': 768 obs. of 9 variables:
## $ Pregnancies : num 6 1 8 1 0 5 3 10 2 8 ...
## $ Glucose : num 148 85 183 89 137 116 78 115 197 125 ...
## $ BloodPressure : num 72 66 64 66 40 74 50 0 70 96 ...
## $ SkinThickness : num 35 29 0 23 35 0 32 0 45 0 ...
## $ Insulin : num 0 0 0 94 168 0 88 0 543 0 ...
## $ BMI : num 33.6 26.6 23.3 28.1 43.1 25.6 31 35.3 30.5 0 ...
## $ DiabetesPedigreeFunction: num 0.627 0.351 0.672 0.167 2.288 ...
## $ Age : num 50 31 32 21 33 30 26 29 53 54 ...
## $ Outcome : num 1 0 1 0 1 0 1 0 1 1 ...
Checking for the count of each observation
#Checking for the count of each variables in my dataset
table(diabetes$Outcome)
##
## 0 1
## 500 268
table(diabetes$Pregnancies)
##
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17
## 111 135 103 75 68 57 50 45 38 28 24 11 9 10 2 1 1
table(diabetes$Glucose)
##
## 0 44 56 57 61 62 65 67 68 71 72 73 74 75 76 77 78 79 80 81
## 5 1 1 2 1 1 1 1 3 4 1 3 4 2 2 2 4 3 6 6
## 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101
## 3 6 10 7 3 7 9 6 11 9 9 7 7 13 8 9 3 17 17 9
## 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121
## 13 9 6 13 14 11 13 12 6 14 13 5 11 10 7 11 6 11 11 6
## 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141
## 12 9 11 14 9 5 11 14 7 5 5 5 6 4 8 8 5 8 5 5
## 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161
## 5 6 7 5 9 7 4 1 3 6 4 2 6 5 3 2 8 2 1 3
## 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181
## 6 3 3 4 3 3 4 1 2 3 1 6 2 2 2 1 1 5 5 5
## 182 183 184 186 187 188 189 190 191 193 194 195 196 197 198 199
## 1 3 3 1 4 2 4 1 1 2 3 2 3 4 1 1
table(diabetes$BloodPressure)
##
## 0 24 30 38 40 44 46 48 50 52 54 55 56 58 60 61 62 64 65 66
## 35 1 2 1 1 4 2 5 13 11 11 2 12 21 37 1 34 43 7 30
## 68 70 72 74 75 76 78 80 82 84 85 86 88 90 92 94 95 96 98 100
## 45 57 44 52 8 39 45 40 30 23 6 21 25 22 8 6 1 4 3 3
## 102 104 106 108 110 114 122
## 1 2 3 2 3 1 1
table(diabetes$SkinThickness)
##
## 0 7 8 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
## 227 2 2 5 6 7 11 6 14 6 14 20 18 13 10 16 22 12 16 16
## 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
## 23 20 17 27 19 31 20 8 15 14 16 7 18 16 15 11 6 5 6 8
## 47 48 49 50 51 52 54 56 60 63 99
## 4 4 3 3 1 2 2 1 1 1 1
table(diabetes$Insulin)
##
## 0 14 15 16 18 22 23 25 29 32 36 37 38 40 41 42 43 44 45 46
## 374 1 1 1 2 1 2 1 1 1 3 2 1 2 1 1 1 3 3 1
## 48 49 50 51 52 53 54 55 56 57 58 59 60 61 63 64 65 66 67 68
## 3 5 3 1 1 2 4 2 5 2 2 1 2 1 3 4 1 5 2 1
## 70 71 72 73 74 75 76 77 78 79 81 82 83 84 85 86 87 88 89 90
## 3 4 1 1 3 3 5 2 2 2 1 3 3 1 2 1 2 4 1 4
## 91 92 94 95 96 99 100 105 106 108 110 112 114 115 116 119 120 122 125 126
## 1 3 7 2 2 2 7 11 3 1 6 1 2 6 2 1 8 2 4 3
## 127 128 129 130 132 135 140 142 144 145 146 148 150 152 155 156 158 159 160 165
## 1 1 1 9 2 6 9 1 2 3 1 2 2 2 4 3 2 1 4 4
## 166 167 168 170 171 175 176 178 180 182 183 184 185 188 190 191 192 193 194 196
## 1 2 4 2 1 3 3 1 7 3 1 1 2 1 4 1 2 1 3 1
## 200 204 205 207 210 215 220 225 228 230 231 235 237 240 245 249 250 255 258 265
## 4 1 2 2 5 3 2 2 1 2 2 1 1 2 1 1 1 1 1 2
## 270 271 272 274 275 277 278 280 284 285 291 293 300 304 310 318 321 325 326 328
## 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 3 1 1
## 330 335 342 360 370 375 387 392 402 415 440 465 474 478 480 485 495 510 540 543
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1
## 545 579 600 680 744 846
## 1 1 1 1 1 1
table(diabetes$Age)
##
## 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
## 63 72 38 46 48 33 32 35 29 21 24 16 17 14 10 16 19 16 12 13 22 18 13 8 15 13
## 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 72 81
## 6 5 5 8 8 8 5 6 4 3 5 7 3 5 2 4 4 1 3 4 3 1 2 1 1 1
table(diabetes$DiabetesPedigreeFunction)
##
## 0.078 0.084 0.085 0.088 0.089 0.092 0.096 0.1 0.101 0.102 0.107 0.108 0.115
## 1 1 2 2 1 1 1 1 1 1 1 1 1
## 0.118 0.121 0.122 0.123 0.126 0.127 0.128 0.129 0.13 0.133 0.134 0.135 0.136
## 1 2 1 1 2 2 2 2 1 1 2 1 1
## 0.137 0.138 0.14 0.141 0.142 0.143 0.144 0.145 0.147 0.148 0.149 0.15 0.151
## 2 1 2 3 3 2 1 1 1 3 1 2 3
## 0.153 0.154 0.155 0.156 0.157 0.158 0.159 0.16 0.161 0.162 0.163 0.164 0.165
## 2 1 1 1 1 2 2 1 2 1 1 2 3
## 0.166 0.167 0.17 0.171 0.173 0.174 0.175 0.176 0.177 0.178 0.179 0.18 0.181
## 1 4 1 1 1 1 1 1 1 3 1 2 1
## 0.182 0.183 0.186 0.187 0.188 0.189 0.19 0.191 0.192 0.194 0.196 0.197 0.198
## 1 2 1 3 1 2 4 2 2 1 1 4 2
## 0.199 0.2 0.201 0.203 0.204 0.205 0.206 0.207 0.209 0.21 0.212 0.215 0.217
## 1 3 1 2 2 3 2 5 2 1 2 1 1
## 0.218 0.219 0.22 0.221 0.222 0.223 0.225 0.226 0.227 0.229 0.23 0.231 0.232
## 2 2 1 1 1 2 1 1 1 1 2 2 1
## 0.233 0.234 0.235 0.236 0.237 0.238 0.239 0.24 0.241 0.243 0.244 0.245 0.246
## 2 2 3 3 4 5 1 2 1 1 2 4 1
## 0.247 0.248 0.249 0.251 0.252 0.253 0.254 0.255 0.256 0.257 0.258 0.259 0.26
## 2 2 2 2 2 1 6 1 3 3 6 5 4
## 0.261 0.262 0.263 0.264 0.265 0.267 0.268 0.269 0.27 0.271 0.272 0.277 0.278
## 5 2 4 1 1 2 5 2 4 1 1 1 2
## 0.279 0.28 0.282 0.283 0.284 0.285 0.286 0.287 0.289 0.29 0.292 0.293 0.294
## 1 3 2 1 4 2 2 1 2 2 3 2 3
## 0.295 0.296 0.297 0.299 0.3 0.302 0.303 0.304 0.305 0.306 0.307 0.313 0.314
## 1 1 1 4 1 2 1 4 3 2 1 2 2
## 0.315 0.317 0.318 0.319 0.323 0.324 0.325 0.326 0.328 0.329 0.33 0.331 0.332
## 2 1 1 1 2 2 1 2 2 1 1 1 1
## 0.334 0.335 0.336 0.337 0.338 0.34 0.341 0.342 0.343 0.344 0.345 0.346 0.347
## 2 1 2 3 1 3 1 2 2 2 1 1 1
## 0.349 0.351 0.352 0.355 0.356 0.358 0.361 0.362 0.364 0.365 0.366 0.368 0.37
## 3 1 1 1 2 1 2 1 2 2 1 2 2
## 0.371 0.374 0.375 0.376 0.378 0.38 0.381 0.382 0.383 0.385 0.388 0.389 0.391
## 1 1 1 1 2 2 1 1 1 1 1 3 3
## 0.393 0.394 0.395 0.396 0.398 0.399 0.4 0.401 0.402 0.403 0.404 0.407 0.408
## 1 1 1 1 1 1 2 1 2 2 1 2 1
## 0.409 0.411 0.412 0.415 0.416 0.417 0.419 0.42 0.421 0.422 0.423 0.426 0.427
## 1 1 2 2 1 1 1 1 1 3 1 1 1
## 0.43 0.431 0.432 0.433 0.434 0.435 0.439 0.441 0.443 0.444 0.446 0.447 0.451
## 2 1 1 2 2 1 2 1 3 2 1 1 1
## 0.452 0.453 0.454 0.455 0.457 0.46 0.463 0.464 0.465 0.466 0.467 0.471 0.472
## 3 1 1 2 1 1 1 1 1 2 1 2 1
## 0.479 0.482 0.483 0.484 0.485 0.487 0.488 0.491 0.493 0.495 0.496 0.497 0.498
## 1 1 1 1 1 1 1 1 1 1 3 2 1
## 0.499 0.501 0.502 0.503 0.507 0.509 0.51 0.512 0.514 0.515 0.516 0.52 0.525
## 1 1 1 1 1 1 1 1 2 1 1 3 1
## 0.526 0.527 0.528 0.529 0.532 0.534 0.536 0.537 0.539 0.542 0.543 0.545 0.546
## 1 2 2 1 1 1 2 1 1 2 2 1 1
## 0.547 0.549 0.551 0.554 0.557 0.559 0.56 0.561 0.564 0.565 0.569 0.571 0.572
## 1 1 4 1 2 2 1 1 1 1 1 1 1
## 0.575 0.578 0.58 0.582 0.583 0.586 0.587 0.588 0.591 0.593 0.595 0.597 0.598
## 1 1 1 2 3 2 3 1 2 1 1 1 1
## 0.6 0.601 0.605 0.607 0.61 0.612 0.613 0.614 0.615 0.619 0.624 0.626 0.627
## 2 1 2 1 1 1 1 1 1 1 1 1 1
## 0.629 0.63 0.631 0.637 0.64 0.645 0.646 0.647 0.649 0.652 0.654 0.655 0.658
## 1 1 1 2 2 1 1 2 1 2 2 1 1
## 0.66 0.661 0.665 0.666 0.672 0.673 0.674 0.677 0.678 0.68 0.682 0.686 0.687
## 2 1 1 1 1 1 2 1 2 1 1 2 4
## 0.692 0.693 0.695 0.696 0.698 0.699 0.702 0.703 0.704 0.705 0.709 0.711 0.717
## 4 1 1 1 1 1 1 1 1 1 1 1 1
## 0.718 0.719 0.721 0.722 0.725 0.727 0.73 0.731 0.732 0.733 0.734 0.735 0.738
## 1 1 1 1 1 2 1 1 1 2 1 1 1
## 0.741 0.742 0.743 0.744 0.745 0.748 0.757 0.759 0.761 0.766 0.767 0.771 0.773
## 1 1 1 1 1 1 1 1 2 1 1 1 1
## 0.785 0.787 0.801 0.803 0.804 0.805 0.808 0.813 0.816 0.817 0.821 0.825 0.826
## 1 2 1 1 1 1 1 1 1 1 1 1 1
## 0.828 0.831 0.832 0.833 0.839 0.84 0.845 0.851 0.855 0.856 0.867 0.871 0.874
## 1 1 1 1 2 1 1 1 1 1 1 1 1
## 0.875 0.878 0.88 0.881 0.886 0.892 0.893 0.904 0.905 0.917 0.925 0.926 0.93
## 2 1 1 1 1 1 1 1 2 1 1 1 1
## 0.932 0.933 0.944 0.947 0.949 0.955 0.956 0.962 0.966 0.968 0.97 0.997 1.001
## 1 1 1 1 1 1 1 2 1 2 1 1 1
## 1.021 1.022 1.034 1.057 1.072 1.076 1.095 1.096 1.101 1.114 1.127 1.136 1.138
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 1.144 1.154 1.159 1.162 1.174 1.182 1.189 1.191 1.213 1.222 1.224 1.251 1.258
## 1 1 1 1 1 1 1 1 1 1 2 1 1
## 1.268 1.282 1.292 1.318 1.321 1.353 1.39 1.391 1.394 1.4 1.441 1.461 1.476
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 1.6 1.698 1.699 1.731 1.781 1.893 2.137 2.288 2.329 2.42
## 1 1 1 1 1 1 1 1 1 1
table(diabetes$BMI)
##
## 0 18.2 18.4 19.1 19.3 19.4 19.5 19.6 19.9 20 20.1 20.4 20.8 21 21.1 21.2
## 11 3 1 1 1 1 2 3 1 1 1 2 2 2 4 1
## 21.7 21.8 21.9 22.1 22.2 22.3 22.4 22.5 22.6 22.7 22.9 23 23.1 23.2 23.3 23.4
## 1 5 3 2 2 1 2 3 2 1 2 2 4 3 2 1
## 23.5 23.6 23.7 23.8 23.9 24 24.1 24.2 24.3 24.4 24.5 24.6 24.7 24.8 24.9 25
## 3 3 2 2 2 4 1 6 4 3 1 4 5 3 1 6
## 25.1 25.2 25.3 25.4 25.5 25.6 25.8 25.9 26 26.1 26.2 26.3 26.4 26.5 26.6 26.7
## 3 6 2 4 2 6 2 7 4 3 4 1 3 3 4 1
## 26.8 26.9 27 27.1 27.2 27.3 27.4 27.5 27.6 27.7 27.8 27.9 28 28.1 28.2 28.3
## 4 1 2 3 2 4 5 5 7 4 7 2 5 1 2 2
## 28.4 28.5 28.6 28.7 28.8 28.9 29 29.2 29.3 29.5 29.6 29.7 29.8 29.9 30 30.1
## 6 3 2 7 2 6 5 1 5 5 4 8 3 5 7 9
## 30.2 30.3 30.4 30.5 30.7 30.8 30.9 31 31.1 31.2 31.3 31.6 31.9 32 32.1 32.2
## 1 1 7 7 1 9 5 2 1 12 1 12 2 13 1 1
## 32.3 32.4 32.5 32.6 32.7 32.8 32.9 33.1 33.2 33.3 33.5 33.6 33.7 33.8 33.9 34
## 3 10 6 1 3 9 9 3 7 10 1 8 5 5 2 6
## 34.1 34.2 34.3 34.4 34.5 34.6 34.7 34.8 34.9 35 35.1 35.2 35.3 35.4 35.5 35.6
## 4 8 6 4 5 5 4 2 6 4 3 2 4 4 7 2
## 35.7 35.8 35.9 36 36.1 36.2 36.3 36.4 36.5 36.6 36.7 36.8 36.9 37 37.1 37.2
## 4 5 5 2 3 1 3 2 4 5 1 6 3 1 2 4
## 37.3 37.4 37.5 37.6 37.7 37.8 37.9 38 38.1 38.2 38.3 38.4 38.5 38.6 38.7 38.8
## 1 3 2 5 5 3 2 2 3 4 1 2 6 1 3 1
## 38.9 39 39.1 39.2 39.3 39.4 39.5 39.6 39.7 39.8 39.9 40 40.1 40.2 40.5 40.6
## 1 4 4 2 1 7 3 1 1 2 3 2 1 1 3 4
## 40.7 40.8 40.9 41 41.2 41.3 41.5 41.8 42 42.1 42.2 42.3 42.4 42.6 42.7 42.8
## 1 1 2 1 1 3 2 1 1 2 1 3 3 1 2 1
## 42.9 43.1 43.2 43.3 43.4 43.5 43.6 44 44.1 44.2 44.5 44.6 45 45.2 45.3 45.4
## 4 1 1 5 2 2 2 2 1 2 2 1 1 1 3 1
## 45.5 45.6 45.7 45.8 46.1 46.2 46.3 46.5 46.7 46.8 47.9 48.3 48.8 49.3 49.6 49.7
## 1 2 1 1 2 2 1 1 1 2 2 1 1 1 1 1
## 50 52.3 52.9 53.2 55 57.3 59.4 67.1
## 1 2 1 1 1 1 1 1
# Checking the pregnancy count for the women.
preg <- diabetes %>%
count(Pregnancies)
preg
## Pregnancies n
## 1 0 111
## 2 1 135
## 3 2 103
## 4 3 75
## 5 4 68
## 6 5 57
## 7 6 50
## 8 7 45
## 9 8 38
## 10 9 28
## 11 10 24
## 12 11 11
## 13 12 9
## 14 13 10
## 15 14 2
## 16 15 1
## 17 17 1
ggplot(preg,aes(x=Pregnancies,y=n))+
geom_col(fill="blue")+
geom_text(aes(label = n),vjust=-0.1)+
labs(title="Count of Pregnancies",
x="Pregnancies",
y="Count"
)+
theme_minimal()
This displays the number of females and how many times they have been pregnant.
out<-diabetes%>%
count(Outcome)
out
## Outcome n
## 1 0 500
## 2 1 268
ggplot(out,aes(x=Outcome,y=n,fill=Outcome))+
geom_col()+
geom_text(aes(label=n),vjust=-0.3)+
labs(title = "Count of Outcome",
x="Outcome",
y="Count"
)+
theme_minimal()
There are 268 diabetes patients in the dataset, meaning the non-diabetes group has the higher count.
Categorizing the pregnancies into groups and visualizing the pregnancy group to see how they affect the diabetes outcome.
Pregnancy_group <- cut(
diabetes$Pregnancies,
breaks = c(0, 6, 12, 17),
labels = c("Normal Pregnancy", "Moderate Pregnancy", "High Pregnancy"),
right = TRUE, include.lowest = TRUE
)
ggplot(diabetes, aes(Pregnancy_group, fill = factor(Outcome))) +
geom_bar(position = "dodge") +
labs(title="Pregnancy Group vs Diabetes Outcome",
x="Pregnancy Group", y="Count") +
theme_minimal()
In the Pima dataset, most women fall into normal pregnancy counts. Since this group is bigger, it will contain more diabetes cases, even if the percentage is not higher.
Categorizing age into groups and checking their count and visualizing to see how age affects the diabetes outcome.
Age_group <- cut(diabetes$Age,
breaks = c(0, 30, 45, 60, 81),
labels = c("Young", "Middle-aged", "Older Adult", "Elderly"),
right = TRUE)
table(diabetes$Age_group)
## < table of extent 0 >
ggplot(diabetes, aes(Age_group, fill = factor(Outcome))) +
geom_bar(position = "dodge") +
labs(title="Age Group vs Diabetes Outcome",
x="Age Group", y="Count") +
theme_minimal()
The young age group has the highest population of non diabetes patient while the people in the middle age has the highest number of diabetes patient.
Categorizing BMI (Body Mass Index) into groups and checking their count and visualizing to see how BMI affects the diabetes outcome.
diabetes$BMI <- as.numeric(diabetes$BMI)
diabetes$BMI
## [1] 33.6 26.6 23.3 28.1 43.1 25.6 31.0 35.3 30.5 0.0 37.6 38.0 27.1 30.1 25.8
## [16] 30.0 45.8 29.6 43.3 34.6 39.3 35.4 39.8 29.0 36.6 31.1 39.4 23.2 22.2 34.1
## [31] 36.0 31.6 24.8 19.9 27.6 24.0 33.2 32.9 38.2 37.1 34.0 40.2 22.7 45.4 27.4
## [46] 42.0 29.7 28.0 39.1 0.0 19.4 24.2 24.4 33.7 34.7 23.0 37.7 46.8 40.5 41.5
## [61] 0.0 32.9 25.0 25.4 32.8 29.0 32.5 42.7 19.6 28.9 32.9 28.6 43.4 35.1 32.0
## [76] 24.7 32.6 37.7 43.2 25.0 22.4 0.0 29.3 24.6 48.8 32.4 36.6 38.5 37.1 26.5
## [91] 19.1 32.0 46.7 23.8 24.7 33.9 31.6 20.4 28.7 49.7 39.0 26.1 22.5 26.6 39.6
## [106] 28.7 22.4 29.5 34.3 37.4 33.3 34.0 31.2 34.0 30.5 31.2 34.0 33.7 28.2 23.2
## [121] 53.2 34.2 33.6 26.8 33.3 55.0 42.9 33.3 34.5 27.9 29.7 33.3 34.5 38.3 21.1
## [136] 33.8 30.8 28.7 31.2 36.9 21.1 39.5 32.5 32.4 32.8 0.0 32.8 30.5 33.7 27.3
## [151] 37.4 21.9 34.3 40.6 47.9 50.0 24.6 25.2 29.0 40.9 29.7 37.2 44.2 29.7 31.6
## [166] 29.9 32.5 29.6 31.9 28.4 30.8 35.4 28.9 43.5 29.7 32.7 31.2 67.1 45.0 39.1
## [181] 23.2 34.9 27.7 26.8 27.6 35.9 30.1 32.0 27.9 31.6 22.6 33.1 30.4 52.3 24.4
## [196] 39.4 24.3 22.9 34.8 30.9 31.0 40.1 27.3 20.4 37.7 23.9 37.5 37.7 33.2 35.5
## [211] 27.7 42.8 34.2 42.6 34.2 41.8 35.8 30.0 29.0 37.8 34.6 31.6 25.2 28.8 23.6
## [226] 34.6 35.7 37.2 36.7 45.2 44.0 46.2 25.4 35.0 29.7 43.6 35.9 44.1 30.8 18.4
## [241] 29.2 33.1 25.6 27.1 38.2 30.0 31.2 52.3 35.4 30.1 31.2 28.0 24.4 35.8 27.6
## [256] 33.6 30.1 28.7 25.9 33.3 30.9 30.0 32.1 32.4 32.0 33.6 36.3 40.0 25.1 27.5
## [271] 45.6 25.2 23.0 33.2 34.2 40.5 26.5 27.8 24.9 25.3 37.9 35.9 32.4 30.4 27.0
## [286] 26.0 38.7 45.6 20.8 36.1 36.9 36.6 43.3 40.5 21.9 35.5 28.0 30.7 36.6 23.6
## [301] 32.3 31.6 35.8 52.9 21.0 39.7 25.5 24.8 30.5 32.9 26.2 39.4 26.6 29.5 35.9
## [316] 34.1 19.3 30.5 38.1 23.5 27.5 31.6 27.4 26.8 35.7 25.6 35.1 35.1 45.5 30.8
## [331] 23.1 32.7 43.3 23.6 23.9 47.9 33.8 31.2 34.2 39.9 25.9 25.9 32.0 34.7 36.8
## [346] 38.5 28.7 23.5 21.8 41.0 42.2 31.2 34.4 27.2 42.7 30.4 33.3 39.9 35.3 36.5
## [361] 31.2 29.8 39.2 38.5 34.9 34.0 27.6 21.0 27.5 32.8 38.4 0.0 35.8 34.9 36.2
## [376] 39.2 25.2 37.2 48.3 43.4 30.8 20.0 25.4 25.1 24.3 22.3 32.3 43.3 32.0 31.6
## [391] 32.0 45.7 23.7 22.1 32.9 27.7 24.7 34.3 21.1 34.9 32.0 24.2 35.0 31.6 32.9
## [406] 42.1 28.9 21.9 25.9 42.4 35.7 34.4 42.4 26.2 34.6 35.7 27.2 38.5 18.2 26.4
## [421] 45.3 26.0 40.6 30.8 42.9 37.0 0.0 34.1 40.6 35.0 22.2 30.4 30.0 25.6 24.5
## [436] 42.4 37.4 29.9 18.2 36.8 34.3 32.2 33.2 30.5 29.7 59.4 25.3 36.5 33.6 30.5
## [451] 21.2 28.9 39.9 19.6 37.8 33.6 26.7 30.2 37.6 25.9 20.8 21.8 35.3 27.6 24.0
## [466] 21.8 27.8 36.8 30.0 46.1 41.3 33.2 38.8 29.9 28.9 27.3 33.7 23.8 25.9 28.0
## [481] 35.5 35.2 27.8 38.2 44.2 42.3 40.7 46.5 25.6 26.1 36.8 33.5 32.8 28.9 0.0
## [496] 26.6 26.0 30.1 25.1 29.3 25.2 37.2 39.0 33.3 37.3 33.3 36.5 28.6 30.4 25.0
## [511] 29.7 22.1 24.2 27.3 25.6 31.6 30.3 37.6 32.8 19.6 25.0 33.2 0.0 34.2 31.6
## [526] 21.8 18.2 26.3 30.8 24.6 29.8 45.3 41.3 29.8 33.3 32.9 29.6 21.7 36.3 36.4
## [541] 39.4 32.4 34.9 39.5 32.0 34.5 43.6 33.1 32.8 28.5 27.4 31.9 27.8 29.9 36.9
## [556] 25.5 38.1 27.8 46.2 30.1 33.8 41.3 37.6 26.9 32.4 26.1 38.6 32.0 31.3 34.3
## [571] 32.5 22.6 29.5 34.7 30.1 35.5 24.0 42.9 27.0 34.7 42.1 25.0 26.5 38.7 28.7
## [586] 22.5 34.9 24.3 33.3 21.1 46.8 39.4 34.4 28.5 33.6 32.0 45.3 27.8 36.8 23.1
## [601] 27.1 23.7 27.8 35.2 28.4 35.8 40.0 19.5 41.5 24.0 30.9 32.9 38.2 32.5 36.1
## [616] 25.8 28.7 20.1 28.2 32.4 38.4 24.2 40.8 43.5 30.8 37.7 24.7 32.4 34.6 24.7
## [631] 27.4 34.5 26.2 27.5 25.9 31.2 28.8 31.6 40.9 19.5 29.3 34.3 29.5 28.0 27.6
## [646] 39.4 23.4 37.8 28.3 26.4 25.2 33.8 34.1 26.8 34.2 38.7 21.8 38.9 39.0 34.2
## [661] 27.7 42.9 37.6 37.9 33.7 34.8 32.5 27.5 34.0 30.9 33.6 25.4 35.5 57.3 35.6
## [676] 30.9 24.8 35.3 36.0 24.2 24.2 49.6 44.6 32.3 0.0 33.2 23.1 28.3 24.1 46.1
## [691] 24.6 42.3 39.1 38.5 23.5 30.4 29.9 25.0 34.5 44.5 35.9 27.6 35.0 38.5 28.4
## [706] 39.8 0.0 34.4 32.8 38.0 31.2 29.6 41.2 26.4 29.5 33.9 33.8 23.1 35.5 35.6
## [721] 29.3 38.1 29.3 39.1 32.8 39.4 36.1 32.4 22.9 30.1 28.4 28.4 44.5 29.0 23.3
## [736] 35.4 27.4 32.0 36.6 39.5 42.3 30.8 28.5 32.7 40.6 30.0 49.3 46.3 36.4 24.3
## [751] 31.2 39.0 26.0 43.3 32.4 36.5 32.0 36.3 37.5 35.5 28.4 44.0 22.5 32.9 36.8
## [766] 26.2 30.1 30.4
is.numeric(diabetes$BMI)
## [1] TRUE
diabetes$BMI_cat <- cut(
diabetes$BMI,
breaks = c(0,25,35, 40, 70),
labels = c("Underweight", "Normal", "Overweight", "Obese"),
right = FALSE
)
ggplot(diabetes, aes(BMI_cat, fill = factor(Outcome))) +
geom_bar(position = "dodge") +
labs(title="BMI vs Diabetes Outcome",
x="BMI category", y="Count") +
theme_minimal()
The normal BMI group has the highest number of diabetic patients simply because they are the most common group in the dataset.
Categorizing Skin thickness into groups and visualizing against diabetes outcome.
Skin_group <- cut(
diabetes$SkinThickness,
breaks = c(-Inf, 10, 20, Inf),
labels = c("Normal", "High", "Very High"),
include.lowest = TRUE
)
ggplot(diabetes,aes(Skin_group,fill=factor(Outcome))) +
geom_bar(position = "dodge") +
labs(title = "Skin thickness vs Diabetes Outcome",
x="Skin thickness", y="Count") +
theme_minimal()
The patient with high skin thickness has the highest number of diabetes patients followed by the patient with normal skin thickness.
Categorizing Glucose level into groups and visualizing against diabetes outcome.
diabetes$Glucose_cat <- cut(diabetes$Glucose, breaks = c(0,100,125,200),
labels = c("Normal","Prediabetic","Diabetic"),right = FALSE)
diabetes$Glucose_cat
## [1] Diabetic Normal Diabetic Normal Diabetic Prediabetic
## [7] Normal Prediabetic Diabetic Diabetic Prediabetic Diabetic
## [13] Diabetic Diabetic Diabetic Prediabetic Prediabetic Prediabetic
## [19] Prediabetic Prediabetic Diabetic Normal Diabetic Prediabetic
## [25] Diabetic Diabetic Diabetic Normal Diabetic Prediabetic
## [31] Prediabetic Diabetic Normal Normal Prediabetic Prediabetic
## [37] Diabetic Prediabetic Normal Prediabetic Diabetic Diabetic
## [43] Prediabetic Diabetic Diabetic Diabetic Diabetic Normal
## [49] Prediabetic Prediabetic Prediabetic Prediabetic Normal Diabetic
## [55] Diabetic Normal Diabetic Prediabetic Diabetic Prediabetic
## [61] Normal Diabetic Normal Diabetic Prediabetic Normal
## [67] Prediabetic Prediabetic Normal Diabetic Prediabetic Diabetic
## [73] Diabetic Diabetic Normal Normal Normal Normal
## [79] Diabetic Prediabetic Prediabetic Normal Normal Prediabetic
## [85] Diabetic Prediabetic Prediabetic Prediabetic Diabetic Prediabetic
## [91] Normal Prediabetic Normal Diabetic Diabetic Diabetic
## [97] Normal Normal Normal Prediabetic Diabetic Diabetic
## [103] Diabetic Normal Normal Diabetic Normal Diabetic
## [109] Normal Normal Diabetic Diabetic Normal Normal
## [115] Diabetic Diabetic Prediabetic Normal Normal Normal
## [121] Diabetic Prediabetic Prediabetic Diabetic Prediabetic Normal
## [127] Prediabetic Prediabetic Prediabetic Prediabetic Diabetic Prediabetic
## [133] Diabetic Normal Normal Diabetic Prediabetic Normal
## [139] Diabetic Prediabetic Diabetic Prediabetic Prediabetic Prediabetic
## [145] Diabetic Prediabetic Normal Prediabetic Diabetic Normal
## [151] Diabetic Prediabetic Diabetic Diabetic Diabetic Diabetic
## [157] Normal Prediabetic Normal Diabetic Diabetic Prediabetic
## [163] Prediabetic Prediabetic Diabetic Prediabetic Diabetic Prediabetic
## [169] Prediabetic Prediabetic Prediabetic Diabetic Normal Normal
## [175] Normal Diabetic Normal Diabetic Diabetic Diabetic
## [181] Normal Prediabetic Normal Normal Diabetic Diabetic
## [187] Diabetic Diabetic Prediabetic Diabetic Prediabetic Prediabetic
## [193] Diabetic Diabetic Normal Diabetic Prediabetic Prediabetic
## [199] Prediabetic Diabetic Prediabetic Diabetic Prediabetic Normal
## [205] Prediabetic Prediabetic Diabetic Diabetic Normal Diabetic
## [211] Normal Diabetic Diabetic Diabetic Prediabetic Diabetic
## [217] Prediabetic Diabetic Normal Prediabetic Diabetic Diabetic
## [223] Prediabetic Diabetic Prediabetic Normal Prediabetic Diabetic
## [229] Diabetic Prediabetic Diabetic Diabetic Normal Prediabetic
## [235] Normal Diabetic Diabetic Diabetic Diabetic Prediabetic
## [241] Normal Normal Diabetic Prediabetic Diabetic Diabetic
## [247] Prediabetic Diabetic Prediabetic Prediabetic Prediabetic Diabetic
## [253] Normal Normal Normal Prediabetic Prediabetic Prediabetic
## [259] Diabetic Diabetic Diabetic Diabetic Normal Diabetic
## [265] Prediabetic Normal Diabetic Diabetic Prediabetic Diabetic
## [271] Prediabetic Prediabetic Prediabetic Normal Prediabetic Prediabetic
## [277] Prediabetic Prediabetic Prediabetic Prediabetic Diabetic Diabetic
## [283] Diabetic Diabetic Prediabetic Diabetic Diabetic Prediabetic
## [289] Normal Prediabetic Normal Prediabetic Diabetic Diabetic
## [295] Diabetic Diabetic Diabetic Diabetic Prediabetic Prediabetic
## [301] Diabetic Diabetic Normal Prediabetic Diabetic Prediabetic
## [307] Diabetic Diabetic Diabetic Prediabetic Normal Prediabetic
## [313] Diabetic Prediabetic Prediabetic Prediabetic Normal Diabetic
## [319] Prediabetic Diabetic Diabetic Prediabetic Prediabetic Diabetic
## [325] Prediabetic Diabetic Prediabetic Diabetic Prediabetic Prediabetic
## [331] Prediabetic Normal Diabetic Prediabetic Normal Diabetic
## [337] Prediabetic Prediabetic Diabetic Diabetic Diabetic Normal
## [343] Normal Prediabetic Normal Diabetic Diabetic Prediabetic
## [349] Normal Normal Normal Diabetic Normal Normal
## [355] Normal Diabetic Diabetic Diabetic Normal Diabetic
## [361] Diabetic Diabetic Prediabetic Diabetic Diabetic Normal
## [367] Prediabetic Prediabetic Normal Diabetic Diabetic Prediabetic
## [373] Normal Prediabetic Prediabetic Diabetic Normal Normal
## [379] Diabetic Normal Prediabetic Prediabetic Prediabetic Normal
## [385] Diabetic Prediabetic Prediabetic Prediabetic Diabetic Prediabetic
## [391] Prediabetic Diabetic Diabetic Prediabetic Diabetic Diabetic
## [397] Normal Diabetic Normal Diabetic Normal Diabetic
## [403] Diabetic Normal Diabetic Prediabetic Prediabetic Prediabetic
## [409] Diabetic Diabetic Prediabetic Prediabetic Diabetic Diabetic
## [415] Diabetic Diabetic Normal Diabetic Normal Diabetic
## [421] Prediabetic Normal Prediabetic Prediabetic Diabetic Diabetic
## [427] Normal Diabetic Diabetic Normal Normal Normal
## [433] Normal Diabetic Normal Diabetic Diabetic Diabetic
## [439] Normal Prediabetic Diabetic Normal Prediabetic Prediabetic
## [445] Prediabetic Diabetic Prediabetic Normal Prediabetic Prediabetic
## [451] Normal Diabetic Normal Prediabetic Prediabetic Diabetic
## [457] Diabetic Normal Diabetic Diabetic Prediabetic Normal
## [463] Normal Normal Prediabetic Prediabetic Normal Normal
## [469] Prediabetic Diabetic Diabetic Diabetic Prediabetic Diabetic
## [475] Prediabetic Diabetic Prediabetic Prediabetic Diabetic Diabetic
## [481] Diabetic Prediabetic Normal Normal Diabetic Diabetic
## [487] Diabetic Diabetic Normal Diabetic Normal Normal
## [493] Normal Diabetic Normal Diabetic Prediabetic Normal
## [499] Diabetic Diabetic Prediabetic Normal Normal Normal
## [505] Normal Normal Diabetic Diabetic Normal Prediabetic
## [511] Normal Diabetic Normal Normal Normal Diabetic
## [517] Diabetic Diabetic Normal Diabetic Normal Prediabetic
## [523] Prediabetic Diabetic Diabetic Normal Normal Prediabetic
## [529] Prediabetic Prediabetic Prediabetic Prediabetic Normal Normal
## [535] Normal Diabetic Prediabetic Normal Diabetic Diabetic
## [541] Prediabetic Diabetic Normal Normal Normal Diabetic
## [547] Diabetic Diabetic Diabetic Diabetic Prediabetic Normal
## [553] Prediabetic Normal Normal Prediabetic Normal Prediabetic
## [559] Prediabetic Normal Diabetic Diabetic Normal Normal
## [565] Normal Normal Normal Normal Diabetic Prediabetic
## [571] Normal Diabetic Prediabetic Normal Diabetic Prediabetic
## [577] Prediabetic Prediabetic Diabetic Diabetic Diabetic Prediabetic
## [583] Prediabetic Prediabetic Prediabetic Normal Diabetic Prediabetic
## [589] Diabetic Normal Prediabetic Prediabetic Diabetic Normal
## [595] Prediabetic Diabetic Normal Normal Diabetic Prediabetic
## [601] Prediabetic Normal Prediabetic Diabetic Diabetic Prediabetic
## [607] Diabetic Normal Diabetic Prediabetic Prediabetic Diabetic
## [613] Diabetic Prediabetic Diabetic Prediabetic Prediabetic Normal
## [619] Prediabetic Prediabetic Prediabetic Normal Diabetic Normal
## [625] Prediabetic Normal Diabetic Diabetic Diabetic Normal
## [631] Prediabetic Prediabetic Prediabetic Diabetic Normal Prediabetic
## [637] Prediabetic Normal Normal Prediabetic Prediabetic Diabetic
## [643] Diabetic Normal Prediabetic Diabetic Diabetic Diabetic
## [649] Diabetic Prediabetic Normal Prediabetic Prediabetic Prediabetic
## [655] Prediabetic Diabetic Prediabetic Prediabetic Diabetic Normal
## [661] Diabetic Diabetic Diabetic Diabetic Prediabetic Prediabetic
## [667] Diabetic Prediabetic Normal Diabetic Diabetic Normal
## [673] Normal Prediabetic Normal Diabetic Diabetic Normal
## [679] Prediabetic Prediabetic Normal Diabetic Normal Diabetic
## [685] Diabetic Diabetic Diabetic Prediabetic Diabetic Diabetic
## [691] Prediabetic Diabetic Prediabetic Diabetic Normal Diabetic
## [697] Diabetic Normal Diabetic Prediabetic Prediabetic Diabetic
## [703] Diabetic Diabetic Prediabetic Normal Prediabetic Diabetic
## [709] Diabetic Normal Diabetic Diabetic Diabetic Diabetic
## [715] Prediabetic Diabetic Diabetic Normal Prediabetic Normal
## [721] Normal Prediabetic Diabetic Prediabetic Prediabetic Prediabetic
## [727] Prediabetic Diabetic Diabetic Normal Diabetic Prediabetic
## [733] Diabetic Prediabetic Prediabetic Normal Diabetic Normal
## [739] Normal Prediabetic Prediabetic Prediabetic Prediabetic Diabetic
## [745] Diabetic Prediabetic Diabetic Normal Diabetic Diabetic
## [751] Diabetic Prediabetic Prediabetic Diabetic Diabetic Diabetic
## [757] Diabetic Prediabetic Prediabetic Diabetic Normal Diabetic
## [763] Normal Prediabetic Prediabetic Prediabetic Diabetic Normal
## Levels: Normal Prediabetic Diabetic
ggplot(diabetes,aes(Glucose_cat,fill=factor(Outcome))) +
geom_bar(position="dodge") +
labs(title="Glucose level vs Diabetes Outcome",
x="Glucose category", y="Count") +
theme_minimal()
As glucose levels rise, there is likelihood for the patient to be diabetic.
Categorizing Blood Pressure into Groups and visualizing against diabetes outcome.
diabetes$BloodPressure_cat <- cut(diabetes$BloodPressure, breaks = c(-Inf, 79, 89, Inf),
labels = c("Normal", "Prehypertension", "Hypertension"),
right = TRUE
)
table(diabetes$BloodPressure)
##
## 0 24 30 38 40 44 46 48 50 52 54 55 56 58 60 61 62 64 65 66
## 35 1 2 1 1 4 2 5 13 11 11 2 12 21 37 1 34 43 7 30
## 68 70 72 74 75 76 78 80 82 84 85 86 88 90 92 94 95 96 98 100
## 45 57 44 52 8 39 45 40 30 23 6 21 25 22 8 6 1 4 3 3
## 102 104 106 108 110 114 122
## 1 2 3 2 3 1 1
ggplot(diabetes,aes(BloodPressure_cat,fill=factor(Outcome))) +
geom_bar(position="dodge") +
labs(title="Blood Pressure vs Diabetes Outcome",
x="Blood Pressure", y="Count") +
theme_minimal()
Even though high blood pressure can increase the risk of diabetes, most people in the dataset have normal blood pressure. That’s why the largest number of diabetes cases is seen in the normal BP group.
Categorizing and visualizing for insulin levels
diabetes$Insulin[diabetes$Insulin > 300] <- 300
diabetes[is.na(diabetes)] <- 0
diabetes$Insulin_cat <- cut(
diabetes$Insulin,
breaks = c(-Inf, 25, 100, 300, Inf),
labels = c("Low", "Normal", "High", "Extremely High"),
right = TRUE,
include.lowest = TRUE
)
ggplot(diabetes, aes(Insulin_cat, fill = factor(Outcome))) +
geom_bar(position = "dodge") +
labs(
title = "Insulin Categories vs Diabetes Outcome",
x = "Insulin Category",
y = "Count",
fill = "Outcome"
) +
theme_minimal()
In Type 1 diabetes, the body cannot make enough insulin, so insulin levels are low. In early Type 2 diabetes, the body becomes resistant to insulin, so it produces extra insulin to control blood sugar
#Visualizing and grouping for pedigree function
diabetes$PedigreeGroup <- cut(
diabetes$DiabetesPedigreeFunction,
breaks = c(0, 0.2, 0.5, 1.0, Inf),
labels = c("Very Low", "Low", "Moderate", "High"),
include.lowest = TRUE
)
ggplot(diabetes, aes(x = PedigreeGroup, fill = as.factor(Outcome))) +
geom_bar(position = "dodge") +
labs(
title = "Pedigree Category vs Diabetes Outcome",
x = "Pedigree Level",
fill = "Outcome"
) +
theme_minimal()
The Diabetes Pedigree Function shows how much family history may increase your risk of diabetes, but it’s not the main cause. Things like diet, exercise, weight, and lifestyle usually have a bigger impact. So even people with low or moderate family risk can still get diabetes, and not everyone with high family risk will develop it.
numeric_data <- diabetes[sapply(diabetes, is.numeric)]
cor_matrix <- cor(numeric_data, use = "complete.obs")
cor_matrix
## Pregnancies Glucose BloodPressure SkinThickness
## Pregnancies 1.00000000 0.12945867 0.14128198 -0.08167177
## Glucose 0.12945867 1.00000000 0.15258959 0.05732789
## BloodPressure 0.14128198 0.15258959 1.00000000 0.20737054
## SkinThickness -0.08167177 0.05732789 0.20737054 1.00000000
## Insulin -0.07823754 0.31048270 0.10225530 0.48823294
## BMI 0.01768309 0.22107107 0.28180529 0.39257320
## DiabetesPedigreeFunction -0.03352267 0.13733730 0.04126495 0.18392757
## Age 0.54434123 0.26351432 0.23952795 -0.11397026
## Outcome 0.22189815 0.46658140 0.06506836 0.07475223
## Insulin BMI DiabetesPedigreeFunction
## Pregnancies -0.07823754 0.01768309 -0.03352267
## Glucose 0.31048270 0.22107107 0.13733730
## BloodPressure 0.10225530 0.28180529 0.04126495
## SkinThickness 0.48823294 0.39257320 0.18392757
## Insulin 1.00000000 0.20909688 0.18734209
## BMI 0.20909688 1.00000000 0.14064695
## DiabetesPedigreeFunction 0.18734209 0.14064695 1.00000000
## Age -0.06880310 0.03624187 0.03356131
## Outcome 0.12346817 0.29269466 0.17384407
## Age Outcome
## Pregnancies 0.54434123 0.22189815
## Glucose 0.26351432 0.46658140
## BloodPressure 0.23952795 0.06506836
## SkinThickness -0.11397026 0.07475223
## Insulin -0.06880310 0.12346817
## BMI 0.03624187 0.29269466
## DiabetesPedigreeFunction 0.03356131 0.17384407
## Age 1.00000000 0.23835598
## Outcome 0.23835598 1.00000000
corrplot(cor_matrix, method = "color",
type = "upper",
col=colorRampPalette(c("blue","white","red"))(200),
addCoef.col = "black",
tl.col = "black",
number.cex = 0.7,
tl.srt = 45
)
The following steps should be taken to help reduce or monitor diabetes:
Monitor blood glucose levels regularly
Maintain a healthy body weight
Adopt a balanced diet and regular physical activity
Attend regular medical checkups
Increase education and awareness about diabetes risk factors like avoiding much intake of sugar and alcohol.