Diabetes Dataset

About the Dataset

The dataset is originally from the National Institute of Diabetes and Kidney Diseases. All the patients are females of Pima Indian heritage (they are Native Americans). It has a dependent variable, which is the outcome, and this outcome depends on several variables, including pregnancies, blood pressure, age, skin thickness, insulin, BMI, and the diabetes pedigree function. The dataset includes several independent variables that are believed to influence the outcome (whether a person has diabetes).

Explaining the variables

A brief explanation of each variable is as follows:

  • Pregnancies: The number of times a patient has been pregnant. Higher pregnancy counts may increase the risk of diabetes due to hormonal changes and metabolic stress. The `plancenta produces hormone which makes it difficult for the insulin to regulate the glucose in the body.

  • Glucose: Higher glucose levels are directly associated with a greater likelihood of having diabetes.

  • Blood Pressure: Higher blood pressure may increase the chance of having diabetes because they share similar health risks.

  • Skin Thickness: More body fat can raise the risk of diabetes.

  • Insulin: Shows how well the body handles blood sugar. Unusual insulin levels may mean the body is not using sugar properly, which can lead to diabetes.

  • BMI (Body Mass Index): Estimates body fat based on height and weight. A higher BMI often means a higher risk of diabetes.

  • Diabetes Pedigree Function: Shows the family history of diabetes. A higher value means a stronger inherited risk.

  • Age: Older people are more likely to develop diabetes.

AIM

To gain a clear understanding of the factors associated with diabetes among Pima Indian women by cleaning, analyzing, and visualizing the dataset to reveal patterns and insights.

QUESTIONS
  • Are there any missing, incorrect, or inconsistent values in the diabetes dataset?
  • Which variables show differences between diabetic and non-diabetic individuals when visualized?
OBJECTIVES

The objectives of this project is to:

  • ensure the dataset is clean, accurate, and well-prepared for analysis.
  • visualize important variables that may influence diabetes outcomes.

Loading the libraries

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.4.3
## Warning: package 'tidyr' was built under R version 4.4.3
## Warning: package 'purrr' was built under R version 4.4.3
## Warning: package 'forcats' was built under R version 4.4.3
## Warning: package 'lubridate' was built under R version 4.4.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ purrr::%||%()   masks base::%||%()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(corrplot)
## Warning: package 'corrplot' was built under R version 4.4.3
## corrplot 0.95 loaded

Importing dataset

diabetes <- read.csv("C:\\Users\\user\\Desktop\\Saheed\\Diabetes\\diabetes.csv")
View(diabetes)
head(diabetes) # to check the first few head of my data
##   Pregnancies Glucose BloodPressure SkinThickness Insulin  BMI
## 1           6     148            72            35       0 33.6
## 2           1      85            66            29       0 26.6
## 3           8     183            64             0       0 23.3
## 4           1      89            66            23      94 28.1
## 5           0     137            40            35     168 43.1
## 6           5     116            74             0       0 25.6
##   DiabetesPedigreeFunction Age Outcome
## 1                    0.627  50       1
## 2                    0.351  31       0
## 3                    0.672  32       1
## 4                    0.167  21       0
## 5                    2.288  33       1
## 6                    0.201  30       0

Checking for missing values

sapply(diabetes, function(x) sum(is.na(diabetes)))
##              Pregnancies                  Glucose            BloodPressure 
##                        0                        0                        0 
##            SkinThickness                  Insulin                      BMI 
##                        0                        0                        0 
## DiabetesPedigreeFunction                      Age                  Outcome 
##                        0                        0                        0
dim(diabetes) # to know how many row and observation
## [1] 768   9
summary(diabetes) #to describe my data
##   Pregnancies        Glucose      BloodPressure    SkinThickness  
##  Min.   : 0.000   Min.   :  0.0   Min.   :  0.00   Min.   : 0.00  
##  1st Qu.: 1.000   1st Qu.: 99.0   1st Qu.: 62.00   1st Qu.: 0.00  
##  Median : 3.000   Median :117.0   Median : 72.00   Median :23.00  
##  Mean   : 3.845   Mean   :120.9   Mean   : 69.11   Mean   :20.54  
##  3rd Qu.: 6.000   3rd Qu.:140.2   3rd Qu.: 80.00   3rd Qu.:32.00  
##  Max.   :17.000   Max.   :199.0   Max.   :122.00   Max.   :99.00  
##     Insulin           BMI        DiabetesPedigreeFunction      Age       
##  Min.   :  0.0   Min.   : 0.00   Min.   :0.0780           Min.   :21.00  
##  1st Qu.:  0.0   1st Qu.:27.30   1st Qu.:0.2437           1st Qu.:24.00  
##  Median : 30.5   Median :32.00   Median :0.3725           Median :29.00  
##  Mean   : 79.8   Mean   :31.99   Mean   :0.4719           Mean   :33.24  
##  3rd Qu.:127.2   3rd Qu.:36.60   3rd Qu.:0.6262           3rd Qu.:41.00  
##  Max.   :846.0   Max.   :67.10   Max.   :2.4200           Max.   :81.00  
##     Outcome     
##  Min.   :0.000  
##  1st Qu.:0.000  
##  Median :0.000  
##  Mean   :0.349  
##  3rd Qu.:1.000  
##  Max.   :1.000
str(diabetes) #Used to check the structure of the dataset
## 'data.frame':    768 obs. of  9 variables:
##  $ Pregnancies             : int  6 1 8 1 0 5 3 10 2 8 ...
##  $ Glucose                 : int  148 85 183 89 137 116 78 115 197 125 ...
##  $ BloodPressure           : int  72 66 64 66 40 74 50 0 70 96 ...
##  $ SkinThickness           : int  35 29 0 23 35 0 32 0 45 0 ...
##  $ Insulin                 : int  0 0 0 94 168 0 88 0 543 0 ...
##  $ BMI                     : num  33.6 26.6 23.3 28.1 43.1 25.6 31 35.3 30.5 0 ...
##  $ DiabetesPedigreeFunction: num  0.627 0.351 0.672 0.167 2.288 ...
##  $ Age                     : int  50 31 32 21 33 30 26 29 53 54 ...
##  $ Outcome                 : int  1 0 1 0 1 0 1 0 1 1 ...

Converting integers to numeric

#Converting integers to numeric
diabetes$Pregnancies <- as.numeric(diabetes$Pregnancies)
diabetes$Glucose <- as.numeric(diabetes$Glucose)
diabetes$BloodPressure <- as.numeric(diabetes$BloodPressure)
diabetes$SkinThickness <- as.numeric(diabetes$SkinThickness)
diabetes$Insulin <- as.numeric(diabetes$Insulin)
diabetes$Age <- as.numeric(diabetes$Age)
diabetes$Outcome <- as.numeric(diabetes$Outcome)
diabetes$BMI <- as.numeric(diabetes$BMI)
str(diabetes)
## 'data.frame':    768 obs. of  9 variables:
##  $ Pregnancies             : num  6 1 8 1 0 5 3 10 2 8 ...
##  $ Glucose                 : num  148 85 183 89 137 116 78 115 197 125 ...
##  $ BloodPressure           : num  72 66 64 66 40 74 50 0 70 96 ...
##  $ SkinThickness           : num  35 29 0 23 35 0 32 0 45 0 ...
##  $ Insulin                 : num  0 0 0 94 168 0 88 0 543 0 ...
##  $ BMI                     : num  33.6 26.6 23.3 28.1 43.1 25.6 31 35.3 30.5 0 ...
##  $ DiabetesPedigreeFunction: num  0.627 0.351 0.672 0.167 2.288 ...
##  $ Age                     : num  50 31 32 21 33 30 26 29 53 54 ...
##  $ Outcome                 : num  1 0 1 0 1 0 1 0 1 1 ...

Checking for the count of each observation

#Checking for the count of each variables in my dataset
table(diabetes$Outcome)
## 
##   0   1 
## 500 268
table(diabetes$Pregnancies)
## 
##   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  17 
## 111 135 103  75  68  57  50  45  38  28  24  11   9  10   2   1   1
table(diabetes$Glucose)
## 
##   0  44  56  57  61  62  65  67  68  71  72  73  74  75  76  77  78  79  80  81 
##   5   1   1   2   1   1   1   1   3   4   1   3   4   2   2   2   4   3   6   6 
##  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100 101 
##   3   6  10   7   3   7   9   6  11   9   9   7   7  13   8   9   3  17  17   9 
## 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 
##  13   9   6  13  14  11  13  12   6  14  13   5  11  10   7  11   6  11  11   6 
## 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 
##  12   9  11  14   9   5  11  14   7   5   5   5   6   4   8   8   5   8   5   5 
## 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 
##   5   6   7   5   9   7   4   1   3   6   4   2   6   5   3   2   8   2   1   3 
## 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 
##   6   3   3   4   3   3   4   1   2   3   1   6   2   2   2   1   1   5   5   5 
## 182 183 184 186 187 188 189 190 191 193 194 195 196 197 198 199 
##   1   3   3   1   4   2   4   1   1   2   3   2   3   4   1   1
table(diabetes$BloodPressure)
## 
##   0  24  30  38  40  44  46  48  50  52  54  55  56  58  60  61  62  64  65  66 
##  35   1   2   1   1   4   2   5  13  11  11   2  12  21  37   1  34  43   7  30 
##  68  70  72  74  75  76  78  80  82  84  85  86  88  90  92  94  95  96  98 100 
##  45  57  44  52   8  39  45  40  30  23   6  21  25  22   8   6   1   4   3   3 
## 102 104 106 108 110 114 122 
##   1   2   3   2   3   1   1
table(diabetes$SkinThickness)
## 
##   0   7   8  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26 
## 227   2   2   5   6   7  11   6  14   6  14  20  18  13  10  16  22  12  16  16 
##  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46 
##  23  20  17  27  19  31  20   8  15  14  16   7  18  16  15  11   6   5   6   8 
##  47  48  49  50  51  52  54  56  60  63  99 
##   4   4   3   3   1   2   2   1   1   1   1
table(diabetes$Insulin)
## 
##   0  14  15  16  18  22  23  25  29  32  36  37  38  40  41  42  43  44  45  46 
## 374   1   1   1   2   1   2   1   1   1   3   2   1   2   1   1   1   3   3   1 
##  48  49  50  51  52  53  54  55  56  57  58  59  60  61  63  64  65  66  67  68 
##   3   5   3   1   1   2   4   2   5   2   2   1   2   1   3   4   1   5   2   1 
##  70  71  72  73  74  75  76  77  78  79  81  82  83  84  85  86  87  88  89  90 
##   3   4   1   1   3   3   5   2   2   2   1   3   3   1   2   1   2   4   1   4 
##  91  92  94  95  96  99 100 105 106 108 110 112 114 115 116 119 120 122 125 126 
##   1   3   7   2   2   2   7  11   3   1   6   1   2   6   2   1   8   2   4   3 
## 127 128 129 130 132 135 140 142 144 145 146 148 150 152 155 156 158 159 160 165 
##   1   1   1   9   2   6   9   1   2   3   1   2   2   2   4   3   2   1   4   4 
## 166 167 168 170 171 175 176 178 180 182 183 184 185 188 190 191 192 193 194 196 
##   1   2   4   2   1   3   3   1   7   3   1   1   2   1   4   1   2   1   3   1 
## 200 204 205 207 210 215 220 225 228 230 231 235 237 240 245 249 250 255 258 265 
##   4   1   2   2   5   3   2   2   1   2   2   1   1   2   1   1   1   1   1   2 
## 270 271 272 274 275 277 278 280 284 285 291 293 300 304 310 318 321 325 326 328 
##   1   1   1   1   1   1   1   1   1   2   1   2   1   1   1   1   1   3   1   1 
## 330 335 342 360 370 375 387 392 402 415 440 465 474 478 480 485 495 510 540 543 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   2   1   2   1   1   1 
## 545 579 600 680 744 846 
##   1   1   1   1   1   1
table(diabetes$Age)
## 
## 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 
## 63 72 38 46 48 33 32 35 29 21 24 16 17 14 10 16 19 16 12 13 22 18 13  8 15 13 
## 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 72 81 
##  6  5  5  8  8  8  5  6  4  3  5  7  3  5  2  4  4  1  3  4  3  1  2  1  1  1
table(diabetes$DiabetesPedigreeFunction)
## 
## 0.078 0.084 0.085 0.088 0.089 0.092 0.096   0.1 0.101 0.102 0.107 0.108 0.115 
##     1     1     2     2     1     1     1     1     1     1     1     1     1 
## 0.118 0.121 0.122 0.123 0.126 0.127 0.128 0.129  0.13 0.133 0.134 0.135 0.136 
##     1     2     1     1     2     2     2     2     1     1     2     1     1 
## 0.137 0.138  0.14 0.141 0.142 0.143 0.144 0.145 0.147 0.148 0.149  0.15 0.151 
##     2     1     2     3     3     2     1     1     1     3     1     2     3 
## 0.153 0.154 0.155 0.156 0.157 0.158 0.159  0.16 0.161 0.162 0.163 0.164 0.165 
##     2     1     1     1     1     2     2     1     2     1     1     2     3 
## 0.166 0.167  0.17 0.171 0.173 0.174 0.175 0.176 0.177 0.178 0.179  0.18 0.181 
##     1     4     1     1     1     1     1     1     1     3     1     2     1 
## 0.182 0.183 0.186 0.187 0.188 0.189  0.19 0.191 0.192 0.194 0.196 0.197 0.198 
##     1     2     1     3     1     2     4     2     2     1     1     4     2 
## 0.199   0.2 0.201 0.203 0.204 0.205 0.206 0.207 0.209  0.21 0.212 0.215 0.217 
##     1     3     1     2     2     3     2     5     2     1     2     1     1 
## 0.218 0.219  0.22 0.221 0.222 0.223 0.225 0.226 0.227 0.229  0.23 0.231 0.232 
##     2     2     1     1     1     2     1     1     1     1     2     2     1 
## 0.233 0.234 0.235 0.236 0.237 0.238 0.239  0.24 0.241 0.243 0.244 0.245 0.246 
##     2     2     3     3     4     5     1     2     1     1     2     4     1 
## 0.247 0.248 0.249 0.251 0.252 0.253 0.254 0.255 0.256 0.257 0.258 0.259  0.26 
##     2     2     2     2     2     1     6     1     3     3     6     5     4 
## 0.261 0.262 0.263 0.264 0.265 0.267 0.268 0.269  0.27 0.271 0.272 0.277 0.278 
##     5     2     4     1     1     2     5     2     4     1     1     1     2 
## 0.279  0.28 0.282 0.283 0.284 0.285 0.286 0.287 0.289  0.29 0.292 0.293 0.294 
##     1     3     2     1     4     2     2     1     2     2     3     2     3 
## 0.295 0.296 0.297 0.299   0.3 0.302 0.303 0.304 0.305 0.306 0.307 0.313 0.314 
##     1     1     1     4     1     2     1     4     3     2     1     2     2 
## 0.315 0.317 0.318 0.319 0.323 0.324 0.325 0.326 0.328 0.329  0.33 0.331 0.332 
##     2     1     1     1     2     2     1     2     2     1     1     1     1 
## 0.334 0.335 0.336 0.337 0.338  0.34 0.341 0.342 0.343 0.344 0.345 0.346 0.347 
##     2     1     2     3     1     3     1     2     2     2     1     1     1 
## 0.349 0.351 0.352 0.355 0.356 0.358 0.361 0.362 0.364 0.365 0.366 0.368  0.37 
##     3     1     1     1     2     1     2     1     2     2     1     2     2 
## 0.371 0.374 0.375 0.376 0.378  0.38 0.381 0.382 0.383 0.385 0.388 0.389 0.391 
##     1     1     1     1     2     2     1     1     1     1     1     3     3 
## 0.393 0.394 0.395 0.396 0.398 0.399   0.4 0.401 0.402 0.403 0.404 0.407 0.408 
##     1     1     1     1     1     1     2     1     2     2     1     2     1 
## 0.409 0.411 0.412 0.415 0.416 0.417 0.419  0.42 0.421 0.422 0.423 0.426 0.427 
##     1     1     2     2     1     1     1     1     1     3     1     1     1 
##  0.43 0.431 0.432 0.433 0.434 0.435 0.439 0.441 0.443 0.444 0.446 0.447 0.451 
##     2     1     1     2     2     1     2     1     3     2     1     1     1 
## 0.452 0.453 0.454 0.455 0.457  0.46 0.463 0.464 0.465 0.466 0.467 0.471 0.472 
##     3     1     1     2     1     1     1     1     1     2     1     2     1 
## 0.479 0.482 0.483 0.484 0.485 0.487 0.488 0.491 0.493 0.495 0.496 0.497 0.498 
##     1     1     1     1     1     1     1     1     1     1     3     2     1 
## 0.499 0.501 0.502 0.503 0.507 0.509  0.51 0.512 0.514 0.515 0.516  0.52 0.525 
##     1     1     1     1     1     1     1     1     2     1     1     3     1 
## 0.526 0.527 0.528 0.529 0.532 0.534 0.536 0.537 0.539 0.542 0.543 0.545 0.546 
##     1     2     2     1     1     1     2     1     1     2     2     1     1 
## 0.547 0.549 0.551 0.554 0.557 0.559  0.56 0.561 0.564 0.565 0.569 0.571 0.572 
##     1     1     4     1     2     2     1     1     1     1     1     1     1 
## 0.575 0.578  0.58 0.582 0.583 0.586 0.587 0.588 0.591 0.593 0.595 0.597 0.598 
##     1     1     1     2     3     2     3     1     2     1     1     1     1 
##   0.6 0.601 0.605 0.607  0.61 0.612 0.613 0.614 0.615 0.619 0.624 0.626 0.627 
##     2     1     2     1     1     1     1     1     1     1     1     1     1 
## 0.629  0.63 0.631 0.637  0.64 0.645 0.646 0.647 0.649 0.652 0.654 0.655 0.658 
##     1     1     1     2     2     1     1     2     1     2     2     1     1 
##  0.66 0.661 0.665 0.666 0.672 0.673 0.674 0.677 0.678  0.68 0.682 0.686 0.687 
##     2     1     1     1     1     1     2     1     2     1     1     2     4 
## 0.692 0.693 0.695 0.696 0.698 0.699 0.702 0.703 0.704 0.705 0.709 0.711 0.717 
##     4     1     1     1     1     1     1     1     1     1     1     1     1 
## 0.718 0.719 0.721 0.722 0.725 0.727  0.73 0.731 0.732 0.733 0.734 0.735 0.738 
##     1     1     1     1     1     2     1     1     1     2     1     1     1 
## 0.741 0.742 0.743 0.744 0.745 0.748 0.757 0.759 0.761 0.766 0.767 0.771 0.773 
##     1     1     1     1     1     1     1     1     2     1     1     1     1 
## 0.785 0.787 0.801 0.803 0.804 0.805 0.808 0.813 0.816 0.817 0.821 0.825 0.826 
##     1     2     1     1     1     1     1     1     1     1     1     1     1 
## 0.828 0.831 0.832 0.833 0.839  0.84 0.845 0.851 0.855 0.856 0.867 0.871 0.874 
##     1     1     1     1     2     1     1     1     1     1     1     1     1 
## 0.875 0.878  0.88 0.881 0.886 0.892 0.893 0.904 0.905 0.917 0.925 0.926  0.93 
##     2     1     1     1     1     1     1     1     2     1     1     1     1 
## 0.932 0.933 0.944 0.947 0.949 0.955 0.956 0.962 0.966 0.968  0.97 0.997 1.001 
##     1     1     1     1     1     1     1     2     1     2     1     1     1 
## 1.021 1.022 1.034 1.057 1.072 1.076 1.095 1.096 1.101 1.114 1.127 1.136 1.138 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
## 1.144 1.154 1.159 1.162 1.174 1.182 1.189 1.191 1.213 1.222 1.224 1.251 1.258 
##     1     1     1     1     1     1     1     1     1     1     2     1     1 
## 1.268 1.282 1.292 1.318 1.321 1.353  1.39 1.391 1.394   1.4 1.441 1.461 1.476 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
##   1.6 1.698 1.699 1.731 1.781 1.893 2.137 2.288 2.329  2.42 
##     1     1     1     1     1     1     1     1     1     1
table(diabetes$BMI)
## 
##    0 18.2 18.4 19.1 19.3 19.4 19.5 19.6 19.9   20 20.1 20.4 20.8   21 21.1 21.2 
##   11    3    1    1    1    1    2    3    1    1    1    2    2    2    4    1 
## 21.7 21.8 21.9 22.1 22.2 22.3 22.4 22.5 22.6 22.7 22.9   23 23.1 23.2 23.3 23.4 
##    1    5    3    2    2    1    2    3    2    1    2    2    4    3    2    1 
## 23.5 23.6 23.7 23.8 23.9   24 24.1 24.2 24.3 24.4 24.5 24.6 24.7 24.8 24.9   25 
##    3    3    2    2    2    4    1    6    4    3    1    4    5    3    1    6 
## 25.1 25.2 25.3 25.4 25.5 25.6 25.8 25.9   26 26.1 26.2 26.3 26.4 26.5 26.6 26.7 
##    3    6    2    4    2    6    2    7    4    3    4    1    3    3    4    1 
## 26.8 26.9   27 27.1 27.2 27.3 27.4 27.5 27.6 27.7 27.8 27.9   28 28.1 28.2 28.3 
##    4    1    2    3    2    4    5    5    7    4    7    2    5    1    2    2 
## 28.4 28.5 28.6 28.7 28.8 28.9   29 29.2 29.3 29.5 29.6 29.7 29.8 29.9   30 30.1 
##    6    3    2    7    2    6    5    1    5    5    4    8    3    5    7    9 
## 30.2 30.3 30.4 30.5 30.7 30.8 30.9   31 31.1 31.2 31.3 31.6 31.9   32 32.1 32.2 
##    1    1    7    7    1    9    5    2    1   12    1   12    2   13    1    1 
## 32.3 32.4 32.5 32.6 32.7 32.8 32.9 33.1 33.2 33.3 33.5 33.6 33.7 33.8 33.9   34 
##    3   10    6    1    3    9    9    3    7   10    1    8    5    5    2    6 
## 34.1 34.2 34.3 34.4 34.5 34.6 34.7 34.8 34.9   35 35.1 35.2 35.3 35.4 35.5 35.6 
##    4    8    6    4    5    5    4    2    6    4    3    2    4    4    7    2 
## 35.7 35.8 35.9   36 36.1 36.2 36.3 36.4 36.5 36.6 36.7 36.8 36.9   37 37.1 37.2 
##    4    5    5    2    3    1    3    2    4    5    1    6    3    1    2    4 
## 37.3 37.4 37.5 37.6 37.7 37.8 37.9   38 38.1 38.2 38.3 38.4 38.5 38.6 38.7 38.8 
##    1    3    2    5    5    3    2    2    3    4    1    2    6    1    3    1 
## 38.9   39 39.1 39.2 39.3 39.4 39.5 39.6 39.7 39.8 39.9   40 40.1 40.2 40.5 40.6 
##    1    4    4    2    1    7    3    1    1    2    3    2    1    1    3    4 
## 40.7 40.8 40.9   41 41.2 41.3 41.5 41.8   42 42.1 42.2 42.3 42.4 42.6 42.7 42.8 
##    1    1    2    1    1    3    2    1    1    2    1    3    3    1    2    1 
## 42.9 43.1 43.2 43.3 43.4 43.5 43.6   44 44.1 44.2 44.5 44.6   45 45.2 45.3 45.4 
##    4    1    1    5    2    2    2    2    1    2    2    1    1    1    3    1 
## 45.5 45.6 45.7 45.8 46.1 46.2 46.3 46.5 46.7 46.8 47.9 48.3 48.8 49.3 49.6 49.7 
##    1    2    1    1    2    2    1    1    1    2    2    1    1    1    1    1 
##   50 52.3 52.9 53.2   55 57.3 59.4 67.1 
##    1    2    1    1    1    1    1    1

Univariate Analysis

# Checking the pregnancy count for the women.
preg <- diabetes %>%
  count(Pregnancies)
preg
##    Pregnancies   n
## 1            0 111
## 2            1 135
## 3            2 103
## 4            3  75
## 5            4  68
## 6            5  57
## 7            6  50
## 8            7  45
## 9            8  38
## 10           9  28
## 11          10  24
## 12          11  11
## 13          12   9
## 14          13  10
## 15          14   2
## 16          15   1
## 17          17   1
ggplot(preg,aes(x=Pregnancies,y=n))+
  geom_col(fill="blue")+
  geom_text(aes(label = n),vjust=-0.1)+
  labs(title="Count of Pregnancies",
       x="Pregnancies",
       y="Count"
  )+
  theme_minimal()

This displays the number of females and how many times they have been pregnant.

out<-diabetes%>%
  count(Outcome)
out
##   Outcome   n
## 1       0 500
## 2       1 268
ggplot(out,aes(x=Outcome,y=n,fill=Outcome))+
  geom_col()+
  geom_text(aes(label=n),vjust=-0.3)+
  labs(title = "Count of Outcome",
       x="Outcome",
       y="Count"
    
  )+
  theme_minimal()

There are 268 diabetes patients in the dataset, meaning the non-diabetes group has the higher count.

Bivariate Analysis

Categorizing the pregnancies into groups and visualizing the pregnancy group to see how they affect the diabetes outcome.

Pregnancy_group <- cut(
 diabetes$Pregnancies,
  breaks = c(0, 6, 12, 17),
  labels = c("Normal Pregnancy", "Moderate Pregnancy", "High Pregnancy"),
  right = TRUE, include.lowest = TRUE
)
ggplot(diabetes, aes(Pregnancy_group, fill = factor(Outcome))) +
  geom_bar(position = "dodge") +
  labs(title="Pregnancy Group vs Diabetes Outcome",
       x="Pregnancy Group", y="Count") +
  theme_minimal()

In the Pima dataset, most women fall into normal pregnancy counts. Since this group is bigger, it will contain more diabetes cases, even if the percentage is not higher.

Categorizing age into groups and checking their count and visualizing to see how age affects the diabetes outcome.

Age_group <- cut(diabetes$Age,
                          breaks = c(0, 30, 45, 60, 81),
                          labels = c("Young", "Middle-aged", "Older Adult", "Elderly"),
                          right = TRUE)




table(diabetes$Age_group)
## < table of extent 0 >
ggplot(diabetes, aes(Age_group, fill = factor(Outcome))) +
  geom_bar(position = "dodge") +
  labs(title="Age Group vs Diabetes Outcome",
       x="Age Group", y="Count") +
  theme_minimal()

The young age group has the highest population of non diabetes patient while the people in the middle age has the highest number of diabetes patient.

Categorizing BMI (Body Mass Index) into groups and checking their count and visualizing to see how BMI affects the diabetes outcome.

diabetes$BMI <- as.numeric(diabetes$BMI)
diabetes$BMI
##   [1] 33.6 26.6 23.3 28.1 43.1 25.6 31.0 35.3 30.5  0.0 37.6 38.0 27.1 30.1 25.8
##  [16] 30.0 45.8 29.6 43.3 34.6 39.3 35.4 39.8 29.0 36.6 31.1 39.4 23.2 22.2 34.1
##  [31] 36.0 31.6 24.8 19.9 27.6 24.0 33.2 32.9 38.2 37.1 34.0 40.2 22.7 45.4 27.4
##  [46] 42.0 29.7 28.0 39.1  0.0 19.4 24.2 24.4 33.7 34.7 23.0 37.7 46.8 40.5 41.5
##  [61]  0.0 32.9 25.0 25.4 32.8 29.0 32.5 42.7 19.6 28.9 32.9 28.6 43.4 35.1 32.0
##  [76] 24.7 32.6 37.7 43.2 25.0 22.4  0.0 29.3 24.6 48.8 32.4 36.6 38.5 37.1 26.5
##  [91] 19.1 32.0 46.7 23.8 24.7 33.9 31.6 20.4 28.7 49.7 39.0 26.1 22.5 26.6 39.6
## [106] 28.7 22.4 29.5 34.3 37.4 33.3 34.0 31.2 34.0 30.5 31.2 34.0 33.7 28.2 23.2
## [121] 53.2 34.2 33.6 26.8 33.3 55.0 42.9 33.3 34.5 27.9 29.7 33.3 34.5 38.3 21.1
## [136] 33.8 30.8 28.7 31.2 36.9 21.1 39.5 32.5 32.4 32.8  0.0 32.8 30.5 33.7 27.3
## [151] 37.4 21.9 34.3 40.6 47.9 50.0 24.6 25.2 29.0 40.9 29.7 37.2 44.2 29.7 31.6
## [166] 29.9 32.5 29.6 31.9 28.4 30.8 35.4 28.9 43.5 29.7 32.7 31.2 67.1 45.0 39.1
## [181] 23.2 34.9 27.7 26.8 27.6 35.9 30.1 32.0 27.9 31.6 22.6 33.1 30.4 52.3 24.4
## [196] 39.4 24.3 22.9 34.8 30.9 31.0 40.1 27.3 20.4 37.7 23.9 37.5 37.7 33.2 35.5
## [211] 27.7 42.8 34.2 42.6 34.2 41.8 35.8 30.0 29.0 37.8 34.6 31.6 25.2 28.8 23.6
## [226] 34.6 35.7 37.2 36.7 45.2 44.0 46.2 25.4 35.0 29.7 43.6 35.9 44.1 30.8 18.4
## [241] 29.2 33.1 25.6 27.1 38.2 30.0 31.2 52.3 35.4 30.1 31.2 28.0 24.4 35.8 27.6
## [256] 33.6 30.1 28.7 25.9 33.3 30.9 30.0 32.1 32.4 32.0 33.6 36.3 40.0 25.1 27.5
## [271] 45.6 25.2 23.0 33.2 34.2 40.5 26.5 27.8 24.9 25.3 37.9 35.9 32.4 30.4 27.0
## [286] 26.0 38.7 45.6 20.8 36.1 36.9 36.6 43.3 40.5 21.9 35.5 28.0 30.7 36.6 23.6
## [301] 32.3 31.6 35.8 52.9 21.0 39.7 25.5 24.8 30.5 32.9 26.2 39.4 26.6 29.5 35.9
## [316] 34.1 19.3 30.5 38.1 23.5 27.5 31.6 27.4 26.8 35.7 25.6 35.1 35.1 45.5 30.8
## [331] 23.1 32.7 43.3 23.6 23.9 47.9 33.8 31.2 34.2 39.9 25.9 25.9 32.0 34.7 36.8
## [346] 38.5 28.7 23.5 21.8 41.0 42.2 31.2 34.4 27.2 42.7 30.4 33.3 39.9 35.3 36.5
## [361] 31.2 29.8 39.2 38.5 34.9 34.0 27.6 21.0 27.5 32.8 38.4  0.0 35.8 34.9 36.2
## [376] 39.2 25.2 37.2 48.3 43.4 30.8 20.0 25.4 25.1 24.3 22.3 32.3 43.3 32.0 31.6
## [391] 32.0 45.7 23.7 22.1 32.9 27.7 24.7 34.3 21.1 34.9 32.0 24.2 35.0 31.6 32.9
## [406] 42.1 28.9 21.9 25.9 42.4 35.7 34.4 42.4 26.2 34.6 35.7 27.2 38.5 18.2 26.4
## [421] 45.3 26.0 40.6 30.8 42.9 37.0  0.0 34.1 40.6 35.0 22.2 30.4 30.0 25.6 24.5
## [436] 42.4 37.4 29.9 18.2 36.8 34.3 32.2 33.2 30.5 29.7 59.4 25.3 36.5 33.6 30.5
## [451] 21.2 28.9 39.9 19.6 37.8 33.6 26.7 30.2 37.6 25.9 20.8 21.8 35.3 27.6 24.0
## [466] 21.8 27.8 36.8 30.0 46.1 41.3 33.2 38.8 29.9 28.9 27.3 33.7 23.8 25.9 28.0
## [481] 35.5 35.2 27.8 38.2 44.2 42.3 40.7 46.5 25.6 26.1 36.8 33.5 32.8 28.9  0.0
## [496] 26.6 26.0 30.1 25.1 29.3 25.2 37.2 39.0 33.3 37.3 33.3 36.5 28.6 30.4 25.0
## [511] 29.7 22.1 24.2 27.3 25.6 31.6 30.3 37.6 32.8 19.6 25.0 33.2  0.0 34.2 31.6
## [526] 21.8 18.2 26.3 30.8 24.6 29.8 45.3 41.3 29.8 33.3 32.9 29.6 21.7 36.3 36.4
## [541] 39.4 32.4 34.9 39.5 32.0 34.5 43.6 33.1 32.8 28.5 27.4 31.9 27.8 29.9 36.9
## [556] 25.5 38.1 27.8 46.2 30.1 33.8 41.3 37.6 26.9 32.4 26.1 38.6 32.0 31.3 34.3
## [571] 32.5 22.6 29.5 34.7 30.1 35.5 24.0 42.9 27.0 34.7 42.1 25.0 26.5 38.7 28.7
## [586] 22.5 34.9 24.3 33.3 21.1 46.8 39.4 34.4 28.5 33.6 32.0 45.3 27.8 36.8 23.1
## [601] 27.1 23.7 27.8 35.2 28.4 35.8 40.0 19.5 41.5 24.0 30.9 32.9 38.2 32.5 36.1
## [616] 25.8 28.7 20.1 28.2 32.4 38.4 24.2 40.8 43.5 30.8 37.7 24.7 32.4 34.6 24.7
## [631] 27.4 34.5 26.2 27.5 25.9 31.2 28.8 31.6 40.9 19.5 29.3 34.3 29.5 28.0 27.6
## [646] 39.4 23.4 37.8 28.3 26.4 25.2 33.8 34.1 26.8 34.2 38.7 21.8 38.9 39.0 34.2
## [661] 27.7 42.9 37.6 37.9 33.7 34.8 32.5 27.5 34.0 30.9 33.6 25.4 35.5 57.3 35.6
## [676] 30.9 24.8 35.3 36.0 24.2 24.2 49.6 44.6 32.3  0.0 33.2 23.1 28.3 24.1 46.1
## [691] 24.6 42.3 39.1 38.5 23.5 30.4 29.9 25.0 34.5 44.5 35.9 27.6 35.0 38.5 28.4
## [706] 39.8  0.0 34.4 32.8 38.0 31.2 29.6 41.2 26.4 29.5 33.9 33.8 23.1 35.5 35.6
## [721] 29.3 38.1 29.3 39.1 32.8 39.4 36.1 32.4 22.9 30.1 28.4 28.4 44.5 29.0 23.3
## [736] 35.4 27.4 32.0 36.6 39.5 42.3 30.8 28.5 32.7 40.6 30.0 49.3 46.3 36.4 24.3
## [751] 31.2 39.0 26.0 43.3 32.4 36.5 32.0 36.3 37.5 35.5 28.4 44.0 22.5 32.9 36.8
## [766] 26.2 30.1 30.4
is.numeric(diabetes$BMI)
## [1] TRUE
diabetes$BMI_cat <- cut(
  diabetes$BMI,
  breaks = c(0,25,35, 40, 70),
  labels = c("Underweight", "Normal", "Overweight", "Obese"),
  right = FALSE
)
ggplot(diabetes, aes(BMI_cat, fill = factor(Outcome))) +
  geom_bar(position = "dodge") +
  labs(title="BMI vs Diabetes Outcome",
       x="BMI category", y="Count") +
  theme_minimal()

The normal BMI group has the highest number of diabetic patients simply because they are the most common group in the dataset.

Categorizing Skin thickness into groups and visualizing against diabetes outcome.

Skin_group <- cut(
  diabetes$SkinThickness,
  breaks = c(-Inf, 10, 20, Inf),
  labels = c("Normal", "High", "Very High"),
  include.lowest = TRUE
)

ggplot(diabetes,aes(Skin_group,fill=factor(Outcome))) +
  geom_bar(position = "dodge") +
  labs(title = "Skin thickness vs Diabetes Outcome",
       x="Skin thickness", y="Count") +
  theme_minimal()

The patient with high skin thickness has the highest number of diabetes patients followed by the patient with normal skin thickness.

Categorizing Glucose level into groups and visualizing against diabetes outcome.

diabetes$Glucose_cat <- cut(diabetes$Glucose, breaks = c(0,100,125,200), 
                            labels = c("Normal","Prediabetic","Diabetic"),right = FALSE)
diabetes$Glucose_cat
##   [1] Diabetic    Normal      Diabetic    Normal      Diabetic    Prediabetic
##   [7] Normal      Prediabetic Diabetic    Diabetic    Prediabetic Diabetic   
##  [13] Diabetic    Diabetic    Diabetic    Prediabetic Prediabetic Prediabetic
##  [19] Prediabetic Prediabetic Diabetic    Normal      Diabetic    Prediabetic
##  [25] Diabetic    Diabetic    Diabetic    Normal      Diabetic    Prediabetic
##  [31] Prediabetic Diabetic    Normal      Normal      Prediabetic Prediabetic
##  [37] Diabetic    Prediabetic Normal      Prediabetic Diabetic    Diabetic   
##  [43] Prediabetic Diabetic    Diabetic    Diabetic    Diabetic    Normal     
##  [49] Prediabetic Prediabetic Prediabetic Prediabetic Normal      Diabetic   
##  [55] Diabetic    Normal      Diabetic    Prediabetic Diabetic    Prediabetic
##  [61] Normal      Diabetic    Normal      Diabetic    Prediabetic Normal     
##  [67] Prediabetic Prediabetic Normal      Diabetic    Prediabetic Diabetic   
##  [73] Diabetic    Diabetic    Normal      Normal      Normal      Normal     
##  [79] Diabetic    Prediabetic Prediabetic Normal      Normal      Prediabetic
##  [85] Diabetic    Prediabetic Prediabetic Prediabetic Diabetic    Prediabetic
##  [91] Normal      Prediabetic Normal      Diabetic    Diabetic    Diabetic   
##  [97] Normal      Normal      Normal      Prediabetic Diabetic    Diabetic   
## [103] Diabetic    Normal      Normal      Diabetic    Normal      Diabetic   
## [109] Normal      Normal      Diabetic    Diabetic    Normal      Normal     
## [115] Diabetic    Diabetic    Prediabetic Normal      Normal      Normal     
## [121] Diabetic    Prediabetic Prediabetic Diabetic    Prediabetic Normal     
## [127] Prediabetic Prediabetic Prediabetic Prediabetic Diabetic    Prediabetic
## [133] Diabetic    Normal      Normal      Diabetic    Prediabetic Normal     
## [139] Diabetic    Prediabetic Diabetic    Prediabetic Prediabetic Prediabetic
## [145] Diabetic    Prediabetic Normal      Prediabetic Diabetic    Normal     
## [151] Diabetic    Prediabetic Diabetic    Diabetic    Diabetic    Diabetic   
## [157] Normal      Prediabetic Normal      Diabetic    Diabetic    Prediabetic
## [163] Prediabetic Prediabetic Diabetic    Prediabetic Diabetic    Prediabetic
## [169] Prediabetic Prediabetic Prediabetic Diabetic    Normal      Normal     
## [175] Normal      Diabetic    Normal      Diabetic    Diabetic    Diabetic   
## [181] Normal      Prediabetic Normal      Normal      Diabetic    Diabetic   
## [187] Diabetic    Diabetic    Prediabetic Diabetic    Prediabetic Prediabetic
## [193] Diabetic    Diabetic    Normal      Diabetic    Prediabetic Prediabetic
## [199] Prediabetic Diabetic    Prediabetic Diabetic    Prediabetic Normal     
## [205] Prediabetic Prediabetic Diabetic    Diabetic    Normal      Diabetic   
## [211] Normal      Diabetic    Diabetic    Diabetic    Prediabetic Diabetic   
## [217] Prediabetic Diabetic    Normal      Prediabetic Diabetic    Diabetic   
## [223] Prediabetic Diabetic    Prediabetic Normal      Prediabetic Diabetic   
## [229] Diabetic    Prediabetic Diabetic    Diabetic    Normal      Prediabetic
## [235] Normal      Diabetic    Diabetic    Diabetic    Diabetic    Prediabetic
## [241] Normal      Normal      Diabetic    Prediabetic Diabetic    Diabetic   
## [247] Prediabetic Diabetic    Prediabetic Prediabetic Prediabetic Diabetic   
## [253] Normal      Normal      Normal      Prediabetic Prediabetic Prediabetic
## [259] Diabetic    Diabetic    Diabetic    Diabetic    Normal      Diabetic   
## [265] Prediabetic Normal      Diabetic    Diabetic    Prediabetic Diabetic   
## [271] Prediabetic Prediabetic Prediabetic Normal      Prediabetic Prediabetic
## [277] Prediabetic Prediabetic Prediabetic Prediabetic Diabetic    Diabetic   
## [283] Diabetic    Diabetic    Prediabetic Diabetic    Diabetic    Prediabetic
## [289] Normal      Prediabetic Normal      Prediabetic Diabetic    Diabetic   
## [295] Diabetic    Diabetic    Diabetic    Diabetic    Prediabetic Prediabetic
## [301] Diabetic    Diabetic    Normal      Prediabetic Diabetic    Prediabetic
## [307] Diabetic    Diabetic    Diabetic    Prediabetic Normal      Prediabetic
## [313] Diabetic    Prediabetic Prediabetic Prediabetic Normal      Diabetic   
## [319] Prediabetic Diabetic    Diabetic    Prediabetic Prediabetic Diabetic   
## [325] Prediabetic Diabetic    Prediabetic Diabetic    Prediabetic Prediabetic
## [331] Prediabetic Normal      Diabetic    Prediabetic Normal      Diabetic   
## [337] Prediabetic Prediabetic Diabetic    Diabetic    Diabetic    Normal     
## [343] Normal      Prediabetic Normal      Diabetic    Diabetic    Prediabetic
## [349] Normal      Normal      Normal      Diabetic    Normal      Normal     
## [355] Normal      Diabetic    Diabetic    Diabetic    Normal      Diabetic   
## [361] Diabetic    Diabetic    Prediabetic Diabetic    Diabetic    Normal     
## [367] Prediabetic Prediabetic Normal      Diabetic    Diabetic    Prediabetic
## [373] Normal      Prediabetic Prediabetic Diabetic    Normal      Normal     
## [379] Diabetic    Normal      Prediabetic Prediabetic Prediabetic Normal     
## [385] Diabetic    Prediabetic Prediabetic Prediabetic Diabetic    Prediabetic
## [391] Prediabetic Diabetic    Diabetic    Prediabetic Diabetic    Diabetic   
## [397] Normal      Diabetic    Normal      Diabetic    Normal      Diabetic   
## [403] Diabetic    Normal      Diabetic    Prediabetic Prediabetic Prediabetic
## [409] Diabetic    Diabetic    Prediabetic Prediabetic Diabetic    Diabetic   
## [415] Diabetic    Diabetic    Normal      Diabetic    Normal      Diabetic   
## [421] Prediabetic Normal      Prediabetic Prediabetic Diabetic    Diabetic   
## [427] Normal      Diabetic    Diabetic    Normal      Normal      Normal     
## [433] Normal      Diabetic    Normal      Diabetic    Diabetic    Diabetic   
## [439] Normal      Prediabetic Diabetic    Normal      Prediabetic Prediabetic
## [445] Prediabetic Diabetic    Prediabetic Normal      Prediabetic Prediabetic
## [451] Normal      Diabetic    Normal      Prediabetic Prediabetic Diabetic   
## [457] Diabetic    Normal      Diabetic    Diabetic    Prediabetic Normal     
## [463] Normal      Normal      Prediabetic Prediabetic Normal      Normal     
## [469] Prediabetic Diabetic    Diabetic    Diabetic    Prediabetic Diabetic   
## [475] Prediabetic Diabetic    Prediabetic Prediabetic Diabetic    Diabetic   
## [481] Diabetic    Prediabetic Normal      Normal      Diabetic    Diabetic   
## [487] Diabetic    Diabetic    Normal      Diabetic    Normal      Normal     
## [493] Normal      Diabetic    Normal      Diabetic    Prediabetic Normal     
## [499] Diabetic    Diabetic    Prediabetic Normal      Normal      Normal     
## [505] Normal      Normal      Diabetic    Diabetic    Normal      Prediabetic
## [511] Normal      Diabetic    Normal      Normal      Normal      Diabetic   
## [517] Diabetic    Diabetic    Normal      Diabetic    Normal      Prediabetic
## [523] Prediabetic Diabetic    Diabetic    Normal      Normal      Prediabetic
## [529] Prediabetic Prediabetic Prediabetic Prediabetic Normal      Normal     
## [535] Normal      Diabetic    Prediabetic Normal      Diabetic    Diabetic   
## [541] Prediabetic Diabetic    Normal      Normal      Normal      Diabetic   
## [547] Diabetic    Diabetic    Diabetic    Diabetic    Prediabetic Normal     
## [553] Prediabetic Normal      Normal      Prediabetic Normal      Prediabetic
## [559] Prediabetic Normal      Diabetic    Diabetic    Normal      Normal     
## [565] Normal      Normal      Normal      Normal      Diabetic    Prediabetic
## [571] Normal      Diabetic    Prediabetic Normal      Diabetic    Prediabetic
## [577] Prediabetic Prediabetic Diabetic    Diabetic    Diabetic    Prediabetic
## [583] Prediabetic Prediabetic Prediabetic Normal      Diabetic    Prediabetic
## [589] Diabetic    Normal      Prediabetic Prediabetic Diabetic    Normal     
## [595] Prediabetic Diabetic    Normal      Normal      Diabetic    Prediabetic
## [601] Prediabetic Normal      Prediabetic Diabetic    Diabetic    Prediabetic
## [607] Diabetic    Normal      Diabetic    Prediabetic Prediabetic Diabetic   
## [613] Diabetic    Prediabetic Diabetic    Prediabetic Prediabetic Normal     
## [619] Prediabetic Prediabetic Prediabetic Normal      Diabetic    Normal     
## [625] Prediabetic Normal      Diabetic    Diabetic    Diabetic    Normal     
## [631] Prediabetic Prediabetic Prediabetic Diabetic    Normal      Prediabetic
## [637] Prediabetic Normal      Normal      Prediabetic Prediabetic Diabetic   
## [643] Diabetic    Normal      Prediabetic Diabetic    Diabetic    Diabetic   
## [649] Diabetic    Prediabetic Normal      Prediabetic Prediabetic Prediabetic
## [655] Prediabetic Diabetic    Prediabetic Prediabetic Diabetic    Normal     
## [661] Diabetic    Diabetic    Diabetic    Diabetic    Prediabetic Prediabetic
## [667] Diabetic    Prediabetic Normal      Diabetic    Diabetic    Normal     
## [673] Normal      Prediabetic Normal      Diabetic    Diabetic    Normal     
## [679] Prediabetic Prediabetic Normal      Diabetic    Normal      Diabetic   
## [685] Diabetic    Diabetic    Diabetic    Prediabetic Diabetic    Diabetic   
## [691] Prediabetic Diabetic    Prediabetic Diabetic    Normal      Diabetic   
## [697] Diabetic    Normal      Diabetic    Prediabetic Prediabetic Diabetic   
## [703] Diabetic    Diabetic    Prediabetic Normal      Prediabetic Diabetic   
## [709] Diabetic    Normal      Diabetic    Diabetic    Diabetic    Diabetic   
## [715] Prediabetic Diabetic    Diabetic    Normal      Prediabetic Normal     
## [721] Normal      Prediabetic Diabetic    Prediabetic Prediabetic Prediabetic
## [727] Prediabetic Diabetic    Diabetic    Normal      Diabetic    Prediabetic
## [733] Diabetic    Prediabetic Prediabetic Normal      Diabetic    Normal     
## [739] Normal      Prediabetic Prediabetic Prediabetic Prediabetic Diabetic   
## [745] Diabetic    Prediabetic Diabetic    Normal      Diabetic    Diabetic   
## [751] Diabetic    Prediabetic Prediabetic Diabetic    Diabetic    Diabetic   
## [757] Diabetic    Prediabetic Prediabetic Diabetic    Normal      Diabetic   
## [763] Normal      Prediabetic Prediabetic Prediabetic Diabetic    Normal     
## Levels: Normal Prediabetic Diabetic
ggplot(diabetes,aes(Glucose_cat,fill=factor(Outcome))) +
  geom_bar(position="dodge") +
  labs(title="Glucose level vs Diabetes Outcome",
       x="Glucose category", y="Count") +
  theme_minimal()

As glucose levels rise, there is likelihood for the patient to be diabetic.

Categorizing Blood Pressure into Groups and visualizing against diabetes outcome.

diabetes$BloodPressure_cat <- cut(diabetes$BloodPressure,    breaks = c(-Inf, 79, 89, Inf),  
  labels = c("Normal", "Prehypertension", "Hypertension"),
                                  right = TRUE
)
table(diabetes$BloodPressure)
## 
##   0  24  30  38  40  44  46  48  50  52  54  55  56  58  60  61  62  64  65  66 
##  35   1   2   1   1   4   2   5  13  11  11   2  12  21  37   1  34  43   7  30 
##  68  70  72  74  75  76  78  80  82  84  85  86  88  90  92  94  95  96  98 100 
##  45  57  44  52   8  39  45  40  30  23   6  21  25  22   8   6   1   4   3   3 
## 102 104 106 108 110 114 122 
##   1   2   3   2   3   1   1
ggplot(diabetes,aes(BloodPressure_cat,fill=factor(Outcome))) +
  geom_bar(position="dodge") +
  labs(title="Blood Pressure vs Diabetes Outcome",
       x="Blood Pressure", y="Count") +
  theme_minimal()

Even though high blood pressure can increase the risk of diabetes, most people in the dataset have normal blood pressure. That’s why the largest number of diabetes cases is seen in the normal BP group.

Categorizing and visualizing for insulin levels

diabetes$Insulin[diabetes$Insulin > 300] <- 300
diabetes[is.na(diabetes)] <- 0
diabetes$Insulin_cat <- cut(
  diabetes$Insulin,
 breaks = c(-Inf, 25, 100, 300, Inf),
  labels = c("Low", "Normal", "High", "Extremely High"),
  right = TRUE,
  include.lowest = TRUE
)




ggplot(diabetes, aes(Insulin_cat, fill = factor(Outcome))) +
  geom_bar(position = "dodge") +
  labs(
    title = "Insulin Categories vs Diabetes Outcome",
    x = "Insulin Category",
    y = "Count",
    fill = "Outcome"
  ) +
  theme_minimal()

In Type 1 diabetes, the body cannot make enough insulin, so insulin levels are low. In early Type 2 diabetes, the body becomes resistant to insulin, so it produces extra insulin to control blood sugar

#Visualizing and grouping for pedigree function

diabetes$PedigreeGroup <- cut(
  diabetes$DiabetesPedigreeFunction,
  breaks = c(0, 0.2, 0.5, 1.0, Inf),
  labels = c("Very Low", "Low", "Moderate", "High"),
  include.lowest = TRUE
)

ggplot(diabetes, aes(x = PedigreeGroup, fill = as.factor(Outcome))) +
  geom_bar(position = "dodge") +
  labs(
    title = "Pedigree Category vs Diabetes Outcome",
    x = "Pedigree Level",
    fill = "Outcome"
  ) +
  theme_minimal()

The Diabetes Pedigree Function shows how much family history may increase your risk of diabetes, but it’s not the main cause. Things like diet, exercise, weight, and lifestyle usually have a bigger impact. So even people with low or moderate family risk can still get diabetes, and not everyone with high family risk will develop it.

Visualizing Variables That Affect the Outcome

numeric_data <- diabetes[sapply(diabetes, is.numeric)]
cor_matrix <- cor(numeric_data, use = "complete.obs") 
cor_matrix
##                          Pregnancies    Glucose BloodPressure SkinThickness
## Pregnancies               1.00000000 0.12945867    0.14128198   -0.08167177
## Glucose                   0.12945867 1.00000000    0.15258959    0.05732789
## BloodPressure             0.14128198 0.15258959    1.00000000    0.20737054
## SkinThickness            -0.08167177 0.05732789    0.20737054    1.00000000
## Insulin                  -0.07823754 0.31048270    0.10225530    0.48823294
## BMI                       0.01768309 0.22107107    0.28180529    0.39257320
## DiabetesPedigreeFunction -0.03352267 0.13733730    0.04126495    0.18392757
## Age                       0.54434123 0.26351432    0.23952795   -0.11397026
## Outcome                   0.22189815 0.46658140    0.06506836    0.07475223
##                              Insulin        BMI DiabetesPedigreeFunction
## Pregnancies              -0.07823754 0.01768309              -0.03352267
## Glucose                   0.31048270 0.22107107               0.13733730
## BloodPressure             0.10225530 0.28180529               0.04126495
## SkinThickness             0.48823294 0.39257320               0.18392757
## Insulin                   1.00000000 0.20909688               0.18734209
## BMI                       0.20909688 1.00000000               0.14064695
## DiabetesPedigreeFunction  0.18734209 0.14064695               1.00000000
## Age                      -0.06880310 0.03624187               0.03356131
## Outcome                   0.12346817 0.29269466               0.17384407
##                                  Age    Outcome
## Pregnancies               0.54434123 0.22189815
## Glucose                   0.26351432 0.46658140
## BloodPressure             0.23952795 0.06506836
## SkinThickness            -0.11397026 0.07475223
## Insulin                  -0.06880310 0.12346817
## BMI                       0.03624187 0.29269466
## DiabetesPedigreeFunction  0.03356131 0.17384407
## Age                       1.00000000 0.23835598
## Outcome                   0.23835598 1.00000000
corrplot(cor_matrix, method = "color", 
         type = "upper", 
        col=colorRampPalette(c("blue","white","red"))(200),
        addCoef.col = "black",
         tl.col = "black",
        number.cex = 0.7,
        tl.srt = 45
        )

Conclusion

The following steps should be taken to help reduce or monitor diabetes: