2022-11-13

Data Visualization on Pima Indian Diabetes Dataset

Features Used in Visualizations

  • Pregnancy: the number of pregnancies a patient had
  • Glucose: plasma glucose concentration measured using an oral glucose tolerance test
  • Blood Pressure: diastolic blood pressure (mm Hg)
  • BMI: Body mass index (weight in kg/(height in m)^2)
  • Insulin: insulin levels (mu U/ml)
  • Age: age of the patient

Summary of the Pima Indian Diabetes Dataset

summary(diabetes)
##   Pregnancies        Glucose      BloodPressure    SkinThickness  
##  Min.   : 0.000   Min.   :  0.0   Min.   :  0.00   Min.   : 0.00  
##  1st Qu.: 1.000   1st Qu.: 99.0   1st Qu.: 62.00   1st Qu.: 0.00  
##  Median : 3.000   Median :117.0   Median : 72.00   Median :23.00  
##  Mean   : 3.845   Mean   :120.9   Mean   : 69.11   Mean   :20.54  
##  3rd Qu.: 6.000   3rd Qu.:140.2   3rd Qu.: 80.00   3rd Qu.:32.00  
##  Max.   :17.000   Max.   :199.0   Max.   :122.00   Max.   :99.00  
##     Insulin           BMI        DiabetesPedigreeFunction      Age       
##  Min.   :  0.0   Min.   : 0.00   Min.   :0.0780           Min.   :21.00  
##  1st Qu.:  0.0   1st Qu.:27.30   1st Qu.:0.2437           1st Qu.:24.00  
##  Median : 30.5   Median :32.00   Median :0.3725           Median :29.00  
##  Mean   : 79.8   Mean   :31.99   Mean   :0.4719           Mean   :33.24  
##  3rd Qu.:127.2   3rd Qu.:36.60   3rd Qu.:0.6262           3rd Qu.:41.00  
##  Max.   :846.0   Max.   :67.10   Max.   :2.4200           Max.   :81.00  
##     Outcome     
##  Min.   :0.000  
##  1st Qu.:0.000  
##  Median :0.000  
##  Mean   :0.349  
##  3rd Qu.:1.000  
##  Max.   :1.000

How Glucose and Insulin Levels May Cause Diabetes

Patients with glucose levels more than 125 and relatively high insulin levels (still below 200) are more likely to be diabetic.

Histogram Distribution of Age

Data from this dataset seems to have been taken from a skewed age distribution, with more patients in their early 20s than an older age.

Density Plot of Pregnancies and Diabetes

Patients with more than 5 children are more likely to be diagnosed with diabetes.

Diabetics Varying with Age

Patients with diabetes are most likely to be between their early 20s and early 40s.

How Glucose Levels and BMI May Cause Diabetes

More than glucose, BMI seems to be a stronger determining factor of diabetes. Patients with and without diabetes seem to have a BMI between 20 and 40.

Density Plot of Blood Pressure and Diabetes

More patients have a blood pressure between 60-80 (mm Hg).

Find the First and Third Quartile of BMI

The first quartile value of BMI can be found using the following equation: \[(N+1)*\frac{1}{4}\] The third quartile value of BMI can be found using the following equation: \[(N+1)*\frac{3}{4}\] In both equations, N is the number of data points.

bmi_new <- diabetes$BMI
bmi_new[diabetes$BMI==0] <- NA
summary(bmi_new)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   18.20   27.50   32.30   32.46   36.60   67.10      11

Find the Average Statistic for All Parameters

The average of each statistic can be found using this formula: \[\frac{1}{n} \sum_{i=i}^{n} x_{i}\] Here, the average is defined as the summation of n numbers divided by n.

mean(diabetes$BloodPressure)
## [1] 69.10547