Part 0 - Loading and Preparing Data

# Attach library
library(psych) 
library(dplyr)
library(DATA606)

# Read the data
diabetes_raw_data <- read.csv("https://raw.githubusercontent.com/L-Velasco/Spring17_IS606/master/Final/diabetes.csv", stringsAsFactors = FALSE)

# Consider only complete cases 
# Remove observations with missing hemoglobin, weight, height, waist and hip data
diabetes_data <- diabetes_raw_data[!(is.na(diabetes_raw_data$glyhb)) & !(is.na(diabetes_raw_data$weight)) & !(is.na(diabetes_raw_data$height)) & !(is.na(diabetes_raw_data$waist)) & !(is.na(diabetes_raw_data$hip)),]

# Add another column to data frame, Waist-Hip Ratio (WHR) by calculating waist/hip
diabetes_data$WHR <- round(diabetes_data$waist / diabetes_data$hip,2)

# Add another column to data frame, Body Mass Index (BMI)
diabetes_data$BMI <- round(((diabetes_data$weight * 703) / ((diabetes_data$height)^2)),1) 

# Classify BMI Obesity

diabetes_data$BMI_type[diabetes_data$BMI <18.5]<-'Underweight'

diabetes_data$BMI_type[diabetes_data$BMI >= 18.5 & diabetes_data$BMI <= 24.9]<-'Normal'

diabetes_data$BMI_type[diabetes_data$BMI >= 25.0 & diabetes_data$BMI <= 29.9]<-'Overweight'

diabetes_data$BMI_type[diabetes_data$BMI >= 30.0]<-'Obese'

# Classify WHR Health Risk

diabetes_data$WHR_HRisk[diabetes_data$gender == 'female' & diabetes_data$WHR <= .80]<-'Low'

diabetes_data$WHR_HRisk[diabetes_data$gender == 'female' & (diabetes_data$WHR >= .81 & diabetes_data$WHR <= .85)]<-'Moderate'

diabetes_data$WHR_HRisk[diabetes_data$gender == 'female' & diabetes_data$WHR > .85]<-'High'

diabetes_data$WHR_HRisk[diabetes_data$gender == 'male' & diabetes_data$WHR <= .95]<-'Low'

diabetes_data$WHR_HRisk[diabetes_data$gender == 'male' & (diabetes_data$WHR >= .96 & diabetes_data$WHR <= 1.00)]<-'Moderate'

diabetes_data$WHR_HRisk[diabetes_data$gender == 'male' & diabetes_data$WHR > 1.00]<-'High'

diabetes_Risk_by_BMI <- subset(diabetes_data, BMI_type %in% c("Overweight","Obese"))
diabetes_Risk_by_WHR <- subset(diabetes_data, WHR_HRisk %in% c("Moderate","High"))

# size of data
dim(diabetes_data)
names(diabetes_data)

Part 1 - Introduction

Medical studies warn that Obesity is a major risk factor for diabetes and other chronic diseases. Two of the most common measure to describe the body’s obesity are Body Mass Index (BMI) and Waist-Hip ratio (WHR). BMI is generally described as a measure of overall obesity, while WHR measures central obesity (belly fat). Knowing one or both measurements provides an idea regarding one’s level of health and risks of obesity-related diseases.

This project aims to explore the a relationship between Waist-Hip Ratio (WHR) and Body Mass Index (BMI); and which of the two measures suggests a stronger risk predictor of Type 2 Diabetes

Part 2 - Data

Data collection

There were 1046 subjects who were interviewed in a study to understand the prevalence of obesity and diabetes in central Virginia for African Americans. The 403 of these subjects included in this dataset were the onces that were actually screened for diabetes.

These data are courtesy of Dr John Schorling, Department of Medicine, University of Virginia School of Medicine and obtained from http://biostat.mc.vanderbilt.edu/DataSets

Cases

Each case represents an African American living in Central Virginia who were screened for Diabetes. There are 403 observations in the given data set.

For this project, only 382 completed cases will be included.

Variables

The response variable is Glycosolated Hemoglobin and is numerical. A Glycosolated hemoglobin of > 7.0 is usually taken as a positive diagnosis of diabetes

The explanatory variables will be calculated as Waist-Hip ratio and Body Mass Index. Both are numerical.

Type of study

This is an observational study

Scope of inference - generalizability

The population of interest is African Americans living in rural setting. The inferences of this study cannot be generalized as there may be other BMI or WHR cutoff values for other ethnic populations (e.g. Asians, Europeans, Caucasians).

Scope of inference - causality

Although there appears a relationship between BMI and WHR, it does not appear that one caused the other. A person with low BMI value can have high WHR, and vice-versa.

Part 3 - Exploratory data analysis

BMI and WHR Relationship

The relationship between BMI and WHR is visualized below.

The first plot displays the distribution of different BMI types against WHR. It appears that Overweight and Obese subjects have greater WHR values than Underweight and Normal subjects.

Likewise in the second plot, it appears that subjects with moderate and high health risk based on their WHR have greater BMI values than subjects with Low risk.

diabetes_data$BMI_type <- ordered(diabetes_data$BMI_type, levels=c("Underweight", "Normal", "Overweight", "Obese"))
boxplot(diabetes_data$WHR ~ diabetes_data$BMI_type, xlab = "BMI Types", ylab = "WHR", main = "Different BMI Types and Waist-Hip Ratio" )

diabetes_data$WHR_HRisk <- ordered(diabetes_data$WHR_HRisk, levels=c("Low", "Moderate", "High"))
boxplot(diabetes_data$BMI ~ diabetes_data$WHR_HRisk, xlab = "WHR Health Risk", ylab = "BMI", main = "Different WHR Health Risk and BMI" )

Checking Gender Effect

It maybe logical to determine if gender is a contributing factor in predicting Glycosolated Hemoglobin. Below is a hypothesis testing to determine if there is a difference in average Glycosolated Hemoglobin between female and male gender. If there is observable difference, separate datasets would be used for female and male to eliminate or reduce possible gender-bias.

H0 : there is no difference in Glycosolated Hemoglobin between gender

HA : there is a difference in Glycosolated Hemoglobin between gender

inference(y = diabetes_data$glyhb, x = diabetes_data$gender, est = "mean", type = "ht", null = 0, alternative = "twosided", method = "theoretical")
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_female = 222, mean_female = 5.5009, sd_female = 2.1226
## n_male = 160, mean_male = 5.6833, sd_male = 2.3173
## Observed difference between means (female-male) = -0.1824
## 
## H0: mu_female - mu_male = 0 
## HA: mu_female - mu_male != 0 
## Standard error = 0.232 
## Test statistic: Z =  -0.786 
## p-value =  0.432

Based on the hypothesis testing, we fail to reject the null hypothesis. Since there is no apparent difference between gender, we will not separate data based on the gender variable.

Part 4 - Inference

The inference performed below will result in a 95% confidence interval. The first inference determines the average Glycosolated Hemoglobin for Overweight and Obese subjects, while the second inference determines average Glycosolated Hemoglobin for those with Moderate and High level of risk based on their WHR.

inference(y = diabetes_Risk_by_BMI$glyhb, est = "mean", type = "ci", null = 0, 
          alternative = "twosided", method = "theoretical")
## Single mean 
## Summary statistics:

## mean = 5.7292 ;  sd = 2.2039 ;  n = 268 
## Standard error = 0.1346 
## 95 % Confidence interval = ( 5.4653 , 5.993 )

Based on our data, we infer that the average Glycosolated Hemoglobin for Overweight and Obese African American living in rural setting would be between 5.4653 and 5.993

inference(y = diabetes_Risk_by_WHR$glyhb, est = "mean", type = "ci", null = 0, 
          alternative = "twosided", method = "theoretical")
## Single mean 
## Summary statistics:

## mean = 5.897 ;  sd = 2.4022 ;  n = 205 
## Standard error = 0.1678 
## 95 % Confidence interval = ( 5.5682 , 6.2259 )

Based on our data, we infer that the average Glycosolated Hemoglobin with Moderate and High health risk based on WHR for African American living in rural setting would be between 5.5682 and 6.2259.

Both BMI and WHR inferences is below 7.0, which is usually taken as a positive diagnosis of diabetes.

Part 5 - Conclusion

This study shows that there is a relatioship between BMI and WHR. Although both are useful for body measurements related to risk factors for Diabetes and other chronic diseases, it appears that BMI or WHR alone would be inadequate predictor of a Glycosolated Hemoglobin suggesting Diabetes.

Similar future research can explore other variables that when combined with BMI or WHR could predict a greater than 7.0 Glycosolated Hemoglobin. These variables can be age, blood pressure, cholesterol, LDL, HDL.