# Attach library
library(psych)
library(dplyr)
# Read the data
diabetes_data <- read.csv("https://raw.githubusercontent.com/L-Velasco/Spring17_IS606/master/Final/diabetes.csv", stringsAsFactors = FALSE)
# Consider only complete cases
# Remove observations with missing hemoglobin, weight, height, waist and hip data
diabetes_cases <- diabetes_data[!(is.na(diabetes_data$glyhb)) & !(is.na(diabetes_data$weight)) & !(is.na(diabetes_data$height)) & !(is.na(diabetes_data$waist)) & !(is.na(diabetes_data$hip)),]
# Add another column to data frame, Waist-Hip Ratio (WHR) by calculating waist/hip
diabetes_cases$WHR <- diabetes_cases$waist / diabetes_cases$hip
# Add another column to data frame, Body Mass Index (BMI)
# Convert to metric measure then calculate weight/(height^2)
diabetes_cases$BMI <- ((diabetes_cases$weight * (0.453592)) / (((diabetes_cases$height / 12) * 0.3048) ^ 2))
# size of data
dim(diabetes_cases)
You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.
What relationship exists between Waist-Hip Ratio (WHR) and Body Mass Index (BMI), and which of the two measures suggests a stronger risk predictor of Type 2 Diabetes?
What are the cases, and how many are there?
Each case represents an African American living in Central Virginia who were screened for Diabetes. There are 403 observations in the given data set.
For this project, only 382 completed cases will be included.
Describe the method of data collection.
There were 1046 subjects who were interviewed in a study to understand the prevalence of obesity and diabetes in central Virginia for African Americans. The 403 of these subjects included in this dataset were the onces that were actually screened for diabetes.
What type of study is this (observational/experiment)?
This is an observational study.
If you collected the data, state self-collected. If not, provide a citation/link.
These data are courtesy of Dr John Schorling, Department of Medicine, University of Virginia School of Medicine and obtained from http://biostat.mc.vanderbilt.edu/DataSets
More information were cited in Vanderbilt website regarding the study: Willems JP, Saunders JT, DE Hunt, JB Schorling: Prevalence of coronary heart disease risk factors among rural blacks: A community-based study. Southern Medical Journal 90:814-820; 1997
and
Schorling JB, Roach J, Siegel M, Baturka N, Hunt DE, Guterbock TM, Stewart HL: A trial of church-based smoking cessation interventions for rural African Americans. Preventive Medicine 26:92-101; 1997.
What is the response variable, and what type is it (numerical/categorical)?
The response variable is Glycosolated Hemoglobin and is numerical.
What is the explanatory variable, and what type is it (numerical/categorival)?
The explanatory variables will be calculated as Waist-Hip ratio and Body Mass Index. Both are numerical.
Provide summary statistics relevant to your research question. For example, if you are comparing means across groups provide means, SDs, sample sizes of each group. This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
# Describe statistics for the variables of interest
describe(diabetes_cases$WHR)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 382 0.88 0.07 0.88 0.88 0.07 0.68 1.14 0.46 0.38 0.56 0
describe(diabetes_cases$BMI)
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 382 28.8 6.64 27.81 28.27 6.07 15.2 55.79 40.58 0.81 0.8
## se
## X1 0.34
describe(diabetes_cases$glyhb)
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 382 5.58 2.21 4.84 5.11 0.83 2.68 16.11 13.43 2.26 5.24
## se
## X1 0.11
# Frequency plot for the variables of interest
hist(diabetes_cases$WHR)
hist(diabetes_cases$BMI)
hist(diabetes_cases$glyhb)
# Observations by town and gender
count(diabetes_cases, location, gender)
## Source: local data frame [4 x 3]
## Groups: location [?]
##
## location gender n
## <chr> <chr> <int>
## 1 Buckingham female 107
## 2 Buckingham male 80
## 3 Louisa female 115
## 4 Louisa male 80