Data Preparation :

library(SASxport)
## Warning: package 'SASxport' was built under R version 3.3.3
experimental = read.xport("C:/Users/Exped/Desktop/607P2/BMX_H.XPT")
bmi = experimental$BMXBMI
bhi = experimental$BMXHT
dataAnalysis = data.frame(bmi,bhi)

Research Question :

Is a person’s height predictive of BMI classification. The BMI system uses height to ascertain a person’s bodyfat, however height should not be predictive of body fat.

BMI classifications are as follows.
1. Underweight (BMI < 5th percentile)
2. Normal weight (BMI 5th to < 85th percentiles)
3. Overweight (BMI 85th to < 95th percentiles)
4. Obese (BMI ??? 95th percentile)

Cases :

Each case represents an eligible survey participant aged 2-150, quality assurance for fairness and random sampling controlled by the NCHS Research Data Center and CDC. This dataset contains around 9000 observations.

Data collection :

Data is collected through the Centers for Disease Control and Prevention (CDC) government database. Data is collected in a joint effort through the CDC and the NCHS Research Data Center.

Type of study :

This is an observational study. Observing 2013-2014 data.

Reponse :

The response variable is the BMI of our 9000 observations… Numerical, discrete.

Explanatory :

The explanatory variable is the height of our 9000 observations…numerical, discrete.

Relevant Summary Statistics

describe(dataAnalysis$bmi)
##    vars    n  mean   sd median trimmed  mad  min  max range skew kurtosis
## X1    1 9055 25.68 7.96   24.7   24.97 7.71 12.1 82.9  70.8 1.02        2
##      se
## X1 0.08
describe(dataAnalysis$bhi)
##    vars    n   mean    sd median trimmed   mad  min   max range  skew
## X1    1 9067 155.88 23.18    162  159.14 15.12 79.7 202.6 122.9 -1.24
##    kurtosis   se
## X1     0.96 0.24
ggplot(dataAnalysis,aes(x=bhi)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 746 rows containing non-finite values (stat_bin).

ggplot(dataAnalysis,aes(x=bmi)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 758 rows containing non-finite values (stat_bin).