The dataset being used can be found in the NHANES package. This is survey data collected by the US National Center for Health Statistics (NCHS) which has conducted a series of health and nutrition surveys since the early 1960’s. Since 1999 approximately 5,000 individuals of all ages are interviewed in their homes every year and complete the health examination component of the survey. The health examination is conducted in a mobile examination centre (MEC).For the purpose of this study we will be looking at the dependent variable -BMI and its relationships with five different independent variables which include age, gender, education, sleep hours and income. We will also look at whether healthgen affects sleep hours.
Gender (sex) of study participant coded as male or female
Age in years at screening of study participant. Note: Subjects 80 years or older were recorded as 80.
Educational level of study participant Reported for participants aged 20 years or older. One of 8thGrade, 9-11thGrade, HighSchool, SomeCollege, or CollegeGrad.
Total annual gross income for the household in US dollars. One of 0 - 4999, 5000 - 9,999, 10000 - 14999, 15000 - 19999, 20000 - 24,999, 25000 - 34999, 35000 - 44999, 45000 - 54999, 55000 - 64999, 65000 - 74999, 75000 - 99999, or 100000 or More.
Self-reported rating of participant’s health in general Reported for participants aged 12 years or older. One of Excellent, Vgood, Good, Fair, or Poor.
Self-reported number of hours study participant usually gets at night on weekdays or workdays. Reported for participants aged 16 years and older.
Body mass index (weight/height2 in kg/m2). Reported for participants aged 2 years or older.
library(ggplot2)
library(ggthemes)
library(Zelig)
library(ggrepel)
library(tidyverse)
library(NHANES)
str(NHANES)
## Classes 'tbl_df', 'tbl' and 'data.frame': 10000 obs. of 76 variables:
## $ ID : int 51624 51624 51624 51625 51630 51638 51646 51647 51647 51647 ...
## $ SurveyYr : Factor w/ 2 levels "2009_10","2011_12": 1 1 1 1 1 1 1 1 1 1 ...
## $ Gender : Factor w/ 2 levels "female","male": 2 2 2 2 1 2 2 1 1 1 ...
## $ Age : int 34 34 34 4 49 9 8 45 45 45 ...
## $ AgeDecade : Factor w/ 8 levels " 0-9"," 10-19",..: 4 4 4 1 5 1 1 5 5 5 ...
## $ AgeMonths : int 409 409 409 49 596 115 101 541 541 541 ...
## $ Race1 : Factor w/ 5 levels "Black","Hispanic",..: 4 4 4 5 4 4 4 4 4 4 ...
## $ Race3 : Factor w/ 6 levels "Asian","Black",..: NA NA NA NA NA NA NA NA NA NA ...
## $ Education : Factor w/ 5 levels "8th Grade","9 - 11th Grade",..: 3 3 3 NA 4 NA NA 5 5 5 ...
## $ MaritalStatus : Factor w/ 6 levels "Divorced","LivePartner",..: 3 3 3 NA 2 NA NA 3 3 3 ...
## $ HHIncome : Factor w/ 12 levels " 0-4999"," 5000-9999",..: 6 6 6 5 7 11 9 11 11 11 ...
## $ HHIncomeMid : int 30000 30000 30000 22500 40000 87500 60000 87500 87500 87500 ...
## $ Poverty : num 1.36 1.36 1.36 1.07 1.91 1.84 2.33 5 5 5 ...
## $ HomeRooms : int 6 6 6 9 5 6 7 6 6 6 ...
## $ HomeOwn : Factor w/ 3 levels "Own","Rent","Other": 1 1 1 1 2 2 1 1 1 1 ...
## $ Work : Factor w/ 3 levels "Looking","NotWorking",..: 2 2 2 NA 2 NA NA 3 3 3 ...
## $ Weight : num 87.4 87.4 87.4 17 86.7 29.8 35.2 75.7 75.7 75.7 ...
## $ Length : num NA NA NA NA NA NA NA NA NA NA ...
## $ HeadCirc : num NA NA NA NA NA NA NA NA NA NA ...
## $ Height : num 165 165 165 105 168 ...
## $ BMI : num 32.2 32.2 32.2 15.3 30.6 ...
## $ BMICatUnder20yrs: Factor w/ 4 levels "UnderWeight",..: NA NA NA NA NA NA NA NA NA NA ...
## $ BMI_WHO : Factor w/ 4 levels "12.0_18.5","18.5_to_24.9",..: 4 4 4 1 4 1 2 3 3 3 ...
## $ Pulse : int 70 70 70 NA 86 82 72 62 62 62 ...
## $ BPSysAve : int 113 113 113 NA 112 86 107 118 118 118 ...
## $ BPDiaAve : int 85 85 85 NA 75 47 37 64 64 64 ...
## $ BPSys1 : int 114 114 114 NA 118 84 114 106 106 106 ...
## $ BPDia1 : int 88 88 88 NA 82 50 46 62 62 62 ...
## $ BPSys2 : int 114 114 114 NA 108 84 108 118 118 118 ...
## $ BPDia2 : int 88 88 88 NA 74 50 36 68 68 68 ...
## $ BPSys3 : int 112 112 112 NA 116 88 106 118 118 118 ...
## $ BPDia3 : int 82 82 82 NA 76 44 38 60 60 60 ...
## $ Testosterone : num NA NA NA NA NA NA NA NA NA NA ...
## $ DirectChol : num 1.29 1.29 1.29 NA 1.16 1.34 1.55 2.12 2.12 2.12 ...
## $ TotChol : num 3.49 3.49 3.49 NA 6.7 4.86 4.09 5.82 5.82 5.82 ...
## $ UrineVol1 : int 352 352 352 NA 77 123 238 106 106 106 ...
## $ UrineFlow1 : num NA NA NA NA 0.094 ...
## $ UrineVol2 : int NA NA NA NA NA NA NA NA NA NA ...
## $ UrineFlow2 : num NA NA NA NA NA NA NA NA NA NA ...
## $ Diabetes : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ DiabetesAge : int NA NA NA NA NA NA NA NA NA NA ...
## $ HealthGen : Factor w/ 5 levels "Excellent","Vgood",..: 3 3 3 NA 3 NA NA 2 2 2 ...
## $ DaysPhysHlthBad : int 0 0 0 NA 0 NA NA 0 0 0 ...
## $ DaysMentHlthBad : int 15 15 15 NA 10 NA NA 3 3 3 ...
## $ LittleInterest : Factor w/ 3 levels "None","Several",..: 3 3 3 NA 2 NA NA 1 1 1 ...
## $ Depressed : Factor w/ 3 levels "None","Several",..: 2 2 2 NA 2 NA NA 1 1 1 ...
## $ nPregnancies : int NA NA NA NA 2 NA NA 1 1 1 ...
## $ nBabies : int NA NA NA NA 2 NA NA NA NA NA ...
## $ Age1stBaby : int NA NA NA NA 27 NA NA NA NA NA ...
## $ SleepHrsNight : int 4 4 4 NA 8 NA NA 8 8 8 ...
## $ SleepTrouble : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 1 1 1 ...
## $ PhysActive : Factor w/ 2 levels "No","Yes": 1 1 1 NA 1 NA NA 2 2 2 ...
## $ PhysActiveDays : int NA NA NA NA NA NA NA 5 5 5 ...
## $ TVHrsDay : Factor w/ 7 levels "0_hrs","0_to_1_hr",..: NA NA NA NA NA NA NA NA NA NA ...
## $ CompHrsDay : Factor w/ 7 levels "0_hrs","0_to_1_hr",..: NA NA NA NA NA NA NA NA NA NA ...
## $ TVHrsDayChild : int NA NA NA 4 NA 5 1 NA NA NA ...
## $ CompHrsDayChild : int NA NA NA 1 NA 0 6 NA NA NA ...
## $ Alcohol12PlusYr : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 2 2 2 ...
## $ AlcoholDay : int NA NA NA NA 2 NA NA 3 3 3 ...
## $ AlcoholYear : int 0 0 0 NA 20 NA NA 52 52 52 ...
## $ SmokeNow : Factor w/ 2 levels "No","Yes": 1 1 1 NA 2 NA NA NA NA NA ...
## $ Smoke100 : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 1 1 1 ...
## $ Smoke100n : Factor w/ 2 levels "Non-Smoker","Smoker": 2 2 2 NA 2 NA NA 1 1 1 ...
## $ SmokeAge : int 18 18 18 NA 38 NA NA NA NA NA ...
## $ Marijuana : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 2 2 2 ...
## $ AgeFirstMarij : int 17 17 17 NA 18 NA NA 13 13 13 ...
## $ RegularMarij : Factor w/ 2 levels "No","Yes": 1 1 1 NA 1 NA NA 1 1 1 ...
## $ AgeRegMarij : int NA NA NA NA NA NA NA NA NA NA ...
## $ HardDrugs : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 1 1 1 ...
## $ SexEver : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 2 2 2 ...
## $ SexAge : int 16 16 16 NA 12 NA NA 13 13 13 ...
## $ SexNumPartnLife : int 8 8 8 NA 10 NA NA 20 20 20 ...
## $ SexNumPartYear : int 1 1 1 NA 1 NA NA 0 0 0 ...
## $ SameSex : Factor w/ 2 levels "No","Yes": 1 1 1 NA 2 NA NA 2 2 2 ...
## $ SexOrientation : Factor w/ 3 levels "Bisexual","Heterosexual",..: 2 2 2 NA 2 NA NA 1 1 1 ...
## $ PregnantNow : Factor w/ 3 levels "Yes","No","Unknown": NA NA NA NA NA NA NA NA NA NA ...
ggplot(data = NHANES) + geom_point(aes(x = BMI, y = HHIncome))