The National Health and Nutrition Survey data as subsetted for use in Hosmer and Lemeshow’s (2000) Applied Logistic Regression: Second edition was obtained for use in this analysis. This data has been compiled by the Centers for Disease Control (CDC) with the intent to assess the health and nutritional status of both adults and children in the United States. Additional information regarding the NHANES can be found at https://www.cdc.gov/nchs/nhanes/about_nhanes.htm.
Prior to reading the data into RStudio, the accompanying data definitions were reviewed to obtain a preliminary understanding of the data type (categorical, numerical, logical). A copy of the category definitions and data details are included below for easy reference.
include_url("https://nlepera.github.io/sta553/w10_tableau_dashboard/data/nhanes_details.pdf")
Once the preliminary definitions were reviewed, the NHANES data was
downloaded and read into RStudio to create a data frame entitled
NHANES.raw
.
NHANES.raw <- read.csv("https://nlepera.github.io/sta553/w10_tableau_dashboard/data/nhanes.csv")
str(NHANES.raw)
'data.frame': 7927 obs. of 16 variables:
$ X : int 3 5 6 7 10 12 15 19 20 22 ...
$ obs : int 9 11 19 34 48 51 55 70 71 73 ...
$ psu : int 2 2 1 1 1 1 2 1 1 1 ...
$ stratum : int 43 40 35 13 24 28 44 9 43 29 ...
$ stat.weight: num 19452 1246 3861 5032 26919 ...
$ age : int 48 48 44 42 56 44 48 63 37 42 ...
$ sex : int 0 1 1 0 0 1 1 1 0 1 ...
$ race : int 1 1 2 2 1 1 1 2 1 3 ...
$ body.weight: num 150 155 190 126 240 ...
$ height : num 61.8 66.2 70.2 62.6 67.6 71.1 68 67.8 61.7 68.4 ...
$ avg.sbp : int 131 120 133 100 128 130 155 137 128 148 ...
$ avg.dbp : int 73 70 85 67 73 86 91 68 70 83 ...
$ past.smk : int 1 1 1 1 1 1 1 1 1 1 ...
$ current.smk: int 2 2 1 1 2 1 1 1 1 1 ...
$ cholestrol : int 236 260 187 216 156 162 212 186 212 267 ...
$ hbp : int 0 0 0 0 0 0 1 0 0 1 ...
DT::datatable(NHANES.raw, fillContainer = TRUE, options = list(pageLength = 10, scrollY = "100eh"))
As described in the data definitions document, the first two columns
of NHANES.raw
, entitled NHANES.raw$X
and
NHANES.raw$obs
include junk data that must be removed. Both
columns are observation IDs which will serve no analytic purpose. A new
data frame entitled NHANES
was created as as subset of
NHANES.raw
without the observation IDs and junk values
removed.
NHANES <- subset(NHANES.raw, select = -c(X, obs)) %>%
mutate(sex =
case_when(sex == 0 ~ 'Female',
sex == 1 ~ 'Male'),
race =
case_when(race == 1 ~ 'White',
race == 2 ~ 'Black',
race == 3 ~ 'Other'),
past.smk =
case_when(past.smk == 1 ~ 'Yes',
past.smk == 2 ~ 'No'),
current.smk =
case_when(current.smk == 1 ~ 'Yes',
current.smk == 2 ~ 'No'),
hbp =
case_when(hbp == 0 ~ 'No',
hbp == 1 ~ 'Yes'))
str(NHANES)
'data.frame': 7927 obs. of 14 variables:
$ psu : int 2 2 1 1 1 1 2 1 1 1 ...
$ stratum : int 43 40 35 13 24 28 44 9 43 29 ...
$ stat.weight: num 19452 1246 3861 5032 26919 ...
$ age : int 48 48 44 42 56 44 48 63 37 42 ...
$ sex : chr "Female" "Male" "Male" "Female" ...
$ race : chr "White" "White" "Black" "Black" ...
$ body.weight: num 150 155 190 126 240 ...
$ height : num 61.8 66.2 70.2 62.6 67.6 71.1 68 67.8 61.7 68.4 ...
$ avg.sbp : int 131 120 133 100 128 130 155 137 128 148 ...
$ avg.dbp : int 73 70 85 67 73 86 91 68 70 83 ...
$ past.smk : chr "Yes" "Yes" "Yes" "Yes" ...
$ current.smk: chr "No" "No" "Yes" "Yes" ...
$ cholestrol : int 236 260 187 216 156 162 212 186 212 267 ...
$ hbp : chr "No" "No" "No" "No" ...
DT::datatable(NHANES, fillContainer = TRUE, options = list(pageLength = 10, scrollY = "100eh"))
This cleaned data was then exported and uploaded to both Github for retention and Tableau for transformation into a dashboard and story point.
write.xlsx(NHANES, "C:/Users/natal/OneDrive/Documents/School Files/Spring 2024/STA553 - Data Visualization/sta553/w10_tableau_dashboard/data/NHANES_clean.xlsx", row.names=FALSE )
Prior to creating any form of data visualization from the
NHANES
data loaded into Tableau, the associated column
names were manually updated per the data definitions outlined in section
1.2. A custom column entitled Smoker Status
was created
utilizing the conditional outcomes as outlined in item 14 of the data
definitions.
include_url("https://prod-useast-b.online.tableau.com/t/nl10316570740a22d2e/views/NHANESHealthData/SmokingDashboard")
backup link: https://prod-useast-b.online.tableau.com/t/nl10316570740a22d2e/views/NHANESHealthData/SmokingDashboard
include_url("https://prod-useast-b.online.tableau.com/t/nl10316570740a22d2e/views/NHANESHealthData/SmokingHealth/cb399a51-270e-4a8f-bd5a-f823c572f2ba/4832462d-0878-4cb4-84c1-8180523a00de")