r Sys.Date(){r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE)
In this assignment, you will use the dataset from the nhanesA package.
The National Health and Nutrition Examination Survey (NHANES) is a program with a series of studies aimed at determining the health and nutritional status of Americans, including adults and children. Started in the early 1960s, the NHANES program has consisted of a series of surveys concentrating on various population groups or health themes. The survey was transformed into a continuous program in 1999, with a shifting focus on a variety of health and nutrition measurements. For continuous NHANES, the survey is conducted in two-year cycles, i.e, 1999-2000,2001-2002,etc.
There are 5 types of continuous NHANES survey data available to the public:
You can learn more about NHANES data at the CDC website.
Load the two packages: nhanesA and dplyr ```{r} library(nhanesA) library(dplyr)
### [Point 1] Check data
Check the Laboratory Data in survey cycle 2017-2018 using the
`nhanesTables()` function.```{r}
lab_data <- nhanesTables("Laboratory", "2017-2018")
head(lab_data)
Import the Demographic Data in survey cycle 2011-2012 using the nhanes()
function. Assign the data to the object named demo.{r} demo <- nhanes("DEMO", "2011-2012")
Show dimension and the first and last 3 rows of the demo table.
{r} dim(demo) head(demo, 3) tail(demo, 3)
Translate the way of variable encoding using the nhanesTranslate()
function and assign the result to the object named demo_translate.
{r} demo_translate <- nhanesTranslate("DEMO_G", c("SEQN", "RIAGENDR", "RIDAGEYR"))
Load the ‘Blood Pressure’ table, which is a part of Examination survey
data, from NHANES 2011-2012. Assign it to the new object named bpx.
Show the first 3 rows of bpx.
{r} bpx <- nhanes("BPX_G") head(bpx, 3)
How many missing values are there for ‘gender’ attributes?{r} missing_gender_count <- sum(is.na(demo$RIAGENDR)
How many missing values are there for each column? ```{r} if (is.data.frame(demo)) { colSums(is.na(demo)) } else { print(“The object ‘demo’ is not a data frame.”) }
## Translate encoding
For the `bpx` dataset, we want to keep the following variables as
our primary interests only:
- SEQN (Respondent sequence number)
- PEASCST1 (Blood Pressure Status)
- PEASCTM1 (Blood Pressure Time in Seconds)
- BPXSY1 (Systolic: Blood pres (1st rdg) mm Hg)
- BPXDI1 (Diastolic: Blood pres (1st rdg) mm Hg)
### [Point 0.5]
Translate the way of variable encoding using the `nhanesTranslate()`
function and assign to a new object named `bpx_translate`. Show the
top 3 rows of `bpx_translate`.```{r}
bpx_translate <- nhanesTranslate(bpx_data)
head(bpx_translate, 3)
Subset the table only to the variables we are interested in and assign
it to the new object named new_bpx. Show the first 6 rows of the
new_bpx table.`{r} new_bpx <- bpx[, c("SEQN", "PEASCST1", "PEASCTM1", "BPXSY1", "BPXDI1")] head(new_bpx, 6)
Rename the variables in the following way:
Assign this updated version of the table as new_bpx and show the first
6 rows of new_bpx.
{r} names(new_bpx) <- c("id", "bp_status", "bpt_sec", "systolic", "diastolic") head(new_bpx, 6)
Keep only the top 10 rows and assign it to the object named final_bpx.
{r} final_bpx <- head(new_bpx, 10)
From the final_bpx table, remove all the rows with any missing values.
Assign this subset of the table as bpx_complete. Display the dimension
of bpx_complete.
{r} bpx_complete <- na.omit(final_bpx) dim(bpx_complete)
Re-order the rows in the final_bpx dataset by Blood Pressure
Time in Seconds (bpt_sec) in descending order.
{r} final_bpx <- final_bpx[order(-final_bpx$bpt_sec), ] head(final_bpx)
Create a new variable called rescale_bpt_min that records the Blood
Pressure Time in minutes. Keep both original and new variables/columns.
```{r}
final_bpx\(rescale_bpt_min <- final_bpx\)bpt_sec / 60
head(final_bpx)
### [Point 1] Keep only new
Create the `rescale_bpt_min` variable in the same way above and
**only keep new columns**. (Try to avoid using `select()`)
```{r}
bpx_new <- data.frame(rescale_bpt_min = final_bpx$rescale_bpt_min)
head(bpx_new)
Summarize the final_bpx table where systolic is above 120 and the new
column (bpt_min) that records the Blood Pressure Time in minutes is added.
Using a pipe operator (%>% or |>).{r} final_bpx %>% filter(systolic > 120) %>% mutate(bpt_min = bpt / 60) %>% summary()
Step 1: Select the following columns from demo_translate table
Step 2: Rename the selected columns to:
Step 3: Join the subset of Demography survey table from Step 2
with new_bpx using id as a key.
You can reference the data wrangling cheat sheet HERE.
Step 4: Subset the table to the participants with age over 65
Step 5: Summarize the average age, systolic, and diastolic by gender
The dimension of the final table should be 2 x 4.```{r}
demographics <- demo_translate %>%
select(SEQN, RIAGENDR, RIDAGEYR)
demographics <- demographics %>% rename(id = SEQN, gender = RIAGENDR, age = RIDAGEYR)
joined_data <- demographics %>% inner_join(new_bpx, by = “id”)
filtered_data <- joined_data %>% filter(age > 65)
summary_by_gender <- filtered_data %>% group_by(gender) %>% summarise( avg_age = mean(age, na.rm = TRUE), avg_systolic = mean(systolic, na.rm = TRUE),
```