Section 6.5

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(dslabs)
library(NHANES)
library(data.table)
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last

For these exercises, we will be using the NHANES data.

library(NHANES) data(NHANES)

1. We will provide some basic facts about blood pressure. First let’s select a group to set the standard. We will use 20-to-29-year-old females. AgeDecade is a categorical variable with these ages. Note that the category is coded like ” 20-29”, with a space in front! Use the data.table package to compute the average and standard deviation of systolic blood pressure as saved in the BPSysAve variable. Save it to a variable called ref.

data(NHANES)
nhdat<-setDT(NHANES)
ref<-nhdat[AgeDecade %in% " 20-29" & Gender %in% "female", .(average = mean(BPSysAve, na.rm=TRUE), standard_deviation = sd(BPSysAve, na.rm=TRUE))]

2. Report the min and max values for the same group.

refmm<-nhdat[AgeDecade %in% " 20-29" & Gender %in% "female", .(minBP = min(BPSysAve, na.rm=TRUE), maxBP = max(BPSysAve, na.rm=TRUE))]

3. Compute the average and standard deviation for females, but for each age group separately rather than a selected decade as in question 1. Note that the age groups are defined by AgeDecade.

reff<-nhdat[Gender %in% "female", .(average = mean(BPSysAve, na.rm=TRUE), standard_deviation = sd(BPSysAve, na.rm=TRUE)), by=AgeDecade]
reff
##    AgeDecade   average standard_deviation
##       <fctr>     <num>              <num>
## 1:     40-49 115.49385          14.530054
## 2:     10-19 104.27466           9.461431
## 3:     50-59 121.84245          16.179333
## 4:       0-9  99.95041           9.071798
## 5:     60-69 127.17787          17.125713
## 6:     20-29 108.42243          10.146681
## 7:     30-39 111.25512          12.314790
## 8:       70+ 133.51652          19.841781
## 9:      <NA> 141.54839          22.908521

4. Repeat exercise 3 for males.

refm<-nhdat[Gender %in% "male", .(avg=mean(BPSysAve, na.rm=TRUE), st_dev=sd(BPSysAve, na.rm=TRUE)), by=AgeDecade]

5. For males between the ages of 40-49, compare systolic blood pressure across race as reported in the Race1 variable. Order the resulting table from lowest to highest average systolic blood pressure.

refx<-nhdat[AgeDecade %in% " 40-49" & Gender %in% "male", .(avg=mean(BPSysAve, na.rm=TRUE)), by=Race1]
refx[order(avg)]
##       Race1      avg
##      <fctr>    <num>
## 1:    White 119.9188
## 2:    Other 120.4000
## 3: Hispanic 121.6098
## 4:  Mexican 121.8500
## 5:    Black 125.8387