Section 6.5
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(dslabs)
library(NHANES)
library(data.table)
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
##
## between, first, last
For these exercises, we will be using the NHANES data.
library(NHANES) data(NHANES)
1. We will provide some basic facts about blood pressure. First let’s
select a group to set the standard. We will use 20-to-29-year-old
females. AgeDecade is a categorical variable with these
ages. Note that the category is coded like ” 20-29”, with a space in
front! Use the data.table package to compute the
average and standard deviation of systolic blood pressure as saved in
the BPSysAve variable. Save it to a variable
called ref.
data(NHANES)
nhdat<-setDT(NHANES)
ref<-nhdat[AgeDecade %in% " 20-29" & Gender %in% "female", .(average = mean(BPSysAve, na.rm=TRUE), standard_deviation = sd(BPSysAve, na.rm=TRUE))]
2. Report the min and max values for the same group.
refmm<-nhdat[AgeDecade %in% " 20-29" & Gender %in% "female", .(minBP = min(BPSysAve, na.rm=TRUE), maxBP = max(BPSysAve, na.rm=TRUE))]
3. Compute the average and standard deviation for females, but for
each age group separately rather than a selected decade as in question
1. Note that the age groups are defined by AgeDecade.
reff<-nhdat[Gender %in% "female", .(average = mean(BPSysAve, na.rm=TRUE), standard_deviation = sd(BPSysAve, na.rm=TRUE)), by=AgeDecade]
reff
## AgeDecade average standard_deviation
## <fctr> <num> <num>
## 1: 40-49 115.49385 14.530054
## 2: 10-19 104.27466 9.461431
## 3: 50-59 121.84245 16.179333
## 4: 0-9 99.95041 9.071798
## 5: 60-69 127.17787 17.125713
## 6: 20-29 108.42243 10.146681
## 7: 30-39 111.25512 12.314790
## 8: 70+ 133.51652 19.841781
## 9: <NA> 141.54839 22.908521
4. Repeat exercise 3 for males.
refm<-nhdat[Gender %in% "male", .(avg=mean(BPSysAve, na.rm=TRUE), st_dev=sd(BPSysAve, na.rm=TRUE)), by=AgeDecade]
5. For males between the ages of 40-49, compare systolic blood
pressure across race as reported in the Race1 variable.
Order the resulting table from lowest to highest average systolic blood
pressure.
refx<-nhdat[AgeDecade %in% " 40-49" & Gender %in% "male", .(avg=mean(BPSysAve, na.rm=TRUE)), by=Race1]
refx[order(avg)]
## Race1 avg
## <fctr> <num>
## 1: White 119.9188
## 2: Other 120.4000
## 3: Hispanic 121.6098
## 4: Mexican 121.8500
## 5: Black 125.8387