HW 2, Due: Friday, Feb. 11

NHANES Data

ISLBS: Chapter 1/2 Exercises.

Collaborators (for this assignment): Tim Mckay

NHANES

The National Health and Nutrition Examination Survey (NHANES) is a survey conducted annually by the US National Center for Health Statistics (NCHS). While the original data uses a survey design that oversamples certain subpopulations, the data have been reweighted to undo oversampling effects and can be treated as if it were a simple random sample from the American population.

# we will begin by downloading (and loading the NHANES package)
#install.packages('NHANES')
library('NHANES')
data(NHANES)
?'NHANES'
## starting httpd help server ... done

1) Using the ? on the NHANES data, how many observations and variables are in this data? Provide descriptions of at least 3 variables.

There are 75 variables and 20,293 observations in this data. 1. AgeMonths is one of the variables and describes the ages in months at the screening of study participants. These were reported for participants aged 0 to 79 years of age for the 2009 to 2010 data report and 0 to 2 years of age for the 2011 to 2012 data report.

  1. Race1 is one of the variables that describes the reported race of study participants which include Mexican, Hispanic, White, Black, or Other.

  2. HomeRooms is another variable that describes the amount of rooms in the home of the study participant including the kitchen but not the bathroom.

2)

  1. Describe in words the distribution of ages for the study participants.

The age distribution for the study participants ranges from 0 to 80 years of age.

  1. Using numerical and graphical summaries, describe the distribution of heights among study participants in terms of inches. Note that 1 centimeter is approximately 0.39 inches.

The lowest height recorded in centimeters was 83.6 which is about 32.6 inches and the greatest height recorded in centimeters was 200.4 which is about 78.2 inches.

NHANES[,20]
  1. Use the following code to draw a random sample of 200 participants from the entire dataset. Using the random sample, nhanes.samp, investigate at which age people generally reach their adult height. Is it possible to do the same for weight; why or why not?

Adult height is reached around age 20. Adult weight begins around age 20 as well, but it is not as constant as height is and tends to fluctuate more.

#draw a random sample
library(ggplot2)
set.seed(5011)
row.num = sample(1:nrow(NHANES), 200, replace = FALSE)
nhanes.samp = NHANES[row.num, ]
ggplot(data = nhanes.samp, aes(x = Age, y= Height)) + 
  geom_point()
## Warning: Removed 5 rows containing missing values (`geom_point()`).

ggplot(data = nhanes.samp, aes(x = Age, y= Weight)) + 
  geom_point()

3)

NHANES25_rows <- which(NHANES$Age >= 25)
NHANES25 <- NHANES[NHANES25_rows,]
ggplot(data = NHANES25, aes(x = Education)) + 
  geom_bar() 

  1. What proportion of Americans at least 25 years of age are college graduates? Out of 10,000 observations about 2000 over the age of 25 are college graduates.

  2. What proportion of Americans at least 25 years of age with a high school degree are college graduates? Out of 10,000 observations those that were over the age of 25 about 1200 have a high school degree and about 2000 are college graduates. ### 4)

  3. Calculate the median and interquartile range of the distribution of the variable Poverty. Write a sentence explaining the median in the context of these data.

#quantile calculates the 1st quartile, the median, and the 3rd quartile

The median for the variable Poverty is around 2.70 and the interquartile range is about 3.47. The distribution of people in poverty is around 2 or 3.

quantile(NHANES$Poverty, na.rm=TRUE)
##   0%  25%  50%  75% 100% 
## 0.00 1.24 2.70 4.71 5.00
  1. Compare the distribution of Poverty across each group in Education among adults (defined as individuals 25 years of age or older). Describe any trends or interesting observations

In individuals 25 years of age or older, for those with:

only an 8th grade education the amount of those in poverty is around 1.2. only a 9th-11th grade education the amount of those in poverty is around 1.5. only a high school degree, the amount of those in poverty is around 2.3. only some college education, the amount of those in poverty is around 3.2. a college degree, the amount of those in poverty is around 5.

NHANES25_rows <- which(NHANES$Age >= 25)
NHANES25 <- NHANES[NHANES25_rows,]
ggplot(data = NHANES25, aes(x = Education, y= Poverty)) + 
  geom_boxplot() 
## Warning: Removed 490 rows containing non-finite values (`stat_boxplot()`).

ISLBS 1.28

ISLBS 1.30

ISLBS 1.31

ISLBS 2.2

ISLBS 2.3